summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc8684.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc8684.txt')
-rw-r--r--doc/rfc/rfc8684.txt3795
1 files changed, 3795 insertions, 0 deletions
diff --git a/doc/rfc/rfc8684.txt b/doc/rfc/rfc8684.txt
new file mode 100644
index 0000000..481f5ae
--- /dev/null
+++ b/doc/rfc/rfc8684.txt
@@ -0,0 +1,3795 @@
+
+
+
+
+Internet Engineering Task Force (IETF) A. Ford
+Request for Comments: 8684 Pexip
+Obsoletes: 6824 C. Raiciu
+Category: Standards Track U. Politehnica of Bucharest
+ISSN: 2070-1721 M. Handley
+ U. College London
+ O. Bonaventure
+ U. catholique de Louvain
+ C. Paasch
+ Apple, Inc.
+ March 2020
+
+
+ TCP Extensions for Multipath Operation with Multiple Addresses
+
+Abstract
+
+ TCP/IP communication is currently restricted to a single path per
+ connection, yet multiple paths often exist between peers. The
+ simultaneous use of these multiple paths for a TCP/IP session would
+ improve resource usage within the network and thus improve user
+ experience through higher throughput and improved resilience to
+ network failure.
+
+ Multipath TCP provides the ability to simultaneously use multiple
+ paths between peers. This document presents a set of extensions to
+ traditional TCP to support multipath operation. The protocol offers
+ the same type of service to applications as TCP (i.e., a reliable
+ bytestream), and it provides the components necessary to establish
+ and use multiple TCP flows across potentially disjoint paths.
+
+ This document specifies v1 of Multipath TCP, obsoleting v0 as
+ specified in RFC 6824, through clarifications and modifications
+ primarily driven by deployment experience.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ https://www.rfc-editor.org/info/rfc8684.
+
+Copyright Notice
+
+ Copyright (c) 2020 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (https://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction
+ 1.1. Design Assumptions
+ 1.2. Multipath TCP in the Networking Stack
+ 1.3. Terminology
+ 1.4. MPTCP Concept
+ 1.5. Requirements Language
+ 2. Operation Overview
+ 2.1. Initiating an MPTCP Connection
+ 2.2. Associating a New Subflow with an Existing MPTCP Connection
+ 2.3. Informing the Other Host about Another Potential Address
+ 2.4. Data Transfer Using MPTCP
+ 2.5. Requesting a Change in a Path's Priority
+ 2.6. Closing an MPTCP Connection
+ 2.7. Notable Features
+ 3. MPTCP Operations: An Overview
+ 3.1. Connection Initiation
+ 3.2. Starting a New Subflow
+ 3.3. MPTCP Operation and Data Transfer
+ 3.3.1. Data Sequence Mapping
+ 3.3.2. Data Acknowledgments
+ 3.3.3. Closing a Connection
+ 3.3.4. Receiver Considerations
+ 3.3.5. Sender Considerations
+ 3.3.6. Reliability and Retransmissions
+ 3.3.7. Congestion Control Considerations
+ 3.3.8. Subflow Policy
+ 3.4. Address Knowledge Exchange (Path Management)
+ 3.4.1. Address Advertisement
+ 3.4.2. Remove Address
+ 3.5. Fast Close
+ 3.6. Subflow Reset
+ 3.7. Fallback
+ 3.8. Error Handling
+ 3.9. Heuristics
+ 3.9.1. Port Usage
+ 3.9.2. Delayed Subflow Start and Subflow Symmetry
+ 3.9.3. Failure Handling
+ 4. Semantic Issues
+ 5. Security Considerations
+ 6. Interactions with Middleboxes
+ 7. IANA Considerations
+ 7.1. TCP Option Kind Numbers
+ 7.2. MPTCP Option Subtypes
+ 7.3. MPTCP Handshake Algorithms
+ 7.4. MP_TCPRST Reason Codes
+ 8. References
+ 8.1. Normative References
+ 8.2. Informative References
+ Appendix A. Notes on Use of TCP Options
+ Appendix B. TCP Fast Open and MPTCP
+ B.1. TFO Cookie Request with MPTCP
+ B.2. Data Sequence Mapping under TFO
+ B.3. Connection Establishment Examples
+ Appendix C. Control Blocks
+ C.1. MPTCP Control Block
+ C.1.1. Authentication and Metadata
+ C.1.2. Sending Side
+ C.1.3. Receiving Side
+ C.2. TCP Control Blocks
+ C.2.1. Sending Side
+ C.2.2. Receiving Side
+ Appendix D. Finite State Machine
+ Appendix E. Changes from RFC 6824
+ Acknowledgments
+ Authors' Addresses
+
+1. Introduction
+
+ Multipath TCP (MPTCP) is a set of extensions to regular TCP [RFC0793]
+ to provide a Multipath TCP service [RFC6182], which enables a
+ transport connection to operate across multiple paths simultaneously.
+ This document presents the protocol changes required to add multipath
+ capability to TCP -- specifically, those for signaling and setting up
+ multiple paths ("subflows"), managing these subflows, reassembly of
+ data, and termination of sessions. This is not the only information
+ required to create a Multipath TCP implementation, however. This
+ document is complemented by three others:
+
+ * [RFC6182] (MPTCP architecture), which explains the motivations
+ behind Multipath TCP, contains a discussion of high-level design
+ decisions on which this design is based, and provides an
+ explanation of a functional separation through which an extensible
+ MPTCP implementation can be developed.
+
+ * [RFC6356] (congestion control), which presents a safe congestion
+ control algorithm for coupling the behavior of the multiple paths
+ in order to "do no harm" to other network users.
+
+ * [RFC6897] (application considerations), which discusses what
+ impact MPTCP will have on applications, what applications will
+ want to do with MPTCP, and as a consequence of these factors, what
+ API extensions an MPTCP implementation should present.
+
+ This document obsoletes the v0 specification of Multipath TCP
+ [RFC6824]. This document specifies MPTCP v1, which is not backward
+ compatible with MPTCP v0. This document additionally defines version
+ negotiation procedures for implementations that support both
+ versions.
+
+1.1. Design Assumptions
+
+ In order to limit the potentially huge design space, the MPTCP
+ Working Group imposed two key constraints on the Multipath TCP design
+ presented in this document:
+
+ * It must be backward compatible with current, regular TCP, to
+ increase its chances of deployment.
+
+ * It can be assumed that one or both hosts are multihomed and
+ multiaddressed.
+
+ To simplify the design, we assume that the presence of multiple
+ addresses at a host is sufficient to indicate the existence of
+ multiple paths. These paths need not be entirely disjoint: they may
+ share one or many routers between them. Even in such a situation,
+ making use of multiple paths is beneficial, improving resource
+ utilization and resilience to a subset of node failures. The
+ congestion control algorithm defined in [RFC6356] ensures that the
+ use of multiple paths does not act detrimentally. Furthermore, there
+ may be some scenarios where different TCP ports on a single host can
+ provide disjoint paths (such as through certain Equal-Cost Multipath
+ (ECMP) implementations [RFC2992]), and so the MPTCP design also
+ supports the use of ports in path identifiers.
+
+ There are three aspects to the backward compatibility listed above
+ (discussed in more detail in [RFC6182]):
+
+ External Constraints: The protocol must function through the vast
+ majority of existing middleboxes such as NATs, firewalls, and
+ proxies, and as such must resemble existing TCP as far as possible
+ on the wire. Furthermore, the protocol must not assume that the
+ segments it sends on the wire arrive unmodified at the
+ destination: they may be split or coalesced; TCP options may be
+ removed or duplicated.
+
+ Application Constraints: The protocol must be usable with no change
+ to existing applications that use the common TCP API (although it
+ is reasonable that not all features would be available to such
+ legacy applications). Furthermore, the protocol must provide the
+ same service model as regular TCP to the application.
+
+ Fallback: The protocol should be able to fall back to standard TCP
+ with no interference from the user, to be able to communicate with
+ legacy hosts.
+
+ The complementary application considerations document [RFC6897]
+ discusses the necessary features of an API to provide backward
+ compatibility, as well as API extensions to convey the behavior of
+ MPTCP at a level of control and information equivalent to that
+ available with regular, single-path TCP.
+
+ Further discussion of the design constraints and associated design
+ decisions is given in the MPTCP architecture document [RFC6182] and
+ in [howhard].
+
+1.2. Multipath TCP in the Networking Stack
+
+ MPTCP operates at the transport layer and aims to be transparent to
+ both higher and lower layers. It is a set of additional features on
+ top of standard TCP; Figure 1 illustrates this layering. MPTCP is
+ designed to be usable by legacy applications with no changes;
+ detailed discussion of its interactions with applications is given in
+ [RFC6897].
+
+ +-------------------------------+
+ | Application |
+ +---------------+ +-------------------------------+
+ | Application | | MPTCP |
+ +---------------+ + - - - - - - - + - - - - - - - +
+ | TCP | | Subflow (TCP) | Subflow (TCP) |
+ +---------------+ +-------------------------------+
+ | IP | | IP | IP |
+ +---------------+ +-------------------------------+
+
+ Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks
+
+1.3. Terminology
+
+ This document makes use of a number of terms that are either MPTCP
+ specific or have defined meaning in the context of MPTCP, as follows:
+
+ Path: A sequence of links between a sender and a receiver, defined
+ in this context by a 4-tuple of source and destination
+ address/port pairs.
+
+ Subflow: A flow of TCP segments operating over an individual path,
+ which forms part of a larger MPTCP connection. A subflow is
+ started and terminated similarly to a regular TCP connection.
+
+ (MPTCP) Connection: A set of one or more subflows, over which an
+ application can communicate between two hosts. There is a
+ one-to-one mapping between a connection and an application socket.
+
+ Data-level: The payload data is nominally transferred over a
+ connection, which in turn is transported over subflows. Thus, the
+ term "data-level" is synonymous with "connection-level", in
+ contrast to "subflow-level", which refers to properties of an
+ individual subflow.
+
+ Token: A locally unique identifier given to a multipath connection
+ by a host. May also be referred to as a "Connection ID".
+
+ Host: An end host operating an MPTCP implementation, and either
+ initiating or accepting an MPTCP connection.
+
+ In addition to these terms, note that MPTCP's interpretation of, and
+ effect on, regular single-path TCP semantics are discussed in
+ Section 4.
+
+1.4. MPTCP Concept
+
+ This section provides a high-level summary of normal operation of
+ MPTCP; this type of scenario is illustrated in Figure 2. A detailed
+ description of how MPTCP operates is given in Section 3.
+
+ Host A Host B
+ ------------------------ ------------------------
+ Address A1 Address A2 Address B1 Address B2
+ ---------- ---------- ---------- ----------
+ | | | |
+ | (initial connection setup) | |
+ |----------------------------------->| |
+ |<-----------------------------------| |
+ | | | |
+ | (additional subflow setup) |
+ | |--------------------->| |
+ | |<---------------------| |
+ | | | |
+ | | | |
+
+ Figure 2: Example MPTCP Usage Scenario
+
+ * To a non-MPTCP-aware application, MPTCP will behave the same as
+ normal TCP. Extended APIs could provide additional control to
+ MPTCP-aware applications [RFC6897]. An application begins by
+ opening a TCP socket in the normal way. MPTCP signaling and
+ operation are handled by the MPTCP implementation.
+
+ * An MPTCP connection begins similarly to a regular TCP connection.
+ This is illustrated in Figure 2, where an MPTCP connection is
+ established between addresses A1 and B1 on Hosts A and B,
+ respectively.
+
+ * If extra paths are available, additional TCP sessions (termed
+ MPTCP "subflows") are created on these paths and are combined with
+ the existing session, which continues to appear as a single
+ connection to the applications at both ends. The creation of the
+ additional TCP session is illustrated between Address A2 on Host A
+ and Address B1 on Host B.
+
+ * MPTCP identifies multiple paths by the presence of multiple
+ addresses at hosts. Combinations of these multiple addresses
+ equate to the additional paths. In the example, other potential
+ paths that could be set up are A1<->B2 and A2<->B2. Although this
+ additional session is shown as being initiated from A2, it could
+ equally have been initiated from B1 or B2.
+
+ * The discovery and setup of additional subflows will be achieved
+ through a path management method; this document describes a
+ mechanism by which a host can initiate new subflows by using its
+ own additional addresses or by signaling its available addresses
+ to the other host.
+
+ * MPTCP adds connection-level sequence numbers to allow the
+ reassembly of segments arriving on multiple subflows with
+ differing network delays.
+
+ * Subflows are terminated as regular TCP connections, with a
+ four-way FIN handshake. The MPTCP connection is terminated by a
+ connection-level FIN.
+
+1.5. Requirements Language
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
+ capitals, as shown here.
+
+2. Operation Overview
+
+ This section presents a single description of common MPTCP operation,
+ with reference to the protocol operation. This is a high-level
+ overview of the key functions; the full specification follows in
+ Section 3. Extensibility and negotiated features are not discussed
+ here. Considerable reference is made to symbolic names of MPTCP
+ options throughout this section -- these are subtypes of the
+ IANA-assigned MPTCP option (see Section 7), and their formats are
+ defined in the detailed protocol specification provided in Section 3.
+
+ A Multipath TCP connection provides a bidirectional bytestream
+ between two hosts communicating like normal TCP and thus does not
+ require any change to the applications. However, Multipath TCP
+ enables the hosts to use different paths with different IP addresses
+ to exchange packets belonging to the MPTCP connection. A Multipath
+ TCP connection appears like a normal TCP connection to an
+ application. However, to the network layer, each MPTCP subflow looks
+ like a regular TCP flow whose segments carry a new TCP option type.
+ Multipath TCP manages the creation, removal, and utilization of these
+ subflows to send data. The number of subflows that are managed
+ within a Multipath TCP connection is not fixed, and it can fluctuate
+ during the lifetime of the Multipath TCP connection.
+
+ All MPTCP operations are signaled with a TCP option -- a single
+ numerical type for MPTCP, with "subtypes" for each MPTCP message.
+ What follows is a summary of the purpose and rationale of these
+ messages.
+
+2.1. Initiating an MPTCP Connection
+
+ This is the same signaling as for initiating a normal TCP connection,
+ but the SYN, SYN/ACK, and initial ACK (and data) packets also carry
+ the MP_CAPABLE option. This option has a variable length and serves
+ multiple purposes. Firstly, it verifies whether the remote host
+ supports Multipath TCP; secondly, this option allows the hosts to
+ exchange some information to authenticate the establishment of
+ additional subflows. Further details are given in Section 3.1.
+
+ Host A Host B
+ ------ ------
+ MP_CAPABLE ->
+ [flags]
+ <- MP_CAPABLE
+ [B's key, flags]
+ ACK + MP_CAPABLE (+ data) ->
+ [A's key, B's key, flags, (data-level details)]
+
+ Retransmission of the ACK + MP_CAPABLE can occur if it is not known
+ if it has been received. The following diagrams show all possible
+ exchanges for the initial subflow setup to ensure this reliability.
+
+ Host A (with data to send immediately) Host B
+ ------ ------
+ MP_CAPABLE ->
+ [flags]
+ <- MP_CAPABLE
+ [B's key, flags]
+ ACK + MP_CAPABLE + data ->
+ [A's key, B's key, flags, data-level details]
+
+
+ Host A (with data to send later) Host B
+ ------ ------
+ MP_CAPABLE ->
+ [flags]
+ <- MP_CAPABLE
+ [B's key, flags]
+ ACK + MP_CAPABLE ->
+ [A's key, B's key, flags]
+
+ ACK + MP_CAPABLE + data ->
+ [A's key, B's key, flags, data-level details]
+
+
+ Host A Host B (sending first)
+ ------ ------
+ MP_CAPABLE ->
+ [flags]
+ <- MP_CAPABLE
+ [B's key, flags]
+ ACK + MP_CAPABLE ->
+ [A's key, B's key, flags]
+
+ <- ACK + DSS + data
+ [data-level details]
+
+2.2. Associating a New Subflow with an Existing MPTCP Connection
+
+ The exchange of keys in the MP_CAPABLE handshake provides material
+ that can be used to authenticate the endpoints when new subflows will
+ be set up. Additional subflows begin in the same way as initiating a
+ normal TCP connection, but the SYN, SYN/ACK, and ACK packets also
+ carry the MP_JOIN option.
+
+ Host A initiates a new subflow between one of its addresses and one
+ of Host B's addresses. The token -- generated from the key -- is
+ used to identify which MPTCP connection it is joining, and the
+ Hash-based Message Authentication Code (HMAC) is used for
+ authentication. The HMAC uses the keys exchanged in the MP_CAPABLE
+ handshake and the random numbers (nonces) exchanged in these MP_JOIN
+ options. MP_JOIN also contains flags and an Address ID that can be
+ used to refer to the source address without the sender needing to
+ know if it has been changed by a NAT. Further details are given in
+ Section 3.2.
+
+ Host A Host B
+ ------ ------
+ MP_JOIN ->
+ [B's token, A's nonce,
+ A's Address ID, flags]
+ <- MP_JOIN
+ [B's HMAC, B's nonce,
+ B's Address ID, flags]
+ ACK + MP_JOIN ->
+ [A's HMAC]
+
+ <- ACK
+
+2.3. Informing the Other Host about Another Potential Address
+
+ The set of IP addresses associated to a multihomed host may change
+ during the lifetime of an MPTCP connection. MPTCP supports the
+ addition and removal of addresses on a host both implicitly and
+ explicitly. If Host A has established a subflow starting at
+ address/port pair IP#-A1 and wants to open a second subflow starting
+ at address/port pair IP#-A2, it simply initiates the establishment of
+ the subflow as explained above. The remote host will then be
+ implicitly informed about the new address.
+
+ In some circumstances, a host may want to advertise to the remote
+ host the availability of an address without establishing a new
+ subflow -- for example, when a NAT prevents setup in one direction.
+ In the example below, Host A informs Host B about its alternative
+ IP address/port pair (IP#-A2). Host B may later send an MP_JOIN to
+ this new address. The ADD_ADDR option contains an HMAC to
+ authenticate the address as having been sent from the originator of
+ the connection. The receiver of this option echoes it back to the
+ client to indicate successful receipt. Further details are given in
+ Section 3.4.1.
+
+ Host A Host B
+ ------ ------
+ ADD_ADDR ->
+ [Echo-flag=0,
+ IP#-A2,
+ IP#-A2's Address ID,
+ HMAC of IP#-A2]
+
+ <- ADD_ADDR
+ [Echo-flag=1,
+ IP#-A2,
+ IP#-A2's Address ID,
+ HMAC of IP#-A2]
+
+ There is a corresponding signal for address removal, making use of
+ the Address ID that is signaled in the ADD_ADDR handshake. Further
+ details are given in Section 3.4.2.
+
+ Host A Host B
+ ------ ------
+ REMOVE_ADDR ->
+ [IP#-A2's Address ID]
+
+2.4. Data Transfer Using MPTCP
+
+ To ensure reliable, in-order delivery of data over subflows that may
+ appear and disappear at any time, MPTCP uses a 64-bit Data Sequence
+ Number (DSN) to number all data sent over the MPTCP connection. Each
+ subflow has its own 32-bit sequence number space, utilizing the
+ regular TCP sequence number header, and an MPTCP option maps the
+ subflow sequence space to the data sequence space. In this way, data
+ can be retransmitted on different subflows (mapped to the same DSN)
+ in the event of failure.
+
+ The Data Sequence Signal (DSS) carries the Data Sequence Mapping.
+ The Data Sequence Mapping consists of the subflow sequence number,
+ data sequence number, and length for which this mapping is valid.
+ This option can also carry a connection-level acknowledgment (the
+ "Data ACK") for the received DSN.
+
+ With MPTCP, all subflows share the same receive buffer and advertise
+ the same receive window. There are two levels of acknowledgment in
+ MPTCP. Regular TCP acknowledgments are used on each subflow to
+ acknowledge the reception of the segments sent over the subflow
+ independently of their DSN. In addition, there are connection-level
+ acknowledgments for the data sequence space. These acknowledgments
+ track the advancement of the bytestream and slide the receive window.
+
+ Further details are given in Section 3.3.
+
+ Host A Host B
+ ------ ------
+ DSS ->
+ [Data Sequence Mapping]
+ [Data ACK]
+ [Checksum]
+
+2.5. Requesting a Change in a Path's Priority
+
+ Hosts can indicate at initial subflow setup whether they wish the
+ subflow to be used as a regular or backup path -- a backup path only
+ being used if there are no regular paths available. During a
+ connection, Host A can request a change in the priority of a subflow
+ through the MP_PRIO signal to Host B. Further details are given in
+ Section 3.3.8.
+
+ Host A Host B
+ ------ ------
+ MP_PRIO ->
+
+2.6. Closing an MPTCP Connection
+
+ When a host wants to close an existing subflow but not the whole
+ connection, it can initiate a regular TCP FIN/ACK exchange.
+
+ When Host A wants to inform Host B that it has no more data to send,
+ it signals this "Data FIN" as part of the DSS (see above). It has
+ the same semantics and behavior as a regular TCP FIN, but at the
+ connection level. Once all the data on the MPTCP connection has been
+ successfully received, this message is acknowledged at the connection
+ level with a Data ACK. Further details are given in Section 3.3.3.
+
+ Host A Host B
+ ------ ------
+ DSS ->
+ [Data FIN]
+ <- DSS
+ [Data ACK]
+
+ There is an additional method of connection closure, referred to as
+ "Fast Close", which is analogous to closing a single-path TCP
+ connection with a RST signal. The MP_FASTCLOSE signal is used to
+ indicate to the peer that the connection will be abruptly closed and
+ no data will be accepted anymore. This can be used on an ACK (which
+ ensures reliability of the signal) or a RST (which does not). Both
+ examples are shown in the following diagrams. Further details are
+ given in Section 3.5.
+
+ Host A Host B
+ ------ ------
+ ACK + MP_FASTCLOSE ->
+ [B's key]
+
+ [RST on all other subflows] ->
+
+ <- [RST on all subflows]
+
+
+ Host A Host B
+ ------ ------
+ RST + MP_FASTCLOSE ->
+ [B's key] [on all subflows]
+
+ <- [RST on all subflows]
+
+2.7. Notable Features
+
+ It is worth highlighting that MPTCP's signaling has been designed
+ with several key requirements in mind:
+
+ * To cope with NATs on the path, addresses are referred to by
+ Address IDs, in case the IP packet's source address gets changed
+ by a NAT. Setting up a new TCP flow is not possible if the
+ receiver of the SYN is behind a NAT; to allow subflows to be
+ created when either end is behind a NAT, MPTCP uses the ADD_ADDR
+ message.
+
+ * MPTCP falls back to ordinary TCP if MPTCP operation is not
+ possible -- for example, if one host is not MPTCP capable or if a
+ middlebox alters the payload. This is discussed in Section 3.7.
+
+ * To address the threats identified in [RFC6181], the following
+ steps are taken: keys are sent in the clear in the MP_CAPABLE
+ messages; MP_JOIN messages are secured with HMAC-SHA256 ([RFC2104]
+ using the algorithm in [RFC6234]) using those keys; and standard
+ TCP validity checks are made on the other messages (ensuring that
+ sequence numbers are in-window [RFC5961]). Residual threats to
+ MPTCP v0 were identified in [RFC7430], and those affecting the
+ protocol (i.e., modifications to ADD_ADDR) have been incorporated
+ in this document. Further discussion of security can be found in
+ Section 5.
+
+3. MPTCP Operations: An Overview
+
+ This section describes the operation of MPTCP. The subsections below
+ discuss each key part of the protocol operation.
+
+ All MPTCP operations are signaled using optional TCP header fields.
+ A single TCP option number ("Kind") has been assigned by IANA for
+ MPTCP (see Section 7), and then individual messages will be
+ determined by a "subtype", the values of which are also stored in an
+ IANA registry (and are also listed in Section 7). As with all TCP
+ options, the Length field is specified in bytes and includes the
+ 2 bytes of Kind and Length.
+
+ Throughout this document, when reference is made to an MPTCP option
+ by symbolic name, such as "MP_CAPABLE", this refers to a TCP option
+ with the single MPTCP option type, and with the subtype value of the
+ symbolic name as defined in Section 7. This subtype is a 4-bit field
+ -- the first 4 bits of the option payload, as shown in Figure 3. The
+ MPTCP messages are defined in the following sections.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----------------------+
+ | Kind | Length |Subtype| |
+ +---------------+---------------+-------+ |
+ | Subtype-specific data |
+ | (variable length) |
+ +---------------------------------------------------------------+
+
+ Figure 3: MPTCP Option Format
+
+ Those MPTCP options associated with subflow initiation are used on
+ packets with the SYN flag set. Additionally, there is one MPTCP
+ option for signaling metadata to ensure that segmented data can be
+ recombined for delivery to the application.
+
+ The remaining options, however, are signals that do not need to be on
+ a specific packet, such as those for signaling additional addresses.
+ While an implementation may desire to send MPTCP options as soon as
+ possible, it may not be possible to combine all desired options (both
+ those for MPTCP and for regular TCP, such as SACK (selective
+ acknowledgment) [RFC2018]) on a single packet. Therefore, an
+ implementation may choose to send duplicate ACKs containing the
+ additional signaling information. This changes the semantics of a
+ duplicate ACK; these are usually only sent as a signal of a lost
+ segment [RFC5681] in regular TCP. Therefore, an MPTCP implementation
+ receiving a duplicate ACK that contains an MPTCP option MUST NOT
+ treat it as a signal of congestion. Additionally, an MPTCP
+ implementation SHOULD NOT send more than two duplicate ACKs in a row
+ for the purposes of sending MPTCP options alone, in order to ensure
+ that no middleboxes misinterpret this as a sign of congestion.
+
+ Furthermore, standard TCP validity checks (such as ensuring that the
+ sequence number and acknowledgment number are within the window) MUST
+ be undertaken before processing any MPTCP signals, as described in
+ [RFC5961], and initial subflow sequence numbers SHOULD be generated
+ according to the recommendations in [RFC6528].
+
+3.1. Connection Initiation
+
+ Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a
+ single path. Each packet contains the Multipath Capable (MP_CAPABLE)
+ MPTCP option (Figure 4). This option declares its sender capable of
+ performing Multipath TCP and wishes to do so on this particular
+ connection.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-------+---------------+
+ | Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H|
+ +---------------+---------------+-------+-------+---------------+
+ | Option Sender's Key (64 bits) |
+ | (if option Length > 4) |
+ | |
+ +---------------------------------------------------------------+
+ | Option Receiver's Key (64 bits) |
+ | (if option Length > 12) |
+ | |
+ +-------------------------------+-------------------------------+
+ | Data-Level Length (16 bits) | Checksum (16 bits, optional) |
+ +-------------------------------+-------------------------------+
+
+ Figure 4: Multipath Capable (MP_CAPABLE) Option
+
+ The MP_CAPABLE exchange in this specification (v1) is different than
+ that specified in v0. If a host supports multiple versions of MPTCP,
+ the sender of the MP_CAPABLE option SHOULD signal the highest version
+ number it supports. In return, in its MP_CAPABLE option, the
+ receiver will signal the version number it wishes to use, which MUST
+ be equal to or lower than the version number indicated in the initial
+ MP_CAPABLE. There is a caveat, though, with respect to this version
+ negotiation with old listeners that only support v0. A listener that
+ supports v0 expects that the MP_CAPABLE option in the SYN segment
+ will include the initiator's key. If, however, the initiator already
+ upgraded to v1, it won't include the key in the SYN segment. Thus,
+ the listener will ignore the MP_CAPABLE of this SYN segment and reply
+ with a SYN/ACK that does not include an MP_CAPABLE. The initiator
+ MAY choose to immediately fall back to TCP or MAY choose to attempt a
+ connection using MPTCP v0 (if the initiator supports v0), in order to
+ discover whether the listener supports the earlier version of MPTCP.
+ In general, an MPTCP v0 connection will likely be preferred over a
+ TCP connection; however, in a particular deployment scenario, it may
+ be known that the listener is unlikely to support MPTCP v0 and so the
+ initiator may prefer not to attempt a v0 connection. An initiator
+ MAY cache information for a peer about what version of MPTCP it
+ supports, if any, and use this information for future connection
+ attempts.
+
+ The MP_CAPABLE option is of variable length, with different fields
+ included, depending on which packet the option is used on. The full
+ MP_CAPABLE option is shown in Figure 4.
+
+ The MP_CAPABLE option is carried on the SYN, SYN/ACK, and ACK packets
+ that start the first subflow of an MPTCP connection, as well as the
+ first packet that carries data, if the initiator wishes to send
+ first. The data carried by each option is as follows, where
+ A = initiator and B = listener.
+
+ * SYN (A->B): only the first 4 octets (Length = 4).
+
+ * SYN/ACK (B->A): B's key for this connection (Length = 12).
+
+ * ACK (no data) (A->B): A's key followed by B's key (Length = 20).
+
+ * ACK (with first data) (A->B): A's key followed by B's key followed
+ by Data-Level Length, and optional Checksum (Length = 22 or 24).
+
+ The contents of the option are determined by the SYN and ACK flags of
+ the packet, along with the option's Length field. In Figure 4,
+ "Sender" and "Receiver" refer to the sender or receiver of the TCP
+ packet (which can be either host).
+
+ The initial SYN, containing just the MP_CAPABLE header, is used to
+ define the version of MPTCP being requested and also to exchange
+ flags to negotiate connection features, as described later.
+
+ This option is used to declare the 64-bit keys that the end hosts
+ have generated for this MPTCP connection. These keys are used to
+ authenticate the addition of future subflows to this connection.
+ This is the only time the key will be sent in the clear on the wire
+ (unless "Fast Close" (Section 3.5) is used); all future subflows will
+ identify the connection using a 32-bit "token". This token is a
+ cryptographic hash of this key. The algorithm for this process is
+ dependent on the authentication algorithm selected; the method of
+ selection is defined later in this section.
+
+ Upon reception of the initial SYN segment, a stateful server
+ generates a random key and replies with a SYN/ACK. The key's method
+ of generation is implementation specific. The key MUST be hard to
+ guess, and it MUST be unique for the sending host across all its
+ current MPTCP connections. Recommendations for generating random
+ numbers for use in keys are given in [RFC4086]. Connections will be
+ indexed at each host by the token (a one-way hash of the key).
+ Therefore, an implementation will require a mapping from each token
+ to the corresponding connection, and in turn to the keys for the
+ connection.
+
+ There is a risk that two different keys will hash to the same token.
+ The risk of hash collisions is usually small, unless the host is
+ handling many tens of thousands of connections. Therefore, an
+ implementation SHOULD check its list of connection tokens to ensure
+ that there is no collision before sending its key, and if there is,
+ then it should generate a new key. This would, however, be costly
+ for a server with thousands of connections. The subflow handshake
+ mechanism (Section 3.2) will ensure that new subflows only join the
+ correct connection, however, through the cryptographic handshake, as
+ well as checking the connection tokens in both directions, and
+ ensuring that sequence numbers are in-window. So, in the worst case,
+ if there was a token collision, the new subflow would not succeed,
+ but the MPTCP connection would continue to provide a regular TCP
+ service.
+
+ Since key generation is implementation specific, there is no
+ requirement that they simply be random numbers. An implementation is
+ free to exchange cryptographic material out of band and generate
+ these keys from this material, in order to provide additional
+ mechanisms by which to verify the identity of the communicating
+ entities. For example, an implementation could choose to link its
+ MPTCP keys to those used in higher-layer TLS or SSH connections.
+
+ If the server behaves in a stateless manner, it has to generate its
+ own key in a verifiable fashion. This verifiable way of generating
+ the key can be done by using a hash of the 4-tuple, sequence number,
+ and a local secret (similar to what is done for the TCP sequence
+ number [RFC4987]). It will thus be able to verify whether it is
+ indeed the originator of the key echoed back in the subsequent
+ MP_CAPABLE option. As for a stateful server, the tokens SHOULD be
+ checked for uniqueness; however, if uniqueness is not met and there
+ is no way to generate an alternative verifiable key, then the
+ connection MUST fall back to using regular TCP by not sending an
+ MP_CAPABLE in the SYN/ACK.
+
+ The ACK carries both A's key and B's key. This is the first time
+ that A's key is seen on the wire, although it is expected that A will
+ have generated a key locally before the initial SYN. The echoing of
+ B's key allows B to operate statelessly, as described above.
+ Therefore, A's key must be delivered reliably to B, and in order to
+ do this, the transmission of this packet must be made reliable.
+
+ If B has data to send first, then the reliable delivery of the
+ ACK + MP_CAPABLE is ensured by the receipt of this data with an MPTCP
+ Data Sequence Signal (DSS) option (Section 3.3) containing a DATA_ACK
+ for the MP_CAPABLE (which is the first octet of the data sequence
+ space). If, however, A wishes to send data first, it has two options
+ to ensure the reliable delivery of the ACK + MP_CAPABLE. If it
+ immediately has data to send, then the first ACK (with data) would
+ also contain an MP_CAPABLE option with additional data parameters
+ (the Data-Level Length and optional Checksum as shown in Figure 4).
+ If A does not immediately have data to send, it MUST include the
+ MP_CAPABLE on the first ACK, but without the additional data
+ parameters. When A does have data to send, it must repeat the
+ sending of the MP_CAPABLE option from the first ACK, with additional
+ data parameters. This MP_CAPABLE option is used in place of the DSS
+ and simply specifies (1) the Data-Level Length of the payload and
+ (2) the checksum (if the use of checksums is negotiated). This is
+ the minimal data required to establish an MPTCP connection -- it
+ allows validation of the payload, and given that it is the first
+ data, the Initial Data Sequence Number (IDSN) is also known (as it is
+ generated from the key, as described below). Conveying the keys on
+ the first data packet allows the TCP reliability mechanisms to ensure
+ that the packet is successfully delivered. The receiver will
+ acknowledge this data at the connection level with a Data ACK, as if
+ a DSS option has been received.
+
+ There could be situations where both A and B attempt to transmit
+ initial data at the same time. For example, if A did not initially
+ have data to send but then needed to transmit data before it had
+ received anything from B, it would use an MP_CAPABLE option with data
+ parameters (since it would not know if the MP_CAPABLE on the ACK was
+ received). In such a situation, B may also have transmitted data
+ with a DSS option, but it had not yet been received at A. Therefore,
+ B has received data with an MP_CAPABLE mapping after it has sent data
+ with a DSS option. To ensure that these situations can be handled,
+ it follows that the data parameters in an MP_CAPABLE are semantically
+ equivalent to those in a DSS option and can be used interchangeably.
+ Similar situations could occur when the MP_CAPABLE with data is lost
+ and retransmitted. Furthermore, in the case of TCP segmentation
+ offloading, the MP_CAPABLE with data parameters may be duplicated
+ across multiple packets, and implementations must also be able to
+ cope with duplicate MP_CAPABLE mappings as well as duplicate DSS
+ mappings.
+
+ Additionally, the MP_CAPABLE exchange allows the safe passage of
+ MPTCP options on SYN packets to be determined. If any of these
+ options are dropped, MPTCP will gracefully fall back to regular
+ single-path TCP, as documented in Section 3.7. If at any point in
+ the handshake either party thinks the MPTCP negotiation is
+ compromised -- for example, by a middlebox corrupting the TCP options
+ or by unexpected ACK numbers being present -- the host MUST stop
+ using MPTCP and no longer include MPTCP options in future TCP
+ packets. The other host will then also fall back to regular TCP
+ using the fallback mechanism. Note that new subflows MUST NOT be
+ established (using the process documented in Section 3.2) until a DSS
+ option has been successfully received across the path (as documented
+ in Section 3.3).
+
+ Like all MPTCP options, the MP_CAPABLE option starts with the Kind
+ and Length to specify the TCP option's kind and length. This
+ information is followed by the MP_CAPABLE option. The first 4 bits
+ of the first octet in the MP_CAPABLE option (Figure 4) define the
+ MPTCP Option Subtype (see Section 7; for MP_CAPABLE, this value is
+ 0x0), and the remaining 4 bits of this octet specify the MPTCP
+ version in use (for this specification, this value is 1).
+
+ The second octet is reserved for flags, allocated as follows:
+
+ A: The leftmost bit, labeled "A", SHOULD be set to 1 to
+ indicate "Checksum required", unless the system
+ administrator has decided that checksums are not
+ required (for example, if the environment is controlled
+ and no middleboxes exist that might adjust the
+ payload).
+
+ B: The second bit, labeled "B", is an extensibility flag.
+ It MUST be set to 0 for current implementations. This
+ flag will be used for an extensibility mechanism in a
+ future specification, and the impact of this flag will
+ be defined at a later date. It is expected, but not
+ mandated, that this flag would be used as part of an
+ alternative security mechanism that does not require a
+ full version upgrade of the protocol but does require
+ redefining some elements of the handshake. If
+ receiving a message with the "B" flag set to 1 and this
+ is not understood, then the MP_CAPABLE in this SYN MUST
+ be silently ignored, which triggers a fallback to
+ regular TCP; the sender is expected to retry with a
+ format compatible with this legacy specification. Note
+ that the length of the MP_CAPABLE option, and the
+ meanings of bits "D" through "H", may be altered by
+ setting B=1.
+
+ C: The third bit, labeled "C", is set to 1 to indicate
+ that the sender of this option will not accept
+ additional MPTCP subflows to the source address and
+ port, and therefore the receiver MUST NOT try to open
+ any additional subflows toward this address and port.
+ This improves efficiency in situations where the sender
+ knows a restriction is in place -- for example, if the
+ sender is behind a strict NAT or operating behind a
+ legacy Layer 4 load balancer.
+
+ D through H: The remaining bits, labeled "D" through "H", are used
+ for crypto algorithm negotiation. In this
+ specification, only the rightmost bit, labeled "H", is
+ assigned. Bit "H" indicates the use of HMAC-SHA256 (as
+ defined in Section 3.2). An implementation that only
+ supports this method MUST set bit "H" to 1 and bits "D"
+ through "G" to 0.
+
+ A crypto algorithm MUST be specified. If flag bits "D" through "H"
+ are all 0, the MP_CAPABLE option MUST be treated as invalid and
+ ignored (that is, it must be treated as a regular TCP handshake).
+
+ The selection of the authentication algorithm also impacts the
+ algorithm used to generate the token and the IDSN. In this
+ specification, with only the SHA-256 algorithm (bit "H") specified
+ and selected, the token MUST be a truncated (most significant
+ 32 bits) SHA-256 hash [RFC6234] of the key. A different, 64-bit
+ truncation (the least significant 64 bits) of the SHA-256 hash of the
+ key MUST be used as the IDSN. Note that the key MUST be hashed in
+ network byte order. Also note that the "least significant" bits MUST
+ be the rightmost bits of the SHA-256 digest, as per [RFC6234].
+ Future specifications of the use of the crypto bits may choose to
+ specify different algorithms for token and IDSN generation.
+
+ Both the crypto and checksum bits negotiate capabilities in similar
+ ways. For the "Checksum required" bit (labeled "A"), if either host
+ requires the use of checksums, checksums MUST be used. In other
+ words, the only way for checksums not to be used is if both hosts in
+ their SYNs set A=0. This decision is confirmed by the setting of the
+ "A" bit in the third packet (the ACK) of the handshake. For example,
+ if the initiator sets A=0 in the SYN but the responder sets A=1 in
+ the SYN/ACK, checksums MUST be used in both directions, and the
+ initiator will set A=1 in the ACK. The decision regarding whether to
+ use checksums will be stored by an implementation in a per-connection
+ binary state variable. If A=1 is received by a host that does not
+ want to use checksums, it MUST fall back to regular TCP by ignoring
+ the MP_CAPABLE option as if it was invalid.
+
+ For crypto negotiation, the responder has the choice. The initiator
+ creates a proposal setting a bit for each algorithm it supports to 1
+ (in this version of the specification, there is only one proposal, so
+ bit "H" will always be set to 1). The responder responds with only
+ 1 bit set -- this is the chosen algorithm. The rationale for this
+ behavior is that the responder will typically be a server with
+ potentially many thousands of connections, so it may wish to choose
+ an algorithm with minimal computational complexity, depending on the
+ load. If a responder does not support (or does not want to support)
+ any of the initiator's proposals, it MUST respond without an
+ MP_CAPABLE option, thus forcing a fallback to regular TCP.
+
+ The MP_CAPABLE option is only used in the first subflow of a
+ connection, in order to identify the connection; all subsequent
+ subflows will use the MP_JOIN option (see Section 3.2) to join the
+ existing connection.
+
+ If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it
+ is assumed that the sender of the SYN/ACK is not multipath capable;
+ thus, the MPTCP session MUST operate as a regular, single-path TCP
+ session. If a SYN does not contain an MP_CAPABLE option, the SYN/ACK
+ MUST NOT contain one in response. If the third packet (the ACK) does
+ not contain the MP_CAPABLE option, then the session MUST fall back to
+ operating as a regular, single-path TCP session. This is done to
+ maintain compatibility with middleboxes on the path that drop some or
+ all TCP options. Note that an implementation MAY choose to attempt
+ sending MPTCP options more than one time before making this decision
+ to operate as regular TCP (see Section 3.9).
+
+ If the SYN packets are unacknowledged, it is up to local policy to
+ decide how to respond. It is expected that a sender will eventually
+ fall back to single-path TCP (i.e., without the MP_CAPABLE option) in
+ order to work around middleboxes that may drop packets with unknown
+ options; however, the number of multipath-capable attempts that are
+ made first will be up to local policy. It is possible that MPTCP and
+ non-MPTCP SYNs could get reordered in the network. Therefore, the
+ final state is inferred from the presence or absence of the
+ MP_CAPABLE option in the third packet of the TCP handshake. If this
+ option is not present, the connection SHOULD fall back to regular
+ TCP, as documented in Section 3.7.
+
+ The IDSN on an MPTCP connection is generated from the key. The
+ algorithm for IDSN generation is also determined from the negotiated
+ authentication algorithm. In this specification, with only the
+ SHA-256 algorithm specified and selected, the IDSN of a host MUST be
+ the least significant 64 bits of the SHA-256 hash of its key, i.e.,
+ IDSN-A = Hash(Key-A) and IDSN-B = Hash(Key-B). This deterministic
+ generation of the IDSN allows a receiver to ensure that there are no
+ gaps in sequence space at the start of the connection. The SYN with
+ MP_CAPABLE occupies the first octet of data sequence space, although
+ this does not need to be acknowledged at the connection level until
+ the first data is sent (see Section 3.3).
+
+3.2. Starting a New Subflow
+
+ Once an MPTCP connection has begun with the MP_CAPABLE exchange,
+ further subflows can be added to the connection. Hosts have
+ knowledge of their own address(es) and can become aware of the other
+ host's addresses through signaling exchanges as described in
+ Section 3.4. Using this knowledge, a host can initiate a new subflow
+ over a currently unused pair of addresses. It is permissible for
+ either host in a connection to initiate the creation of a new
+ subflow, but it is expected that this will normally be the original
+ connection initiator (see Section 3.9 for heuristics).
+
+ A new subflow is started as a normal TCP SYN/ACK exchange. The Join
+ Connection (MP_JOIN) MPTCP option is used to identify the connection
+ to be joined by the new subflow. It uses keying material that was
+ exchanged in the initial MP_CAPABLE handshake (Section 3.1), and that
+ handshake also negotiates the crypto algorithm in use for the MP_JOIN
+ handshake.
+
+ This section specifies the behavior of MP_JOIN using the HMAC-SHA256
+ algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK
+ of the three-way handshake, although in each case with a different
+ format.
+
+ In the first MP_JOIN on the SYN packet, illustrated in Figure 5, the
+ initiator sends a token, random number, and Address ID.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----+-+---------------+
+ | Kind | Length = 12 |Subtype|(rsv)|B| Address ID |
+ +---------------+---------------+-------+-----+-+---------------+
+ | Receiver's Token (32 bits) |
+ +---------------------------------------------------------------+
+ | Sender's Random Number (32 bits) |
+ +---------------------------------------------------------------+
+
+ Figure 5: Join Connection (MP_JOIN) Option (for Initial SYN)
+
+ The token is used to identify the MPTCP connection and is a
+ cryptographic hash of the receiver's key, as exchanged in the initial
+ MP_CAPABLE handshake (Section 3.1). In this specification, the
+ tokens presented in this option are generated by the SHA-256
+ algorithm [RFC6234], truncated to the most significant 32 bits. The
+ token included in the MP_JOIN option is the token that the receiver
+ of the packet uses to identify this connection; i.e., Host A will
+ send Token-B (which is generated from Key-B). Note that the hash
+ generation algorithm can be overridden by the choice of cryptographic
+ handshake algorithm, as defined in Section 3.1.
+
+ The MP_JOIN SYN sends not only the token (which is static for a
+ connection) but also random numbers (nonces) that are used to prevent
+ replay attacks on the authentication method. Recommendations for the
+ generation of random numbers for this purpose are given in [RFC4086].
+
+ The MP_JOIN option includes an "Address ID". This is an identifier
+ generated by the sender of the option, used to identify the source
+ address of this packet, even if the IP header has been changed in
+ transit by a middlebox. The numeric value of this field is generated
+ by the sender and must map uniquely to a source IP address for the
+ sending host. The Address ID allows address removal (Section 3.4.2)
+ without needing to know what the source address at the receiver is,
+ thus allowing address removal through NATs. The Address ID also
+ allows correlation between new subflow setup attempts and address
+ signaling (Section 3.4.1), to prevent setting up duplicate subflows
+ on the same path, if an MP_JOIN and ADD_ADDR are sent at the same
+ time.
+
+ The Address IDs of the subflow used in the initial SYN exchange of
+ the first subflow in the connection are implicit and have the value
+ zero. A host MUST store the mappings between Address IDs and
+ addresses both for itself and the remote host. An implementation
+ will also need to know which local and remote Address IDs are
+ associated with which established subflows, for when addresses are
+ removed from a local or remote host.
+
+ The MP_JOIN option on packets with the SYN flag set also includes
+ 4 bits of flags, 3 of which are currently reserved and MUST be set to
+ 0 by the sender. The final bit, labeled "B", indicates whether the
+ sender of this option (1) wishes this subflow to be used as a backup
+ path (B=1) in the event of failure of other paths or (2) wants the
+ subflow to be used as part of the connection immediately. By setting
+ B=1, the sender of the option is requesting that the other host only
+ send data on this subflow if there are no available subflows where
+ B=0. Subflow policy is discussed in more detail in Section 3.3.8.
+
+ When receiving a SYN with an MP_JOIN option that contains a valid
+ token for an existing MPTCP connection, the recipient SHOULD respond
+ with a SYN/ACK also containing an MP_JOIN option containing a random
+ number and a truncated (leftmost 64 bits) HMAC. This version of the
+ option is shown in Figure 6. If the token is unknown or the host
+ wants to refuse subflow establishment (for example, due to a limit on
+ the number of subflows it will permit), the receiver will send back a
+ reset (RST) signal, analogous to an unknown port in TCP, containing
+ an MP_TCPRST option (Section 3.6) with an "MPTCP specific error"
+ reason code. Although calculating an HMAC requires cryptographic
+ operations, it is believed that the 32-bit token in the MP_JOIN SYN
+ gives sufficient protection against blind state exhaustion attacks;
+ therefore, there is no need to provide mechanisms to allow a
+ responder to operate statelessly at the MP_JOIN stage.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----+-+---------------+
+ | Kind | Length = 16 |Subtype|(rsv)|B| Address ID |
+ +---------------+---------------+-------+-----+-+---------------+
+ | |
+ | Sender's Truncated HMAC (64 bits) |
+ | |
+ +---------------------------------------------------------------+
+ | Sender's Random Number (32 bits) |
+ +---------------------------------------------------------------+
+
+ Figure 6: Join Connection (MP_JOIN) Option (for Responding SYN/ACK)
+
+ An HMAC is sent by both hosts -- by the initiator (Host A) in the
+ third packet (the ACK) and by the responder (Host B) in the second
+ packet (the SYN/ACK). Doing the HMAC exchange at this stage allows
+ both hosts to have first exchanged random data (in the first two SYN
+ packets) that is used as the "message". This specification defines
+ that HMAC as defined in [RFC2104] is used, along with the SHA-256
+ hash algorithm [RFC6234], and that the output is truncated to the
+ leftmost 160 bits (20 octets). Due to option space limitations, the
+ HMAC included in the SYN/ACK is truncated to the leftmost 64 bits,
+ but this is acceptable, since random numbers are used; thus, an
+ attacker only has one chance to correctly guess the HMAC that matches
+ the random number previously sent by the peer (if the HMAC is
+ incorrect, the TCP connection is closed, so a new MP_JOIN negotiation
+ with a new random number is required).
+
+ The initiator's authentication information is sent in its first ACK
+ (the third packet of the handshake), as shown in Figure 7. This data
+ needs to be sent reliably, since it is the only time this HMAC is
+ sent; therefore, receipt of this packet MUST trigger a regular TCP
+ ACK in response, and the packet MUST be retransmitted if this ACK is
+ not received. In other words, sending the ACK/MP_JOIN packet places
+ the subflow in the PRE_ESTABLISHED state, and it moves to the
+ ESTABLISHED state only on receipt of an ACK from the receiver. It is
+ not permissible to send data while in the PRE_ESTABLISHED state. The
+ reserved bits in this option MUST be set to 0 by the sender.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----------------------+
+ | Kind | Length = 24 |Subtype| (reserved) |
+ +---------------+---------------+-------+-----------------------+
+ | |
+ | |
+ | Sender's Truncated HMAC (160 bits) |
+ | |
+ | |
+ +---------------------------------------------------------------+
+
+ Figure 7: Join Connection (MP_JOIN) Option
+ (for Initiator's First ACK)
+
+ The key for the HMAC algorithm, in the case of the message
+ transmitted by Host A, will be Key-A followed by Key-B; and in the
+ case of Host B, Key-B followed by Key-A. These are the keys that
+ were exchanged in the original MP_CAPABLE handshake. The "message"
+ for the HMAC algorithm in each case is the concatenations of random
+ numbers for each host (denoted by R): for Host A, R-A followed by
+ R-B; and for Host B, R-B followed by R-A.
+
+ These various MPTCP options fit together to enable authenticated
+ subflow setup as illustrated in Figure 8.
+
+ Host A Host B
+ ------------------------ ----------
+ Address A1 Address A2 Address B1
+ ---------- ---------- ----------
+ | | |
+ | | SYN + MP_CAPABLE |
+ |--------------------------------------------->|
+ |<---------------------------------------------|
+ | SYN/ACK + MP_CAPABLE(Key-B) |
+ | | |
+ | ACK + MP_CAPABLE(Key-A, Key-B) |
+ |--------------------------------------------->|
+ | | |
+ | | SYN + MP_JOIN(Token-B, R-A) |
+ | |------------------------------->|
+ | |<-------------------------------|
+ | | SYN/ACK + MP_JOIN(HMAC-B, R-B) |
+ | | |
+ | | ACK + MP_JOIN(HMAC-A) |
+ | |------------------------------->|
+ | |<-------------------------------|
+ | | ACK |
+
+ HMAC-A = HMAC(Key=(Key-A + Key-B), Msg=(R-A + R-B))
+ HMAC-B = HMAC(Key=(Key-B + Key-A), Msg=(R-B + R-A))
+
+ Figure 8: Example Use of MPTCP Authentication
+
+ If the token received at Host B is unknown or local policy prohibits
+ the acceptance of the new subflow, the recipient MUST respond with a
+ TCP RST for the subflow. If appropriate, an MP_TCPRST option with an
+ "Administratively prohibited" reason code (Section 3.6) should be
+ included.
+
+ If the token is accepted at Host B but the HMAC returned to Host A
+ does not match the one expected, Host A MUST close the subflow with a
+ TCP RST. In this and all subsequent cases of sending a RST as
+ described in this section, the sender SHOULD send an MP_TCPRST option
+ (Section 3.6) on this RST packet with the reason code for an "MPTCP-
+ specific error".
+
+ If Host B does not receive the expected HMAC or the MP_JOIN option is
+ missing from the ACK, it MUST close the subflow with a TCP RST.
+
+ If the HMACs are verified as correct, then both hosts have verified
+ each other as being the same peers as those that existed at the start
+ of the connection, and they have agreed of which connection this
+ subflow will become a part.
+
+ If the SYN/ACK as received at Host A does not have an MP_JOIN option,
+ Host A MUST close the subflow with a TCP RST.
+
+ This covers all cases of the loss of an MP_JOIN. In more detail, if
+ an MP_JOIN is stripped from the SYN on the path from A to B and
+ Host B does not have a listener on the relevant port, it will respond
+ with a RST in the normal way. If in response to a SYN with an
+ MP_JOIN option a SYN/ACK is received without the MP_JOIN option
+ (because it was either stripped on the return path, or stripped on
+ the outgoing path leading to Host B responding as if it was a new
+ regular TCP session), then the subflow is unusable and Host A MUST
+ close it with a RST.
+
+ Note that additional subflows can be created between any pair of
+ ports (but see Section 3.9 for heuristics); no explicit application-
+ level accept calls or bind calls are required to open additional
+ subflows. To associate a new subflow with an existing connection,
+ the token supplied in the subflow's SYN exchange is used for
+ demultiplexing. This then binds the 5-tuple of the TCP subflow to
+ the local token of the connection. One consequence is that it is
+ possible to allow any port pairs to be used for a connection.
+
+ Demultiplexing subflow SYNs MUST be done using the token; this is
+ unlike traditional TCP, where the destination port is used for
+ demultiplexing SYN packets. Once a subflow is set up, demultiplexing
+ packets is done using the 5-tuple, as in traditional TCP. The
+ 5-tuples will be mapped to the local connection identifier (token).
+ Note that Host A will know its local token for the subflow even
+ though it is not sent on the wire -- only the responder's token is
+ sent.
+
+3.3. MPTCP Operation and Data Transfer
+
+ This section discusses the operation of MPTCP for data transfer. At
+ a high level, an MPTCP implementation will take one input data stream
+ from an application and split it into one or more subflows, with
+ sufficient control information to allow it to be reassembled and
+ delivered reliably and in order to the recipient application. The
+ following subsections define this behavior in detail.
+
+ The Data Sequence Mapping and the Data ACK are signaled in the DSS
+ option (Figure 9). Either or both can be signaled in one DSS,
+ depending on the flags set. The Data Sequence Mapping defines how
+ the sequence space on the subflow maps to the connection level, and
+ the Data ACK acknowledges receipt of data at the connection level.
+ These functions are described in more detail in the following two
+ subsections.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+----------------------+
+ | Kind | Length |Subtype| (reserved) |F|m|M|a|A|
+ +---------------+---------------+-------+----------------------+
+ | Data ACK (4 or 8 octets, depending on flags) |
+ +--------------------------------------------------------------+
+ | Data Sequence Number (4 or 8 octets, depending on flags) |
+ +--------------------------------------------------------------+
+ | Subflow Sequence Number (4 octets) |
+ +-------------------------------+------------------------------+
+ | Data-Level Length (2 octets) | Checksum (2 octets) |
+ +-------------------------------+------------------------------+
+
+ Figure 9: Data Sequence Signal (DSS) Option
+
+ The flags, when set, define the contents of this option, as follows:
+
+ * A = Data ACK present
+
+ * a = Data ACK is 8 octets (if not set, Data ACK is 4 octets)
+
+ * M = Data Sequence Number (DSN), Subflow Sequence Number (SSN),
+ Data-Level Length, and Checksum (if negotiated) present
+
+ * m = Data Sequence Number is 8 octets (if not set, DSN is 4 octets)
+
+ The flags "a" and "m" only have meaning if the corresponding "A" or
+ "M" flags are set; otherwise, they will be ignored. The maximum
+ length of this option, with all flags set, is 28 octets.
+
+ The "F" flag indicates "Data FIN". If present, this means that this
+ mapping covers the final data from the sender. This is the
+ connection-level equivalent of the FIN flag in single-path TCP. A
+ connection is not closed unless there has been a Data FIN exchange,
+ an MP_FASTCLOSE (Section 3.5) message, or an implementation-specific
+ connection-level send timeout. The purpose of the Data FIN and the
+ interactions between this flag, the subflow-level FIN flag, and the
+ Data Sequence Mapping are described in Section 3.3.3. The remaining
+ reserved bits MUST be set to 0 by an implementation of this
+ specification.
+
+ Note that the checksum is only present in this option if the use of
+ MPTCP checksumming has been negotiated at the MP_CAPABLE handshake
+ (see Section 3.1). The presence of the checksum can be inferred from
+ the length of the option. If a checksum is present but its use had
+ not been negotiated in the MP_CAPABLE handshake, the receiver MUST
+ close the subflow with a RST, as it is not behaving as negotiated.
+ If a checksum is not present when its use has been negotiated, the
+ receiver MUST close the subflow with a RST, as it is considered
+ broken. In both cases, this RST SHOULD be accompanied by an
+ MP_TCPRST option (Section 3.6) with the reason code for an "MPTCP-
+ specific error".
+
+3.3.1. Data Sequence Mapping
+
+ The data stream as a whole can be reassembled through the use of the
+ Data Sequence Mapping components of the DSS option (Figure 9), which
+ define the mapping from the subflow sequence number to the data
+ sequence number. This is used by the receiver to ensure in-order
+ delivery to the application layer. Meanwhile, the subflow-level
+ sequence numbers (i.e., the regular sequence numbers in the TCP
+ header) are only relevant to the subflow. It is expected (but not
+ mandated) that SACK [RFC2018] will be used at the subflow level to
+ improve efficiency.
+
+ The Data Sequence Mapping specifies a mapping from the subflow
+ sequence space to the data sequence space. This is expressed in
+ terms of starting sequence numbers for the subflow and the data
+ level, and a length of bytes for which this mapping is valid. This
+ explicit mapping for a range of data, rather than per-packet
+ signaling, was chosen to assist with compatibility with situations
+ where TCP/IP segmentation or coalescing is undertaken separately from
+ the stack that is generating the data flow (e.g., through the use of
+ TCP segmentation offloading on network interface cards, or by
+ middleboxes such as Performance Enhancing Proxies (PEPs) [RFC3135]).
+ It also allows a single mapping to cover many packets; this may be
+ useful in bulk-transfer situations.
+
+ A mapping is fixed, in that the subflow sequence number is bound to
+ the data sequence number after the mapping has been processed. A
+ sender MUST NOT change this mapping after it has been declared;
+ however, the same data sequence number can be mapped to by different
+ subflows for retransmission purposes (see Section 3.3.6). This would
+ also permit the same data to be sent simultaneously on multiple
+ subflows for resilience or efficiency purposes, especially in the
+ case of lossy links. Although the detailed specification of such
+ operation is outside the scope of this document, an implementation
+ SHOULD treat the first data that is received at a subflow for the
+ data sequence space as the data that should be delivered to the
+ application, and any subsequent data for that sequence space SHOULD
+ be ignored.
+
+ The data sequence number is specified as an absolute value, whereas
+ the subflow sequence numbering is relative (the SYN at the start of
+ the subflow has a relative subflow sequence number of 0). This is
+ done to allow middleboxes to change the Initial Sequence Number (ISN)
+ of a subflow, such as firewalls that undertake ISN randomization.
+
+ The Data Sequence Mapping also contains a checksum of the data that
+ this mapping covers, if the use of checksums has been negotiated at
+ the MP_CAPABLE exchange. Checksums are used to detect if the payload
+ has been adjusted in any way by a non-MPTCP-aware middlebox. If this
+ checksum fails, it will trigger a failure of the subflow, or a
+ fallback to regular TCP, as documented in Section 3.7, since MPTCP
+ can no longer reliably know the subflow sequence space at the
+ receiver to build Data Sequence Mappings. Without checksumming
+ enabled, corrupt data may be delivered to the application if a
+ middlebox alters segment boundaries, alters content, or does not
+ deliver all segments covered by a Data Sequence Mapping. It is
+ therefore RECOMMENDED that checksumming be used, unless it is known
+ that the network path contains no such devices.
+
+ The checksum algorithm used is the standard TCP checksum [RFC0793],
+ operating over the data covered by this mapping, along with a
+ pseudo-header as shown in Figure 10.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +--------------------------------------------------------------+
+ | |
+ | Data Sequence Number (8 octets) |
+ | |
+ +--------------------------------------------------------------+
+ | Subflow Sequence Number (4 octets) |
+ +-------------------------------+------------------------------+
+ | Data-Level Length (2 octets) | Zeros (2 octets) |
+ +-------------------------------+------------------------------+
+
+ Figure 10: Pseudo-Header for DSS Checksum
+
+ Note that the data sequence number used in the pseudo-header is
+ always the 64-bit value, irrespective of what length is used in the
+ DSS option itself. The standard TCP checksum algorithm has been
+ chosen, since it will be calculated anyway for the TCP subflow, and
+ if calculated first over the data before adding the pseudo-headers,
+ it only needs to be calculated once. Furthermore, since the TCP
+ checksum is additive, the checksum for a DSN_MAP can be constructed
+ by simply adding together the checksums for the data of each
+ constituent TCP segment and adding the checksum for the DSS
+ pseudo-header.
+
+ Note that checksumming relies on the TCP subflow containing
+ contiguous data; therefore, a TCP subflow MUST NOT use the Urgent
+ Pointer to interrupt an existing mapping. Further note, however,
+ that if Urgent data is received on a subflow, it SHOULD be mapped to
+ the data sequence space and delivered to the application, analogous
+ to Urgent data in regular TCP.
+
+ To avoid possible deadlock scenarios, subflow-level processing should
+ be undertaken separately from processing at the connection level.
+ Therefore, even if a mapping does not exist from the subflow space to
+ the data-level space, the data SHOULD still be ACKed at the subflow
+ (if it is in-window). This data cannot, however, be acknowledged at
+ the data level (Section 3.3.2) because its data sequence numbers are
+ unknown. Implementations MAY hold onto such unmapped data for a
+ short while, in the expectation that a mapping will arrive shortly.
+ Such unmapped data cannot be counted as being within the connection-
+ level receive window because this is relative to the data sequence
+ numbers, so if the receiver runs out of memory to hold this data, it
+ will have to be discarded. If a mapping for that subflow-level
+ sequence space does not arrive within a receive window of data, that
+ subflow SHOULD be treated as broken, closed with a RST, and any
+ unmapped data silently discarded.
+
+ Data sequence numbers are always 64-bit quantities and MUST be
+ maintained as such in implementations. If a connection is
+ progressing at a slow rate, so protection against wrapped sequence
+ numbers is not required, then an implementation MAY include just the
+ lower 32 bits of the data sequence number in the Data Sequence
+ Mapping and/or Data ACK as an optimization, and an implementation can
+ make this choice independently for each packet. An implementation
+ MUST be able to receive and process both 64-bit and 32-bit sequence
+ number values, but it is not required that an implementation be able
+ to send both.
+
+ An implementation MUST send the full 64-bit data sequence number if
+ it is transmitting at a sufficiently high rate that the 32-bit value
+ could wrap within the Maximum Segment Lifetime (MSL) [RFC7323]. The
+ lengths of the DSNs used in these values (which may be different) are
+ declared with flags in the DSS option. Implementations MUST accept a
+ 32-bit DSN and implicitly promote it to a 64-bit quantity by
+ incrementing the upper 32 bits of the sequence number each time the
+ lower 32 bits wrap. A sanity check MUST be implemented to ensure
+ that a wrap occurs at an expected time (e.g., the sequence number
+ jumps from a very high number to a very low number) and is not
+ triggered by out-of-order packets.
+
+ As with the standard TCP sequence number, the data sequence number
+ should not start at zero, but at a random value to make blind session
+ hijacking harder. This specification requires setting the IDSN of
+ each host to the least significant 64 bits of the SHA-256 hash of the
+ host's key, as described in Section 3.1. This is also required in
+ order for the receiver to know what the expected IDSN is and thus
+ determine if any initial connection-level packets are missing; this
+ is particularly relevant if two subflows start transmitting
+ simultaneously.
+
+ The mapping provided by a Data Sequence Mapping MUST apply to some or
+ all of the subflow sequence space in the TCP segment that carries the
+ option. It does not need to be included in every MPTCP packet, as
+ long as the subflow sequence space in that packet is covered by a
+ mapping known at the receiver. This can be used to reduce overhead
+ in cases where the mapping is known in advance. One such case is
+ when there is a single subflow between the hosts, and another is when
+ segments of data are scheduled in larger-than-packet-sized chunks.
+
+ An "infinite" mapping can be used to fall back to regular TCP by
+ mapping the subflow-level data to the connection-level data for the
+ remainder of the connection (see Section 3.7). This is achieved by
+ setting the Data-Level Length field of the DSS option to the reserved
+ value of 0. The checksum, in such a case, will also be set to 0.
+
+3.3.2. Data Acknowledgments
+
+ To provide full end-to-end resilience, MPTCP provides a connection-
+ level acknowledgment, to act as a cumulative ACK for the connection
+ as a whole. This is done via the "Data ACK" field of the DSS option
+ (Figure 9). The Data ACK is analogous to the behavior of the
+ standard TCP cumulative ACK -- indicating how much data has been
+ successfully received (with no holes). This can be compared to the
+ subflow-level ACK, which acts in a fashion analogous to TCP SACK,
+ given that there may still be holes in the data stream at the
+ connection level. The Data ACK specifies the next data sequence
+ number it expects to receive.
+
+ The Data ACK, as for the DSN, can be sent as the full 64-bit value or
+ as the lower 32 bits. If data is received with a 64-bit DSN, it MUST
+ be acknowledged with a 64-bit Data ACK. If the DSN received is
+ 32 bits, an implementation can choose whether to send a 32-bit or
+ 64-bit Data ACK, and an implementation MUST accept either in this
+ situation.
+
+ The Data ACK proves that the data, and all required MPTCP signaling,
+ have been received and accepted by the remote end. One key use of
+ the Data ACK signal is that it is used to indicate the left edge of
+ the advertised receive window. As explained in Section 3.3.4, the
+ receive window is shared by all subflows and is relative to the Data
+ ACK. Because of this, an implementation MUST NOT use the RCV.WND
+ field of a TCP segment at the connection level if it does not also
+ carry a DSS option with a Data ACK field. Furthermore, separating
+ the connection-level acknowledgments from the subflow level allows
+ processing to be done separately, and a receiver has the freedom to
+ drop segments after acknowledgment at the subflow level -- for
+ example, due to memory constraints when many segments arrive out of
+ order.
+
+ An MPTCP sender MUST NOT free data from the send buffer until it has
+ been acknowledged by both a Data ACK received on any subflow and at
+ the subflow level by all subflows on which the data was sent. The
+ former condition ensures liveness of the connection, and the latter
+ condition ensures liveness and self-consistence of a subflow when
+ data needs to be retransmitted. Note, however, that if some data
+ needs to be retransmitted multiple times over a subflow, there is a
+ risk of blocking the send window. In this case, the MPTCP sender can
+ decide to terminate the subflow that is behaving badly by sending a
+ RST, using an appropriate MP_TCPRST (Section 3.6) error code.
+
+ The Data ACK MAY be included in all segments; however, optimizations
+ SHOULD be considered in more advanced implementations, where the Data
+ ACK is present in segments only when the Data ACK value advances, and
+ this behavior MUST be treated as valid. This behavior ensures that
+ the send buffer is freed, while reducing overhead when the data
+ transfer is unidirectional.
+
+3.3.3. Closing a Connection
+
+ In regular TCP, a FIN announces to the receiver that the sender has
+ no more data to send. In order to allow subflows to operate
+ independently and to keep the appearance of TCP over the wire, a FIN
+ in MPTCP only affects the subflow on which it is sent. This allows
+ nodes to exercise considerable freedom over which paths are in use at
+ any one time. The semantics of a FIN remain as for regular TCP;
+ i.e., it is not until both sides have ACKed each other's FINs that
+ the subflow is fully closed.
+
+ When an application calls close() on a socket, this indicates that it
+ has no more data to send; for regular TCP, this would result in a FIN
+ on the connection. For MPTCP, an equivalent mechanism is needed;
+ this is referred to as the DATA_FIN.
+
+ A DATA_FIN is an indication that the sender has no more data to send,
+ and as such it can be used to verify that all data has been
+ successfully received. A DATA_FIN, as with the FIN on a regular TCP
+ connection, is a unidirectional signal.
+
+ The DATA_FIN is signaled by setting the "F" flag in the DSS option
+ (Figure 9) to 1. A DATA_FIN occupies 1 octet (the final octet) of
+ the connection-level sequence space. Note that the DATA_FIN is
+ included in the Data-Level Length but not at the subflow level: for
+ example, a segment with a DSN value of 80 and a Data-Level Length of
+ 11, with DATA_FIN set, would map 10 octets from the subflow into data
+ sequence space 80-89, and the DATA_FIN would be DSN 90; therefore,
+ this segment, including DATA_FIN, would be acknowledged with a
+ DATA_ACK of 91.
+
+ Note that when the DATA_FIN is not attached to a TCP segment
+ containing data, the DSS MUST have a subflow sequence number of 0, a
+ Data-Level Length of 1, and the data sequence number that corresponds
+ with the DATA_FIN itself. The checksum in this case will only cover
+ the pseudo-header.
+
+ A DATA_FIN has the same semantics and behavior as a regular TCP FIN,
+ but at the connection level. Notably, it is only DATA_ACKed once all
+ data has been successfully received at the connection level. Note,
+ therefore, that a DATA_FIN is decoupled from a subflow FIN. It is
+ only permissible to combine these signals on one subflow if there is
+ no data outstanding on other subflows. Otherwise, it may be
+ necessary to retransmit data on different subflows. Essentially, a
+ host MUST NOT close all functioning subflows unless it is safe to do
+ so, i.e., until all outstanding data has been DATA_ACKed or until the
+ segment with the DATA_FIN flag set is the only outstanding segment.
+
+ Once a DATA_FIN has been acknowledged, all remaining subflows MUST be
+ closed with standard FIN exchanges. Both hosts SHOULD send FINs on
+ all subflows, as a courtesy, to allow middleboxes to clean up state
+ even if an individual subflow has failed. Reducing the timeouts
+ (MSL) on subflows at end hosts after receiving a DATA_FIN is also
+ encouraged. In particular, any subflows where there is still
+ outstanding data queued (which has been retransmitted on other
+ subflows in order to get the DATA_FIN acknowledged) MAY be closed
+ with a RST with an MP_TCPRST (Section 3.6) error code for "too much
+ outstanding data".
+
+ A connection is considered closed once both hosts' DATA_FINs have
+ been acknowledged by DATA_ACKs.
+
+ As specified above, a standard TCP FIN on an individual subflow only
+ shuts down the subflow on which it was sent. If all subflows have
+ been closed with a FIN exchange but no DATA_FIN has been received and
+ acknowledged, the MPTCP connection is treated as closed only after a
+ timeout. This implies that an implementation will have TIME_WAIT
+ states at both the subflow level and the connection level (see
+ Appendix D). This permits "break-before-make" scenarios where
+ connectivity is lost on all subflows before a new one can be
+ re-established.
+
+3.3.4. Receiver Considerations
+
+ Regular TCP advertises a receive window in each packet, telling the
+ sender how much data the receiver is willing to accept past the
+ cumulative ACK. The receive window is used to implement flow
+ control, throttling down fast senders when receivers cannot keep up.
+
+ MPTCP also uses a unique receive window, shared between the subflows.
+ The idea is to allow any subflow to send data as long as the receiver
+ is willing to accept it. The alternative -- maintaining per-subflow
+ receive windows -- could end up stalling some subflows while others
+ would not use up their window.
+
+ The receive window is relative to the DATA_ACK. As in TCP, a
+ receiver MUST NOT shrink the right edge of the receive window (i.e.,
+ DATA_ACK + receive window). The receiver will use the data sequence
+ number to tell if a packet should be accepted at the connection
+ level.
+
+ When deciding to accept packets at the subflow level, regular TCP
+ checks the sequence number in the packet against the allowed receive
+ window. With MPTCP, such a check is done using only the connection-
+ level window. A sanity check SHOULD be performed at the subflow
+ level to ensure that the subflow and mapped sequence numbers meet the
+ following test: SSN - SUBFLOW_ACK <= DSN - DATA_ACK, where SSN is the
+ subflow sequence number of the received packet and SUBFLOW_ACK is the
+ RCV.NXT (next expected sequence number) of the subflow (with the
+ equivalent connection-level definitions for DSN and DATA_ACK).
+
+ In regular TCP, once a segment is deemed in-window, it is put in
+ either the in-order receive queue or the out-of-order queue. In
+ Multipath TCP, the same thing happens, but at the connection level: a
+ segment is placed in the connection-level in-order or out-of-order
+ queue if it is in-window at both the connection level and the subflow
+ level. The stack still has to remember, for each subflow, which
+ segments were received successfully so that it can ACK them at the
+ subflow level appropriately. Typically, this will be implemented by
+ keeping per-subflow out-of-order queues (containing only message
+ headers -- not the payloads) and remembering the value of the
+ cumulative ACK.
+
+ It is important for implementers to understand how large a receive
+ buffer is appropriate. The lower bound for full network utilization
+ is the maximum bandwidth-delay product of any one of the paths.
+ However, this might be insufficient when a packet is lost on a slower
+ subflow and needs to be retransmitted (see Section 3.3.6). A tight
+ upper bound would be the maximum round-trip time (RTT) of any path
+ multiplied by the total bandwidth available across all paths. This
+ permits all subflows to continue at full speed while a packet is
+ fast-retransmitted on the maximum RTT path. Even this might be
+ insufficient to maintain full performance in the event of a
+ retransmit timeout on the maximum RTT path. Determining the
+ relationship between retransmission strategies and receive buffer
+ sizing is left for future study.
+
+3.3.5. Sender Considerations
+
+ The sender remembers receive window advertisements from the receiver.
+ It should only update its local receive window values when the
+ largest sequence number allowed (i.e., DATA_ACK + receive window)
+ increases on the receipt of a DATA_ACK. This is important for
+ allowing the use of paths with different RTTs and thus different
+ feedback loops.
+
+ MPTCP uses a single receive window across all subflows, and if the
+ receive window was guaranteed to be unchanged end to end, a host
+ could always read the most recent receive window value. However,
+ some classes of middleboxes may alter the TCP-level receive window.
+ Typically, these will shrink the offered window, although for short
+ periods of time it may be possible for the window to be larger
+ (however, note that this would not continue for long periods, since
+ ultimately the middlebox must keep up with delivering data to the
+ receiver). Therefore, if receive window sizes differ on multiple
+ subflows, when sending data MPTCP SHOULD take the largest of the most
+ recent window sizes as the one to use in calculations. This rule is
+ implicit in the requirement not to reduce the right edge of the
+ window.
+
+ The sender MUST also remember the receive windows advertised by each
+ subflow. The allowed window for subflow i is (ack_i, ack_i +
+ rcv_wnd_i), where ack_i is the subflow-level cumulative ACK of
+ subflow i. This ensures that data will not be sent to a middlebox
+ unless there is enough buffering for the data.
+
+ Putting the two rules together, we get the following: a sender is
+ allowed to send data segments with data-level sequence numbers
+ between (DATA_ACK, DATA_ACK + receive_window). Each of these
+ segments will be mapped onto subflows, as long as subflow sequence
+ numbers are in the allowed windows for those subflows. Note that
+ subflow sequence numbers do not generally affect flow control if the
+ same receive window is advertised across all subflows. They will
+ perform flow control for those subflows with a smaller advertised
+ receive window.
+
+ The send buffer MUST, at a minimum, be as big as the receive buffer,
+ to enable the sender to reach maximum throughput.
+
+3.3.6. Reliability and Retransmissions
+
+ The Data Sequence Mapping allows senders to resend data with the same
+ data sequence number on a different subflow. When doing this, a host
+ MUST still retransmit the original data on the original subflow, in
+ order to preserve the subflow's integrity (middleboxes could replay
+ old data and/or could reject holes in subflows), and a receiver will
+ ignore these retransmissions. While this is clearly suboptimal, for
+ compatibility reasons this is sensible behavior. Optimizations could
+ be negotiated in future versions of this protocol. Note also that
+ this property would also permit a sender to always send the same
+ data, with the same data sequence number, on multiple subflows, if
+ desired for reliability reasons.
+
+ This protocol specification does not mandate any mechanisms for
+ handling retransmissions, and much will be dependent upon local
+ policy (as discussed in Section 3.3.8). One can imagine aggressive
+ connection-level retransmission policies where every packet lost at
+ the subflow level is retransmitted on a different subflow (hence
+ wasting bandwidth but possibly reducing application-to-application
+ delays) or conservative retransmission policies where connection-
+ level retransmissions are only used after a few subflow-level
+ retransmission timeouts occur.
+
+ It is envisaged that a standard connection-level retransmission
+ mechanism would be implemented around a connection-level data queue:
+ all segments that haven't been DATA_ACKed are stored. A timer is set
+ when the head of the connection level is ACKed at the subflow level
+ but is not DATA_ACKed at the data level. This timer will guard
+ against retransmission failures by middleboxes that proactively ACK
+ data.
+
+ The sender MUST keep data in its send buffer as long as the data has
+ not been acknowledged both (1) at the connection level and (2) on all
+ subflows on which it has been sent. In this way, the sender can
+ always retransmit the data if needed, on the same subflow or on a
+ different one. A special case is when a subflow fails: the sender
+ will typically resend the data on other working subflows after a
+ timeout and will keep trying to retransmit the data on the failed
+ subflow too. The sender will declare the subflow failed after a
+ predefined upper bound on retransmissions is reached (which MAY be
+ lower than the usual TCP limits of the MSL) or on the receipt of an
+ ICMP error, and only then delete the outstanding data segments.
+
+ If multiple retransmissions that indicate that a subflow is
+ performing badly are triggered, this MAY lead to a host resetting the
+ subflow with a RST. However, additional research is required to
+ understand the heuristics of how and when to reset underperforming
+ subflows. For example, a highly asymmetric path may be misdiagnosed
+ as underperforming. A RST for this purpose SHOULD be accompanied by
+ an "Unacceptable performance" MP_TCPRST option (Section 3.6).
+
+3.3.7. Congestion Control Considerations
+
+ Different subflows in an MPTCP connection have different congestion
+ windows. To achieve fairness at bottlenecks and resource pooling, it
+ is necessary to couple the congestion windows in use on each subflow,
+ in order to push most traffic to uncongested links. One algorithm
+ for achieving this is presented in [RFC6356]; the algorithm does not
+ achieve perfect resource pooling but is "safe" in that it is readily
+ deployable in the current Internet. By this we mean that it does not
+ take up more capacity on any one path than if it was a single path
+ flow using only that route, so this ensures fair coexistence with
+ single-path TCP at shared bottlenecks.
+
+ It is foreseeable that different congestion controllers will be
+ implemented for MPTCP, each aiming to achieve different properties in
+ the resource pooling / fairness / stability design space, as well as
+ those for achieving different properties in quality of service,
+ reliability, and resilience.
+
+ Regardless of the algorithm used, the design of MPTCP aims to provide
+ the congestion control implementations with sufficient information to
+ make the right decisions; this information includes, for each
+ subflow, which packets were lost and when.
+
+3.3.8. Subflow Policy
+
+ Within a local MPTCP implementation, a host may use any local policy
+ it wishes to decide how to share the traffic to be sent over the
+ available paths.
+
+ In the typical use case, where the goal is to maximize throughput,
+ all available paths will be used simultaneously for data transfer,
+ using coupled congestion control as described in [RFC6356]. It is
+ expected, however, that other use cases will appear.
+
+ For instance, one possibility is an "all-or-nothing" approach, i.e.,
+ have a second path ready for use in the event of failure of the first
+ path, but alternatives could include entirely saturating one path
+ before using an additional path (the "overflow" case). Such choices
+ would be most likely based on the monetary cost of links but may also
+ be based on properties such as the delay or jitter of links, where
+ stability (of delay or bandwidth) is more important than throughput.
+ Application requirements such as these are discussed in detail in
+ [RFC6897].
+
+ The ability to make effective choices at the sender requires full
+ knowledge of the path "cost", which is unlikely to be the case. It
+ would be desirable for a receiver to be able to signal their own
+ preferences for paths, since they will often be the multihomed party
+ and may have to pay for metered incoming bandwidth.
+
+ To enable this behavior, the MP_JOIN option (see Section 3.2)
+ contains the "B" bit, which allows a host to indicate to its peer
+ that this path should be treated as a backup path to use only in the
+ event of failure of other working subflows (i.e., a subflow where the
+ receiver has indicated that B=1 SHOULD NOT be used to send data
+ unless there are no usable subflows where B=0).
+
+ In the event that the available set of paths changes, a host may wish
+ to signal a change in priority of subflows to the peer (e.g., a
+ subflow that was previously set as a backup should now take priority
+ over all remaining subflows). Therefore, the MP_PRIO option, shown
+ in Figure 11, can be used to change the "B" flag of the subflow on
+ which it is sent.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----+-+
+ | Kind | Length |Subtype|(rsv)|B|
+ +---------------+---------------+-------+-----+-+
+
+ Figure 11: Change Subflow Priority (MP_PRIO) Option
+
+ Another use of the MP_PRIO option is to set the "B" flag on a subflow
+ to cleanly "retire" its use before closing it and removing it with
+ REMOVE_ADDR (Section 3.4.2) -- for example, to support make-before-
+ break session continuity, where new subflows are added before the
+ previously used subflows are closed.
+
+ It should be noted that the backup flag is a request from a data
+ receiver to a data sender only, and the data sender SHOULD adhere to
+ these requests. A host cannot assume that the data sender will do
+ so, however, since local policies -- or technical difficulties -- may
+ override MP_PRIO requests. Note also that this signal applies to a
+ single direction, and so the sender of this option could choose to
+ continue using the subflow to send data even if it has signaled B=1
+ to the other host.
+
+3.4. Address Knowledge Exchange (Path Management)
+
+ We use the term "path management" to refer to the exchange of
+ information about additional paths between hosts, which in this
+ design is managed by multiple addresses at hosts. For more details
+ regarding the architectural thinking behind this design, see the
+ MPTCP architecture document [RFC6182].
+
+ This design makes use of two methods of sharing such information, and
+ both can be used on a connection. The first is the direct setup of
+ new subflows (described in Section 3.2), where the initiator has an
+ additional address. The second method (described in the following
+ subsections) signals addresses explicitly to the other host to allow
+ it to initiate new subflows. The two mechanisms are complementary:
+ the first is implicit and simple, while the second (explicit) is more
+ complex but is more robust. Together, these mechanisms allow
+ addresses to change in flight (and thus support operation through
+ NATs, since the source address need not be known); they also allow
+ the signaling of previously unknown addresses and of addresses
+ belonging to other address families (e.g., both IPv4 and IPv6).
+
+ Here is an example of typical operation of the protocol:
+
+ * An MPTCP connection is initially set up between address/port A1 of
+ Host A and address/port B1 of Host B. If Host A is multihomed and
+ multiaddressed, it can start an additional subflow from its
+ address A2 to B1, by sending a SYN with an MP_JOIN option from A2
+ to B1, using B's previously declared token for this connection.
+ Alternatively, if B is multihomed, it can try to set up a new
+ subflow from B2 to A1, using A's previously declared token. In
+ either case, the SYN will be sent to the port already in use for
+ the original subflow on the receiving host.
+
+ * Simultaneously (or after a timeout), an ADD_ADDR option
+ (Section 3.4.1) is sent on an existing subflow, informing the
+ receiver of the sender's alternative address(es). The recipient
+ can use this information to open a new subflow to the sender's
+ additional address(es). In our example, A will send the ADD_ADDR
+ option informing B of address/port A2. The mix of using the
+ SYN-based option and the ADD_ADDR option, including timeouts, is
+ implementation specific and can be tailored to agree with local
+ policy.
+
+ * If subflow A2-B1 is successfully set up, Host B can use the
+ Address ID in the MP_JOIN option to correlate this source address
+ with the ADD_ADDR option that will also arrive on an existing
+ subflow; now B knows not to open A2-B1, ignoring the ADD_ADDR.
+ Otherwise, if B has not received the A2-B1 MP_JOIN SYN but
+ received the ADD_ADDR, it can try to initiate a new subflow from
+ one or more of its addresses to address A2. This permits new
+ sessions to be opened if one host is behind a NAT.
+
+ Other ways of using the two signaling mechanisms are possible; for
+ instance, signaling addresses in other address families can only be
+ done explicitly using the Add Address (ADD_ADDR) option.
+
+3.4.1. Address Advertisement
+
+ The ADD_ADDR MPTCP option announces additional addresses (and,
+ optionally, ports) on which a host can be reached (Figure 12). This
+ option can be used at any time during a connection, depending on when
+ the sender wishes to enable multiple paths and/or when paths become
+ available. As with all MPTCP signals, the receiver MUST undertake
+ standard TCP validity checks, e.g., per [RFC5961], before acting
+ upon it.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-------+---------------+
+ | Kind | Length |Subtype|(rsv)|E| Address ID |
+ +---------------+---------------+-------+-------+---------------+
+ | Address (IPv4: 4 octets / IPv6: 16 octets) |
+ +-------------------------------+-------------------------------+
+ | Port (2 octets, optional) | |
+ +-------------------------------+ |
+ | Truncated HMAC (8 octets, if E=0) |
+ | +-------------------------------+
+ | |
+ +-------------------------------+
+
+ Figure 12: Add Address (ADD_ADDR) Option
+
+ Every address has an Address ID that can be used for uniquely
+ identifying the address within a connection for address removal. The
+ Address ID is also used to identify MP_JOIN options (see Section 3.2)
+ relating to the same address, even when address translators are in
+ use. The Address ID MUST uniquely identify the address for the
+ sender of the option (within the scope of the connection); the
+ mechanism for allocating such IDs is implementation specific.
+
+ All Address IDs learned via either MP_JOIN or ADD_ADDR SHOULD be
+ stored by the receiver in a data structure that gathers all the
+ Address-ID-to-address mappings for a connection (identified by a
+ token pair). In this way, there is a stored mapping between the
+ Address ID, observed source address, and token pair for future
+ processing of control information for a connection. Note that an
+ implementation MAY discard incoming address advertisements at will --
+ for example, to avoid updating mapping state or because advertised
+ addresses are of no use to it (for example, IPv6 addresses when it
+ has IPv4 only). Therefore, a host MUST treat address advertisements
+ as soft state, and it MAY choose to refresh advertisements
+ periodically. Note also that an implementation MAY choose to cache
+ these address advertisements even if they are not currently relevant
+ but may be relevant in the future, such as IPv4 addresses when IPv6
+ connectivity is available but IPv4 is awaiting DHCP.
+
+ This option is shown in Figure 12. The illustration is sized for
+ IPv4 addresses. For IPv6, the length of the address will be
+ 16 octets (instead of 4).
+
+ The 2 octets that specify the TCP port number to use are optional,
+ and their presence can be inferred from the length of the option.
+ Although it is expected that the majority of use cases will use the
+ same port pairs as those used for the initial subflow (e.g., port 80
+ remains port 80 on all subflows, as does the ephemeral port at the
+ client), there may be cases (such as port-based load balancing) where
+ the explicit specification of a different port is required. If no
+ port is specified, MPTCP SHOULD attempt to connect to the specified
+ address on the same port as the port that is already in use by the
+ subflow on which the ADD_ADDR signal was sent; this is discussed in
+ more detail in Section 3.9.
+
+ The Truncated HMAC parameter present in this option is the rightmost
+ 64 bits of an HMAC, negotiated and calculated in the same way as for
+ MP_JOIN as described in Section 3.2. For this specification of
+ MPTCP, as there is only one hash algorithm option specified, this
+ will be HMAC as defined in [RFC2104], using the SHA-256 hash
+ algorithm [RFC6234]. In the same way as for MP_JOIN, the key for the
+ HMAC algorithm, in the case of the message transmitted by Host A,
+ will be Key-A followed by Key-B, and in the case of Host B, Key-B
+ followed by Key-A. These are the keys that were exchanged in the
+ original MP_CAPABLE handshake. The message for the HMAC is the
+ Address ID, IP address, and port that precede the HMAC in the
+ ADD_ADDR option. If the port is not present in the ADD_ADDR option,
+ the HMAC message will nevertheless include 2 octets of value zero.
+ The rationale for the HMAC is to prevent unauthorized entities from
+ injecting ADD_ADDR signals in an attempt to hijack a connection.
+ Note that, additionally, the presence of this HMAC prevents the
+ address from being changed in flight unless the key is known by an
+ intermediary. If a host receives an ADD_ADDR option for which it
+ cannot validate the HMAC, it SHOULD silently ignore the option.
+
+ A set of four flags is present after the subtype and before the
+ Address ID. Only the rightmost bit -- labeled "E" -- is assigned in
+ this specification. The other bits are currently unassigned; they
+ MUST be set to 0 by a sender and MUST be ignored by the receiver.
+
+ The "E" flag exists to provide reliability for this option. Because
+ this option will often be sent on pure ACKs, there is no guarantee of
+ reliability. Therefore, a receiver receiving a fresh ADD_ADDR option
+ (where E=0) will send the same option back to the sender, but not
+ including the HMAC and with E=1, to indicate receipt. According to
+ local policy, the lack of this type of "echo" can indicate to the
+ initial ADD_ADDR sender that the ADD_ADDR needs to be retransmitted.
+
+ Due to the proliferation of NATs, it is reasonably likely that one
+ host may attempt to advertise private addresses [RFC1918]. It is not
+ desirable to prohibit this behavior, since there may be cases where
+ both hosts have additional interfaces on the same private network,
+ and a host MAY advertise such addresses. The MP_JOIN handshake to
+ create a new subflow (Section 3.2) provides mechanisms to minimize
+ security risks. The MP_JOIN message contains a 32-bit token that
+ uniquely identifies the connection to the receiving host. If the
+ token is unknown, the host will respond with a RST. In the unlikely
+ event that the token is valid at the receiving host, subflow setup
+ will continue, but the HMAC exchange must occur for authentication.
+ The HMAC exchange will fail and will provide sufficient protection
+ against two unconnected hosts accidentally setting up a new subflow
+ upon the signal of a private address. Further security
+ considerations around the issue of ADD_ADDR messages that
+ accidentally misdirect, or maliciously direct, new MP_JOIN attempts
+ are discussed in Section 5.
+
+ A host that receives an ADD_ADDR but finds that a connection set up
+ to that IP address and port number is unsuccessful SHOULD NOT perform
+ further connection attempts to this address/port combination for this
+ connection. A sender that wants to trigger a new incoming connection
+ attempt on a previously advertised address/port combination can
+ therefore refresh ADD_ADDR information by sending the option again.
+
+ A host can therefore send an ADD_ADDR message with an already-
+ assigned Address ID, but the address MUST be the same as the address
+ previously assigned to this Address ID. A new ADD_ADDR may have the
+ same port number or a different port number. If the port number is
+ different, the receiving host SHOULD try to set up a new subflow to
+ this new address/port combination.
+
+ A host wishing to replace an existing Address ID MUST first remove
+ the existing one (Section 3.4.2).
+
+ During normal MPTCP operation, it is unlikely that there will be
+ sufficient TCP option space for ADD_ADDR to be included along with
+ those for data sequence numbering (Section 3.3.1). Therefore, it is
+ expected that an MPTCP implementation will send the ADD_ADDR option
+ on separate ACKs. As discussed earlier, however, an MPTCP
+ implementation MUST NOT treat duplicate ACKs with any MPTCP option,
+ with the exception of the DSS option, as indications of congestion
+ [RFC5681], and an MPTCP implementation SHOULD NOT send more than two
+ duplicate ACKs in a row for signaling purposes.
+
+3.4.2. Remove Address
+
+ If, during the lifetime of an MPTCP connection, a previously
+ announced address becomes invalid (e.g., if the interface disappears
+ or an IPv6 address is no longer preferred), the affected host SHOULD
+ announce this situation so that the peer can remove subflows related
+ to this address. Even if an address is not in use by an MPTCP
+ connection, if it has been previously announced, an implementation
+ SHOULD announce its removal. A host MAY also choose to announce that
+ a valid IP address should not be used any longer -- for example, for
+ make-before-break session continuity.
+
+ This is achieved through the Remove Address (REMOVE_ADDR) option
+ (Figure 13), which will remove a previously added address (or list of
+ addresses) from a connection and terminate any subflows currently
+ using that address.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-------+---------------+
+ | Kind |Length = 3 + n |Subtype|(resvd)| Address ID | ...
+ +---------------+---------------+-------+-------+---------------+
+ (followed by n-1 Address IDs, if required)
+
+ Figure 13: Remove Address (REMOVE_ADDR) Option
+
+ For security purposes, if a host receives a REMOVE_ADDR option, it
+ must ensure that the affected path or paths are no longer in use
+ before it instigates closure. The receipt of REMOVE_ADDR SHOULD
+ first trigger the sending of a TCP keepalive [RFC1122] on the path,
+ and if a response is received, the path SHOULD NOT be removed. If
+ the path is found to still be alive, the receiving host SHOULD no
+ longer use the specified address for future connections, but it is
+ the responsibility of the host that sent the REMOVE_ADDR to shut down
+ the subflow. Before the address is removed, the requesting host MAY
+ also use MP_PRIO (Section 3.3.8) to request that a path no longer be
+ used. Typical TCP validity tests on the subflow (e.g., ensuring that
+ sequence and ACK numbers are correct) MUST also be undertaken. An
+ implementation can use indications of these test failures as part of
+ intrusion detection or error logging.
+
+ The sending and receipt (if no keepalive response was received) of
+ this message SHOULD trigger the sending of RSTs by both hosts on the
+ affected subflow(s) (if possible), as a courtesy, to allow the
+ cleanup of middlebox state before cleaning up any local state.
+
+ Address removal is undertaken according to the Address ID, so as to
+ permit the use of NATs and other middleboxes that rewrite source
+ addresses. If an Address ID is not known, the receiver will silently
+ ignore the request.
+
+ A subflow that is still functioning MUST be closed with a FIN
+ exchange as in regular TCP, rather than using this option. For more
+ information, see Section 3.3.3.
+
+3.5. Fast Close
+
+ Regular TCP has the means of sending a RST signal to abruptly close a
+ connection. With MPTCP, a regular RST only has the scope of the
+ subflow; it will only close the applicable subflow and will not
+ affect the remaining subflows. MPTCP's connection will stay alive at
+ the data level, in order to permit break-before-make handover between
+ subflows. It is therefore necessary to provide an MPTCP-level
+ "reset" to allow the abrupt closure of the whole MPTCP connection;
+ this is done via the MP_FASTCLOSE option.
+
+ MP_FASTCLOSE is used to indicate to the peer that the connection will
+ be abruptly closed and no data will be accepted anymore. The reasons
+ for triggering an MP_FASTCLOSE are implementation specific. Regular
+ TCP does not allow the sending of a RST while the connection is in a
+ synchronized state [RFC0793]. Nevertheless, implementations allow
+ the sending of a RST in this state if, for example, the operating
+ system is running out of resources. In these cases, MPTCP should
+ send the MP_FASTCLOSE. This option is illustrated in Figure 14.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----------------------+
+ | Kind | Length |Subtype| (reserved) |
+ +---------------+---------------+-------+-----------------------+
+ | Option Receiver's Key |
+ | (64 bits) |
+ | |
+ +---------------------------------------------------------------+
+
+ Figure 14: Fast Close (MP_FASTCLOSE) Option
+
+ If Host A wants to force the closure of an MPTCP connection, it can
+ do so via two options:
+
+ * Option A (ACK): Host A sends an ACK containing the MP_FASTCLOSE
+ option on one subflow, containing the key of Host B as declared in
+ the initial connection handshake. On all the other subflows,
+ Host A sends a regular TCP RST to close these subflows and tears
+ them down. Host A now enters FASTCLOSE_WAIT state.
+
+ * Option R (RST): Host A sends a RST containing the MP_FASTCLOSE
+ option on all subflows, containing the key of Host B as declared
+ in the initial connection handshake. Host A can tear down the
+ subflows and the connection immediately.
+
+ If Host A decides to force the closure by using Option A and sending
+ an ACK with the MP_FASTCLOSE option, the connection shall proceed as
+ follows:
+
+ * Upon receipt of an ACK with MP_FASTCLOSE by Host B, containing the
+ valid key, Host B answers on the same subflow with a TCP RST and
+ tears down all subflows also through sending TCP RST signals.
+ Host B can now close the whole MPTCP connection (it transitions
+ directly to CLOSED state).
+
+ * As soon as Host A has received the TCP RST on the remaining
+ subflow, it can close this subflow and tear down the whole
+ connection (transition from FASTCLOSE_WAIT state to CLOSED state).
+ If Host A receives an MP_FASTCLOSE instead of a TCP RST, both
+ hosts attempted fast closure simultaneously. Host A should reply
+ with a TCP RST and tear down the connection.
+
+ * If Host A does not receive a TCP RST in reply to its MP_FASTCLOSE
+ after one retransmission timeout (RTO) (the RTO of the subflow
+ where the MP_FASTCLOSE has been sent), it SHOULD retransmit the
+ MP_FASTCLOSE. To keep this connection from being retained for a
+ long time, the number of retransmissions SHOULD be limited; this
+ limit is implementation specific. A RECOMMENDED number is 3. If
+ no TCP RST is received in response, Host A SHOULD send a TCP RST
+ with the MP_FASTCLOSE option itself when it releases state in
+ order to clear any remaining state at middleboxes.
+
+ If, however, Host A decides to force the closure by using Option R
+ and sending a RST with the MP_FASTCLOSE option, Host B will act as
+ follows: upon receipt of a RST with MP_FASTCLOSE, containing the
+ valid key, Host B tears down all subflows by sending a TCP RST.
+ Host B can now close the whole MPTCP connection (it transitions
+ directly to CLOSED state).
+
+3.6. Subflow Reset
+
+ An implementation of MPTCP may also need to send a regular TCP RST to
+ force the closure of a subflow. A host sends a TCP RST in order to
+ close a subflow or reject an attempt to open a subflow (MP_JOIN). In
+ order to let the receiving host know why a subflow is being closed or
+ rejected, the TCP RST packet MAY include the MP_TCPRST option
+ (Figure 15). The host MAY use this information to decide, for
+ example, whether it tries to re-establish the subflow immediately,
+ later, or never.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----------------------+
+ | Kind | Length |Subtype|U|V|W|T| Reason |
+ +---------------+---------------+-------+-----------------------+
+
+ Figure 15: TCP RST Reason (MP_TCPRST) Option
+
+ The MP_TCPRST option contains a reason code that allows the sender of
+ the option to provide more information about the reason for the
+ termination of the subflow. Using 12 bits of option space, the first
+ 4 bits are reserved for flags (only one of which is currently
+ defined), and the remaining octet is used to express a reason code
+ for this subflow termination, from which a receiver MAY infer
+ information about the usability of this path.
+
+ The "T" flag is used by the sender to indicate whether the error
+ condition that is reported is Transient ("T" bit set to 1) or
+ Permanent ("T" bit set to 0). If the error condition is considered
+ to be Transient by the sender of the RST segment, the recipient of
+ this segment MAY try to re-establish a subflow for this connection
+ over the failed path. The time at which a receiver may try to
+ re-establish this subflow is implementation specific but SHOULD take
+ into account the properties of the failure as defined by the provided
+ reason code. If the error condition is considered to be Permanent,
+ the receiver of the RST segment SHOULD NOT try to re-establish a
+ subflow for this connection over this path. The "U", "V", and "W"
+ flags are not defined by this specification and are reserved for
+ future use. An implementation of this specification MUST set these
+ flags to 0, and a receiver MUST ignore them.
+
+ "Reason" is an 8-bit field that indicates the reason code for the
+ termination of the subflow. The following codes are defined in this
+ document:
+
+ * Unspecified error (code 0x00). This is the default error; it
+ implies that the subflow is no longer available. The presence of
+ this option shows that the RST was generated by an MPTCP-aware
+ device.
+
+ * MPTCP-specific error (code 0x01). An error has been detected in
+ the processing of MPTCP options. This is the usual reason code to
+ return in the cases where a RST is being sent to close a subflow
+ because of an invalid response.
+
+ * Lack of resources (code 0x02). This code indicates that the
+ sending host does not have enough resources to support the
+ terminated subflow.
+
+ * Administratively prohibited (code 0x03). This code indicates that
+ the requested subflow is prohibited by the policies of the sending
+ host.
+
+ * Too much outstanding data (code 0x04). This code indicates that
+ there is an excessive amount of data that needs to be transmitted
+ over the terminated subflow while having already been acknowledged
+ over one or more other subflows. This may occur if a path has
+ been unavailable for a short period and it is more efficient to
+ reset and start again than it is to retransmit the queued data.
+
+ * Unacceptable performance (code 0x05). This code indicates that
+ the performance of this subflow was too low compared to the other
+ subflows of this Multipath TCP connection.
+
+ * Middlebox interference (code 0x06). Middlebox interference has
+ been detected over this subflow, making MPTCP signaling invalid.
+ For example, this may be sent if the checksum does not validate.
+
+3.7. Fallback
+
+ Sometimes, middleboxes will exist on a path that could prevent the
+ operation of MPTCP. MPTCP has been designed to cope with many
+ middlebox modifications (see Section 6), but there are still some
+ cases where a subflow could fail to operate within the MPTCP
+ requirements. Notably, these cases are the following: the loss of
+ MPTCP options on a path and the modification of payload data. If
+ such an event occurs, it is necessary to "fall back" to the previous,
+ safe operation. This may be either falling back to regular TCP or
+ removing a problematic subflow.
+
+ At the start of an MPTCP connection (i.e., the first subflow), it is
+ important to ensure that the path is fully MPTCP capable and the
+ necessary MPTCP options can reach each host. The handshake as
+ described in Section 3.1 SHOULD fall back to regular TCP if either of
+ the SYN messages does not have the MPTCP options: this is the same,
+ and desired, behavior in the case where a host is not MPTCP capable
+ or the path does not support the MPTCP options. When attempting to
+ join an existing MPTCP connection (Section 3.2), if a path is not
+ MPTCP capable and the MPTCP options do not get through on the SYNs,
+ the subflow will be closed according to the MP_JOIN logic.
+
+ There is, however, another corner case that should be addressed: the
+ case where MPTCP options get through on the SYN but not on regular
+ packets. If the subflow is the first subflow and thus all data in
+ flight is contiguous, this situation can be resolved by using the
+ following rules:
+
+ * A sender MUST include a DSS option with Data Sequence Mapping in
+ every segment until one of the sent segments has been acknowledged
+ with a DSS option containing a Data ACK. Upon reception of the
+ acknowledgment, the sender has the confirmation that the DSS
+ option passes in both directions and may choose to send fewer DSS
+ options than once per segment.
+
+ * If, however, an ACK is received for data (not just for the SYN)
+ without a DSS option containing a Data ACK, the sender determines
+ that the path is not MPTCP capable. In the case of this occurring
+ on an additional subflow (i.e., one started with MP_JOIN), the
+ host MUST close the subflow with a RST, which SHOULD contain an
+ MP_TCPRST option (Section 3.6) with a "Middlebox interference"
+ reason code.
+
+ * In the case of such an ACK being received on the first subflow
+ (i.e., that started with MP_CAPABLE), before any additional
+ subflows are added, the implementation MUST drop out of MPTCP mode
+ and fall back to regular TCP. The sender will send one final Data
+ Sequence Mapping, with the Data-Level Length value of 0 indicating
+ an infinite mapping (to inform the other end in case the path
+ drops options in one direction only), and then revert to sending
+ data on the single subflow without any MPTCP options.
+
+ * If a subflow breaks during operation, e.g., if it is rerouted and
+ MPTCP options are no longer permitted, then once this is detected
+ (by the subflow-level receive buffer filling up, since there is no
+ mapping available in order to DATA_ACK this data), the subflow
+ SHOULD be treated as broken and closed with a RST, since no data
+ can be delivered to the application layer and no fallback signal
+ can be reliably sent. This RST SHOULD include the MP_TCPRST
+ option (Section 3.6) with a "Middlebox interference" reason code.
+
+ These rules should cover all cases where such a failure could happen
+ -- whether it's on the forward or reverse path and whether the server
+ or the client first sends data.
+
+ So far, this section has discussed the loss of MPTCP options, either
+ initially or during the course of the connection. As described in
+ Section 3.3, each portion of data for which there is a mapping is
+ protected by a checksum, if checksums have been negotiated. This
+ mechanism is used to detect if middleboxes have made any adjustments
+ to the payload (added, removed, or changed data). A checksum will
+ fail if the data has been changed in any way. The use of a checksum
+ will also detect whether the length of data on the subflow is
+ increased or decreased, and this means the Data Sequence Mapping is
+ no longer valid. The sender no longer knows what subflow-level
+ sequence number the receiver is genuinely operating at (the middlebox
+ will be faking ACKs in return), and it cannot signal any further
+ mappings. Furthermore, in addition to the possibility of payload
+ modifications that are valid at the application layer, it is possible
+ that such modifications could be triggered across MPTCP segment
+ boundaries, corrupting the data. Therefore, all data from the start
+ of the segment that failed the checksum onward is not trustworthy.
+
+ Note that if checksum usage has not been negotiated, this fallback
+ mechanism cannot be used unless there is some higher-layer or
+ lower-layer signal to inform the MPTCP implementation that the
+ payload has been tampered with.
+
+ When multiple subflows are in use, the data in flight on a subflow
+ will likely involve data that is not contiguously part of the
+ connection-level stream, since segments will be spread across the
+ multiple subflows. Due to the problems identified above, it is not
+ possible to determine what adjustments have been done to the data
+ (notably, any changes to the subflow sequence numbering). Therefore,
+ it is not possible to recover the subflow, and the affected subflow
+ must be immediately closed with a RST that includes an MP_FAIL option
+ (Figure 16), which defines the data sequence number at the start of
+ the segment (defined by the Data Sequence Mapping) that had the
+ checksum failure. Note that the MP_FAIL option requires the use of
+ the full 64-bit sequence number, even if 32-bit sequence numbers are
+ normally in use in the DSS signals on the path.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+----------------------+
+ | Kind | Length=12 |Subtype| (reserved) |
+ +---------------+---------------+-------+----------------------+
+ | |
+ | Data Sequence Number (8 octets) |
+ | |
+ +--------------------------------------------------------------+
+
+ Figure 16: Fallback (MP_FAIL) Option
+
+ The receiver of this option MUST discard all data following the data
+ sequence number specified. Failed data MUST NOT be DATA_ACKed and so
+ will be retransmitted on other subflows (Section 3.3.6).
+
+ A special case is when there is a single subflow and it fails with a
+ checksum error. If it is known that all unacknowledged data in
+ flight is contiguous (which will usually be the case with a single
+ subflow), an infinite mapping can be applied to the subflow without
+ the need to close it first, essentially turning off all further MPTCP
+ signaling. In this case, if a receiver identifies a checksum failure
+ when there is only one path, it will send back an MP_FAIL option on
+ the subflow-level ACK, referring to the data-level sequence number of
+ the start of the segment on which the checksum error was detected.
+ The sender will receive this information and, if all unacknowledged
+ data in flight is contiguous, will signal an infinite mapping. This
+ infinite mapping will be a DSS option (Section 3.3) on the first new
+ packet, containing a Data Sequence Mapping that acts retroactively,
+ referring to the start of the subflow sequence number of the most
+ recent segment that was known to be delivered intact (i.e., was
+ successfully DATA_ACKed). From that point onward, data can be
+ altered by a middlebox without affecting MPTCP, as the data stream is
+ equivalent to a regular, legacy TCP session. While in theory paths
+ may only be damaged in one direction -- and the MP_FAIL signal
+ affects only one direction of traffic -- for simplicity of
+ implementation, the receiver of an MP_FAIL MUST also respond with an
+ MP_FAIL in the reverse direction and entirely revert to a regular TCP
+ session.
+
+ In the rare case that the data is not contiguous (which could happen
+ when there is only one subflow but it is retransmitting data from a
+ subflow that has recently been uncleanly closed), the receiver MUST
+ close the subflow with a RST with MP_FAIL. The receiver MUST discard
+ all data that follows the data sequence number specified. The sender
+ MAY attempt to create a new subflow belonging to the same connection
+ and, if it chooses to do so, SHOULD immediately place the single
+ subflow in single-path mode by setting an infinite Data Sequence
+ Mapping. This mapping will begin from the data-level sequence number
+ that was declared in the MP_FAIL.
+
+ After a sender signals an infinite mapping, it MUST only use subflow
+ ACKs to clear its send buffer. This is because Data ACKs may become
+ misaligned with the subflow ACKs when middleboxes insert or delete
+ data. The receiver SHOULD stop generating Data ACKs after it
+ receives an infinite mapping.
+
+ When a connection has fallen back with an infinite mapping, only one
+ subflow can send data; otherwise, the receiver would not know how to
+ reorder the data. In practice, this means that all MPTCP subflows
+ will have to be terminated except one. Once MPTCP falls back to
+ regular TCP, it MUST NOT revert to MPTCP later in the connection.
+
+ It should be emphasized that MPTCP is not attempting to prevent the
+ use of middleboxes that want to adjust the payload. An MPTCP-aware
+ middlebox could provide such functionality by also rewriting
+ checksums.
+
+3.8. Error Handling
+
+ In addition to the fallback mechanism described above, the standard
+ classes of TCP errors may need to be handled in an MPTCP-specific
+ way. Note that changing semantics -- such as the relevance of a RST
+ -- are covered in Section 4. Where possible, we do not want to
+ deviate from regular TCP behavior.
+
+ The following list covers possible errors and the appropriate MPTCP
+ behavior:
+
+ * Unknown token in MP_JOIN (or HMAC failure in MP_JOIN ACK, or
+ missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's
+ behavior on an unknown port)
+
+ * DSN out of window (during normal operation): drop the data; do not
+ send Data ACKs
+
+ * Remove request for unknown Address ID: silently ignore
+
+3.9. Heuristics
+
+ There are a number of heuristics that are needed for performance or
+ deployment but that are not required for protocol correctness. In
+ this section, we detail such heuristics. Note that discussions of
+ buffering and certain sender and receiver window behaviors are
+ presented in Sections 3.3.4 and 3.3.5, and retransmission is
+ discussed in Section 3.3.6.
+
+3.9.1. Port Usage
+
+ Under typical operation, an MPTCP implementation SHOULD use the same
+ ports as the ports that are already in use. In other words, the
+ destination port of a SYN containing an MP_JOIN option SHOULD be the
+ same as the remote port of the first subflow in the connection. The
+ local port for such SYNs SHOULD also be the same as the port for the
+ first subflow (and as such, an implementation SHOULD reserve
+ ephemeral ports across all local IP addresses), although there may be
+ cases where this is infeasible. This strategy is intended to
+ maximize the probability of the SYN being permitted by a firewall or
+ NAT at the recipient and to avoid confusing any network-monitoring
+ software.
+
+ There may also be cases, however, where a host wishes to signal that
+ a specific port should be used; this facility is provided in the
+ ADD_ADDR option as documented in Section 3.4.1. It is therefore
+ feasible to allow multiple subflows between the same two addresses
+ but using different port pairs, and such a facility could be used to
+ allow load balancing within the network based on 5-tuples (e.g., some
+ ECMP implementations [RFC2992]).
+
+3.9.2. Delayed Subflow Start and Subflow Symmetry
+
+ Many TCP connections are short-lived and consist only of a few
+ segments, and so the overhead of using MPTCP outweighs any benefits.
+ A heuristic is required, therefore, to decide when to start using
+ additional subflows in an MPTCP connection. Experimental deployments
+ have shown that MPTCP can be applied in a range of scenarios, so an
+ implementation will likely need to take into account such factors as
+ the type of traffic being sent and the duration of the session; this
+ information MAY be signaled by the application layer.
+
+ However, for standard TCP traffic, a suggested general-purpose
+ heuristic that an implementation MAY choose to employ is as follows.
+
+ If a host has data buffered for its peer (which implies that the
+ application has received a request for data), the host opens one
+ subflow for each initial window's worth of data that is buffered.
+
+ Consideration should also be given to limiting the rate of adding new
+ subflows, as well as limiting the total number of subflows open for a
+ particular connection. A host may choose to vary these values based
+ on its load or knowledge of traffic and path characteristics.
+
+ Note that this heuristic alone is probably insufficient. Traffic for
+ many common applications, such as downloads, is highly asymmetric,
+ and the host that is multihomed may well be the client that will
+ never fill its buffers and thus never use MPTCP according to this
+ heuristic. Advanced APIs that allow an application to signal its
+ traffic requirements would aid in these decisions.
+
+ An additional time-based heuristic could be applied, opening
+ additional subflows after a given period of time has passed. This
+ would alleviate the above issue and also provide resilience for
+ low-bandwidth but long-lived applications.
+
+ Another issue is that both communicating hosts may simultaneously try
+ to set up a subflow between the same pair of addresses. This leads
+ to an inefficient use of resources.
+
+ If the same ports are used on all subflows, as recommended above,
+ then standard TCP simultaneous-open logic should take care of this
+ situation and only one subflow will be established between the
+ address pairs. However, this relies on the same ports being used at
+ both end hosts. If a host does not support TCP simultaneous open, it
+ is RECOMMENDED that some element of randomization be applied to the
+ time to wait before opening new subflows, so that only one subflow is
+ created between a given address pair. If, however, hosts signal
+ additional ports to use (for example, for leveraging ECMP on-path),
+ this heuristic is not appropriate.
+
+ This section has shown some of the factors that an implementer should
+ consider when developing MPTCP heuristics, but it is not intended to
+ be prescriptive.
+
+3.9.3. Failure Handling
+
+ Requirements for MPTCP's handling of unexpected signals are given in
+ Section 3.8. There are other failure cases, however, where hosts can
+ choose appropriate behavior.
+
+ For example, Section 3.1 suggests that a host SHOULD fall back to
+ trying regular TCP SYNs after one or more failures of MPTCP SYNs for
+ a connection. A host may keep a system-wide cache of such
+ information, so that it can back off from using MPTCP, firstly for
+ that particular destination host and, eventually, on a whole
+ interface, if MPTCP connections continue to fail. The duration of
+ such a cache would be implementation specific.
+
+ Another failure could occur when the MP_JOIN handshake fails.
+ Section 3.8 specifies that an incorrect handshake MUST lead to the
+ subflow being closed with a RST. A host operating an active
+ intrusion-detection system may choose to start blocking MP_JOIN
+ packets from the source host if multiple failed MP_JOIN attempts are
+ seen. From the connection initiator's point of view, if an MP_JOIN
+ fails, it SHOULD NOT attempt to connect to the same IP address and
+ port during the lifetime of the connection, unless the other host
+ refreshes the information with another ADD_ADDR option. Note that
+ the ADD_ADDR option is informational only and does not guarantee that
+ the other host will attempt a connection.
+
+ In addition, an implementation may learn, over a number of
+ connections, that certain interfaces or destination addresses
+ consistently fail and may default to not trying to use MPTCP for such
+ interfaces or addresses. The behavior of subflows that perform
+ particularly badly or subflows that regularly fail during use could
+ also be learned, so that an implementation can temporarily choose not
+ to use these paths.
+
+4. Semantic Issues
+
+ In order to support multipath operation, the semantics of some TCP
+ components have changed. To help clarify, this section lists these
+ semantic changes as a point of reference.
+
+ Sequence number: The (in-header) TCP sequence number is specific to
+ the subflow. To allow the receiver to reorder application data,
+ an additional data-level sequence space is used. In this
+ data-level sequence space, the initial SYN and the final DATA_FIN
+ occupy 1 octet of sequence space. This is done to ensure that
+ these signals are acknowledged at the connection level. There is
+ an explicit mapping of data sequence space to subflow sequence
+ space, which is signaled through TCP options in data packets.
+
+ ACK: The ACK field in the TCP header acknowledges only the subflow
+ sequence number -- not the data-level sequence space.
+ Implementations SHOULD NOT attempt to infer a data-level
+ acknowledgment from the subflow ACKs. This separates subflow-
+ level and connection-level processing at an end host.
+
+ Duplicate ACK: A duplicate ACK that includes any MPTCP signaling
+ (with the exception of the DSS option) MUST NOT be treated as a
+ signal of congestion. To limit the chances of non-MPTCP-aware
+ entities mistakenly interpreting duplicate ACKs as a signal of
+ congestion, MPTCP SHOULD NOT send more than two duplicate ACKs
+ containing (non-DSS) MPTCP signals in a row.
+
+ Receive Window: The receive window in the TCP header indicates the
+ amount of free buffer space for the whole data-level connection
+ (as opposed to the amount of space for this subflow) that is
+ available at the receiver. The semantics are the same as for
+ regular TCP, but to maintain these semantics the receive window
+ must be interpreted at the sender as relative to the sequence
+ number given in the DATA_ACK rather than the subflow ACK in the
+ TCP header. In this way, the original role of flow control is
+ preserved. Note that some middleboxes may change the receive
+ window, and so a host SHOULD use the maximum value of those
+ recently seen on the constituent subflows for the connection-level
+ receive window and also needs to maintain a subflow-level window
+ for subflow-level processing.
+
+ FIN: The FIN flag in the TCP header applies only to the subflow it
+ is sent on -- not to the whole connection. For connection-level
+ FIN semantics, the DATA_FIN option is used.
+
+ RST: The RST flag in the TCP header applies only to the subflow it
+ is sent on -- not to the whole connection. The MP_FASTCLOSE
+ option provides the Fast Close functionality of a RST at the MPTCP
+ connection level.
+
+ Address List: Address list management (i.e., knowledge of the local
+ and remote hosts' lists of available IP addresses) is handled on a
+ per-connection basis (as opposed to per subflow, per host, or per
+ pair of communicating hosts). This permits the application of
+ per-connection local policy. Adding an address to one connection
+ (either explicitly through an ADD_ADDR message or implicitly
+ through an MP_JOIN) has no implications for other connections
+ between the same pair of hosts.
+
+ 5-tuple: The 5-tuple (protocol, local address, local port, remote
+ address, remote port) presented by kernel APIs to the application
+ layer in a non-multipath-aware application is that of the first
+ subflow, even if the subflow has since been closed and removed
+ from the connection. This decision, and other related API issues,
+ are discussed in more detail in [RFC6897].
+
+5. Security Considerations
+
+ As identified in [RFC6181], the addition of multipath capability to
+ TCP will bring with it a number of new classes of threats. In order
+ to prevent these threats, [RFC6182] presents a set of requirements
+ for a security solution for MPTCP. The fundamental goal is for the
+ security of MPTCP to be "no worse" than regular TCP today. The key
+ security requirements are as follows:
+
+ * Provide a mechanism to confirm that the parties in a subflow
+ handshake are the same as the parties in the original connection
+ setup.
+
+ * Provide verification that the peer can receive traffic at a new
+ address before using it as part of a connection.
+
+ * Provide replay protection, i.e., ensure that a request to
+ add/remove a subflow is "fresh".
+
+ In order to achieve these goals, MPTCP includes a hash-based
+ handshake algorithm, as documented in Sections 3.1 and 3.2.
+
+ The security of the MPTCP connection hangs on the use of keys that
+ are shared once at the start of the first subflow and are never sent
+ again over the network (unless used in the Fast Close mechanism
+ (Section 3.5)). To ease demultiplexing while not giving away any
+ cryptographic material, future subflows use a truncated cryptographic
+ hash of this key as the connection identification "token". The keys
+ are concatenated and used as keys for creating Hash-based Message
+ Authentication Codes (HMACs) used on subflow setup, in order to
+ verify that the parties in the handshake are the same as the parties
+ in the original connection setup. It also provides verification that
+ the peer can receive traffic at this new address. Replay attacks
+ would still be possible when only keys are used; therefore, the
+ handshakes use single-use random numbers (nonces) at both ends --
+ this ensures that the HMAC will never be the same on two handshakes.
+ Guidance on generating random numbers suitable for use as keys is
+ given in [RFC4086] and discussed in Section 3.1. The nonces are
+ valid for the lifetime of the TCP connection attempt. HMAC is also
+ used to secure the ADD_ADDR option, due to the threats identified in
+ [RFC7430].
+
+ The use of crypto capability bits in the initial connection handshake
+ to negotiate the use of a particular algorithm allows the deployment
+ of additional crypto mechanisms in the future. This negotiation
+ would nevertheless be susceptible to a bid-down attack by an on-path
+ active attacker who could modify the crypto capability bits in the
+ response from the receiver to use a less secure crypto mechanism.
+ The security mechanism presented in this document should therefore
+ protect against all forms of flooding and hijacking attacks discussed
+ in [RFC6181].
+
+ The version negotiation specified in Section 3.1, if differing MPTCP
+ versions shared a common negotiation format, would allow an on-path
+ attacker to apply a theoretical bid-down attack. Since the v1 and v0
+ protocols have a different handshake, such an attack would require
+ that the client re-establish the connection using v0 and that the
+ server support v0. Note that an on-path attacker would have access
+ to the raw data, negating any other TCP-level security mechanisms.
+ As also noted in Appendix E, this document specifies the removal of
+ the AddrID field [RFC6824] in the MP_PRIO option (Section 3.3.8).
+ This change eliminates the possibility of a theoretical attack where
+ a subflow could be placed in "backup" mode by an attacker.
+
+ During normal operation, regular TCP protection mechanisms (such as
+ ensuring that sequence numbers are in-window) will provide the same
+ level of protection against attacks on individual TCP subflows as the
+ level of protection that exists for regular TCP today.
+ Implementations will introduce additional buffers compared to regular
+ TCP, to reassemble data at the connection level. The application of
+ window sizing will minimize the risk of denial-of-service attacks
+ consuming resources.
+
+ As discussed in Section 3.4.1, a host may advertise its private
+ addresses, but these might point to different hosts in the receiver's
+ network. The MP_JOIN handshake (Section 3.2) will ensure that this
+ does not succeed in setting up a subflow to the incorrect host.
+ However, it could still create unwanted TCP handshake traffic. This
+ feature of MPTCP could be a target for denial-of-service exploits,
+ with malicious participants in MPTCP connections encouraging the
+ recipient to target other hosts in the network. Therefore,
+ implementations should consider heuristics (Section 3.9) at both the
+ sender and receiver to reduce the impact of this.
+
+ To further protect against malicious ADD_ADDR messages sent by an
+ off-path attacker, the ADD_ADDR includes an HMAC using the keys
+ negotiated during the handshake. This effectively prevents an
+ attacker from diverting an MPTCP connection through an off-path
+ ADD_ADDR injection into the stream.
+
+ A small security risk could theoretically exist with key reuse, but
+ in order to accomplish a replay attack, both the sender and receiver
+ keys, and the sender and receiver random numbers, in the MP_JOIN
+ handshake (Section 3.2) would have to match.
+
+ While this specification defines a "medium" security solution,
+ meeting the criteria specified at the start of this section and in
+ the threat analysis document [RFC6181], since attacks only ever get
+ worse, it is likely that a future version of MPTCP would need to be
+ able to support stronger security. There are several ways the
+ security of MPTCP could potentially be improved; some of these would
+ be compatible with MPTCP as defined in this document, while others
+ may not be. For now, the best approach is to gain experience with
+ the current approach, establish what might work, and check that the
+ threat analysis is still accurate.
+
+ Possible ways of improving MPTCP security could include:
+
+ * defining a new MPTCP cryptographic algorithm, as negotiated in
+ MP_CAPABLE. If an implementation was being deployed in a
+ controlled environment where additional assumptions could be made,
+ such as the ability for the servers to store state during the TCP
+ handshake, then it may be possible to use a stronger cryptographic
+ algorithm than would otherwise be possible.
+
+ * defining how to secure data transfer with MPTCP, while not
+ changing the signaling part of the protocol.
+
+ * defining security that requires more option space, perhaps in
+ conjunction with a "long options" proposal for extending the TCP
+ option space (such as those surveyed in [TCPLO]), or perhaps
+ building on the current approach with a second stage of security
+ based on MPTCP options.
+
+ * revisiting the working group's decision to exclusively use TCP
+ options for MPTCP signaling and instead looking at the possibility
+ of using TCP payloads as well.
+
+ MPTCP has been designed with several methods available to indicate a
+ new security mechanism, including:
+
+ * available flags in MP_CAPABLE (Figure 4).
+
+ * available subtypes in the MPTCP option (Figure 3).
+
+ * the Version field in MP_CAPABLE (Figure 4).
+
+6. Interactions with Middleboxes
+
+ Multipath TCP was designed to be deployable in the present world.
+ Its design takes into account "reasonable" existing middlebox
+ behavior. In this section, we outline a few representative
+ middlebox-related failure scenarios and show how Multipath TCP
+ handles them. Next, we list the design decisions Multipath TCP has
+ made to accommodate the different middleboxes.
+
+ A primary concern is our use of a new TCP option. Middleboxes should
+ forward packets with unknown options unchanged, yet there are some
+ that don't. We expect these middleboxes to strip options and pass
+ the data, drop packets with new options, copy the same option into
+ multiple segments (e.g., when doing segmentation), or drop options
+ during segment coalescing.
+
+ MPTCP uses a single new TCP option called "Kind", and all message
+ types are defined by "subtype" values (see Section 7). This should
+ reduce the chances of only some types of MPTCP options being passed;
+ instead, the key differing characteristics are different paths and
+ the presence of the SYN flag.
+
+ MPTCP SYN packets on the first subflow of a connection contain the
+ MP_CAPABLE option (Section 3.1). If this is dropped, MPTCP SHOULD
+ fall back to regular TCP. If packets with the MP_JOIN option
+ (Section 3.2) are dropped, the paths will simply not be used.
+
+ If a middlebox strips options but otherwise passes the packets
+ unchanged, MPTCP will behave safely. If an MP_CAPABLE option is
+ dropped on either the outgoing path or the return path, the
+ initiating host can fall back to regular TCP, as illustrated in
+ Figure 17 and discussed in Section 3.1.
+
+ Host A Host B
+ | Middlebox M |
+ | | |
+ | SYN (MP_CAPABLE) | SYN |
+ |-------------------|---------------->|
+ | SYN/ACK |
+ |<------------------------------------|
+ a) MP_CAPABLE option stripped on outgoing path
+
+ Host A Host B
+ | SYN (MP_CAPABLE) |
+ |-------------------------------------->|
+ | Middlebox M |
+ | | |
+ | SYN/ACK |SYN/ACK (MP_CAPABLE)|
+ |<-----------------|--------------------|
+ b) MP_CAPABLE option stripped on return path
+
+ Figure 17: Connection Setup with Middleboxes That Strip Options
+ from Packets
+
+ Subflow SYNs contain the MP_JOIN option. If this option is stripped
+ on the outgoing path, the SYN will appear to be a regular SYN to
+ Host B. Depending on whether there is a listening socket on the
+ target port, Host B will reply with either a SYN/ACK or a RST
+ (subflow connection fails). When Host A receives the SYN/ACK, it
+ sends a RST because the SYN/ACK does not contain the MP_JOIN option
+ and its token. Either way, the subflow setup fails but otherwise
+ does not affect the MPTCP connection as a whole.
+
+ We now examine data flow with MPTCP, assuming that the flow is
+ correctly set up, which implies that the options in the SYN packets
+ were allowed through by the relevant middleboxes. If options are
+ allowed through and there is no resegmentation or coalescing to TCP
+ segments, Multipath TCP flows can proceed without problems.
+
+ The case when options get stripped on data packets is discussed in
+ Section 3.7. If only some MPTCP options are stripped, behavior is
+ not deterministic. If some Data Sequence Mappings are lost, the
+ connection can continue so long as mappings exist for the subflow-
+ level data (e.g., if multiple maps have been sent that reinforce each
+ other). If some subflow-level space is left unmapped, however, the
+ subflow is treated as broken and is closed, using the process
+ described in Section 3.7. MPTCP should survive with a loss of some
+ Data ACKs, but performance will degrade as the fraction of stripped
+ options increases. We do not expect such cases to appear in
+ practice, though: most middleboxes will either strip all options or
+ let them all through.
+
+ We end this section with a list of middlebox classes, their behavior,
+ and the elements in the MPTCP design that allow operation through
+ such middleboxes. Issues surrounding dropping packets with options
+ or stripping options were discussed above and are not included here:
+
+ * NATs (Network Address (and port) Translators) [RFC3022] change the
+ source address (and often the source port) of packets. This means
+ that a host will not know its public-facing address for signaling
+ in MPTCP. Therefore, MPTCP permits implicit address addition via
+ the MP_JOIN option, and the handshake mechanism ensures that
+ connection attempts to private addresses [RFC1918], since they are
+ authenticated, will only set up subflows to the correct hosts.
+ Explicit address removal is undertaken by an Address ID to allow
+ no knowledge of the source address.
+
+ * Performance Enhancing Proxies (PEPs) [RFC3135] might proactively
+ ACK data to increase performance. MPTCP, however, relies on
+ accurate congestion control signals from the end host, and
+ non-MPTCP-aware PEPs will not be able to provide such signals.
+ MPTCP will, therefore, fall back to single-path TCP or close the
+ problematic subflow (see Section 3.7).
+
+ * Traffic normalizers [norm] may not allow holes in sequence
+ numbers, and they may cache packets and retransmit the same data.
+ MPTCP looks like standard TCP on the wire and will not retransmit
+ different data on the same subflow sequence number. In the event
+ of a retransmission, the same data will be retransmitted on the
+ original TCP subflow even if it is additionally retransmitted at
+ the connection level on a different subflow.
+
+ * Firewalls [RFC2979] might perform Initial Sequence Number (ISN)
+ randomization on TCP connections. MPTCP uses relative sequence
+ numbers in Data Sequence Mappings to cope with this. Like NATs,
+ firewalls will not permit many incoming connections, so MPTCP
+ supports address signaling (ADD_ADDR) so that a multiaddressed
+ host can invite its peer behind the firewall/NAT to connect out to
+ its additional interface.
+
+ * Intrusion Detection Systems / Intrusion Prevention Systems
+ (IDSs/IPSs) observe packet streams for patterns and content that
+ could threaten a network. MPTCP may require the instrumentation
+ of additional paths, and an MPTCP-aware IDS or IPS would need to
+ read MPTCP tokens to correlate data from multiple subflows to
+ maintain comparable visibility into all of the traffic between
+ devices. Without such changes, an IDS would get an incomplete
+ view of the traffic, increasing the risk of missing traffic of
+ interest (false negatives) and increasing the chances of
+ erroneously identifying a subflow as a risk due to only seeing
+ partial data (false positives).
+
+ * Application-level middleboxes such as content-aware firewalls may
+ alter the payload within a subflow -- for example, rewriting URIs
+ in HTTP traffic. MPTCP will detect such changes using the
+ checksum and close the affected subflow(s), if there are other
+ subflows that can be used. If all subflows are affected, MPTCP
+ will fall back to TCP, allowing such middleboxes to change the
+ payload. MPTCP-aware middleboxes should be able to adjust the
+ payload and MPTCP metadata in order not to break the connection.
+
+ In addition, all classes of middleboxes may affect TCP traffic in the
+ following ways:
+
+ * TCP options may be removed, or packets with unknown options
+ dropped, by many classes of middleboxes. It is intended that the
+ initial SYN exchange, with a TCP option, will be sufficient to
+ identify the path's capabilities. If such a packet does not get
+ through, MPTCP will end up falling back to regular TCP.
+
+ * Segmentation/coalescing (e.g., TCP segmentation offloading) might
+ copy options between packets and might strip some options.
+ MPTCP's Data Sequence Mapping includes the relative subflow
+ sequence number instead of using the sequence number in the
+ segment. In this way, the mapping is independent of the packets
+ that carry it.
+
+ * The receive window may be shrunk by some middleboxes at the
+ subflow level. MPTCP will use the maximum window at the data
+ level but will also obey subflow-specific windows.
+
+7. IANA Considerations
+
+ This document obsoletes [RFC6824]. As such, IANA has updated several
+ registries to point to this document. In addition, this document
+ creates one new registry. These topics are described in the
+ following subsections.
+
+7.1. TCP Option Kind Numbers
+
+ IANA has updated the "TCP Option Kind Numbers" registry to point to
+ this document for Multipath TCP, as shown in Table 1:
+
+ +------+--------+-----------------------+-----------+
+ | Kind | Length | Meaning | Reference |
+ +======+========+=======================+===========+
+ | 30 | N | Multipath TCP (MPTCP) | RFC 8684 |
+ +------+--------+-----------------------+-----------+
+
+ Table 1: TCP Option Kind Numbers
+
+7.2. MPTCP Option Subtypes
+
+ The 4-bit MPTCP subtype in the "MPTCP Option Subtypes" subregistry
+ under the "Transmission Control Protocol (TCP) Parameters" registry
+ was defined in [RFC6824]. Since [RFC6824] is an Experimental RFC and
+ not a Standards Track RFC, and since no further entries have occurred
+ beyond those pointing to [RFC6824], IANA has replaced the existing
+ registry with the contents of Table 2 and with the following
+ explanatory note.
+
+ Note: This registry specifies the MPTCP Option Subtypes for MPTCP v1,
+ which obsoletes the Experimental MPTCP v0. For the MPTCP v0
+ subtypes, please refer to [RFC6824].
+
+ +-------+-----------------+----------------------+-------------+
+ | Value | Symbol | Name | Reference |
+ +=======+=================+======================+=============+
+ | 0x0 | MP_CAPABLE | Multipath Capable | RFC 8684, |
+ | | | | Section 3.1 |
+ +-------+-----------------+----------------------+-------------+
+ | 0x1 | MP_JOIN | Join Connection | RFC 8684, |
+ | | | | Section 3.2 |
+ +-------+-----------------+----------------------+-------------+
+ | 0x2 | DSS | Data Sequence Signal | RFC 8684, |
+ | | | (Data ACK and Data | Section 3.3 |
+ | | | Sequence Mapping) | |
+ +-------+-----------------+----------------------+-------------+
+ | 0x3 | ADD_ADDR | Add Address | RFC 8684, |
+ | | | | Section |
+ | | | | 3.4.1 |
+ +-------+-----------------+----------------------+-------------+
+ | 0x4 | REMOVE_ADDR | Remove Address | RFC 8684, |
+ | | | | Section |
+ | | | | 3.4.2 |
+ +-------+-----------------+----------------------+-------------+
+ | 0x5 | MP_PRIO | Change Subflow | RFC 8684, |
+ | | | Priority | Section |
+ | | | | 3.3.8 |
+ +-------+-----------------+----------------------+-------------+
+ | 0x6 | MP_FAIL | Fallback | RFC 8684, |
+ | | | | Section 3.7 |
+ +-------+-----------------+----------------------+-------------+
+ | 0x7 | MP_FASTCLOSE | Fast Close | RFC 8684, |
+ | | | | Section 3.5 |
+ +-------+-----------------+----------------------+-------------+
+ | 0x8 | MP_TCPRST | Subflow Reset | RFC 8684, |
+ | | | | Section 3.6 |
+ +-------+-----------------+----------------------+-------------+
+ | 0xf | MP_EXPERIMENTAL | Reserved for Private | |
+ | | | Use | |
+ +-------+-----------------+----------------------+-------------+
+
+ Table 2: MPTCP Option Subtypes
+
+ Values 0x9 through 0xe are currently unassigned. Option 0xf is
+ reserved for use by private experiments. Its use may be formalized
+ in a future specification. Future assignments in this registry are
+ to be defined by Standards Action as defined by [RFC8126].
+ Assignments consist of the MPTCP subtype's symbolic name, its
+ associated value, and a reference to its specification.
+
+7.3. MPTCP Handshake Algorithms
+
+ The "MPTCP Handshake Algorithms" subregistry under the "Transmission
+ Control Protocol (TCP) Parameters" registry was defined in [RFC6824].
+ Since [RFC6824] is an Experimental RFC and not a Standards Track RFC,
+ and since no further entries have occurred beyond those pointing to
+ [RFC6824], IANA has replaced the existing registry with the contents
+ of Table 3 and with the following explanatory note.
+
+ Note: This registry specifies the MPTCP Handshake Algorithms for
+ MPTCP v1, which obsoletes the Experimental MPTCP v0. For the MPTCP
+ v0 subtypes, please refer to [RFC6824].
+
+ +----------+---------------------------------+-------------+
+ | Flag Bit | Meaning | Reference |
+ +==========+=================================+=============+
+ | A | Checksum required | RFC 8684, |
+ | | | Section 3.1 |
+ +----------+---------------------------------+-------------+
+ | B | Extensibility | RFC 8684, |
+ | | | Section 3.1 |
+ +----------+---------------------------------+-------------+
+ | C | Do not attempt to establish new | RFC 8684, |
+ | | subflows to the source address. | Section 3.1 |
+ +----------+---------------------------------+-------------+
+ | D-G | Unassigned | |
+ +----------+---------------------------------+-------------+
+ | H | HMAC-SHA256 | RFC 8684, |
+ | | | Section 3.2 |
+ +----------+---------------------------------+-------------+
+
+ Table 3: MPTCP Handshake Algorithms
+
+ Note that the meanings of bits "D" through "H" can be dependent upon
+ bit "B", depending on how the Extensibility parameter is defined in
+ future specifications; see Section 3.1 for more information.
+
+ Future assignments in this registry are also to be defined by
+ Standards Action as defined by [RFC8126]. Assignments consist of the
+ value of the flags, a symbolic name for the algorithm, and a
+ reference to its specification.
+
+7.4. MP_TCPRST Reason Codes
+
+ IANA has created a further subregistry, "MPTCP MP_TCPRST Reason
+ Codes" under the "Transmission Control Protocol (TCP) Parameters"
+ registry, based on the reason code in the MP_TCPRST (Section 3.6)
+ message. Initial values for this registry are given in Table 4;
+ future assignments are to be defined by Specification Required as
+ defined by [RFC8126]. Assignments consist of the value of the code,
+ a short description of its meaning, and a reference to its
+ specification. The maximum value is 0xff.
+
+ +------+-----------------------------+-----------------------+
+ | Code | Meaning | Reference |
+ +======+=============================+=======================+
+ | 0x00 | Unspecified error | RFC 8684, Section 3.6 |
+ +------+-----------------------------+-----------------------+
+ | 0x01 | MPTCP-specific error | RFC 8684, Section 3.6 |
+ +------+-----------------------------+-----------------------+
+ | 0x02 | Lack of resources | RFC 8684, Section 3.6 |
+ +------+-----------------------------+-----------------------+
+ | 0x03 | Administratively prohibited | RFC 8684, Section 3.6 |
+ +------+-----------------------------+-----------------------+
+ | 0x04 | Too much outstanding data | RFC 8684, Section 3.6 |
+ +------+-----------------------------+-----------------------+
+ | 0x05 | Unacceptable performance | RFC 8684, Section 3.6 |
+ +------+-----------------------------+-----------------------+
+ | 0x06 | Middlebox interference | RFC 8684, Section 3.6 |
+ +------+-----------------------------+-----------------------+
+
+ Table 4: MPTCP MP_TCPRST Reason Codes
+
+ As guidance to the designated expert [RFC8126], assignments should
+ not normally be refused unless codepoint space is becoming scarce,
+ provided that there is a clear distinction from other, already-
+ existing codes and also provided that there is sufficient guidance
+ for implementers both sending and receiving these codes.
+
+8. References
+
+8.1. Normative References
+
+ [RFC0793] Postel, J., "Transmission Control Protocol", STD 7,
+ RFC 793, DOI 10.17487/RFC0793, September 1981,
+ <https://www.rfc-editor.org/info/rfc793>.
+
+ [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-
+ Hashing for Message Authentication", RFC 2104,
+ DOI 10.17487/RFC2104, February 1997,
+ <https://www.rfc-editor.org/info/rfc2104>.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <https://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's
+ Robustness to Blind In-Window Attacks", RFC 5961,
+ DOI 10.17487/RFC5961, August 2010,
+ <https://www.rfc-editor.org/info/rfc5961>.
+
+ [RFC6234] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms
+ (SHA and SHA-based HMAC and HKDF)", RFC 6234,
+ DOI 10.17487/RFC6234, May 2011,
+ <https://www.rfc-editor.org/info/rfc6234>.
+
+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+ 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+ May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
+8.2. Informative References
+
+ [deployments]
+ Bonaventure, O. and S. Seo, "Multipath TCP Deployments",
+ IETF Journal 2016, November 2016,
+ <https://www.ietfjournal.org/multipath-tcp-deployments/>.
+
+ [howhard] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M.,
+ Duchene, F., Bonaventure, O., and M. Handley, "How Hard
+ Can It Be? Designing and Implementing a Deployable
+ Multipath TCP", Usenix Symposium on Networked Systems
+ Design and Implementation 2012, April 2012,
+ <https://www.usenix.org/conference/nsdi12/technical-
+ sessions/presentation/raiciu>.
+
+ [norm] Handley, M., Paxson, V., and C. Kreibich, "Network
+ Intrusion Detection: Evasion, Traffic Normalization, and
+ End-to-End Protocol Semantics", Usenix Security
+ Symposium 2001, August 2001,
+ <https://www.usenix.org/legacy/events/sec01/full_papers/
+ handley/handley.pdf>.
+
+ [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
+ Communication Layers", STD 3, RFC 1122,
+ DOI 10.17487/RFC1122, October 1989,
+ <https://www.rfc-editor.org/info/rfc1122>.
+
+ [RFC1918] Rekhter, Y., Moskowitz, B., Karrenberg, D., de Groot, G.
+ J., and E. Lear, "Address Allocation for Private
+ Internets", BCP 5, RFC 1918, DOI 10.17487/RFC1918,
+ February 1996, <https://www.rfc-editor.org/info/rfc1918>.
+
+ [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
+ Selective Acknowledgment Options", RFC 2018,
+ DOI 10.17487/RFC2018, October 1996,
+ <https://www.rfc-editor.org/info/rfc2018>.
+
+ [RFC2979] Freed, N., "Behavior of and Requirements for Internet
+ Firewalls", RFC 2979, DOI 10.17487/RFC2979, October 2000,
+ <https://www.rfc-editor.org/info/rfc2979>.
+
+ [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path
+ Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000,
+ <https://www.rfc-editor.org/info/rfc2992>.
+
+ [RFC3022] Srisuresh, P. and K. Egevang, "Traditional IP Network
+ Address Translator (Traditional NAT)", RFC 3022,
+ DOI 10.17487/RFC3022, January 2001,
+ <https://www.rfc-editor.org/info/rfc3022>.
+
+ [RFC3135] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
+ Shelby, "Performance Enhancing Proxies Intended to
+ Mitigate Link-Related Degradations", RFC 3135,
+ DOI 10.17487/RFC3135, June 2001,
+ <https://www.rfc-editor.org/info/rfc3135>.
+
+ [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker,
+ "Randomness Requirements for Security", BCP 106, RFC 4086,
+ DOI 10.17487/RFC4086, June 2005,
+ <https://www.rfc-editor.org/info/rfc4086>.
+
+ [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common
+ Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007,
+ <https://www.rfc-editor.org/info/rfc4987>.
+
+ [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
+ Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
+ <https://www.rfc-editor.org/info/rfc5681>.
+
+ [RFC6181] Bagnulo, M., "Threat Analysis for TCP Extensions for
+ Multipath Operation with Multiple Addresses", RFC 6181,
+ DOI 10.17487/RFC6181, March 2011,
+ <https://www.rfc-editor.org/info/rfc6181>.
+
+ [RFC6182] Ford, A., Raiciu, C., Handley, M., Barre, S., and J.
+ Iyengar, "Architectural Guidelines for Multipath TCP
+ Development", RFC 6182, DOI 10.17487/RFC6182, March 2011,
+ <https://www.rfc-editor.org/info/rfc6182>.
+
+ [RFC6356] Raiciu, C., Handley, M., and D. Wischik, "Coupled
+ Congestion Control for Multipath Transport Protocols",
+ RFC 6356, DOI 10.17487/RFC6356, October 2011,
+ <https://www.rfc-editor.org/info/rfc6356>.
+
+ [RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence
+ Number Attacks", RFC 6528, DOI 10.17487/RFC6528, February
+ 2012, <https://www.rfc-editor.org/info/rfc6528>.
+
+ [RFC6824] Ford, A., Raiciu, C., Handley, M., and O. Bonaventure,
+ "TCP Extensions for Multipath Operation with Multiple
+ Addresses", RFC 6824, DOI 10.17487/RFC6824, January 2013,
+ <https://www.rfc-editor.org/info/rfc6824>.
+
+ [RFC6897] Scharf, M. and A. Ford, "Multipath TCP (MPTCP) Application
+ Interface Considerations", RFC 6897, DOI 10.17487/RFC6897,
+ March 2013, <https://www.rfc-editor.org/info/rfc6897>.
+
+ [RFC7323] Borman, D., Braden, B., Jacobson, V., and R.
+ Scheffenegger, Ed., "TCP Extensions for High Performance",
+ RFC 7323, DOI 10.17487/RFC7323, September 2014,
+ <https://www.rfc-editor.org/info/rfc7323>.
+
+ [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
+ Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
+ <https://www.rfc-editor.org/info/rfc7413>.
+
+ [RFC7430] Bagnulo, M., Paasch, C., Gont, F., Bonaventure, O., and C.
+ Raiciu, "Analysis of Residual Threats and Possible Fixes
+ for Multipath TCP (MPTCP)", RFC 7430,
+ DOI 10.17487/RFC7430, July 2015,
+ <https://www.rfc-editor.org/info/rfc7430>.
+
+ [RFC8041] Bonaventure, O., Paasch, C., and G. Detal, "Use Cases and
+ Operational Experience with Multipath TCP", RFC 8041,
+ DOI 10.17487/RFC8041, January 2017,
+ <https://www.rfc-editor.org/info/rfc8041>.
+
+ [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
+ Writing an IANA Considerations Section in RFCs", BCP 26,
+ RFC 8126, DOI 10.17487/RFC8126, June 2017,
+ <https://www.rfc-editor.org/info/rfc8126>.
+
+ [TCPLO] Ramaiah, A., "TCP option space extension", Work in
+ Progress, Internet-Draft, draft-ananth-tcpm-tcpoptext-00,
+ 26 March 2012, <https://tools.ietf.org/html/draft-ananth-
+ tcpm-tcpoptext-00>.
+
+Appendix A. Notes on Use of TCP Options
+
+ The TCP option space is limited due to the length of the Data Offset
+ field in the TCP header (4 bits), which defines the TCP header length
+ in 32-bit words. With the standard TCP header being 20 bytes, this
+ leaves a maximum of 40 bytes for options, and many of these may
+ already be used by options such as timestamp and SACK.
+
+ We performed a brief study on the commonly used TCP options in SYN,
+ data, and pure ACK packets and found that there is enough room to fit
+ all the options discussed in this document.
+
+ SYN packets typically include the following options: Maximum Segment
+ Size (MSS) (4 bytes), window scale (3 bytes), SACK permitted
+ (2 bytes), and timestamp (10 bytes). The sum of these options is
+ 19 bytes. Some operating systems appear to pad each option up to a
+ word boundary, thus using 24 bytes (a brief survey suggests that
+ Windows XP and Mac OS X do this, whereas Linux does not).
+ Optimistically, therefore, we have 21 bytes available, or 16 if
+ options have to be word-aligned. In either case, however, the SYN
+ versions of MP_CAPABLE (12 bytes) and MP_JOIN (12 or 16 bytes) will
+ fit in this remaining space.
+
+ Note that due to the use of a 64-bit data-level sequence space, it is
+ feasible that MPTCP will not require the timestamp option for
+ protection against wrapped sequence numbers (per the Protection
+ Against Wrapped Sequences (PAWS) mechanism, as described in
+ [RFC7323]), since the data-level sequence space has far less chance
+ of wrapping. Confirmation of the validity of this optimization is
+ left for further study.
+
+ TCP data packets typically carry timestamp options in every packet,
+ taking 10 bytes (or 12, with padding). That leaves 30 bytes (or 28,
+ if word-aligned). The DSS option varies in length, depending on
+ (1) whether the Data Sequence Mapping, DATA_ACK, or both are
+ included, (2) whether the sequence numbers in use are 4 or 8 octets,
+ and (3) whether the checksum is present. The maximum size of the DSS
+ option is 28 bytes, so even that will fit in the available space.
+ But unless a connection is both bidirectional and high-bandwidth, it
+ is unlikely that all that option space will be required on each DSS
+ option.
+
+ Within the DSS option, it is not necessary to include the Data
+ Sequence Mapping and DATA_ACK in each packet, and in many cases it
+ may be possible to alternate their presence (so long as the mapping
+ covers the data being sent in the subsequent packet). It would also
+ be possible to alternate between 4-byte and 8-byte sequence numbers
+ in each option.
+
+ On subflow and connection setup, an MPTCP option is also set on the
+ third packet (an ACK). These are 20 bytes (for MP_CAPABLE) and
+ 24 bytes (for MP_JOIN), both of which will fit in the available
+ option space.
+
+ Pure ACKs in TCP typically contain only timestamps (10 bytes). Here,
+ Multipath TCP typically needs to encode only the DATA_ACK (maximum of
+ 12 bytes). Occasionally, ACKs will contain SACK information.
+ Depending on the number of lost packets, SACK may utilize the entire
+ option space. If a DATA_ACK had to be included, then it is probably
+ necessary to reduce the number of SACK blocks to accommodate the
+ DATA_ACK. However, the presence of the DATA_ACK is unlikely to be
+ necessary in a case where SACK is in use, since until at least some
+ of the SACK blocks have been retransmitted, the cumulative data-level
+ ACK will not be moving forward (or if it does, due to retransmissions
+ on another path, then that path can also be used to transmit the new
+ DATA_ACK).
+
+ The ADD_ADDR option can be between 16 and 30 bytes, depending on
+ (1) whether IPv4 or IPv6 is used and (2) whether or not the port
+ number is present. It is unlikely that such signaling would fit in a
+ data packet (although if there is space, it is fine to include it).
+ It is recommended that duplicate ACKs not be used with any other
+ payload or options, in order to transmit these rare signals. Note
+ that this is the reason for mandating that duplicate ACKs with MPTCP
+ options not be taken as a signal of congestion.
+
+Appendix B. TCP Fast Open and MPTCP
+
+ TCP Fast Open (TFO) is an experimental TCP extension, described in
+ [RFC7413], which has been introduced to allow the sending of data one
+ RTT earlier than with regular TCP. This is considered a valuable
+ gain, as very short connections are very common, especially for HTTP
+ request/response schemes. It achieves this by sending the SYN
+ segment together with the application's data and allowing the
+ listener to reply immediately with data after the SYN/ACK. [RFC7413]
+ secures this mechanism by using a new TCP option that includes a
+ cookie that is negotiated in a preceding connection.
+
+ When using TFO in conjunction with MPTCP, there are two key points to
+ take into account, as detailed below.
+
+B.1. TFO Cookie Request with MPTCP
+
+ When a TFO initiator first connects to a listener, it cannot
+ immediately include data in the SYN for security reasons [RFC7413].
+ Instead, it requests a cookie that will be used in subsequent
+ connections. This is done with the TCP cookie request/response
+ options, of 2 bytes and 6-18 bytes, respectively (depending on the
+ chosen cookie length).
+
+ TFO and MPTCP can be combined, provided that the total length of all
+ the options does not exceed the maximum 40 bytes possible in TCP:
+
+ * In the SYN: MPTCP uses a 4-byte MP_CAPABLE option. The sum of the
+ MPTCP and TFO options is 6 bytes. With typical TCP options using
+ up to 19 bytes in the SYN (24 bytes if options are padded at a
+ word boundary), there is enough space to combine the MP_CAPABLE
+ with the TFO cookie request.
+
+ * In the SYN + ACK: MPTCP uses a 12-byte MP_CAPABLE option, but now
+ the TFO option can be as long as 18 bytes. Since the maximum
+ option length may be exceeded, it is up to the listener to avoid
+ this problem by using a shorter cookie. As an example, if we
+ consider that 19 bytes are used for classical TCP options, the
+ maximum possible cookie length would be 7 bytes. Note that, for
+ the SYN packet, the same limitation applies to subsequent
+ connections (because the initiator then echoes the cookie back to
+ the listener). Finally, if the security impact of reducing the
+ cookie size is not deemed acceptable, the listener can reduce the
+ amount of space used by other TCP options by omitting the TCP
+ timestamps (as outlined in Appendix A).
+
+B.2. Data Sequence Mapping under TFO
+
+ In the TCP establishment phase, MPTCP uses a key exchange that is
+ used to generate the Initial Data Sequence Numbers (IDSNs). In
+ particular, the SYN with MP_CAPABLE occupies the first octet of data
+ sequence space. With TFO, one way to handle the data sent together
+ with the SYN would be to consider an implicit DSS mapping that covers
+ that SYN segment (since there is not enough space in the SYN to
+ include a DSS option). The problem with that approach is that if a
+ middlebox modifies the TFO data, this will not be noticed by MPTCP
+ because of the absence of a DSS checksum. For example, a TCP-aware
+ (but not MPTCP-aware) middlebox could insert bytes at the beginning
+ of the stream and adapt the TCP checksum and sequence numbers
+ accordingly. With an implicit mapping, this information would give
+ to the initiator and listener a different view of the DSS mapping;
+ there would be no way to detect this inconsistency, because the DSS
+ checksum is not present.
+
+ To solve this issue, the TFO data must not be considered part of the
+ data sequence number space: the SYN with MP_CAPABLE still occupies
+ the first octet of data sequence space, but then the first non-TFO
+ data byte occupies the second octet. This guarantees that, if the
+ use of the DSS checksum is negotiated, all data in the data sequence
+ number space is checksummed. We also note that this does not entail
+ a loss of functionality, because TFO data is always only sent on the
+ initial subflow, before any attempt to create additional subflows.
+
+B.3. Connection Establishment Examples
+
+ A few examples of possible "TFO + MPTCP" establishment scenarios are
+ shown below.
+
+ Before an initiator can send data together with the SYN, it must
+ request a cookie from the listener, as shown in Figure 18. (Note:
+ The sequence number and length are annotated in Figure 18 as
+ Seq(Length) (e.g., "S. 0(0)") and used as such in the subsequent
+ figures (e.g., "S 0(20)" in Figure 19).) This is done by simply
+ combining the TFO and MPTCP options.
+
+ initiator listener
+ | |
+ | S Seq=0(Length=0) <MP_CAPABLE>, <TFO cookie request> |
+ | --------------------------------------------------------> |
+ | |
+ | S. 0(0) ack 1 <MP_CAPABLE>, <TFO cookie> |
+ | <-------------------------------------------------------- |
+ | |
+ | . 0(0) ack 1 <MP_CAPABLE> |
+ | --------------------------------------------------------> |
+ | |
+
+ Figure 18: Cookie Request
+
+ Once this is done, the received cookie can be used for TFO, as shown
+ in Figure 19. In this example, the initiator first sends 20 bytes in
+ the SYN. The listener immediately replies with 100 bytes following
+ the SYN-ACK, to which the initiator replies with 20 more bytes. Note
+ that the last segment in the figure has a TCP sequence number of 21,
+ while the DSS subflow sequence number is 1 (because the TFO data is
+ not part of the data sequence number space, as explained in
+ Appendix B.2.
+
+ initiator listener
+ | |
+ | S 0(20) <MP_CAPABLE>, <TFO cookie> |
+ | --------------------------------------------------------> |
+ | |
+ | S. 0(0) ack 21 <MP_CAPABLE> |
+ | <-------------------------------------------------------- |
+ | |
+ | . 1(100) ack 21 <DSS ack=1 seq=1 ssn=1 dlen=100> |
+ | <-------------------------------------------------------- |
+ | |
+ | . 21(0) ack 1 <MP_CAPABLE> |
+ | --------------------------------------------------------> |
+ | |
+ | . 21(20) ack 101 <DSS ack=101 seq=1 ssn=1 dlen=20> |
+ | --------------------------------------------------------> |
+ | |
+
+ Figure 19: The Listener Supports TFO
+
+ In Figure 20, the listener does not support TFO. The initiator
+ detects that no state is created in the listener (as no data is
+ ACKed) and now sends the MP_CAPABLE in the third packet, in order for
+ the listener to build its MPTCP context at the end of the
+ establishment. Now, the TFO data, when retransmitted, becomes part
+ of the Data Sequence Mapping because it is effectively sent (in fact
+ re-sent) after the establishment.
+
+ initiator listener
+ | |
+ | S 0(20) <MP_CAPABLE>, <TFO cookie> |
+ | --------------------------------------------------------> |
+ | |
+ | S. 0(0) ack 1 <MP_CAPABLE> |
+ | <-------------------------------------------------------- |
+ | |
+ | . 1(0) ack 1 <MP_CAPABLE> |
+ | --------------------------------------------------------> |
+ | |
+ | . 1(20) ack 1 <DSS ack=1 seq=1 ssn=1 dlen=20> |
+ | --------------------------------------------------------> |
+ | |
+ | . 0(0) ack 21 <DSS ack=21 seq=1 ssn=1 dlen=0> |
+ | <-------------------------------------------------------- |
+ | |
+
+ Figure 20: The Listener Does Not Support TFO
+
+ It is also possible that the listener acknowledges only part of the
+ TFO data, as illustrated in Figure 21. The initiator will simply
+ retransmit the missing data together with a DSS mapping.
+
+ initiator listener
+ | |
+ | S 0(1000) <MP_CAPABLE>, <TFO cookie> |
+ | --------------------------------------------------------> |
+ | |
+ | S. 0(0) ack 501 <MP_CAPABLE> |
+ | <-------------------------------------------------------- |
+ | |
+ | . 501(0) ack 1 <MP_CAPABLE> |
+ | --------------------------------------------------------> |
+ | |
+ | . 501(500) ack 1 <DSS ack=1 seq=1 ssn=1 dlen=500> |
+ | --------------------------------------------------------> |
+ | |
+
+ Figure 21: Partial Data Acknowledgment
+
+Appendix C. Control Blocks
+
+ Conceptually, an MPTCP connection can be represented as an MPTCP
+ protocol control block (PCB) that contains several variables that
+ track the progress and the state of the MPTCP connection and a set of
+ linked TCP control blocks that correspond to the subflows that have
+ been established.
+
+ RFC 793 [RFC0793] specifies several state variables. Whenever
+ possible, we reuse the same terminology as RFC 793 to describe the
+ state variables that are maintained by MPTCP.
+
+C.1. MPTCP Control Block
+
+ The MPTCP control block contains the following variables per
+ connection.
+
+C.1.1. Authentication and Metadata
+
+ Local.Token (32 bits): This is the token chosen by the local host on
+ this MPTCP connection. The token must be unique among all
+ established MPTCP connections and is generated from the local key.
+
+ Local.Key (64 bits): This is the key sent by the local host on this
+ MPTCP connection.
+
+ Remote.Token (32 bits): This is the token chosen by the remote host
+ on this MPTCP connection, generated from the remote key.
+
+ Remote.Key (64 bits): This is the key chosen by the remote host on
+ this MPTCP connection.
+
+ MPTCP.Checksum (flag): This flag is set to true if at least one of
+ the hosts has set the "A" bit in the MP_CAPABLE options exchanged
+ during connection establishment; otherwise, it is set to false.
+ If this flag is set, the checksum must be computed in all DSS
+ options.
+
+C.1.2. Sending Side
+
+ SND.UNA (64 bits): This is the data sequence number of the next byte
+ to be acknowledged, at the MPTCP connection level. This variable
+ is updated upon reception of a DSS option containing a DATA_ACK.
+
+ SND.NXT (64 bits): This is the data sequence number of the next byte
+ to be sent. SND.NXT is used to determine the value of the DSN in
+ the DSS option.
+
+ SND.WND (32 bits): This is the send window. 32 bits if the features
+ in RFC 7323 are used; 16 bits otherwise. MPTCP maintains the send
+ window at the MPTCP connection level, and the same window is
+ shared by all subflows. All subflows use the MPTCP connection-
+ level SND.WND to compute the SEQ.WND value that is sent in each
+ transmitted segment.
+
+C.1.3. Receiving Side
+
+ RCV.NXT (64 bits): This is the data sequence number of the next byte
+ that is expected on the MPTCP connection. This state variable is
+ modified upon reception of in-order data. The value of RCV.NXT is
+ used to specify the DATA_ACK that is sent in the DSS option on all
+ subflows.
+
+ RCV.WND (32 bits): This is the connection-level receive window,
+ which is the maximum of the RCV.WND on all the subflows. 32 bits
+ if the features in RFC 7323 are used; 16 bits otherwise.
+
+C.2. TCP Control Blocks
+
+ The MPTCP control block also contains a list of the TCP control
+ blocks that are associated with the MPTCP connection.
+
+ Note that the TCP control block on the TCP subflows does not contain
+ the RCV.WND and SND.WND state variables, as these are maintained at
+ the MPTCP connection level and not at the subflow level.
+
+ Inside each TCP control block, the following state variables are
+ defined.
+
+C.2.1. Sending Side
+
+ SND.UNA (32 bits): This is the sequence number of the next byte to
+ be acknowledged on the subflow. This variable is updated upon
+ reception of each TCP acknowledgment on the subflow.
+
+ SND.NXT (32 bits): This is the sequence number of the next byte to
+ be sent on the subflow. SND.NXT is used to set the value of
+ SEG.SEQ upon transmission of the next segment.
+
+C.2.2. Receiving Side
+
+ RCV.NXT (32 bits): This is the sequence number of the next byte that
+ is expected on the subflow. This state variable is modified upon
+ reception of in-order segments. The value of RCV.NXT is copied to
+ the SEG.ACK field of the next segments transmitted on the subflow.
+
+ RCV.WND (32 bits): This is the subflow-level receive window that is
+ updated with the window field from the segments received on this
+ subflow. 32 bits if the features in RFC 7323 are used; 16 bits
+ otherwise.
+
+Appendix D. Finite State Machine
+
+ The diagram in Figure 22 shows the Finite State Machine for
+ connection-level closure. This illustrates how the DATA_FIN
+ connection-level signal (indicated in the diagram as the DFIN flag on
+ a DATA_ACK) (1) interacts with subflow-level FINs and (2) permits
+ break-before-make handover between subflows.
+
+ +---------+
+ | M_ESTAB |
+ +---------+
+ M_CLOSE | | rcv DATA_FIN
+ ------- | | -------
+ +---------+ snd DATA_FIN / \ snd DATA_ACK[DFIN] +-------+
+ | M_FIN |<----------------- ------------------->|M_CLOSE|
+ | WAIT-1 |--------------------------- | WAIT |
+ +---------+ rcv DATA_FIN \ +-------+
+ | rcv DATA_ACK[DFIN] ------- | M_CLOSE |
+ | -------------- snd DATA_ACK | ------- |
+ | CLOSE all subflows | snd DATA_FIN |
+ V V V
+ +-----------+ +-----------+ +----------+
+ |M_FINWAIT-2| | M_CLOSING | |M_LAST-ACK|
+ +-----------+ +-----------+ +----------+
+ | rcv DATA_ACK[DFIN] | rcv DATA_ACK[DFIN] |
+ | rcv DATA_FIN -------------- | -------------- |
+ | ------- CLOSE all subflows | CLOSE all subflows |
+ | snd DATA_ACK[DFIN] V delete MPTCP PCB V
+ \ +-----------+ +--------+
+ ------------------------>|M_TIME WAIT|---------------->|M_CLOSED|
+ +-----------+ +--------+
+ All subflows in CLOSED
+ ------------
+ delete MPTCP PCB
+
+ Figure 22: Finite State Machine for Connection Closure
+
+Appendix E. Changes from RFC 6824
+
+ This appendix lists the key technical changes between [RFC6824],
+ which specifies MPTCP v0; and this document, which obsoletes
+ [RFC6824] and specifies MPTCP v1. Note that this specification is
+ not backward compatible with [RFC6824].
+
+ * This document incorporates lessons learned from the various
+ implementations, deployments, and experiments gathered in the
+ documents "Use Cases and Operational Experience with Multipath
+ TCP" [RFC8041] and the IETF Journal article "Multipath TCP
+ Deployments" [deployments].
+
+ * Connection initiation, through the exchange of the MP_CAPABLE
+ MPTCP option, is different from [RFC6824]. The SYN no longer
+ includes the initiator's key, to allow the MP_CAPABLE option on
+ the SYN to be shorter in length and to avoid duplicating the
+ sending of keying material.
+
+ * This also ensures reliable delivery of the key on the MP_CAPABLE
+ option by allowing its transmission to be combined with data and
+ thus using TCP's built-in reliability mechanism. If the initiator
+ does not immediately have data to send, the MP_CAPABLE option with
+ the keys will be repeated on the first data packet. If the other
+ end is the first to send, then the presence of the DSS option
+ implicitly confirms the receipt of the MP_CAPABLE.
+
+ * In the Flags field of MP_CAPABLE, "C" is now assigned to mean that
+ the sender of this option will not accept additional MPTCP
+ subflows to the source address and port. This improves efficiency
+ -- for example, in cases where the sender is behind a strict NAT.
+
+ * In the Flags field of MP_CAPABLE, "H" now indicates the use of
+ HMAC-SHA256 (rather than HMAC-SHA1).
+
+ * Connection initiation also defines the procedure for version
+ negotiation, for implementations that support both v0 [RFC6824]
+ and v1 (this document).
+
+ * The HMAC-SHA256 (rather than HMAC-SHA1) algorithm is used, as it
+ provides better security. It is used to generate the token in the
+ MP_JOIN and ADD_ADDR messages and to set the IDSN.
+
+ * A new subflow-level option exists to signal reasons for sending a
+ RST on a subflow (MP_TCPRST (Section 3.6)); this can help an
+ implementation decide whether to attempt later reconnection.
+
+ * The MP_PRIO option (Section 3.3.8), which is used to signal a
+ change of priority for a subflow, no longer includes the AddrID
+ field. Its purpose was to allow the changed priority to be
+ applied on a subflow other than the one it was sent on. However,
+ it was determined that this could be used by a man-in-the-middle
+ to divert all traffic onto its own path, and MP_PRIO does not
+ include a token or other type of security mechanism.
+
+ * The ADD_ADDR option (Section 3.4.1), which is used to inform the
+ other host about another potential address, is different in
+ several ways. It now includes an HMAC of the added address, for
+ enhanced security. In addition, reliability for the ADD_ADDR
+ option has been added: the IPVer field is replaced with a flag
+ field, and one flag is assigned ("E") that is used as an "echo" so
+ a host can indicate that it has received the option.
+
+ * This document describes an additional way of performing a Fast
+ Close -- by sending an MP_FASTCLOSE option on a RST on all
+ subflows. This allows the host to tear down the subflows and the
+ connection immediately.
+
+ * IANA has reserved the MPTCP option subtype of value 0xf for
+ Private Use (Section 7.2). This document doesn't define how to
+ use that value.
+
+ * This document adds a new appendix (Appendix B), which discusses
+ the usage of both MPTCP options and TFO options on the same
+ packet.
+
+Acknowledgments
+
+ The authors gratefully acknowledge significant input into this
+ document from Sebastien Barre and Andrew McDonald.
+
+ The authors also wish to acknowledge reviews and contributions from
+ Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock,
+ Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo,
+ Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing,
+ Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey
+ Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks,
+ Sean Turner, Stephen Farrell, Martin Stiemerling, Gregory Detal,
+ Fabien Duchene, Xavier de Foy, Rahul Jadhav, Klemens Schragel, Mirja
+ Kühlewind, Sheng Jiang, Alissa Cooper, Ines Robles, Roman Danyliw,
+ Adam Roach, Eric Vyncke, and Ben Kaduk.
+
+Authors' Addresses
+
+ Alan Ford
+ Pexip
+
+ Email: alan.ford@gmail.com
+
+
+ Costin Raiciu
+ University Politehnica of Bucharest
+ Splaiul Independentei 313
+ Bucharest
+ Romania
+
+ Email: costin.raiciu@cs.pub.ro
+
+
+ Mark Handley
+ University College London
+ Gower Street
+ London
+ WC1E 6BT
+ United Kingdom
+
+ Email: m.handley@cs.ucl.ac.uk
+
+
+ Olivier Bonaventure
+ Université catholique de Louvain
+ Pl. Ste Barbe, 2
+ 1348 Louvain-la-Neuve
+ Belgium
+
+ Email: olivier.bonaventure@uclouvain.be
+
+
+ Christoph Paasch
+ Apple, Inc.
+ Cupertino, CA
+ United States of America
+
+ Email: cpaasch@apple.com