summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc6824.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc6824.txt')
-rw-r--r--doc/rfc/rfc6824.txt3587
1 files changed, 3587 insertions, 0 deletions
diff --git a/doc/rfc/rfc6824.txt b/doc/rfc/rfc6824.txt
new file mode 100644
index 0000000..c3d677c
--- /dev/null
+++ b/doc/rfc/rfc6824.txt
@@ -0,0 +1,3587 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) A. Ford
+Request for Comments: 6824 Cisco
+Category: Experimental C. Raiciu
+ISSN: 2070-1721 U. Politechnica of Bucharest
+ M. Handley
+ U. College London
+ O. Bonaventure
+ U. catholique de Louvain
+ January 2013
+
+
+ TCP Extensions for Multipath Operation with Multiple Addresses
+
+Abstract
+
+ TCP/IP communication is currently restricted to a single path per
+ connection, yet multiple paths often exist between peers. The
+ simultaneous use of these multiple paths for a TCP/IP session would
+ improve resource usage within the network and, thus, improve user
+ experience through higher throughput and improved resilience to
+ network failure.
+
+ Multipath TCP provides the ability to simultaneously use multiple
+ paths between peers. This document presents a set of extensions to
+ traditional TCP to support multipath operation. The protocol offers
+ the same type of service to applications as TCP (i.e., reliable
+ bytestream), and it provides the components necessary to establish
+ and use multiple TCP flows across potentially disjoint paths.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for examination, experimental implementation, and
+ evaluation.
+
+ This document defines an Experimental Protocol for the Internet
+ community. This document is a product of the Internet Engineering
+ Task Force (IETF). It represents the consensus of the IETF
+ community. It has received public review and has been approved for
+ publication by the Internet Engineering Steering Group (IESG). Not
+ all documents approved by the IESG are a candidate for any level of
+ Internet Standard; see Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc6824.
+
+
+
+
+
+Ford, et al. Experimental [Page 1]
+
+RFC 6824 Multipath TCP January 2013
+
+
+Copyright Notice
+
+ Copyright (c) 2013 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction ....................................................4
+ 1.1. Design Assumptions .........................................4
+ 1.2. Multipath TCP in the Networking Stack ......................5
+ 1.3. Terminology ................................................6
+ 1.4. MPTCP Concept ..............................................7
+ 1.5. Requirements Language ......................................8
+ 2. Operation Overview ..............................................8
+ 2.1. Initiating an MPTCP Connection .............................9
+ 2.2. Associating a New Subflow with an Existing MPTCP
+ Connection .................................................9
+ 2.3. Informing the Other Host about Another Potential Address ..10
+ 2.4. Data Transfer Using MPTCP .................................11
+ 2.5. Requesting a Change in a Path's Priority ..................11
+ 2.6. Closing an MPTCP Connection ...............................12
+ 2.7. Notable Features ..........................................12
+ 3. MPTCP Protocol .................................................12
+ 3.1. Connection Initiation .....................................14
+ 3.2. Starting a New Subflow ....................................18
+ 3.3. General MPTCP Operation ...................................23
+ 3.3.1. Data Sequence Mapping ..............................25
+ 3.3.2. Data Acknowledgments ...............................28
+ 3.3.3. Closing a Connection ...............................29
+ 3.3.4. Receiver Considerations ............................30
+ 3.3.5. Sender Considerations ..............................31
+ 3.3.6. Reliability and Retransmissions ....................32
+ 3.3.7. Congestion Control Considerations ..................33
+ 3.3.8. Subflow Policy .....................................34
+ 3.4. Address Knowledge Exchange (Path Management) ..............35
+ 3.4.1. Address Advertisement ..............................36
+ 3.4.2. Remove Address .....................................39
+ 3.5. Fast Close ................................................40
+
+
+
+Ford, et al. Experimental [Page 2]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ 3.6. Fallback ..................................................41
+ 3.7. Error Handling ............................................45
+ 3.8. Heuristics ................................................45
+ 3.8.1. Port Usage .........................................46
+ 3.8.2. Delayed Subflow Start ..............................46
+ 3.8.3. Failure Handling ...................................47
+ 4. Semantic Issues ................................................48
+ 5. Security Considerations ........................................49
+ 6. Interactions with Middleboxes ..................................51
+ 7. Acknowledgments ................................................55
+ 8. IANA Considerations ............................................55
+ 9. References .....................................................57
+ 9.1. Normative References ......................................57
+ 9.2. Informative References ....................................57
+ Appendix A. Notes on Use of TCP Options ...........................59
+ Appendix B. Control Blocks ........................................60
+ B.1. MPTCP Control Block .......................................60
+ B.1.1. Authentication and Metadata ........................60
+ B.1.2. Sending Side .......................................61
+ B.1.3. Receiving Side .....................................61
+ B.2. TCP Control Blocks ........................................62
+ B.2.1. Sending Side .......................................62
+ B.2.2. Receiving Side .....................................62
+ Appendix C. Finite State Machine ..................................63
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 3]
+
+RFC 6824 Multipath TCP January 2013
+
+
+1. Introduction
+
+ Multipath TCP (MPTCP) is a set of extensions to regular TCP [1] to
+ provide a Multipath TCP [2] service, which enables a transport
+ connection to operate across multiple paths simultaneously. This
+ document presents the protocol changes required to add multipath
+ capability to TCP; specifically, those for signaling and setting up
+ multiple paths ("subflows"), managing these subflows, reassembly of
+ data, and termination of sessions. This is not the only information
+ required to create a Multipath TCP implementation, however. This
+ document is complemented by three others:
+
+ o Architecture [2], which explains the motivations behind Multipath
+ TCP, contains a discussion of high-level design decisions on which
+ this design is based, and an explanation of a functional
+ separation through which an extensible MPTCP implementation can be
+ developed.
+
+ o Congestion control [5] presents a safe congestion control
+ algorithm for coupling the behavior of the multiple paths in order
+ to "do no harm" to other network users.
+
+ o Application considerations [6] discusses what impact MPTCP will
+ have on applications, what applications will want to do with
+ MPTCP, and as a consequence of these factors, what API extensions
+ an MPTCP implementation should present.
+
+1.1. Design Assumptions
+
+ In order to limit the potentially huge design space, the working
+ group imposed two key constraints on the Multipath TCP design
+ presented in this document:
+
+ o It must be backwards-compatible with current, regular TCP, to
+ increase its chances of deployment.
+
+ o It can be assumed that one or both hosts are multihomed and
+ multiaddressed.
+
+ To simplify the design, we assume that the presence of multiple
+ addresses at a host is sufficient to indicate the existence of
+ multiple paths. These paths need not be entirely disjoint: they may
+ share one or many routers between them. Even in such a situation,
+ making use of multiple paths is beneficial, improving resource
+ utilization and resilience to a subset of node failures. The
+ congestion control algorithms defined in [5] ensure this does not act
+ detrimentally. Furthermore, there may be some scenarios where
+ different TCP ports on a single host can provide disjoint paths (such
+
+
+
+Ford, et al. Experimental [Page 4]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ as through certain Equal-Cost Multipath (ECMP) implementations [7]),
+ and so the MPTCP design also supports the use of ports in path
+ identifiers.
+
+ There are three aspects to the backwards-compatibility listed above
+ (discussed in more detail in [2]):
+
+ External Constraints: The protocol must function through the vast
+ majority of existing middleboxes such as NATs, firewalls, and
+ proxies, and as such must resemble existing TCP as far as possible
+ on the wire. Furthermore, the protocol must not assume the
+ segments it sends on the wire arrive unmodified at the
+ destination: they may be split or coalesced; TCP options may be
+ removed or duplicated.
+
+ Application Constraints: The protocol must be usable with no change
+ to existing applications that use the common TCP API (although it
+ is reasonable that not all features would be available to such
+ legacy applications). Furthermore, the protocol must provide the
+ same service model as regular TCP to the application.
+
+ Fallback: The protocol should be able to fall back to standard TCP
+ with no interference from the user, to be able to communicate with
+ legacy hosts.
+
+ The complementary application considerations document [6] discusses
+ the necessary features of an API to provide backwards-compatibility,
+ as well as API extensions to convey the behavior of MPTCP at a level
+ of control and information equivalent to that available with regular,
+ single-path TCP.
+
+ Further discussion of the design constraints and associated design
+ decisions are given in the MPTCP Architecture document [2] and in
+ [8].
+
+1.2. Multipath TCP in the Networking Stack
+
+ MPTCP operates at the transport layer and aims to be transparent to
+ both higher and lower layers. It is a set of additional features on
+ top of standard TCP; Figure 1 illustrates this layering. MPTCP is
+ designed to be usable by legacy applications with no changes;
+ detailed discussion of its interactions with applications is given in
+ [6].
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 5]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ +-------------------------------+
+ | Application |
+ +---------------+ +-------------------------------+
+ | Application | | MPTCP |
+ +---------------+ + - - - - - - - + - - - - - - - +
+ | TCP | | Subflow (TCP) | Subflow (TCP) |
+ +---------------+ +-------------------------------+
+ | IP | | IP | IP |
+ +---------------+ +-------------------------------+
+
+ Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks
+
+1.3. Terminology
+
+ This document makes use of a number of terms that are either MPTCP-
+ specific or have defined meaning in the context of MPTCP, as follows:
+
+ Path: A sequence of links between a sender and a receiver, defined
+ in this context by a 4-tuple of source and destination address/
+ port pairs.
+
+ Subflow: A flow of TCP segments operating over an individual path,
+ which forms part of a larger MPTCP connection. A subflow is
+ started and terminated similar to a regular TCP connection.
+
+ (MPTCP) Connection: A set of one or more subflows, over which an
+ application can communicate between two hosts. There is a one-to-
+ one mapping between a connection and an application socket.
+
+ Data-level: The payload data is nominally transferred over a
+ connection, which in turn is transported over subflows. Thus, the
+ term "data-level" is synonymous with "connection level", in
+ contrast to "subflow-level", which refers to properties of an
+ individual subflow.
+
+ Token: A locally unique identifier given to a multipath connection
+ by a host. May also be referred to as a "Connection ID".
+
+ Host: An end host operating an MPTCP implementation, and either
+ initiating or accepting an MPTCP connection.
+
+ In addition to these terms, note that MPTCP's interpretation of, and
+ effect on, regular single-path TCP semantics are discussed in
+ Section 4.
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 6]
+
+RFC 6824 Multipath TCP January 2013
+
+
+1.4. MPTCP Concept
+
+ This section provides a high-level summary of normal operation of
+ MPTCP, and is illustrated by the scenario shown in Figure 2. A
+ detailed description of operation is given in Section 3.
+
+ o To a non-MPTCP-aware application, MPTCP will behave the same as
+ normal TCP. Extended APIs could provide additional control to
+ MPTCP-aware applications [6]. An application begins by opening a
+ TCP socket in the normal way. MPTCP signaling and operation are
+ handled by the MPTCP implementation.
+
+ o An MPTCP connection begins similarly to a regular TCP connection.
+ This is illustrated in Figure 2 where an MPTCP connection is
+ established between addresses A1 and B1 on Hosts A and B,
+ respectively.
+
+ o If extra paths are available, additional TCP sessions (termed
+ MPTCP "subflows") are created on these paths, and are combined
+ with the existing session, which continues to appear as a single
+ connection to the applications at both ends. The creation of the
+ additional TCP session is illustrated between Address A2 on Host A
+ and Address B1 on Host B.
+
+ o MPTCP identifies multiple paths by the presence of multiple
+ addresses at hosts. Combinations of these multiple addresses
+ equate to the additional paths. In the example, other potential
+ paths that could be set up are A1<->B2 and A2<->B2. Although this
+ additional session is shown as being initiated from A2, it could
+ equally have been initiated from B1.
+
+ o The discovery and setup of additional subflows will be achieved
+ through a path management method; this document describes a
+ mechanism by which a host can initiate new subflows by using its
+ own additional addresses, or by signaling its available addresses
+ to the other host.
+
+ o MPTCP adds connection-level sequence numbers to allow the
+ reassembly of segments arriving on multiple subflows with
+ differing network delays.
+
+ o Subflows are terminated as regular TCP connections, with a four-
+ way FIN handshake. The MPTCP connection is terminated by a
+ connection-level FIN.
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 7]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ Host A Host B
+ ------------------------ ------------------------
+ Address A1 Address A2 Address B1 Address B2
+ ---------- ---------- ---------- ----------
+ | | | |
+ | (initial connection setup) | |
+ |----------------------------------->| |
+ |<-----------------------------------| |
+ | | | |
+ | (additional subflow setup) |
+ | |--------------------->| |
+ | |<---------------------| |
+ | | | |
+ | | | |
+
+ Figure 2: Example MPTCP Usage Scenario
+
+1.5. Requirements Language
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [3].
+
+2. Operation Overview
+
+ This section presents a single description of common MPTCP operation,
+ with reference to the protocol operation. This is a high-level
+ overview of the key functions; the full specification follows in
+ Section 3. Extensibility and negotiated features are not discussed
+ here. Considerable reference is made to symbolic names of MPTCP
+ options throughout this section -- these are subtypes of the IANA-
+ assigned MPTCP option (see Section 8), and their formats are defined
+ in the detailed protocol specification that follows in Section 3.
+
+ A Multipath TCP connection provides a bidirectional bytestream
+ between two hosts communicating like normal TCP and, thus, does not
+ require any change to the applications. However, Multipath TCP
+ enables the hosts to use different paths with different IP addresses
+ to exchange packets belonging to the MPTCP connection. A Multipath
+ TCP connection appears like a normal TCP connection to an
+ application. However, to the network layer, each MPTCP subflow looks
+ like a regular TCP flow whose segments carry a new TCP option type.
+ Multipath TCP manages the creation, removal, and utilization of these
+ subflows to send data. The number of subflows that are managed
+ within a Multipath TCP connection is not fixed and it can fluctuate
+ during the lifetime of the Multipath TCP connection.
+
+
+
+
+
+Ford, et al. Experimental [Page 8]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ All MPTCP operations are signaled with a TCP option -- a single
+ numerical type for MPTCP, with "sub-types" for each MPTCP message.
+ What follows is a summary of the purpose and rationale of these
+ messages.
+
+2.1. Initiating an MPTCP Connection
+
+ This is the same signaling as for initiating a normal TCP connection,
+ but the SYN, SYN/ACK, and ACK packets also carry the MP_CAPABLE
+ option. This is variable length and serves multiple purposes.
+ Firstly, it verifies whether the remote host supports Multipath TCP;
+ secondly, this option allows the hosts to exchange some information
+ to authenticate the establishment of additional subflows. Further
+ details are given in Section 3.1.
+
+ Host A Host B
+ ------ ------
+ MP_CAPABLE ->
+ [A's key, flags]
+ <- MP_CAPABLE
+ [B's key, flags]
+ ACK + MP_CAPABLE ->
+ [A's key, B's key, flags]
+
+2.2. Associating a New Subflow with an Existing MPTCP Connection
+
+ The exchange of keys in the MP_CAPABLE handshake provides material
+ that can be used to authenticate the endpoints when new subflows will
+ be set up. Additional subflows begin in the same way as initiating a
+ normal TCP connection, but the SYN, SYN/ACK, and ACK packets also
+ carry the MP_JOIN option.
+
+ Host A initiates a new subflow between one of its addresses and one
+ of Host B's addresses. The token -- generated from the key -- is
+ used to identify which MPTCP connection it is joining, and the HMAC
+ is used for authentication. The Hash-based Message Authentication
+ Code (HMAC) uses the keys exchanged in the MP_CAPABLE handshake, and
+ the random numbers (nonces) exchanged in these MP_JOIN options.
+ MP_JOIN also contains flags and an Address ID that can be used to
+ refer to the source address without the sender needing to know if it
+ has been changed by a NAT. Further details are in Section 3.2.
+
+
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 9]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ Host A Host B
+ ------ ------
+ MP_JOIN ->
+ [B's token, A's nonce,
+ A's Address ID, flags]
+ <- MP_JOIN
+ [B's HMAC, B's nonce,
+ B's Address ID, flags]
+ ACK + MP_JOIN ->
+ [A's HMAC]
+
+ <- ACK
+
+2.3. Informing the Other Host about Another Potential Address
+
+ The set of IP addresses associated to a multihomed host may change
+ during the lifetime of an MPTCP connection. MPTCP supports the
+ addition and removal of addresses on a host both implicitly and
+ explicitly. If Host A has established a subflow starting at address
+ IP#-A1 and wants to open a second subflow starting at address IP#-A2,
+ it simply initiates the establishment of the subflow as explained
+ above. The remote host will then be implicitly informed about the
+ new address.
+
+ In some circumstances, a host may want to advertise to the remote
+ host the availability of an address without establishing a new
+ subflow, for example, when a NAT prevents setup in one direction. In
+ the example below, Host A informs Host B about its alternative IP
+ address (IP#-A2). Host B may later send an MP_JOIN to this new
+ address. Due to the presence of middleboxes that may translate IP
+ addresses, this option uses an address identifier to unambiguously
+ identify an address on a host. Further details are in Section 3.4.1.
+
+ Host A Host B
+ ------ ------
+ ADD_ADDR ->
+ [IP#-A2,
+ IP#-A2's Address ID]
+
+ There is a corresponding signal for address removal, making use of
+ the Address ID that is signaled in the add address handshake.
+ Further details in Section 3.4.2.
+
+ Host A Host B
+ ------ ------
+ REMOVE_ADDR ->
+ [IP#-A2's Address ID]
+
+
+
+
+Ford, et al. Experimental [Page 10]
+
+RFC 6824 Multipath TCP January 2013
+
+
+2.4. Data Transfer Using MPTCP
+
+ To ensure reliable, in-order delivery of data over subflows that may
+ appear and disappear at any time, MPTCP uses a 64-bit data sequence
+ number (DSN) to number all data sent over the MPTCP connection. Each
+ subflow has its own 32-bit sequence number space and an MPTCP option
+ maps the subflow sequence space to the data sequence space. In this
+ way, data can be retransmitted on different subflows (mapped to the
+ same DSN) in the event of failure.
+
+ The "Data Sequence Signal" carries the "Data Sequence Mapping". The
+ data sequence mapping consists of the subflow sequence number, data
+ sequence number, and length for which this mapping is valid. This
+ option can also carry a connection-level acknowledgment (the "Data
+ ACK") for the received DSN.
+
+ With MPTCP, all subflows share the same receive buffer and advertise
+ the same receive window. There are two levels of acknowledgment in
+ MPTCP. Regular TCP acknowledgments are used on each subflow to
+ acknowledge the reception of the segments sent over the subflow
+ independently of their DSN. In addition, there are connection-level
+ acknowledgments for the data sequence space. These acknowledgments
+ track the advancement of the bytestream and slide the receiving
+ window.
+
+ Further details are in Section 3.3.
+
+ Host A Host B
+ ------ ------
+ DATA_SEQUENCE_SIGNAL ->
+ [Data Sequence Mapping]
+ [Data ACK]
+ [Checksum]
+
+2.5. Requesting a Change in a Path's Priority
+
+ Hosts can indicate at initial subflow setup whether they wish the
+ subflow to be used as a regular or backup path -- a backup path only
+ being used if there are no regular paths available. During a
+ connection, Host A can request a change in the priority of a subflow
+ through the MP_PRIO signal to Host B. Further details are in
+ Section 3.3.8.
+
+ Host A Host B
+ ------ ------
+ MP_PRIO ->
+
+
+
+
+
+Ford, et al. Experimental [Page 11]
+
+RFC 6824 Multipath TCP January 2013
+
+
+2.6. Closing an MPTCP Connection
+
+ When Host A wants to inform Host B that it has no more data to send,
+ it signals this "Data FIN" as part of the Data Sequence Signal (see
+ above). It has the same semantics and behavior as a regular TCP FIN,
+ but at the connection level. Once all the data on the MPTCP
+ connection has been successfully received, then this message is
+ acknowledged at the connection level with a DATA_ACK. Further
+ details are in Section 3.3.3.
+
+ Host A Host B
+ ------ ------
+ DATA_SEQUENCE_SIGNAL ->
+ [Data FIN]
+
+ <- (MPTCP DATA_ACK)
+
+2.7. Notable Features
+
+ It is worth highlighting that MPTCP's signaling has been designed
+ with several key requirements in mind:
+
+ o To cope with NATs on the path, addresses are referred to by
+ Address IDs, in case the IP packet's source address gets changed
+ by a NAT. Setting up a new TCP flow is not possible if the
+ passive opener is behind a NAT; to allow subflows to be created
+ when either end is behind a NAT, MPTCP uses the ADD_ADDR message.
+
+ o MPTCP falls back to ordinary TCP if MPTCP operation is not
+ possible, for example, if one host is not MPTCP capable or if a
+ middlebox alters the payload.
+
+ o To meet the threats identified in [9], the following steps are
+ taken: keys are sent in the clear in the MP_CAPABLE messages;
+ MP_JOIN messages are secured with HMAC-SHA1 ([10], [4]) using
+ those keys; and standard TCP validity checks are made on the other
+ messages (ensuring sequence numbers are in-window).
+
+3. MPTCP Protocol
+
+ This section describes the operation of the MPTCP protocol, and is
+ subdivided into sections for each key part of the protocol operation.
+
+ All MPTCP operations are signaled using optional TCP header fields.
+ A single TCP option number ("Kind") has been assigned by IANA for
+ MPTCP (see Section 8), and then individual messages will be
+ determined by a "subtype", the values of which are also stored in an
+ IANA registry (and are also listed in Section 8).
+
+
+
+Ford, et al. Experimental [Page 12]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ Throughout this document, when reference is made to an MPTCP option
+ by symbolic name, such as "MP_CAPABLE", this refers to a TCP option
+ with the single MPTCP option type, and with the subtype value of the
+ symbolic name as defined in Section 8. This subtype is a 4-bit field
+ -- the first 4 bits of the option payload, as shown in Figure 3. The
+ MPTCP messages are defined in the following sections.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----------------------+
+ | Kind | Length |Subtype| |
+ +---------------+---------------+-------+ |
+ | Subtype-specific data |
+ | (variable length) |
+ +---------------------------------------------------------------+
+
+ Figure 3: MPTCP Option Format
+
+ Those MPTCP options associated with subflow initiation are used on
+ packets with the SYN flag set. Additionally, there is one MPTCP
+ option for signaling metadata to ensure segmented data can be
+ recombined for delivery to the application.
+
+ The remaining options, however, are signals that do not need to be on
+ a specific packet, such as those for signaling additional addresses.
+ Whilst an implementation may desire to send MPTCP options as soon as
+ possible, it may not be possible to combine all desired options (both
+ those for MPTCP and for regular TCP, such as SACK (selective
+ acknowledgment) [11]) on a single packet. Therefore, an
+ implementation may choose to send duplicate ACKs containing the
+ additional signaling information. This changes the semantics of a
+ duplicate ACK; these are usually only sent as a signal of a lost
+ segment [12] in regular TCP. Therefore, an MPTCP implementation
+ receiving a duplicate ACK that contains an MPTCP option MUST NOT
+ treat it as a signal of congestion. Additionally, an MPTCP
+ implementation SHOULD NOT send more than two duplicate ACKs in a row
+ for the purposes of sending MPTCP options alone, in order to ensure
+ no middleboxes misinterpret this as a sign of congestion.
+
+ Furthermore, standard TCP validity checks (such as ensuring the
+ sequence number and acknowledgment number are within window) MUST be
+ undertaken before processing any MPTCP signals, as described in [13].
+
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 13]
+
+RFC 6824 Multipath TCP January 2013
+
+
+3.1. Connection Initiation
+
+ Connection initiation begins with a SYN, SYN/ACK, ACK exchange on a
+ single path. Each packet contains the Multipath Capable (MP_CAPABLE)
+ TCP option (Figure 4). This option declares its sender is capable of
+ performing Multipath TCP and wishes to do so on this particular
+ connection.
+
+ This option is used to declare the 64-bit key that the sender has
+ generated for this MPTCP connection. This key is used to
+ authenticate the addition of future subflows to this connection.
+ This is the only time the key will be sent in clear on the wire
+ (unless "fast close", Section 3.5, is used); all future subflows will
+ identify the connection using a 32-bit "token". This token is a
+ cryptographic hash of this key. The algorithm for this process is
+ dependent on the authentication algorithm selected; the method of
+ selection is defined later in this section.
+
+ This key is generated by its sender, and its method of generation is
+ implementation specific. The key MUST be hard to guess, and it MUST
+ be unique for the sending host at any one time. Recommendations for
+ generating random numbers for use in keys are given in [14].
+ Connections will be indexed at each host by the token (a one-way hash
+ of the key). Therefore, an implementation will require a mapping
+ from each token to the corresponding connection, and in turn to the
+ keys for the connection.
+
+ There is a risk that two different keys will hash to the same token.
+ The risk of hash collisions is usually small, unless the host is
+ handling many tens of thousands of connections. Therefore, an
+ implementation SHOULD check its list of connection tokens to ensure
+ there is not a collision before sending its key in the SYN/ACK. This
+ would, however, be costly for a server with thousands of connections.
+ The subflow handshake mechanism (Section 3.2) will ensure that new
+ subflows only join the correct connection, however, through the
+ cryptographic handshake, as well as checking the connection tokens in
+ both directions, and ensuring sequence numbers are in-window. So in
+ the worst case if there was a token collision, the new subflow would
+ not succeed, but the MPTCP connection would continue to provide a
+ regular TCP service.
+
+ The MP_CAPABLE option is carried on the SYN, SYN/ACK, and ACK packets
+ that start the first subflow of an MPTCP connection. The data
+ carried by each packet is as follows, where A = initiator and B =
+ listener.
+
+
+
+
+
+
+Ford, et al. Experimental [Page 14]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ o SYN (A->B): A's Key for this connection.
+
+ o SYN/ACK (B->A): B's Key for this connection.
+
+ o ACK (A->B): A's Key followed by B's Key.
+
+ The contents of the option is determined by the SYN and ACK flags of
+ the packet, verified by the option's length field. For the diagram
+ shown in Figure 4, "sender" and "receiver" refer to the sender or
+ receiver of the TCP packet (which can be either host). If the SYN
+ flag is set, a single key is included; if only an ACK flag is set,
+ both keys are present.
+
+ B's Key is echoed in the ACK in order to allow the listener (Host B)
+ to act statelessly until the TCP connection reaches the ESTABLISHED
+ state. If the listener acts in this way, however, it MUST generate
+ its key in a way that would allow it to verify that it generated the
+ key when it is echoed in the ACK.
+
+ This exchange allows the safe passage of MPTCP options on SYN packets
+ to be determined. If any of these options are dropped, MPTCP will
+ gracefully fall back to regular single-path TCP, as documented in
+ Section 3.6. Note that new subflows MUST NOT be established (using
+ the process documented in Section 3.2) until a Digital Signature
+ Standard (DSS) option has been successfully received across the path
+ (as documented in Section 3.3).
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-------+---------------+
+ | Kind | Length |Subtype|Version|A|B|C|D|E|F|G|H|
+ +---------------+---------------+-------+-------+---------------+
+ | Option Sender's Key (64 bits) |
+ | |
+ | |
+ +---------------------------------------------------------------+
+ | Option Receiver's Key (64 bits) |
+ | (if option Length == 20) |
+ | |
+ +---------------------------------------------------------------+
+
+
+ Figure 4: Multipath Capable (MP_CAPABLE) Option
+
+ The first 4 bits of the first octet in the MP_CAPABLE option
+ (Figure 4) define the MPTCP option subtype (see Section 8; for
+ MP_CAPABLE, this is 0), and the remaining 4 bits of this octet
+ specify the MPTCP version in use (for this specification, this is 0).
+
+
+
+Ford, et al. Experimental [Page 15]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ The second octet is reserved for flags, allocated as follows:
+
+ A: The leftmost bit, labeled "A", SHOULD be set to 1 to indicate
+ "Checksum Required", unless the system administrator has decided
+ that checksums are not required (for example, if the environment
+ is controlled and no middleboxes exist that might adjust the
+ payload).
+
+ B: The second bit, labeled "B", is an extensibility flag, and MUST be
+ set to 0 for current implementations. This will be used for an
+ extensibility mechanism in a future specification, and the impact
+ of this flag will be defined at a later date. If receiving a
+ message with the 'B' flag set to 1, and this is not understood,
+ then this SYN MUST be silently ignored; the sender is expected to
+ retry with a format compatible with this legacy specification.
+ Note that the length of the MP_CAPABLE option, and the meanings of
+ bits "C" through "H", may be altered by setting B=1.
+
+ C through H: The remaining bits, labeled "C" through "H", are used
+ for crypto algorithm negotiation. Currently only the rightmost
+ bit, labeled "H", is assigned. Bit "H" indicates the use of HMAC-
+ SHA1 (as defined in Section 3.2). An implementation that only
+ supports this method MUST set bit "H" to 1, and bits "C" through
+ "G" to 0.
+
+ A crypto algorithm MUST be specified. If flag bits C through H are
+ all 0, the MP_CAPABLE option MUST be treated as invalid and ignored
+ (that is, it must be treated as a regular TCP handshake).
+
+ The selection of the authentication algorithm also impacts the
+ algorithm used to generate the token and the initial data sequence
+ number (IDSN). In this specification, with only the SHA-1 algorithm
+ (bit "H") specified and selected, the token MUST be a truncated (most
+ significant 32 bits) SHA-1 hash ([4], [15]) of the key. A different,
+ 64-bit truncation (the least significant 64 bits) of the SHA-1 hash
+ of the key MUST be used as the initial data sequence number. Note
+ that the key MUST be hashed in network byte order. Also note that
+ the "least significant" bits MUST be the rightmost bits of the SHA-1
+ digest, as per [4]. Future specifications of the use of the crypto
+ bits may choose to specify different algorithms for token and IDSN
+ generation.
+
+ Both the crypto and checksum bits negotiate capabilities in similar
+ ways. For the Checksum Required bit (labeled "A"), if either host
+ requires the use of checksums, checksums MUST be used. In other
+ words, the only way for checksums not to be used is if both hosts in
+ their SYNs set A=0. This decision is confirmed by the setting of the
+ "A" bit in the third packet (the ACK) of the handshake. For example,
+
+
+
+Ford, et al. Experimental [Page 16]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ if the initiator sets A=0 in the SYN, but the responder sets A=1 in
+ the SYN/ACK, checksums MUST be used in both directions, and the
+ initiator will set A=1 in the ACK. The decision whether to use
+ checksums will be stored by an implementation in a per-connection
+ binary state variable.
+
+ For crypto negotiation, the responder has the choice. The initiator
+ creates a proposal setting a bit for each algorithm it supports to 1
+ (in this version of the specification, there is only one proposal, so
+ bit "H" will be always set to 1). The responder responds with only 1
+ bit set -- this is the chosen algorithm. The rationale for this
+ behavior is that the responder will typically be a server with
+ potentially many thousands of connections, so it may wish to choose
+ an algorithm with minimal computational complexity, depending on the
+ load. If a responder does not support (or does not want to support)
+ any of the initiator's proposals, it can respond without an
+ MP_CAPABLE option, thus forcing a fallback to regular TCP.
+
+ The MP_CAPABLE option is only used in the first subflow of a
+ connection, in order to identify the connection; all following
+ subflows will use the "Join" option (see Section 3.2) to join the
+ existing connection.
+
+ If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it
+ is assumed that the passive opener is not multipath capable; thus,
+ the MPTCP session MUST operate as a regular, single-path TCP. If a
+ SYN does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT
+ contain one in response. If the third packet (the ACK) does not
+ contain the MP_CAPABLE option, then the session MUST fall back to
+ operating as a regular, single-path TCP. This is to maintain
+ compatibility with middleboxes on the path that drop some or all TCP
+ options. Note that an implementation MAY choose to attempt sending
+ MPTCP options more than one time before making this decision to
+ operate as regular TCP (see Section 3.8).
+
+ If the SYN packets are unacknowledged, it is up to local policy to
+ decide how to respond. It is expected that a sender will eventually
+ fall back to single-path TCP (i.e., without the MP_CAPABLE option) in
+ order to work around middleboxes that may drop packets with unknown
+ options; however, the number of multipath-capable attempts that are
+ made first will be up to local policy. It is possible that MPTCP and
+ non-MPTCP SYNs could get reordered in the network. Therefore, the
+ final state is inferred from the presence or absence of the
+ MP_CAPABLE option in the third packet of the TCP handshake. If this
+ option is not present, the connection SHOULD fall back to regular
+ TCP, as documented in Section 3.6.
+
+
+
+
+
+Ford, et al. Experimental [Page 17]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ The initial data sequence number on an MPTCP connection is generated
+ from the key. The algorithm for IDSN generation is also determined
+ from the negotiated authentication algorithm. In this specification,
+ with only the SHA-1 algorithm specified and selected, the IDSN of a
+ host MUST be the least significant 64 bits of the SHA-1 hash of its
+ key, i.e., IDSN-A = Hash(Key-A) and IDSN-B = Hash(Key-B). This
+ deterministic generation of the IDSN allows a receiver to ensure that
+ there are no gaps in sequence space at the start of the connection.
+ The SYN with MP_CAPABLE occupies the first octet of data sequence
+ space, although this does not need to be acknowledged at the
+ connection level until the first data is sent (see Section 3.3).
+
+3.2. Starting a New Subflow
+
+ Once an MPTCP connection has begun with the MP_CAPABLE exchange,
+ further subflows can be added to the connection. Hosts have
+ knowledge of their own address(es), and can become aware of the other
+ host's addresses through signaling exchanges as described in
+ Section 3.4. Using this knowledge, a host can initiate a new subflow
+ over a currently unused pair of addresses. It is permitted for
+ either host in a connection to initiate the creation of a new
+ subflow, but it is expected that this will normally be the original
+ connection initiator (see Section 3.8 for heuristics).
+
+ A new subflow is started as a normal TCP SYN/ACK exchange. The Join
+ Connection (MP_JOIN) TCP option is used to identify the connection to
+ be joined by the new subflow. It uses keying material that was
+ exchanged in the initial MP_CAPABLE handshake (Section 3.1), and that
+ handshake also negotiates the crypto algorithm in use for the MP_JOIN
+ handshake.
+
+ This section specifies the behavior of MP_JOIN using the HMAC-SHA1
+ algorithm. An MP_JOIN option is present in the SYN, SYN/ACK, and ACK
+ of the three-way handshake, although in each case with a different
+ format.
+
+ In the first MP_JOIN on the SYN packet, illustrated in Figure 5, the
+ initiator sends a token, random number, and address ID.
+
+ The token is used to identify the MPTCP connection and is a
+ cryptographic hash of the receiver's key, as exchanged in the initial
+ MP_CAPABLE handshake (Section 3.1). In this specification, the
+ tokens presented in this option are generated by the SHA-1 ([4],
+ [15]) algorithm, truncated to the most significant 32 bits. The
+ token included in the MP_JOIN option is the token that the receiver
+ of the packet uses to identify this connection; i.e., Host A will
+
+
+
+
+
+Ford, et al. Experimental [Page 18]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ send Token-B (which is generated from Key-B). Note that the hash
+ generation algorithm can be overridden by the choice of cryptographic
+ handshake algorithm, as defined in Section 3.1.
+
+ The MP_JOIN SYN sends not only the token (which is static for a
+ connection) but also random numbers (nonces) that are used to prevent
+ replay attacks on the authentication method. Recommendations for the
+ generation of random numbers for this purpose are given in [14].
+
+ The MP_JOIN option includes an "Address ID". This is an identifier
+ that only has significance within a single connection, where it
+ identifies the source address of this packet, even if the IP header
+ has been changed in transit by a middlebox. The Address ID allows
+ address removal (Section 3.4.2) without needing to know what the
+ source address at the receiver is, thus allowing address removal
+ through NATs. The Address ID also allows correlation between new
+ subflow setup attempts and address signaling (Section 3.4.1), to
+ prevent setting up duplicate subflows on the same path, if an MP_JOIN
+ and ADD_ADDR are sent at the same time.
+
+ The Address IDs of the subflow used in the initial SYN exchange of
+ the first subflow in the connection are implicit, and have the value
+ zero. A host MUST store the mappings between Address IDs and
+ addresses both for itself and the remote host. An implementation
+ will also need to know which local and remote Address IDs are
+ associated with which established subflows, for when addresses are
+ removed from a local or remote host.
+
+ The MP_JOIN option on packets with the SYN flag set also includes 4
+ bits of flags, 3 of which are currently reserved and MUST be set to
+ zero by the sender. The final bit, labeled "B", indicates whether
+ the sender of this option wishes this subflow to be used as a backup
+ path (B=1) in the event of failure of other paths, or whether it
+ wants it to be used as part of the connection immediately. By
+ setting B=1, the sender of the option is requesting the other host to
+ only send data on this subflow if there are no available subflows
+ where B=0. Subflow policy is discussed in more detail in
+ Section 3.3.8.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 19]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----+-+---------------+
+ | Kind | Length = 12 |Subtype| |B| Address ID |
+ +---------------+---------------+-------+-----+-+---------------+
+ | Receiver's Token (32 bits) |
+ +---------------------------------------------------------------+
+ | Sender's Random Number (32 bits) |
+ +---------------------------------------------------------------+
+
+ Figure 5: Join Connection (MP_JOIN) Option (for Initial SYN)
+
+ When receiving a SYN with an MP_JOIN option that contains a valid
+ token for an existing MPTCP connection, the recipient SHOULD respond
+ with a SYN/ACK also containing an MP_JOIN option containing a random
+ number and a truncated (leftmost 64 bits) Hash-based Message
+ Authentication Code (HMAC). This version of the option is shown in
+ Figure 6. If the token is unknown, or the host wants to refuse
+ subflow establishment (for example, due to a limit on the number of
+ subflows it will permit), the receiver will send back a reset (RST)
+ signal, analogous to an unknown port in TCP. Although calculating an
+ HMAC requires cryptographic operations, it is believed that the 32-
+ bit token in the MP_JOIN SYN gives sufficient protection against
+ blind state exhaustion attacks; therefore, there is no need to
+ provide mechanisms to allow a responder to operate statelessly at the
+ MP_JOIN stage.
+
+ An HMAC is sent by both hosts -- by the initiator (Host A) in the
+ third packet (the ACK) and by the responder (Host B) in the second
+ packet (the SYN/ACK). Doing the HMAC exchange at this stage allows
+ both hosts to have first exchanged random data (in the first two SYN
+ packets) that is used as the "message". This specification defines
+ that HMAC as defined in [10] is used, along with the SHA-1 hash
+ algorithm [4] (potentially implemented as in [15]), thus generating a
+ 160-bit / 20-octet HMAC. Due to option space limitations, the HMAC
+ included in the SYN/ACK is truncated to the leftmost 64 bits, but
+ this is acceptable since random numbers are used; thus, an attacker
+ only has one chance to guess the HMAC correctly (if the HMAC is
+ incorrect, the TCP connection is closed, so a new MP_JOIN negotiation
+ with a new random number is required).
+
+ The initiator's authentication information is sent in its first ACK
+ (the third packet of the handshake), as shown in Figure 7. This data
+ needs to be sent reliably, since it is the only time this HMAC is
+ sent; therefore, receipt of this packet MUST trigger a regular TCP
+ ACK in response, and the packet MUST be retransmitted if this ACK is
+ not received. In other words, sending the ACK/MP_JOIN packet places
+ the subflow in the PRE_ESTABLISHED state, and it moves to the
+
+
+
+Ford, et al. Experimental [Page 20]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ ESTABLISHED state only on receipt of an ACK from the receiver. It is
+ not permitted to send data while in the PRE_ESTABLISHED state. The
+ reserved bits in this option MUST be set to zero by the sender.
+
+ The key for the HMAC algorithm, in the case of the message
+ transmitted by Host A, will be Key-A followed by Key-B, and in the
+ case of Host B, Key-B followed by Key-A. These are the keys that
+ were exchanged in the original MP_CAPABLE handshake. The "message"
+ for the HMAC algorithm in each case is the concatenations of random
+ number for each host (denoted by R): for Host A, R-A followed by R-B;
+ and for Host B, R-B followed by R-A.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----+-+---------------+
+ | Kind | Length = 16 |Subtype| |B| Address ID |
+ +---------------+---------------+-------+-----+-+---------------+
+ | |
+ | Sender's Truncated HMAC (64 bits) |
+ | |
+ +---------------------------------------------------------------+
+ | Sender's Random Number (32 bits) |
+ +---------------------------------------------------------------+
+
+ Figure 6: Join Connection (MP_JOIN) Option (for Responding SYN/ACK)
+
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----------------------+
+ | Kind | Length = 24 |Subtype| (reserved) |
+ +---------------+---------------+-------+-----------------------+
+ | |
+ | |
+ | Sender's HMAC (160 bits) |
+ | |
+ | |
+ +---------------------------------------------------------------+
+
+ Figure 7: Join Connection (MP_JOIN) Option (for Third ACK)
+
+ These various TCP options fit together to enable authenticated
+ subflow setup as illustrated in Figure 8.
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 21]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ Host A Host B
+ ------------------------ ----------
+ Address A1 Address A2 Address B1
+ ---------- ---------- ----------
+ | | |
+ | SYN + MP_CAPABLE(Key-A) |
+ |--------------------------------------------->|
+ |<---------------------------------------------|
+ | SYN/ACK + MP_CAPABLE(Key-B) |
+ | | |
+ | ACK + MP_CAPABLE(Key-A, Key-B) |
+ |--------------------------------------------->|
+ | | |
+ | | SYN + MP_JOIN(Token-B, R-A) |
+ | |------------------------------->|
+ | |<-------------------------------|
+ | | SYN/ACK + MP_JOIN(HMAC-B, R-B) |
+ | | |
+ | | ACK + MP_JOIN(HMAC-A) |
+ | |------------------------------->|
+ | |<-------------------------------|
+ | | ACK |
+
+ HMAC-A = HMAC(Key=(Key-A+Key-B), Msg=(R-A+R-B))
+ HMAC-B = HMAC(Key=(Key-B+Key-A), Msg=(R-B+R-A))
+
+ Figure 8: Example Use of MPTCP Authentication
+
+ If the token received at Host B is unknown or local policy prohibits
+ the acceptance of the new subflow, the recipient MUST respond with a
+ TCP RST for the subflow.
+
+ If the token is accepted at Host B, but the HMAC returned to Host A
+ does not match the one expected, Host A MUST close the subflow with a
+ TCP RST.
+
+ If Host B does not receive the expected HMAC, or the MP_JOIN option
+ is missing from the ACK, it MUST close the subflow with a TCP RST.
+
+ If the HMACs are verified as correct, then both hosts have
+ authenticated each other as being the same peers as existed at the
+ start of the connection, and they have agreed of which connection
+ this subflow will become a part.
+
+ If the SYN/ACK as received at Host A does not have an MP_JOIN option,
+ Host A MUST close the subflow with a RST.
+
+
+
+
+
+Ford, et al. Experimental [Page 22]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ This covers all cases of the loss of an MP_JOIN. In more detail, if
+ MP_JOIN is stripped from the SYN on the path from A to B, and Host B
+ does not have a passive opener on the relevant port, it will respond
+ with a RST in the normal way. If in response to a SYN with an
+ MP_JOIN option, a SYN/ACK is received without the MP_JOIN option
+ (either since it was stripped on the return path, or it was stripped
+ on the outgoing path but the passive opener on Host B responded as if
+ it were a new regular TCP session), then the subflow is unusable and
+ Host A MUST close it with a RST.
+
+ Note that additional subflows can be created between any pair of
+ ports (but see Section 3.8 for heuristics); no explicit application-
+ level accept calls or bind calls are required to open additional
+ subflows. To associate a new subflow with an existing connection,
+ the token supplied in the subflow's SYN exchange is used for
+ demultiplexing. This then binds the 5-tuple of the TCP subflow to
+ the local token of the connection. A consequence is that it is
+ possible to allow any port pairs to be used for a connection.
+
+ Demultiplexing subflow SYNs MUST be done using the token; this is
+ unlike traditional TCP, where the destination port is used for
+ demultiplexing SYN packets. Once a subflow is set up, demultiplexing
+ packets is done using the 5-tuple, as in traditional TCP. The
+ 5-tuples will be mapped to the local connection identifier (token).
+ Note that Host A will know its local token for the subflow even
+ though it is not sent on the wire -- only the responder's token is
+ sent.
+
+3.3. General MPTCP Operation
+
+ This section discusses operation of MPTCP for data transfer. At a
+ high level, an MPTCP implementation will take one input data stream
+ from an application, and split it into one or more subflows, with
+ sufficient control information to allow it to be reassembled and
+ delivered reliably and in order to the recipient application. The
+ following subsections define this behavior in detail.
+
+ The data sequence mapping and the Data ACK are signaled in the Data
+ Sequence Signal (DSS) option (Figure 9). Either or both can be
+ signaled in one DSS, dependent on the flags set. The data sequence
+ mapping defines how the sequence space on the subflow maps to the
+ connection level, and the Data ACK acknowledges receipt of data at
+ the connection level. These functions are described in more detail
+ in the following two subsections.
+
+ Either or both the data sequence mapping and the Data ACK can be
+ signaled in the DSS option, dependent on the flags set.
+
+
+
+
+Ford, et al. Experimental [Page 23]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+----------------------+
+ | Kind | Length |Subtype| (reserved) |F|m|M|a|A|
+ +---------------+---------------+-------+----------------------+
+ | Data ACK (4 or 8 octets, depending on flags) |
+ +--------------------------------------------------------------+
+ | Data sequence number (4 or 8 octets, depending on flags) |
+ +--------------------------------------------------------------+
+ | Subflow Sequence Number (4 octets) |
+ +-------------------------------+------------------------------+
+ | Data-Level Length (2 octets) | Checksum (2 octets) |
+ +-------------------------------+------------------------------+
+
+ Figure 9: Data Sequence Signal (DSS) Option
+
+ The flags, when set, define the contents of this option, as follows:
+
+ o A = Data ACK present
+
+ o a = Data ACK is 8 octets (if not set, Data ACK is 4 octets)
+
+ o M = Data Sequence Number (DSN), Subflow Sequence Number (SSN),
+ Data-Level Length, and Checksum present
+
+ o m = Data sequence number is 8 octets (if not set, DSN is 4 octets)
+
+ The flags 'a' and 'm' only have meaning if the corresponding 'A' or
+ 'M' flags are set; otherwise, they will be ignored. The maximum
+ length of this option, with all flags set, is 28 octets.
+
+ The 'F' flag indicates "DATA_FIN". If present, this means that this
+ mapping covers the final data from the sender. This is the
+ connection-level equivalent to the FIN flag in single-path TCP. A
+ connection is not closed unless there has been a DATA_FIN exchange or
+ a timeout. The purpose of the DATA_FIN and the interactions between
+ this flag, the subflow-level FIN flag, and the data sequence mapping
+ are described in Section 3.3.3. The remaining reserved bits MUST be
+ set to zero by an implementation of this specification.
+
+ Note that the checksum is only present in this option if the use of
+ MPTCP checksumming has been negotiated at the MP_CAPABLE handshake
+ (see Section 3.1). The presence of the checksum can be inferred from
+ the length of the option. If a checksum is present, but its use had
+ not been negotiated in the MP_CAPABLE handshake, the checksum field
+ MUST be ignored. If a checksum is not present when its use has been
+ negotiated, the receiver MUST close the subflow with a RST as it is
+ considered broken.
+
+
+
+Ford, et al. Experimental [Page 24]
+
+RFC 6824 Multipath TCP January 2013
+
+
+3.3.1. Data Sequence Mapping
+
+ The data stream as a whole can be reassembled through the use of the
+ data sequence mapping components of the DSS option (Figure 9), which
+ define the mapping from the subflow sequence number to the data
+ sequence number. This is used by the receiver to ensure in-order
+ delivery to the application layer. Meanwhile, the subflow-level
+ sequence numbers (i.e., the regular sequence numbers in the TCP
+ header) have subflow-only relevance. It is expected (but not
+ mandated) that SACK [11] is used at the subflow level to improve
+ efficiency.
+
+ The data sequence mapping specifies a mapping from subflow sequence
+ space to data sequence space. This is expressed in terms of starting
+ sequence numbers for the subflow and the data level, and a length of
+ bytes for which this mapping is valid. This explicit mapping for a
+ range of data was chosen rather than per-packet signaling to assist
+ with compatibility with situations where TCP/IP segmentation or
+ coalescing is undertaken separately from the stack that is generating
+ the data flow (e.g., through the use of TCP segmentation offloading
+ on network interface cards, or by middleboxes such as performance
+ enhancing proxies). It also allows a single mapping to cover many
+ packets, which may be useful in bulk transfer situations.
+
+ A mapping is fixed, in that the subflow sequence number is bound to
+ the data sequence number after the mapping has been processed. A
+ sender MUST NOT change this mapping after it has been declared;
+ however, the same data sequence number can be mapped to by different
+ subflows for retransmission purposes (see Section 3.3.6). This would
+ also permit the same data to be sent simultaneously on multiple
+ subflows for resilience or efficiency purposes, especially in the
+ case of lossy links. Although the detailed specification of such
+ operation is outside the scope of this document, an implementation
+ SHOULD treat the first data that is received at a subflow for the
+ data sequence space as that which should be delivered to the
+ application, and any later data for that sequence space ignored.
+
+ The data sequence number is specified as an absolute value, whereas
+ the subflow sequence numbering is relative (the SYN at the start of
+ the subflow has relative subflow sequence number 0). This is to
+ allow middleboxes to change the initial sequence number of a subflow,
+ such as firewalls that undertake ISN randomization.
+
+ The data sequence mapping also contains a checksum of the data that
+ this mapping covers, if use of checksums has been negotiated at the
+ MP_CAPABLE exchange. Checksums are used to detect if the payload has
+ been adjusted in any way by a non-MPTCP-aware middlebox. If this
+ checksum fails, it will trigger a failure of the subflow, or a
+
+
+
+Ford, et al. Experimental [Page 25]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ fallback to regular TCP, as documented in Section 3.6, since MPTCP
+ can no longer reliably know the subflow sequence space at the
+ receiver to build data sequence mappings.
+
+ The checksum algorithm used is the standard TCP checksum [1],
+ operating over the data covered by this mapping, along with a pseudo-
+ header as shown in Figure 10.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +--------------------------------------------------------------+
+ | |
+ | Data Sequence Number (8 octets) |
+ | |
+ +--------------------------------------------------------------+
+ | Subflow Sequence Number (4 octets) |
+ +-------------------------------+------------------------------+
+ | Data-Level Length (2 octets) | Zeros (2 octets) |
+ +-------------------------------+------------------------------+
+
+ Figure 10: Pseudo-Header for DSS Checksum
+
+ Note that the data sequence number used in the pseudo-header is
+ always the 64-bit value, irrespective of what length is used in the
+ DSS option itself. The standard TCP checksum algorithm has been
+ chosen since it will be calculated anyway for the TCP subflow, and if
+ calculated first over the data before adding the pseudo-headers, it
+ only needs to be calculated once. Furthermore, since the TCP
+ checksum is additive, the checksum for a DSN_MAP can be constructed
+ by simply adding together the checksums for the data of each
+ constituent TCP segment, and adding the checksum for the DSS pseudo-
+ header.
+
+ Note that checksumming relies on the TCP subflow containing
+ contiguous data; therefore, a TCP subflow MUST NOT use the Urgent
+ Pointer to interrupt an existing mapping. Further note, however,
+ that if Urgent data is received on a subflow, it SHOULD be mapped to
+ the data sequence space and delivered to the application analogous to
+ Urgent data in regular TCP.
+
+ To avoid possible deadlock scenarios, subflow-level processing should
+ be undertaken separately from that at connection level. Therefore,
+ even if a mapping does not exist from the subflow space to the data-
+ level space, the data SHOULD still be ACKed at the subflow (if it is
+ in-window). This data cannot, however, be acknowledged at the data
+ level (Section 3.3.2) because its data sequence numbers are unknown.
+ Implementations MAY hold onto such unmapped data for a short while in
+ the expectation that a mapping will arrive shortly. Such unmapped
+
+
+
+Ford, et al. Experimental [Page 26]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ data cannot be counted as being within the connection level receive
+ window because this is relative to the data sequence numbers, so if
+ the receiver runs out of memory to hold this data, it will have to be
+ discarded. If a mapping for that subflow-level sequence space does
+ not arrive within a receive window of data, that subflow SHOULD be
+ treated as broken, closed with a RST, and any unmapped data silently
+ discarded.
+
+ Data sequence numbers are always 64-bit quantities, and MUST be
+ maintained as such in implementations. If a connection is
+ progressing at a slow rate, so protection against wrapped sequence
+ numbers is not required, then it is permissible to include just the
+ lower 32 bits of the data sequence number in the data sequence
+ mapping and/or Data ACK as an optimization, and an implementation can
+ make this choice independently for each packet.
+
+ An implementation MUST send the full 64-bit data sequence number if
+ it is transmitting at a sufficiently high rate that the 32-bit value
+ could wrap within the Maximum Segment Lifetime (MSL) [16]. The
+ lengths of the DSNs used in these values (which may be different) are
+ declared with flags in the DSS option. Implementations MUST accept a
+ 32-bit DSN and implicitly promote it to a 64-bit quantity by
+ incrementing the upper 32 bits of sequence number each time the lower
+ 32 bits wrap. A sanity check MUST be implemented to ensure that a
+ wrap occurs at an expected time (e.g., the sequence number jumps from
+ a very high number to a very low number) and is not triggered by out-
+ of-order packets.
+
+ As with the standard TCP sequence number, the data sequence number
+ should not start at zero, but at a random value to make blind session
+ hijacking harder. This specification requires setting the initial
+ data sequence number (IDSN) of each host to the least significant 64
+ bits of the SHA-1 hash of the host's key, as described in
+ Section 3.1.
+
+ A data sequence mapping does not need to be included in every MPTCP
+ packet, as long as the subflow sequence space in that packet is
+ covered by a mapping known at the receiver. This can be used to
+ reduce overhead in cases where the mapping is known in advance; one
+ such case is when there is a single subflow between the hosts,
+ another is when segments of data are scheduled in larger than packet-
+ sized chunks.
+
+ An "infinite" mapping can be used to fall back to regular TCP by
+ mapping the subflow-level data to the connection-level data for the
+ remainder of the connection (see Section 3.6). This is achieved by
+ setting the Data-Level Length field of the DSS option to the reserved
+ value of 0. The checksum, in such a case, will also be set to zero.
+
+
+
+Ford, et al. Experimental [Page 27]
+
+RFC 6824 Multipath TCP January 2013
+
+
+3.3.2. Data Acknowledgments
+
+ To provide full end-to-end resilience, MPTCP provides a connection-
+ level acknowledgment, to act as a cumulative ACK for the connection
+ as a whole. This is the "Data ACK" field of the DSS option
+ (Figure 9). The Data ACK is analogous to the behavior of the
+ standard TCP cumulative ACK -- indicating how much data has been
+ successfully received (with no holes). This is in comparison to the
+ subflow-level ACK, which acts analogous to TCP SACK, given that there
+ may still be holes in the data stream at the connection level. The
+ Data ACK specifies the next data sequence number it expects to
+ receive.
+
+ The Data ACK, as for the DSN, can be sent as the full 64-bit value,
+ or as the lower 32 bits. If data is received with a 64-bit DSN, it
+ MUST be acknowledged with a 64-bit Data ACK. If the DSN received is
+ 32 bits, it is valid for the implementation to choose whether to send
+ a 32-bit or 64-bit Data ACK.
+
+ The Data ACK proves that the data, and all required MPTCP signaling,
+ has been received and accepted by the remote end. One key use of the
+ Data ACK signal is that it is used to indicate the left edge of the
+ advertised receive window. As explained in Section 3.3.4, the
+ receive window is shared by all subflows and is relative to the Data
+ ACK. Because of this, an implementation MUST NOT use the RCV.WND
+ field of a TCP segment at the connection level if it does not also
+ carry a DSS option with a Data ACK field. Furthermore, separating
+ the connection-level acknowledgments from the subflow level allows
+ processing to be done separately, and a receiver has the freedom to
+ drop segments after acknowledgment at the subflow level, for example,
+ due to memory constraints when many segments arrive out of order.
+
+ An MPTCP sender MUST NOT free data from the send buffer until it has
+ been acknowledged by both a Data ACK received on any subflow and at
+ the subflow level by all subflows on which the data was sent. The
+ former condition ensures liveness of the connection and the latter
+ condition ensures liveness and self-consistence of a subflow when
+ data needs to be retransmitted. Note, however, that if some data
+ needs to be retransmitted multiple times over a subflow, there is a
+ risk of blocking the sending window. In this case, the MPTCP sender
+ can decide to terminate the subflow that is behaving badly by sending
+ a RST.
+
+ The Data ACK MAY be included in all segments; however, optimizations
+ SHOULD be considered in more advanced implementations, where the Data
+ ACK is present in segments only when the Data ACK value advances, and
+
+
+
+
+
+Ford, et al. Experimental [Page 28]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ this behavior MUST be treated as valid. This behavior ensures the
+ sender buffer is freed, while reducing overhead when the data
+ transfer is unidirectional.
+
+3.3.3. Closing a Connection
+
+ In regular TCP, a FIN announces the receiver that the sender has no
+ more data to send. In order to allow subflows to operate
+ independently and to keep the appearance of TCP over the wire, a FIN
+ in MPTCP only affects the subflow on which it is sent. This allows
+ nodes to exercise considerable freedom over which paths are in use at
+ any one time. The semantics of a FIN remain as for regular TCP;
+ i.e., it is not until both sides have ACKed each other's FINs that
+ the subflow is fully closed.
+
+ When an application calls close() on a socket, this indicates that it
+ has no more data to send; for regular TCP, this would result in a FIN
+ on the connection. For MPTCP, an equivalent mechanism is needed, and
+ this is referred to as the DATA_FIN.
+
+ A DATA_FIN is an indication that the sender has no more data to send,
+ and as such can be used to verify that all data has been successfully
+ received. A DATA_FIN, as with the FIN on a regular TCP connection,
+ is a unidirectional signal.
+
+ The DATA_FIN is signaled by setting the 'F' flag in the Data Sequence
+ Signal option (Figure 9) to 1. A DATA_FIN occupies 1 octet (the
+ final octet) of the connection-level sequence space. Note that the
+ DATA_FIN is included in the Data-Level Length, but not at the subflow
+ level: for example, a segment with DSN 80, and Data-Level Length 11,
+ with DATA_FIN set, would map 10 octets from the subflow into data
+ sequence space 80-89, the DATA_FIN is DSN 90; therefore, this segment
+ including DATA_FIN would be acknowledged with a DATA_ACK of 91.
+
+ Note that when the DATA_FIN is not attached to a TCP segment
+ containing data, the Data Sequence Signal MUST have a subflow
+ sequence number of 0, a Data-Level Length of 1, and the data sequence
+ number that corresponds with the DATA_FIN itself. The checksum in
+ this case will only cover the pseudo-header.
+
+ A DATA_FIN has the semantics and behavior as a regular TCP FIN, but
+ at the connection level. Notably, it is only DATA_ACKed once all
+ data has been successfully received at the connection level. Note,
+ therefore, that a DATA_FIN is decoupled from a subflow FIN. It is
+ only permissible to combine these signals on one subflow if there is
+ no data outstanding on other subflows. Otherwise, it may be
+ necessary to retransmit data on different subflows. Essentially, a
+ host MUST NOT close all functioning subflows unless it is safe to do
+
+
+
+Ford, et al. Experimental [Page 29]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ so, i.e., until all outstanding data has been DATA_ACKed, or until
+ the segment with the DATA_FIN flag set is the only outstanding
+ segment.
+
+ Once a DATA_FIN has been acknowledged, all remaining subflows MUST be
+ closed with standard FIN exchanges. Both hosts SHOULD send FINs on
+ all subflows, as a courtesy to allow middleboxes to clean up state
+ even if an individual subflow has failed. It is also encouraged to
+ reduce the timeouts (Maximum Segment Life) on subflows at end hosts.
+ In particular, any subflows where there is still outstanding data
+ queued (which has been retransmitted on other subflows in order to
+ get the DATA_FIN acknowledged) MAY be closed with a RST.
+
+ A connection is considered closed once both hosts' DATA_FINs have
+ been acknowledged by DATA_ACKs.
+
+ As specified above, a standard TCP FIN on an individual subflow only
+ shuts down the subflow on which it was sent. If all subflows have
+ been closed with a FIN exchange, but no DATA_FIN has been received
+ and acknowledged, the MPTCP connection is treated as closed only
+ after a timeout. This implies that an implementation will have
+ TIME_WAIT states at both the subflow and connection levels (see
+ Appendix C). This permits "break-before-make" scenarios where
+ connectivity is lost on all subflows before a new one can be re-
+ established.
+
+3.3.4. Receiver Considerations
+
+ Regular TCP advertises a receive window in each packet, telling the
+ sender how much data the receiver is willing to accept past the
+ cumulative ack. The receive window is used to implement flow
+ control, throttling down fast senders when receivers cannot keep up.
+
+ MPTCP also uses a unique receive window, shared between the subflows.
+ The idea is to allow any subflow to send data as long as the receiver
+ is willing to accept it. The alternative, maintaining per subflow
+ receive windows, could end up stalling some subflows while others
+ would not use up their window.
+
+ The receive window is relative to the DATA_ACK. As in TCP, a
+ receiver MUST NOT shrink the right edge of the receive window (i.e.,
+ DATA_ACK + receive window). The receiver will use the data sequence
+ number to tell if a packet should be accepted at the connection
+ level.
+
+ When deciding to accept packets at subflow level, regular TCP checks
+ the sequence number in the packet against the allowed receive window.
+ With multipath, such a check is done using only the connection-level
+
+
+
+Ford, et al. Experimental [Page 30]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ window. A sanity check SHOULD be performed at subflow level to
+ ensure that the subflow and mapped sequence numbers meet the
+ following test: SSN - SUBFLOW_ACK <= DSN - DATA_ACK, where SSN is the
+ subflow sequence number of the received packet and SUBFLOW_ACK is the
+ RCV.NXT (next expected sequence number) of the subflow (with the
+ equivalent connection-level definitions for DSN and DATA_ACK).
+
+ In regular TCP, once a segment is deemed in-window, it is put either
+ in the in-order receive queue or in the out-of-order queue. In
+ Multipath TCP, the same happens but at the connection level: a
+ segment is placed in the connection level in-order or out-of-order
+ queue if it is in-window at both connection and subflow levels. The
+ stack still has to remember, for each subflow, which segments were
+ received successfully so that it can ACK them at subflow level
+ appropriately. Typically, this will be implemented by keeping per
+ subflow out-of-order queues (containing only message headers, not the
+ payloads) and remembering the value of the cumulative ACK.
+
+ It is important for implementers to understand how large a receiver
+ buffer is appropriate. The lower bound for full network utilization
+ is the maximum bandwidth-delay product of any one of the paths.
+ However, this might be insufficient when a packet is lost on a slower
+ subflow and needs to be retransmitted (see Section 3.3.6). A tight
+ upper bound would be the maximum round-trip time (RTT) of any path
+ multiplied by the total bandwidth available across all paths. This
+ permits all subflows to continue at full speed while a packet is
+ fast-retransmitted on the maximum RTT path. Even this might be
+ insufficient to maintain full performance in the event of a
+ retransmit timeout on the maximum RTT path. It is for future study
+ to determine the relationship between retransmission strategies and
+ receive buffer sizing.
+
+3.3.5. Sender Considerations
+
+ The sender remembers receiver window advertisements from the
+ receiver. It should only update its local receive window values when
+ the largest sequence number allowed (i.e., DATA_ACK + receive window)
+ increases, on the receipt of a DATA_ACK. This is important to allow
+ using paths with different RTTs, and thus different feedback loops.
+
+ MPTCP uses a single receive window across all subflows, and if the
+ receive window was guaranteed to be unchanged end-to-end, a host
+ could always read the most recent receive window value. However,
+ some classes of middleboxes may alter the TCP-level receive window.
+ Typically, these will shrink the offered window, although for short
+ periods of time it may be possible for the window to be larger
+ (however, note that this would not continue for long periods since
+ ultimately the middlebox must keep up with delivering data to the
+
+
+
+Ford, et al. Experimental [Page 31]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ receiver). Therefore, if receive window sizes differ on multiple
+ subflows, when sending data MPTCP SHOULD take the largest of the most
+ recent window sizes as the one to use in calculations. This rule is
+ implicit in the requirement not to reduce the right edge of the
+ window.
+
+ The sender MUST also remember the receive windows advertised by each
+ subflow. The allowed window for subflow i is (ack_i, ack_i +
+ rcv_wnd_i), where ack_i is the subflow-level cumulative ACK of
+ subflow i. This ensures data will not be sent to a middlebox unless
+ there is enough buffering for the data.
+
+ Putting the two rules together, we get the following: a sender is
+ allowed to send data segments with data-level sequence numbers
+ between (DATA_ACK, DATA_ACK + receive_window). Each of these
+ segments will be mapped onto subflows, as long as subflow sequence
+ numbers are in the allowed windows for those subflows. Note that
+ subflow sequence numbers do not generally affect flow control if the
+ same receive window is advertised across all subflows. They will
+ perform flow control for those subflows with a smaller advertised
+ receive window.
+
+ The send buffer MUST, at a minimum, be as big as the receive buffer,
+ to enable the sender to reach maximum throughput.
+
+3.3.6. Reliability and Retransmissions
+
+ The data sequence mapping allows senders to resend data with the same
+ data sequence number on a different subflow. When doing this, a host
+ MUST still retransmit the original data on the original subflow, in
+ order to preserve the subflow integrity (middleboxes could replay old
+ data, and/or could reject holes in subflows), and a receiver will
+ ignore these retransmissions. While this is clearly suboptimal, for
+ compatibility reasons this is sensible behavior. Optimizations could
+ be negotiated in future versions of this protocol.
+
+ This protocol specification does not mandate any mechanisms for
+ handling retransmissions, and much will be dependent upon local
+ policy (as discussed in Section 3.3.8). One can imagine aggressive
+ connection-level retransmissions policies where every packet lost at
+ subflow level is retransmitted on a different subflow (hence, wasting
+ bandwidth but possibly reducing application-to-application delays),
+ or conservative retransmission policies where connection-level
+ retransmits are only used after a few subflow-level retransmission
+ timeouts occur.
+
+
+
+
+
+
+Ford, et al. Experimental [Page 32]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ It is envisaged that a standard connection-level retransmission
+ mechanism would be implemented around a connection-level data queue:
+ all segments that haven't been DATA_ACKed are stored. A timer is set
+ when the head of the connection-level is ACKed at subflow level but
+ its corresponding data is not ACKed at data level. This timer will
+ guard against failures in retransmission by middleboxes that
+ proactively ACK data.
+
+ The sender MUST keep data in its send buffer as long as the data has
+ not been acknowledged at both connection level and on all subflows on
+ which it has been sent. In this way, the sender can always
+ retransmit the data if needed, on the same subflow or on a different
+ one. A special case is when a subflow fails: the sender will
+ typically resend the data on other working subflows after a timeout,
+ and will keep trying to retransmit the data on the failed subflow
+ too. The sender will declare the subflow failed after a predefined
+ upper bound on retransmissions is reached (which MAY be lower than
+ the usual TCP limits of the Maximum Segment Life), or on the receipt
+ of an ICMP error, and only then delete the outstanding data segments.
+
+ Multiple retransmissions are triggers that will indicate that a
+ subflow performs badly and could lead to a host resetting the subflow
+ with a RST. However, additional research is required to understand
+ the heuristics of how and when to reset underperforming subflows.
+ For example, a highly asymmetric path may be misdiagnosed as
+ underperforming.
+
+3.3.7. Congestion Control Considerations
+
+ Different subflows in an MPTCP connection have different congestion
+ windows. To achieve fairness at bottlenecks and resource pooling, it
+ is necessary to couple the congestion windows in use on each subflow,
+ in order to push most traffic to uncongested links. One algorithm
+ for achieving this is presented in [5]; the algorithm does not
+ achieve perfect resource pooling but is "safe" in that it is readily
+ deployable in the current Internet. By this, we mean that it does
+ not take up more capacity on any one path than if it was a single
+ path flow using only that route, so this ensures fair coexistence
+ with single-path TCP at shared bottlenecks.
+
+ It is foreseeable that different congestion controllers will be
+ implemented for MPTCP, each aiming to achieve different properties in
+ the resource pooling/fairness/stability design space, as well as
+ those for achieving different properties in quality of service,
+ reliability, and resilience.
+
+
+
+
+
+
+Ford, et al. Experimental [Page 33]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ Regardless of the algorithm used, the design of the MPTCP protocol
+ aims to provide the congestion control implementations sufficient
+ information to take the right decisions; this information includes,
+ for each subflow, which packets were lost and when.
+
+3.3.8. Subflow Policy
+
+ Within a local MPTCP implementation, a host may use any local policy
+ it wishes to decide how to share the traffic to be sent over the
+ available paths.
+
+ In the typical use case, where the goal is to maximize throughput,
+ all available paths will be used simultaneously for data transfer,
+ using coupled congestion control as described in [5]. It is
+ expected, however, that other use cases will appear.
+
+ For instance, a possibility is an 'all-or-nothing' approach, i.e.,
+ have a second path ready for use in the event of failure of the first
+ path, but alternatives could include entirely saturating one path
+ before using an additional path (the 'overflow' case). Such choices
+ would be most likely based on the monetary cost of links, but may
+ also be based on properties such as the delay or jitter of links,
+ where stability (of delay or bandwidth) is more important than
+ throughput. Application requirements such as these are discussed in
+ detail in [6].
+
+ The ability to make effective choices at the sender requires full
+ knowledge of the path "cost", which is unlikely to be the case. It
+ would be desirable for a receiver to be able to signal their own
+ preferences for paths, since they will often be the multihomed party,
+ and may have to pay for metered incoming bandwidth.
+
+ Whilst fine-grained control may be the most powerful solution, that
+ would require some mechanism such as overloading the Explicit
+ Congestion Notification (ECN) signal [17], which is undesirable, and
+ it is felt that there would not be sufficient benefit to justify an
+ entirely new signal. Therefore, the MP_JOIN option (see Section 3.2)
+ contains the 'B' bit, which allows a host to indicate to its peer
+ that this path should be treated as a backup path to use only in the
+ event of failure of other working subflows (i.e., a subflow where the
+ receiver has indicated B=1 SHOULD NOT be used to send data unless
+ there are no usable subflows where B=0).
+
+ In the event that the available set of paths changes, a host may wish
+ to signal a change in priority of subflows to the peer (e.g., a
+ subflow that was previously set as backup should now take priority
+
+
+
+
+
+Ford, et al. Experimental [Page 34]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ over all remaining subflows). Therefore, the MP_PRIO option, shown
+ in Figure 11, can be used to change the 'B' flag of the subflow on
+ which it is sent.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----+-+--------------+
+ | Kind | Length |Subtype| |B| AddrID (opt) |
+ +---------------+---------------+-------+-----+-+--------------+
+
+ Figure 11: Change Subflow Priority (MP_PRIO) Option
+
+ It should be noted that the backup flag is a request from a data
+ receiver to a data sender only, and the data sender SHOULD adhere to
+ these requests. A host cannot assume that the data sender will do
+ so, however, since local policies -- or technical difficulties -- may
+ override MP_PRIO requests. Note also that this signal applies to a
+ single direction, and so the sender of this option could choose to
+ continue using the subflow to send data even if it has signaled B=1
+ to the other host.
+
+ This option can also be applied to other subflows than the one on
+ which it is sent, by setting the optional Address ID field. This
+ applies the given setting of B to all subflows in this connection
+ that use the address identified by the given Address ID. The
+ presence of this field is determined by the option length; if
+ Length==4 then it is present. If Length==3, then it applies to the
+ current subflow only. The use case of this is that a host can signal
+ to its peer that an address is temporarily unavailable (for example,
+ if it has radio coverage issues) and the peer should therefore drop
+ to backup state on all subflows using that Address ID.
+
+3.4. Address Knowledge Exchange (Path Management)
+
+ We use the term "path management" to refer to the exchange of
+ information about additional paths between hosts, which in this
+ design is managed by multiple addresses at hosts. For more detail of
+ the architectural thinking behind this design, see the MPTCP
+ Architecture document [2].
+
+ This design makes use of two methods of sharing such information, and
+ both can be used on a connection. The first is the direct setup of
+ new subflows, already described in Section 3.2, where the initiator
+ has an additional address. The second method, described in the
+ following subsections, signals addresses explicitly to the other host
+ to allow it to initiate new subflows. The two mechanisms are
+ complementary: the first is implicit and simple, while the explicit
+ is more complex but is more robust. Together, the mechanisms allow
+
+
+
+Ford, et al. Experimental [Page 35]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ addresses to change in flight (and thus support operation through
+ NATs, since the source address need not be known), and also allow the
+ signaling of previously unknown addresses, and of addresses belonging
+ to other address families (e.g., both IPv4 and IPv6).
+
+ Here is an example of typical operation of the protocol:
+
+ o An MPTCP connection is initially set up between address/port A1 of
+ Host A and address/port B1 of Host B. If Host A is multihomed and
+ multiaddressed, it can start an additional subflow from its
+ address A2 to B1, by sending a SYN with a Join option from A2 to
+ B1, using B's previously declared token for this connection.
+ Alternatively, if B is multihomed, it can try to set up a new
+ subflow from B2 to A1, using A's previously declared token. In
+ either case, the SYN will be sent to the port already in use for
+ the original subflow on the receiving host.
+
+ o Simultaneously (or after a timeout), an ADD_ADDR option
+ (Section 3.4.1) is sent on an existing subflow, informing the
+ receiver of the sender's alternative address(es). The recipient
+ can use this information to open a new subflow to the sender's
+ additional address. In our example, A will send ADD_ADDR option
+ informing B of address/port A2. The mix of using the SYN-based
+ option and the ADD_ADDR option, including timeouts, is
+ implementation specific and can be tailored to agree with local
+ policy.
+
+ o If subflow A2-B1 is successfully set up, Host B can use the
+ Address ID in the Join option to correlate this with the ADD_ADDR
+ option that will also arrive on an existing subflow; now B knows
+ not to open A2-B1, ignoring the ADD_ADDR. Otherwise, if B has not
+ received the A2-B1 MP_JOIN SYN but received the ADD_ADDR, it can
+ try to initiate a new subflow from one or more of its addresses to
+ address A2. This permits new sessions to be opened if one host is
+ behind a NAT.
+
+ Other ways of using the two signaling mechanisms are possible; for
+ instance, signaling addresses in other address families can only be
+ done explicitly using the Add Address option.
+
+3.4.1. Address Advertisement
+
+ The Add Address (ADD_ADDR) TCP option announces additional addresses
+ (and optionally, ports) on which a host can be reached (Figure 12).
+ Multiple instances of this TCP option can be added in a single
+ message if there is sufficient TCP option space; otherwise, multiple
+ TCP messages containing this option will be sent. This option can be
+ used at any time during a connection, depending on when the sender
+
+
+
+Ford, et al. Experimental [Page 36]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ wishes to enable multiple paths and/or when paths become available.
+ As with all MPTCP signals, the receiver MUST undertake standard TCP
+ validity checks before acting upon it.
+
+ Every address has an Address ID that can be used for uniquely
+ identifying the address within a connection for address removal.
+ This is also used to identify MP_JOIN options (see Section 3.2)
+ relating to the same address, even when address translators are in
+ use. The Address ID MUST uniquely identify the address to the sender
+ (within the scope of the connection), but the mechanism for
+ allocating such IDs is implementation specific.
+
+ All address IDs learned via either MP_JOIN or ADD_ADDR SHOULD be
+ stored by the receiver in a data structure that gathers all the
+ Address ID to address mappings for a connection (identified by a
+ token pair). In this way, there is a stored mapping between Address
+ ID, observed source address, and token pair for future processing of
+ control information for a connection. Note that an implementation
+ MAY discard incoming address advertisements at will, for example, for
+ avoiding the required mapping state, or because advertised addresses
+ are of no use to it (for example, IPv6 addresses when it has IPv4
+ only). Therefore, a host MUST treat address advertisements as soft
+ state, and it MAY choose to refresh advertisements periodically.
+
+ This option is shown in Figure 12. The illustration is sized for
+ IPv4 addresses (IPVer = 4). For IPv6, the IPVer field will read 6,
+ and the length of the address will be 16 octets (instead of 4).
+
+ The presence of the final 2 octets, specifying the TCP port number to
+ use, are optional and can be inferred from the length of the option.
+ Although it is expected that the majority of use cases will use the
+ same port pairs as used for the initial subflow (e.g., port 80
+ remains port 80 on all subflows, as does the ephemeral port at the
+ client), there may be cases (such as port-based load balancing) where
+ the explicit specification of a different port is required. If no
+ port is specified, MPTCP SHOULD attempt to connect to the specified
+ address on the same port as is already in use by the subflow on which
+ the ADD_ADDR signal was sent; this is discussed in more detail in
+ Section 3.8.
+
+
+
+
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 37]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-------+---------------+
+ | Kind | Length |Subtype| IPVer | Address ID |
+ +---------------+---------------+-------+-------+---------------+
+ | Address (IPv4 - 4 octets / IPv6 - 16 octets) |
+ +-------------------------------+-------------------------------+
+ | Port (2 octets, optional) |
+ +-------------------------------+
+
+ Figure 12: Add Address (ADD_ADDR) Option
+
+ Due to the proliferation of NATs, it is reasonably likely that one
+ host may attempt to advertise private addresses [18]. It is not
+ desirable to prohibit this, since there may be cases where both hosts
+ have additional interfaces on the same private network, and a host
+ MAY want to advertise such addresses. The MP_JOIN handshake to
+ create a new subflow (Section 3.2) provides mechanisms to minimize
+ security risks. The MP_JOIN message contains a 32-bit token that
+ uniquely identifies the connection to the receiving host. If the
+ token is unknown, the host will return with a RST. In the unlikely
+ event that the token is known, subflow setup will continue, but the
+ HMAC exchange must occur for authentication. This will fail, and
+ will provide sufficient protection against two unconnected hosts
+ accidentally setting up a new subflow upon the signal of a private
+ address. Further security considerations around the issue of
+ ADD_ADDR messages that accidentally misdirect, or maliciously direct,
+ new MP_JOIN attempts are discussed in Section 5.
+
+ Ideally, ADD_ADDR and REMOVE_ADDR options would be sent reliably, and
+ in order, to the other end. This would ensure that this address
+ management does not unnecessarily cause an outage in the connection
+ when remove/add addresses are processed in reverse order, and also to
+ ensure that all possible paths are used. Note, however, that losing
+ reliability and ordering will not break the multipath connections, it
+ will just reduce the opportunity to open multipath paths and to
+ survive different patterns of path failures.
+
+ Therefore, implementing reliability signals for these TCP options is
+ not necessary. In order to minimize the impact of the loss of these
+ options, however, it is RECOMMENDED that a sender should send these
+ options on all available subflows. If these options need to be
+ received in order, an implementation SHOULD only send one ADD_ADDR/
+ REMOVE_ADDR option per RTT, to minimize the risk of misordering.
+
+ A host can send an ADD_ADDR message with an already assigned Address
+ ID, but the Address MUST be the same as previously assigned to this
+ Address ID, and the Port MUST be different from one already in use
+
+
+
+Ford, et al. Experimental [Page 38]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ for this Address ID. If these conditions are not met, the receiver
+ SHOULD silently ignore the ADD_ADDR. A host wishing to replace an
+ existing Address ID MUST first remove the existing one
+ (Section 3.4.2).
+
+ A host that receives an ADD_ADDR but finds a connection set up to
+ that IP address and port number is unsuccessful SHOULD NOT perform
+ further connection attempts to this address/port combination for this
+ connection. A sender that wants to trigger a new incoming connection
+ attempt on a previously advertised address/port combination can
+ therefore refresh ADD_ADDR information by sending the option again.
+
+ During normal MPTCP operation, it is unlikely that there will be
+ sufficient TCP option space for ADD_ADDR to be included along with
+ those for data sequence numbering (Section 3.3.1). Therefore, it is
+ expected that an MPTCP implementation will send the ADD_ADDR option
+ on separate ACKs. As discussed earlier, however, an MPTCP
+ implementation MUST NOT treat duplicate ACKs with any MPTCP option,
+ with the exception of the DSS option, as indications of congestion
+ [12], and an MPTCP implementation SHOULD NOT send more than two
+ duplicate ACKs in a row for signaling purposes.
+
+3.4.2. Remove Address
+
+ If, during the lifetime of an MPTCP connection, a previously
+ announced address becomes invalid (e.g., if the interface
+ disappears), the affected host SHOULD announce this so that the peer
+ can remove subflows related to this address.
+
+ This is achieved through the Remove Address (REMOVE_ADDR) option
+ (Figure 13), which will remove a previously added address (or list of
+ addresses) from a connection and terminate any subflows currently
+ using that address.
+
+ For security purposes, if a host receives a REMOVE_ADDR option, it
+ must ensure the affected path(s) are no longer in use before it
+ instigates closure. The receipt of REMOVE_ADDR SHOULD first trigger
+ the sending of a TCP keepalive [19] on the path, and if a response is
+ received the path SHOULD NOT be removed. Typical TCP validity tests
+ on the subflow (e.g., ensuring sequence and ACK numbers are correct)
+ MUST also be undertaken. An implementation can use indications of
+ these test failures as part of intrusion detection or error logging.
+
+ The sending and receipt (if no keepalive response was received) of
+ this message SHOULD trigger the sending of RSTs by both hosts on the
+ affected subflow(s) (if possible), as a courtesy to cleaning up
+ middlebox state, before cleaning up any local state.
+
+
+
+
+Ford, et al. Experimental [Page 39]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ Address removal is undertaken by ID, so as to permit the use of NATs
+ and other middleboxes that rewrite source addresses. If there is no
+ address at the requested ID, the receiver will silently ignore the
+ request.
+
+ A subflow that is still functioning MUST be closed with a FIN
+ exchange as in regular TCP, rather than using this option. For more
+ information, see Section 3.3.3.
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-------+---------------+
+ | Kind | Length = 3+n |Subtype|(resvd)| Address ID | ...
+ +---------------+---------------+-------+-------+---------------+
+ (followed by n-1 Address IDs, if required)
+
+ Figure 13: Remove Address (REMOVE_ADDR) Option
+
+3.5. Fast Close
+
+ Regular TCP has the means of sending a reset (RST) signal to abruptly
+ close a connection. With MPTCP, the RST only has the scope of the
+ subflow and will only close the concerned subflow but not affect the
+ remaining subflows. MPTCP's connection will stay alive at the data
+ level, in order to permit break-before-make handover between
+ subflows. It is therefore necessary to provide an MPTCP-level
+ "reset" to allow the abrupt closure of the whole MPTCP connection,
+ and this is the MP_FASTCLOSE option.
+
+ MP_FASTCLOSE is used to indicate to the peer that the connection will
+ be abruptly closed and no data will be accepted anymore. The reasons
+ for triggering an MP_FASTCLOSE are implementation specific. Regular
+ TCP does not allow sending a RST while the connection is in a
+ synchronized state [1]. Nevertheless, implementations allow the
+ sending of a RST in this state, if, for example, the operating system
+ is running out of resources. In these cases, MPTCP should send the
+ MP_FASTCLOSE. This option is illustrated in Figure 14.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 40]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+-----------------------+
+ | Kind | Length |Subtype| (reserved) |
+ +---------------+---------------+-------+-----------------------+
+ | Option Receiver's Key |
+ | (64 bits) |
+ | |
+ +---------------------------------------------------------------+
+
+ Figure 14: Fast Close (MP_FASTCLOSE) Option
+
+ If Host A wants to force the closure of an MPTCP connection, the
+ MPTCP Fast Close procedure is as follows:
+
+ o Host A sends an ACK containing the MP_FASTCLOSE option on one
+ subflow, containing the key of Host B as declared in the initial
+ connection handshake. On all the other subflows, Host A sends a
+ regular TCP RST to close these subflows, and tears them down.
+ Host A now enters FASTCLOSE_WAIT state.
+
+ o Upon receipt of an MP_FASTCLOSE, containing the valid key, Host B
+ answers on the same subflow with a TCP RST and tears down all
+ subflows. Host B can now close the whole MPTCP connection (it
+ transitions directly to CLOSED state).
+
+ o As soon as Host A has received the TCP RST on the remaining
+ subflow, it can close this subflow and tear down the whole
+ connection (transition from FASTCLOSE_WAIT to CLOSED states). If
+ Host A receives an MP_FASTCLOSE instead of a TCP RST, both hosts
+ attempted fast closure simultaneously. Host A should reply with a
+ TCP RST and tear down the connection.
+
+ o If Host A does not receive a TCP RST in reply to its MP_FASTCLOSE
+ after one retransmission timeout (RTO) (the RTO of the subflow
+ where the MPTCP_RST has been sent), it SHOULD retransmit the
+ MP_FASTCLOSE. The number of retransmissions SHOULD be limited to
+ avoid this connection from being retained for a long time, but
+ this limit is implementation specific. A RECOMMENDED number is 3.
+
+3.6. Fallback
+
+ Sometimes, middleboxes will exist on a path that could prevent the
+ operation of MPTCP. MPTCP has been designed in order to cope with
+ many middlebox modifications (see Section 6), but there are still
+ some cases where a subflow could fail to operate within the MPTCP
+ requirements. These cases are notably the following: the loss of TCP
+ options on a path and the modification of payload data. If such an
+
+
+
+Ford, et al. Experimental [Page 41]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ event occurs, it is necessary to "fall back" to the previous, safe
+ operation. This may be either falling back to regular TCP or
+ removing a problematic subflow.
+
+ At the start of an MPTCP connection (i.e., the first subflow), it is
+ important to ensure that the path is fully MPTCP capable and the
+ necessary TCP options can reach each host. The handshake as
+ described in Section 3.1 SHOULD fall back to regular TCP if either of
+ the SYN messages do not have the MPTCP options: this is the same, and
+ desired, behavior in the case where a host is not MPTCP capable, or
+ the path does not support the MPTCP options. When attempting to join
+ an existing MPTCP connection (Section 3.2), if a path is not MPTCP
+ capable and the TCP options do not get through on the SYNs, the
+ subflow will be closed according to the MP_JOIN logic.
+
+ There is, however, another corner case that should be addressed.
+ That is one of MPTCP options getting through on the SYN, but not on
+ regular packets. This can be resolved if the subflow is the first
+ subflow, and thus all data in flight is contiguous, using the
+ following rules.
+
+ A sender MUST include a DSS option with data sequence mapping in
+ every segment until one of the sent segments has been acknowledged
+ with a DSS option containing a Data ACK. Upon reception of the
+ acknowledgment, the sender has the confirmation that the DSS option
+ passes in both directions and may choose to send fewer DSS options
+ than once per segment.
+
+ If, however, an ACK is received for data (not just for the SYN)
+ without a DSS option containing a Data ACK, the sender determines the
+ path is not MPTCP capable. In the case of this occurring on an
+ additional subflow (i.e., one started with MP_JOIN), the host MUST
+ close the subflow with a RST. In the case of the first subflow
+ (i.e., that started with MP_CAPABLE), it MUST drop out of an MPTCP
+ mode back to regular TCP. The sender will send one final data
+ sequence mapping, with the Data-Level Length value of 0 indicating an
+ infinite mapping (in case the path drops options in one direction
+ only), and then revert to sending data on the single subflow without
+ any MPTCP options.
+
+ Note that this rule essentially prohibits the sending of data on the
+ third packet of an MP_CAPABLE or MP_JOIN handshake, since both that
+ option and a DSS cannot fit in TCP option space. If the initiator is
+ to send first, another segment must be sent that contains the data
+ and DSS. Note also that an additional subflow cannot be used until
+ the initial path has been verified as MPTCP capable.
+
+
+
+
+
+Ford, et al. Experimental [Page 42]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ These rules should cover all cases where such a failure could happen:
+ whether it's on the forward or reverse path and whether the server or
+ the client first sends data. If lost options on data packets occur
+ on any other subflow apart from the initial subflow, it should be
+ treated as a standard path failure. The data would not be DATA_ACKed
+ (since there is no mapping for the data), and the subflow can be
+ closed with a RST.
+
+ The case described above is a specialized case of fallback, for when
+ the lack of MPTCP support is detected before any data is acknowledged
+ at the connection level on a subflow. More generally, fallback
+ (either closing a subflow, or to regular TCP) can become necessary at
+ any point during a connection if a non-MPTCP-aware middlebox changes
+ the data stream.
+
+ As described in Section 3.3, each portion of data for which there is
+ a mapping is protected by a checksum. This mechanism is used to
+ detect if middleboxes have made any adjustments to the payload
+ (added, removed, or changed data). A checksum will fail if the data
+ has been changed in any way. This will also detect if the length of
+ data on the subflow is increased or decreased, and this means the
+ data sequence mapping is no longer valid. The sender no longer knows
+ what subflow-level sequence number the receiver is genuinely
+ operating at (the middlebox will be faking ACKs in return), and it
+ cannot signal any further mappings. Furthermore, in addition to the
+ possibility of payload modifications that are valid at the
+ application layer, there is the possibility that false positives
+ could be hit across MPTCP segment boundaries, corrupting the data.
+ Therefore, all data from the start of the segment that failed the
+ checksum onwards is not trustworthy.
+
+ When multiple subflows are in use, the data in flight on a subflow
+ will likely involve data that is not contiguously part of the
+ connection-level stream, since segments will be spread across the
+ multiple subflows. Due to the problems identified above, it is not
+ possible to determine what the adjustment has done to the data
+ (notably, any changes to the subflow sequence numbering). Therefore,
+ it is not possible to recover the subflow, and the affected subflow
+ must be immediately closed with a RST, featuring an MP_FAIL option
+ (Figure 15), which defines the data sequence number at the start of
+ the segment (defined by the data sequence mapping) that had the
+ checksum failure. Note that the MP_FAIL option requires the use of
+ the full 64-bit sequence number, even if 32-bit sequence numbers are
+ normally in use in the DSS signals on the path.
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 43]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------+----------------------+
+ | Kind | Length=12 |Subtype| (reserved) |
+ +---------------+---------------+-------+----------------------+
+ | |
+ | Data Sequence Number (8 octets) |
+ | |
+ +--------------------------------------------------------------+
+
+ Figure 15: Fallback (MP_FAIL) Option
+
+ The receiver MUST discard all data following the data sequence number
+ specified. Failed data MUST NOT be DATA_ACKed and so will be
+ retransmitted on other subflows (Section 3.3.6).
+
+ A special case is when there is a single subflow and it fails with a
+ checksum error. If it is known that all unacknowledged data in
+ flight is contiguous (which will usually be the case with a single
+ subflow), an infinite mapping can be applied to the subflow without
+ the need to close it first, and essentially turn off all further
+ MPTCP signaling. In this case, if a receiver identifies a checksum
+ failure when there is only one path, it will send back an MP_FAIL
+ option on the subflow-level ACK, referring to the data-level sequence
+ number of the start of the segment on which the checksum error was
+ detected. The sender will receive this, and if all unacknowledged
+ data in flight is contiguous, will signal an infinite mapping. This
+ infinite mapping will be a DSS option (Section 3.3) on the first new
+ packet, containing a data sequence mapping that acts retroactively,
+ referring to the start of the subflow sequence number of the last
+ segment that was known to be delivered intact. From that point
+ onwards, data can be altered by a middlebox without affecting MPTCP,
+ as the data stream is equivalent to a regular, legacy TCP session.
+
+ In the rare case that the data is not contiguous (which could happen
+ when there is only one subflow but it is retransmitting data from a
+ subflow that has recently been uncleanly closed), the receiver MUST
+ close the subflow with a RST with MP_FAIL. The receiver MUST discard
+ all data that follows the data sequence number specified. The sender
+ MAY attempt to create a new subflow belonging to the same connection,
+ and, if it chooses to do so, SHOULD place the single subflow
+ immediately in single-path mode by setting an infinite data sequence
+ mapping. This mapping will begin from the data-level sequence number
+ that was declared in the MP_FAIL.
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 44]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ After a sender signals an infinite mapping, it MUST only use subflow
+ ACKs to clear its send buffer. This is because Data ACKs may become
+ misaligned with the subflow ACKs when middleboxes insert or delete
+ data. The receive SHOULD stop generating Data ACKs after it receives
+ an infinite mapping.
+
+ When a connection has fallen back, only one subflow can send data;
+ otherwise, the receiver would not know how to reorder the data. In
+ practice, this means that all MPTCP subflows will have to be
+ terminated except one. Once MPTCP falls back to regular TCP, it MUST
+ NOT revert to MPTCP later in the connection.
+
+ It should be emphasized that we are not attempting to prevent the use
+ of middleboxes that want to adjust the payload. An MPTCP-aware
+ middlebox could provide such functionality by also rewriting
+ checksums.
+
+3.7. Error Handling
+
+ In addition to the fallback mechanism as described above, the
+ standard classes of TCP errors may need to be handled in an MPTCP-
+ specific way. Note that changing semantics -- such as the relevance
+ of a RST -- are covered in Section 4. Where possible, we do not want
+ to deviate from regular TCP behavior.
+
+ The following list covers possible errors and the appropriate MPTCP
+ behavior:
+
+ o Unknown token in MP_JOIN (or HMAC failure in MP_JOIN ACK, or
+ missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's
+ behavior on an unknown port)
+
+ o DSN out of window (during normal operation): drop the data, do not
+ send Data ACKs
+
+ o Remove request for unknown address ID: silently ignore
+
+3.8. Heuristics
+
+ There are a number of heuristics that are needed for performance or
+ deployment but that are not required for protocol correctness. In
+ this section, we detail such heuristics. Note that discussion of
+ buffering and certain sender and receiver window behaviors are
+ presented in Sections 3.3.4 and 3.3.5, as well as retransmission in
+ Section 3.3.6.
+
+
+
+
+
+
+Ford, et al. Experimental [Page 45]
+
+RFC 6824 Multipath TCP January 2013
+
+
+3.8.1. Port Usage
+
+ Under typical operation, an MPTCP implementation SHOULD use the same
+ ports as already in use. In other words, the destination port of a
+ SYN containing an MP_JOIN option SHOULD be the same as the remote
+ port of the first subflow in the connection. The local port for such
+ SYNs SHOULD also be the same as for the first subflow (and as such,
+ an implementation SHOULD reserve ephemeral ports across all local IP
+ addresses), although there may be cases where this is infeasible.
+ This strategy is intended to maximize the probability of the SYN
+ being permitted by a firewall or NAT at the recipient and to avoid
+ confusing any network monitoring software.
+
+ There may also be cases, however, where the passive opener wishes to
+ signal to the other host that a specific port should be used, and
+ this facility is provided in the Add Address option as documented in
+ Section 3.4.1. It is therefore feasible to allow multiple subflows
+ between the same two addresses but using different port pairs, and
+ such a facility could be used to allow load balancing within the
+ network based on 5-tuples (e.g., some ECMP implementations [7]).
+
+3.8.2. Delayed Subflow Start
+
+ Many TCP connections are short-lived and consist only of a few
+ segments, and so the overheads of using MPTCP outweigh any benefits.
+ A heuristic is required, therefore, to decide when to start using
+ additional subflows in an MPTCP connection. We expect that
+ experience gathered from deployments will provide further guidance on
+ this, and will be affected by particular application characteristics
+ (which are likely to change over time). However, a suggested
+ general-purpose heuristic that an implementation MAY choose to employ
+ is as follows. Results from experimental deployments are needed in
+ order to verify the correctness of this proposal.
+
+ If a host has data buffered for its peer (which implies that the
+ application has received a request for data), the host opens one
+ subflow for each initial window's worth of data that is buffered.
+
+ Consideration should also be given to limiting the rate of adding new
+ subflows, as well as limiting the total number of subflows open for a
+ particular connection. A host may choose to vary these values based
+ on its load or knowledge of traffic and path characteristics.
+
+ Note that this heuristic alone is probably insufficient. Traffic for
+ many common applications, such as downloads, is highly asymmetric and
+ the host that is multihomed may well be the client that will never
+
+
+
+
+
+Ford, et al. Experimental [Page 46]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ fill its buffers, and thus never use MPTCP. Advanced APIs that allow
+ an application to signal its traffic requirements would aid in these
+ decisions.
+
+ An additional time-based heuristic could be applied, opening
+ additional subflows after a given period of time has passed. This
+ would alleviate the above issue, and also provide resilience for low-
+ bandwidth but long-lived applications.
+
+ This section has shown some of the considerations that an implementer
+ should give when developing MPTCP heuristics, but is not intended to
+ be prescriptive.
+
+3.8.3. Failure Handling
+
+ Requirements for MPTCP's handling of unexpected signals have been
+ given in Section 3.7. There are other failure cases, however, where
+ a hosts can choose appropriate behavior.
+
+ For example, Section 3.1 suggests that a host SHOULD fall back to
+ trying regular TCP SYNs after one or more failures of MPTCP SYNs for
+ a connection. A host may keep a system-wide cache of such
+ information, so that it can back off from using MPTCP, firstly for
+ that particular destination host, and eventually on a whole
+ interface, if MPTCP connections continue failing.
+
+ Another failure could occur when the MP_JOIN handshake fails.
+ Section 3.7 specifies that an incorrect handshake MUST lead to the
+ subflow being closed with a RST. A host operating an active
+ intrusion detection system may choose to start blocking MP_JOIN
+ packets from the source host if multiple failed MP_JOIN attempts are
+ seen. From the connection initiator's point of view, if an MP_JOIN
+ fails, it SHOULD NOT attempt to connect to the same IP address and
+ port during the lifetime of the connection, unless the other host
+ refreshes the information with another ADD_ADDR option. Note that
+ the ADD_ADDR option is informational only, and does not guarantee the
+ other host will attempt a connection.
+
+ In addition, an implementation may learn, over a number of
+ connections, that certain interfaces or destination addresses
+ consistently fail and may default to not trying to use MPTCP for
+ these. Behavior could also be learned for particularly badly
+ performing subflows or subflows that regularly fail during use, in
+ order to temporarily choose not to use these paths.
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 47]
+
+RFC 6824 Multipath TCP January 2013
+
+
+4. Semantic Issues
+
+ In order to support multipath operation, the semantics of some TCP
+ components have changed. To aid clarity, this section collects these
+ semantic changes as a reference.
+
+ Sequence number: The (in-header) TCP sequence number is specific to
+ the subflow. To allow the receiver to reorder application data,
+ an additional data-level sequence space is used. In this data-
+ level sequence space, the initial SYN and the final DATA_FIN
+ occupy 1 octet of sequence space. There is an explicit mapping of
+ data sequence space to subflow sequence space, which is signaled
+ through TCP options in data packets.
+
+ ACK: The ACK field in the TCP header acknowledges only the subflow
+ sequence number, not the data-level sequence space.
+ Implementations SHOULD NOT attempt to infer a data-level
+ acknowledgment from the subflow ACKs. This separates subflow- and
+ connection-level processing at an end host.
+
+ Duplicate ACK: A duplicate ACK that includes any MPTCP signaling
+ (with the exception of the DSS option) MUST NOT be treated as a
+ signal of congestion. To limit the chances of non-MPTCP-aware
+ entities mistakenly interpreting duplicate ACKs as a signal of
+ congestion, MPTCP SHOULD NOT send more than two duplicate ACKs
+ containing (non-DSS) MPTCP signals in a row.
+
+ Receive Window: The receive window in the TCP header indicates the
+ amount of free buffer space for the whole data-level connection
+ (as opposed to for this subflow) that is available at the
+ receiver. This is the same semantics as regular TCP, but to
+ maintain these semantics the receive window must be interpreted at
+ the sender as relative to the sequence number given in the
+ DATA_ACK rather than the subflow ACK in the TCP header. In this
+ way, the original flow control role is preserved. Note that some
+ middleboxes may change the receive window, and so a host SHOULD
+ use the maximum value of those recently seen on the constituent
+ subflows for the connection-level receive window, and also needs
+ to maintain a subflow-level window for subflow-level processing.
+
+ FIN: The FIN flag in the TCP header applies only to the subflow it
+ is sent on, not to the whole connection. For connection-level FIN
+ semantics, the DATA_FIN option is used.
+
+ RST: The RST flag in the TCP header applies only to the subflow it
+ is sent on, not to the whole connection. The MP_FASTCLOSE option
+ provides the fast close functionality of a RST at the MPTCP
+ connection level.
+
+
+
+Ford, et al. Experimental [Page 48]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ Address List: Address list management (i.e., knowledge of the local
+ and remote hosts' lists of available IP addresses) is handled on a
+ per-connection basis (as opposed to per subflow, per host, or per
+ pair of communicating hosts). This permits the application of
+ per-connection local policy. Adding an address to one connection
+ (either explicitly through an Add Address message, or implicitly
+ through a Join) has no implication for other connections between
+ the same pair of hosts.
+
+ 5-tuple: The 5-tuple (protocol, local address, local port, remote
+ address, remote port) presented by kernel APIs to the application
+ layer in a non-multipath-aware application is that of the first
+ subflow, even if the subflow has since been closed and removed
+ from the connection. This decision, and other related API issues,
+ are discussed in more detail in [6].
+
+5. Security Considerations
+
+ As identified in [9], the addition of multipath capability to TCP
+ will bring with it a number of new classes of threat. In order to
+ prevent these, [2] presents a set of requirements for a security
+ solution for MPTCP. The fundamental goal is for the security of
+ MPTCP to be "no worse" than regular TCP today, and the key security
+ requirements are:
+
+ o Provide a mechanism to confirm that the parties in a subflow
+ handshake are the same as in the original connection setup.
+
+ o Provide verification that the peer can receive traffic at a new
+ address before using it as part of a connection.
+
+ o Provide replay protection, i.e., ensure that a request to add/
+ remove a subflow is 'fresh'.
+
+ In order to achieve these goals, MPTCP includes a hash-based
+ handshake algorithm documented in Sections 3.1 and 3.2.
+
+ The security of the MPTCP connection hangs on the use of keys that
+ are shared once at the start of the first subflow, and are never sent
+ again over the network (unless used in the fast close mechanism,
+ Section 3.5). To ease demultiplexing while not giving away any
+ cryptographic material, future subflows use a truncated cryptographic
+ hash of this key as the connection identification "token". The keys
+ are concatenated and used as keys for creating Hash-based Message
+ Authentication Codes (HMACs) used on subflow setup, in order to
+ verify that the parties in the handshake are the same as in the
+ original connection setup. It also provides verification that the
+ peer can receive traffic at this new address. Replay attacks would
+
+
+
+Ford, et al. Experimental [Page 49]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ still be possible when only keys are used; therefore, the handshakes
+ use single-use random numbers (nonces) at both ends -- this ensures
+ the HMAC will never be the same on two handshakes. Guidance on
+ generating random numbers suitable for use as keys is given in [14]
+ and discussed in Section 3.1.
+
+ The use of crypto capability bits in the initial connection handshake
+ to negotiate use of a particular algorithm allows the deployment of
+ additional crypto mechanisms in the future. Note that this would be
+ susceptible to bid-down attacks only if the attacker was on-path (and
+ thus would be able to modify the data anyway). The security
+ mechanism presented in this document should therefore protect against
+ all forms of flooding and hijacking attacks discussed in [9].
+
+ During normal operation, regular TCP protection mechanisms (such as
+ ensuring sequence numbers are in-window) will provide the same level
+ of protection against attacks on individual TCP subflows as exists
+ for regular TCP today. Implementations will introduce additional
+ buffers compared to regular TCP, to reassemble data at the connection
+ level. The application of window sizing will minimize the risk of
+ denial-of-service attacks consuming resources.
+
+ As discussed in Section 3.4.1, a host may advertise its private
+ addresses, but these might point to different hosts in the receiver's
+ network. The MP_JOIN handshake (Section 3.2) will ensure that this
+ does not succeed in setting up a subflow to the incorrect host.
+ However, it could still create unwanted TCP handshake traffic. This
+ feature of MPTCP could be a target for denial-of-service exploits,
+ with malicious participants in MPTCP connections encouraging the
+ recipient to target other hosts in the network. Therefore,
+ implementations should consider heuristics (Section 3.8) at both the
+ sender and receiver to reduce the impact of this.
+
+ A small security risk could theoretically exist with key reuse, but
+ in order to accomplish a replay attack, both the sender and receiver
+ keys, and the sender and receiver random numbers, in the MP_JOIN
+ handshake (Section 3.2) would have to match.
+
+ Whilst this specification defines a "medium" security solution,
+ meeting the criteria specified at the start of this section and the
+ threat analysis ([9]), since attacks only ever get worse, it is
+ likely that a future Standards Track version of MPTCP would need to
+ be able to support stronger security. There are several ways the
+ security of MPTCP could potentially be improved; some of these would
+ be compatible with MPTCP as defined in this document, whilst others
+ may not be. For now, the best approach is to get experience with the
+ current approach, establish what might work, and check that the
+ threat analysis is still accurate.
+
+
+
+Ford, et al. Experimental [Page 50]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ Possible ways of improving MPTCP security could include:
+
+ o defining a new MPCTP cryptographic algorithm, as negotiated in
+ MP_CAPABLE. A sub-case could be to include an additional
+ deployment assumption, such as stateful servers, in order to allow
+ a more powerful algorithm to be used.
+
+ o defining how to secure data transfer with MPTCP, whilst not
+ changing the signaling part of the protocol.
+
+ o defining security that requires more option space, perhaps in
+ conjunction with a "long options" proposal for extending the TCP
+ options space (such as those surveyed in [20]), or perhaps
+ building on the current approach with a second stage of MPTCP-
+ option-based security.
+
+ o revisiting the working group's decision to exclusively use TCP
+ options for MPTCP signaling, and instead look at also making use
+ of the TCP payloads.
+
+ MPTCP has been designed with several methods available to indicate a
+ new security mechanism, including:
+
+ o available flags in MP_CAPABLE (Figure 4);
+
+ o available subtypes in the MPTCP option (Figure 3);
+
+ o the version field in MP_CAPABLE (Figure 4);
+
+6. Interactions with Middleboxes
+
+ Multipath TCP was designed to be deployable in the present world.
+ Its design takes into account "reasonable" existing middlebox
+ behavior. In this section, we outline a few representative
+ middlebox-related failure scenarios and show how Multipath TCP
+ handles them. Next, we list the design decisions multipath has made
+ to accommodate the different middleboxes.
+
+ A primary concern is our use of a new TCP option. Middleboxes should
+ forward packets with unknown options unchanged, yet there are some
+ that don't. These we expect will either strip options and pass the
+ data, drop packets with new options, copy the same option into
+ multiple segments (e.g., when doing segmentation), or drop options
+ during segment coalescing.
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 51]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ MPTCP uses a single new TCP option "Kind", and all message types are
+ defined by "subtype" values (see Section 8). This should reduce the
+ chances of only some types of MPTCP options being passed, and instead
+ the key differing characteristics are different paths, and the
+ presence of the SYN flag.
+
+ MPTCP SYN packets on the first subflow of a connection contain the
+ MP_CAPABLE option (Section 3.1). If this is dropped, MPTCP SHOULD
+ fall back to regular TCP. If packets with the MP_JOIN option
+ (Section 3.2) are dropped, the paths will simply not be used.
+
+ If a middlebox strips options but otherwise passes the packets
+ unchanged, MPTCP will behave safely. If an MP_CAPABLE option is
+ dropped on either the outgoing or the return path, the initiating
+ host can fall back to regular TCP, as illustrated in Figure 16 and
+ discussed in Section 3.1.
+
+ Subflow SYNs contain the MP_JOIN option. If this option is stripped
+ on the outgoing path, the SYN will appear to be a regular SYN to Host
+ B. Depending on whether there is a listening socket on the target
+ port, Host B will reply either with SYN/ACK or RST (subflow
+ connection fails). When Host A receives the SYN/ACK it sends a RST
+ because the SYN/ACK does not contain the MP_JOIN option and its
+ token. Either way, the subflow setup fails, but otherwise does not
+ affect the MPTCP connection as a whole.
+
+ Host A Host B
+ | Middlebox M |
+ | | |
+ | SYN(MP_CAPABLE) | SYN |
+ |-------------------|---------------->|
+ | SYN/ACK |
+ |<------------------------------------|
+ a) MP_CAPABLE option stripped on outgoing path
+
+ Host A Host B
+ | SYN(MP_CAPABLE) |
+ |------------------------------------>|
+ | Middlebox M |
+ | | |
+ | SYN/ACK |SYN/ACK(MP_CAPABLE)|
+ |<----------------|-------------------|
+ b) MP_CAPABLE option stripped on return path
+
+ Figure 16: Connection Setup with Middleboxes that
+ Strip Options from Packets
+
+
+
+
+
+Ford, et al. Experimental [Page 52]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ We now examine data flow with MPTCP, assuming the flow is correctly
+ set up, which implies the options in the SYN packets were allowed
+ through by the relevant middleboxes. If options are allowed through
+ and there is no resegmentation or coalescing to TCP segments,
+ Multipath TCP flows can proceed without problems.
+
+ The case when options get stripped on data packets has been discussed
+ in the Fallback section. If a fraction of options are stripped,
+ behavior is not deterministic. If some data sequence mappings are
+ lost, the connection can continue so long as mappings exist for the
+ subflow-level data (e.g., if multiple maps have been sent that
+ reinforce each other). If some subflow-level space is left unmapped,
+ however, the subflow is treated as broken and is closed, through the
+ process described in Section 3.6. MPTCP should survive with a loss
+ of some Data ACKs, but performance will degrade as the fraction of
+ stripped options increases. We do not expect such cases to appear in
+ practice, though: most middleboxes will either strip all options or
+ let them all through.
+
+ We end this section with a list of middlebox classes, their behavior,
+ and the elements in the MPTCP design that allow operation through
+ such middleboxes. Issues surrounding dropping packets with options
+ or stripping options were discussed above, and are not included here:
+
+ o NATs [21] (Network Address (and Port) Translators) change the
+ source address (and often source port) of packets. This means
+ that a host will not know its public-facing address for signaling
+ in MPTCP. Therefore, MPTCP permits implicit address addition via
+ the MP_JOIN option, and the handshake mechanism ensures that
+ connection attempts to private addresses [18] do not cause
+ problems. Explicit address removal is undertaken by an Address ID
+ to allow no knowledge of the source address.
+
+ o Performance Enhancing Proxies (PEPs) [22] might proactively ACK
+ data to increase performance. MPTCP, however, relies on accurate
+ congestion control signals from the end host, and non-MPTCP-aware
+ PEPs will not be able to provide such signals. MPTCP will,
+ therefore, fall back to single-path TCP, or close the problematic
+ subflow (see Section 3.6).
+
+ o Traffic Normalizers [23] may not allow holes in sequence numbers,
+ and may cache packets and retransmit the same data. MPTCP looks
+ like standard TCP on the wire, and will not retransmit different
+ data on the same subflow sequence number. In the event of a
+ retransmission, the same data will be retransmitted on the
+ original TCP subflow even if it is additionally retransmitted at
+ the connection level on a different subflow.
+
+
+
+
+Ford, et al. Experimental [Page 53]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ o Firewalls [24] might perform initial sequence number randomization
+ on TCP connections. MPTCP uses relative sequence numbers in data
+ sequence mapping to cope with this. Like NATs, firewalls will not
+ permit many incoming connections, so MPTCP supports address
+ signaling (ADD_ADDR) so that a multiaddressed host can invite its
+ peer behind the firewall/NAT to connect out to its additional
+ interface.
+
+ o Intrusion Detection Systems look out for traffic patterns and
+ content that could threaten a network. Multipath will mean that
+ such data is potentially spread, so it is more difficult for an
+ IDS to analyze the whole traffic, and potentially increases the
+ risk of false positives. However, for an MPTCP-aware IDS, tokens
+ can be read by such systems to correlate multiple subflows and
+ reassemble for analysis.
+
+ o Application-level middleboxes such as content-aware firewalls may
+ alter the payload within a subflow, such as rewriting URIs in HTTP
+ traffic. MPTCP will detect these using the checksum and close the
+ affected subflow(s), if there are other subflows that can be used.
+ If all subflows are affected, multipath will fall back to TCP,
+ allowing such middleboxes to change the payload. MPTCP-aware
+ middleboxes should be able to adjust the payload and MPTCP
+ metadata in order not to break the connection.
+
+ In addition, all classes of middleboxes may affect TCP traffic in the
+ following ways:
+
+ o TCP options may be removed, or packets with unknown options
+ dropped, by many classes of middleboxes. It is intended that the
+ initial SYN exchange, with a TCP option, will be sufficient to
+ identify the path capabilities. If such a packet does not get
+ through, MPTCP will end up falling back to regular TCP.
+
+ o Segmentation/Coalescing (e.g., TCP segmentation offloading) might
+ copy options between packets and might strip some options.
+ MPTCP's data sequence mapping includes the relative subflow
+ sequence number instead of using the sequence number in the
+ segment. In this way, the mapping is independent of the packets
+ that carry it.
+
+ o The receive window may be shrunk by some middleboxes at the
+ subflow level. MPTCP will use the maximum window at data level,
+ but will also obey subflow-specific windows.
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 54]
+
+RFC 6824 Multipath TCP January 2013
+
+
+7. Acknowledgments
+
+ The authors were originally supported by Trilogy
+ (http://www.trilogy-project.org), a research project (ICT-216372)
+ partially funded by the European Community under its Seventh
+ Framework Program.
+
+ Alan Ford was originally supported by Roke Manor Research.
+
+ The authors gratefully acknowledge significant input into this
+ document from Sebastien Barre, Christoph Paasch, and Andrew McDonald.
+
+ The authors also wish to acknowledge reviews and contributions from
+ Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock,
+ Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo,
+ Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing,
+ Andrew McGregor, Georg Hampel, Anumita Biswas, Wes Eddy, Alexey
+ Melnikov, Francis Dupont, Adrian Farrel, Barry Leiba, Robert Sparks,
+ Sean Turner, Stephen Farrell, and Martin Stiemerling.
+
+8. IANA Considerations
+
+ This document defines a new TCP option for MPTCP, assigned a value of
+ 30 (decimal) from the TCP option space. This value is the value of
+ "Kind" as seen in all MPTCP options in this document. This value is
+ defined as:
+
+ +------+--------+-----------------------+-----------+
+ | Kind | Length | Meaning | Reference |
+ +------+--------+-----------------------+-----------+
+ | 30 | N | Multipath TCP (MPTCP) | RFC 6824 |
+ +------+--------+-----------------------+-----------+
+
+ Table 1: TCP Option Kind Numbers
+
+ This document also defines a 4-bit subtype field, for which IANA has
+ created and will maintain a new sub-registry entitled "MPTCP Option
+ Subtypes" under the "Transmission Control Protocol (TCP) Parameters"
+ registry. Initial values for the MPTCP option subtype registry are
+ given below; future assignments are to be defined by Standards Action
+ as defined by [25]. Assignments consist of the MPTCP subtype's
+ symbolic name and its associated value, as per the following table.
+
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 55]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ +-------+--------------+----------------------------+---------------+
+ | Value | Symbol | Name | Reference |
+ +-------+--------------+----------------------------+---------------+
+ | 0x0 | MP_CAPABLE | Multipath Capable | Section 3.1 |
+ | 0x1 | MP_JOIN | Join Connection | Section 3.2 |
+ | 0x2 | DSS | Data Sequence Signal (Data | Section 3.3 |
+ | | | ACK and data sequence | |
+ | | | mapping) | |
+ | 0x3 | ADD_ADDR | Add Address | Section 3.4.1 |
+ | 0x4 | REMOVE_ADDR | Remove Address | Section 3.4.2 |
+ | 0x5 | MP_PRIO | Change Subflow Priority | Section 3.3.8 |
+ | 0x6 | MP_FAIL | Fallback | Section 3.6 |
+ | 0x7 | MP_FASTCLOSE | Fast Close | Section 3.5 |
+ +-------+--------------+----------------------------+---------------+
+
+ Table 2: MPTCP Option Subtypes
+
+ Values 0x8 through 0xe are currently unassigned. The value 0xf is
+ reserved for Private Use within controlled testbeds.
+
+ IANA has created another sub-registry, "MPTCP Handshake Algorithms"
+ under the "Transmission Control Protocol (TCP) Parameters" registry,
+ based on the flags in MP_CAPABLE (Section 3.1). The flags consist of
+ 8 bits, labeled "A" through "H", and this document assigns the bits
+ as follows:
+
+ +----------+-------------------+-----------------------+
+ | Flag Bit | Meaning | Reference |
+ +----------+-------------------+-----------------------+
+ | A | Checksum required | RFC 6824, Section 3.1 |
+ | B | Extensibility | RFC 6824, Section 3.1 |
+ | C-G | Unassigned | |
+ | H | HMAC-SHA1 | RFC 6824, Section 3.2 |
+ +----------+-------------------+-----------------------+
+
+ Table 3: MPTCP Handshake Algorithms
+
+ Note that the meanings of bits C through H can be dependent upon bit
+ B, depending on how Extensibility is defined in future
+ specifications; see Section 3.1 for more information.
+
+ Future assignments in this registry are also to be defined by
+ Standards Action as defined by [25]. Assignments consist of the
+ value of the flags, a symbolic name for the algorithm, and a
+ reference to its specification.
+
+
+
+
+
+
+Ford, et al. Experimental [Page 56]
+
+RFC 6824 Multipath TCP January 2013
+
+
+9. References
+
+9.1. Normative References
+
+ [1] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
+ September 1981.
+
+ [2] Ford, A., Raiciu, C., Handley, M., Barre, S., and J. Iyengar,
+ "Architectural Guidelines for Multipath TCP Development",
+ RFC 6182, March 2011.
+
+ [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [4] National Institute of Science and Technology, "Secure Hash
+ Standard", Federal Information Processing Standard
+ (FIPS) 180-3, October 2008, <http://csrc.nist.gov/publications/
+ fips/fips180-3/fips180-3_final.pdf>.
+
+9.2. Informative References
+
+ [5] Raiciu, C., Handley, M., and D. Wischik, "Coupled Congestion
+ Control for Multipath Transport Protocols", RFC 6356,
+ October 2011.
+
+ [6] Scharf, M. and A. Ford, "MPTCP Application Interface
+ Considerations", Work in Progress, October 2012.
+
+ [7] Hopps, C., "Analysis of an Equal-Cost Multi-Path Algorithm",
+ RFC 2992, November 2000.
+
+ [8] Raiciu, C., Paasch, C., Barre, S., Ford, A., Honda, M.,
+ Duchene, F., Bonaventure, O., and M. Handley, "How Hard Can It
+ Be? Designing and Implementing a Deployable Multipath TCP",
+ Usenix Symposium on Networked Systems Design and
+ Implementation 012, 2012, <https://www.usenix.org/conference/
+ nsdi12/how-hard-can-it-be-designing-and-implementing-
+ deployable-multipath-tcp>.
+
+ [9] Bagnulo, M., "Threat Analysis for TCP Extensions for Multipath
+ Operation with Multiple Addresses", RFC 6181, March 2011.
+
+ [10] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-Hashing
+ for Message Authentication", RFC 2104, February 1997.
+
+ [11] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
+ Selective Acknowledgment Options", RFC 2018, October 1996.
+
+
+
+
+Ford, et al. Experimental [Page 57]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ [12] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
+ Control", RFC 5681, September 2009.
+
+ [13] Gont, F., "Survey of Security Hardening Methods for
+ Transmission Control Protocol (TCP) Implementations", Work
+ in Progress, March 2012.
+
+ [14] Eastlake, D., Schiller, J., and S. Crocker, "Randomness
+ Requirements for Security", BCP 106, RFC 4086, June 2005.
+
+ [15] Eastlake, D. and T. Hansen, "US Secure Hash Algorithms (SHA and
+ SHA-based HMAC and HKDF)", RFC 6234, May 2011.
+
+ [16] Jacobson, V., Braden, B., and D. Borman, "TCP Extensions for
+ High Performance", RFC 1323, May 1992.
+
+ [17] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of
+ Explicit Congestion Notification (ECN) to IP", RFC 3168,
+ September 2001.
+
+ [18] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E.
+ Lear, "Address Allocation for Private Internets", BCP 5,
+ RFC 1918, February 1996.
+
+ [19] Braden, R., "Requirements for Internet Hosts - Communication
+ Layers", STD 3, RFC 1122, October 1989.
+
+ [20] Ramaiah, A., "TCP option space extension", Work in Progress,
+ March 2012.
+
+ [21] Srisuresh, P. and K. Egevang, "Traditional IP Network Address
+ Translator (Traditional NAT)", RFC 3022, January 2001.
+
+ [22] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
+ Shelby, "Performance Enhancing Proxies Intended to Mitigate
+ Link-Related Degradations", RFC 3135, June 2001.
+
+ [23] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion
+ Detection: Evasion, Traffic Normalization, and End-to-End
+ Protocol Semantics", Usenix Security 2001, 2001,
+ <http://www.usenix.org/events/sec01/full_papers/
+ handley/handley.pdf>.
+
+ [24] Freed, N., "Behavior of and Requirements for Internet
+ Firewalls", RFC 2979, October 2000.
+
+ [25] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
+ Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.
+
+
+
+Ford, et al. Experimental [Page 58]
+
+RFC 6824 Multipath TCP January 2013
+
+
+Appendix A. Notes on Use of TCP Options
+
+ The TCP option space is limited due to the length of the Data Offset
+ field in the TCP header (4 bits), which defines the TCP header length
+ in 32-bit words. With the standard TCP header being 20 bytes, this
+ leaves a maximum of 40 bytes for options, and many of these may
+ already be used by options such as timestamp and SACK.
+
+ We have performed a brief study on the commonly used TCP options in
+ SYN, data, and pure ACK packets, and found that there is enough room
+ to fit all the options we propose using in this document.
+
+ SYN packets typically include Maximum Segment Size (MSS) (4 bytes),
+ window scale (3 bytes), SACK permitted (2 bytes), and timestamp (10
+ bytes) options. Together these sum to 19 bytes. Some operating
+ systems appear to pad each option up to a word boundary, thus using
+ 24 bytes (a brief survey suggests Windows XP and Mac OS X do this,
+ whereas Linux does not). Optimistically, therefore, we have 21 bytes
+ spare, or 16 if it has to be word-aligned. In either case, however,
+ the SYN versions of Multipath Capable (12 bytes) and Join (12 or 16
+ bytes) options will fit in this remaining space.
+
+ TCP data packets typically carry timestamp options in every packet,
+ taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28,
+ if word-aligned). The Data Sequence Signal (DSS) option varies in
+ length depending on whether the data sequence mapping and DATA_ACK
+ are included, and whether the sequence numbers in use are 4 or 8
+ octets. The maximum size of the DSS option is 28 bytes, so even that
+ will fit in the available space. But unless a connection is both
+ bidirectional and high-bandwidth, it is unlikely that all that option
+ space will be required on each DSS option.
+
+ Within the DSS option, it is not necessary to include the data
+ sequence mapping and DATA_ACK in each packet, and in many cases it
+ may be possible to alternate their presence (so long as the mapping
+ covers the data being sent in the following packet). It would also
+ be possible to alternate between 4- and 8-byte sequence numbers in
+ each option.
+
+ On subflow and connection setup, an MPTCP option is also set on the
+ third packet (an ACK). These are 20 bytes (for Multipath Capable)
+ and 24 bytes (for Join), both of which will fit in the available
+ option space.
+
+ Pure ACKs in TCP typically contain only timestamps (10 bytes). Here,
+ Multipath TCP typically needs to encode only the DATA_ACK (maximum of
+ 12 bytes). Occasionally, ACKs will contain SACK information.
+ Depending on the number of lost packets, SACK may utilize the entire
+
+
+
+Ford, et al. Experimental [Page 59]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ option space. If a DATA_ACK had to be included, then it is probably
+ necessary to reduce the number of SACK blocks to accommodate the
+ DATA_ACK. However, the presence of the DATA_ACK is unlikely to be
+ necessary in a case where SACK is in use, since until at least some
+ of the SACK blocks have been retransmitted, the cumulative data-level
+ ACK will not be moving forward (or if it does, due to retransmissions
+ on another path, then that path can also be used to transmit the new
+ DATA_ACK).
+
+ The ADD_ADDR option can be between 8 and 22 bytes, depending on
+ whether IPv4 or IPv6 is used, and whether or not the port number is
+ present. It is unlikely that such signaling would fit in a data
+ packet (although if there is space, it is fine to include it). It is
+ recommended to use duplicate ACKs with no other payload or options in
+ order to transmit these rare signals. Note this is the reason for
+ mandating that duplicate ACKs with MPTCP options are not taken as a
+ signal of congestion.
+
+ Finally, there are issues with reliable delivery of options. As
+ options can also be sent on pure ACKs, these are not reliably sent.
+ This is not an issue for DATA_ACK due to their cumulative nature, but
+ may be an issue for ADD_ADDR/REMOVE_ADDR options. Here, it is
+ recommended to send these options redundantly (whether on multiple
+ paths or on the same path on a number of ACKs -- but interspersed
+ with data in order to avoid interpretation as congestion). The cases
+ where options are stripped by middleboxes are discussed in Section 6.
+
+Appendix B. Control Blocks
+
+ Conceptually, an MPTCP connection can be represented as an MPTCP
+ control block that contains several variables that track the progress
+ and the state of the MPTCP connection and a set of linked TCP control
+ blocks that correspond to the subflows that have been established.
+
+ RFC 793 [1] specifies several state variables. Whenever possible, we
+ reuse the same terminology as RFC 793 to describe the state variables
+ that are maintained by MPTCP.
+
+B.1. MPTCP Control Block
+
+ The MPTCP control block contains the following variable per
+ connection.
+
+B.1.1. Authentication and Metadata
+
+ Local.Token (32 bits): This is the token chosen by the local host on
+ this MPTCP connection. The token MUST be unique among all
+ established MPTCP connections, generated from the local key.
+
+
+
+Ford, et al. Experimental [Page 60]
+
+RFC 6824 Multipath TCP January 2013
+
+
+ Local.Key (64 bits): This is the key sent by the local host on this
+ MPTCP connection.
+
+ Remote.Token (32 bits): This is the token chosen by the remote host
+ on this MPTCP connection, generated from the remote key.
+
+ Remote.Key (64 bits): This is the key chosen by the remote host on
+ this MPTCP connection
+
+ MPTCP.Checksum (flag): This flag is set to true if at least one of
+ the hosts has set the C bit in the MP_CAPABLE options exchanged
+ during connection establishment, and is set to false otherwise.
+ If this flag is set, the checksum must be computed in all DSS
+ options.
+
+B.1.2. Sending Side
+
+ SND.UNA (64 bits): This is the data sequence number of the next byte
+ to be acknowledged, at the MPTCP connection level. This variable
+ is updated upon reception of a DSS option containing a DATA_ACK.
+
+ SND.NXT (64 bits): This is the data sequence number of the next byte
+ to be sent. SND.NXT is used to determine the value of the DSN in
+ the DSS option.
+
+ SND.WND (32 bits with RFC 1323, 16 bits otherwise): This is the
+ sending window. MPTCP maintains the sending window at the MPTCP
+ connection level and the same window is shared by all subflows.
+ All subflows use the MPTCP connection level SND.WND to compute the
+ SEQ.WND value that is sent in each transmitted segment.
+
+B.1.3. Receiving Side
+
+ RCV.NXT (64 bits): This is the data sequence number of the next byte
+ that is expected on the MPTCP connection. This state variable is
+ modified upon reception of in-order data. The value of RCV.NXT is
+ used to specify the DATA_ACK that is sent in the DSS option on all
+ subflows.
+
+ RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the
+ connection-level receive window, which is the maximum of the
+ RCV.WND on all the subflows.
+
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 61]
+
+RFC 6824 Multipath TCP January 2013
+
+
+B.2. TCP Control Blocks
+
+ The MPTCP control block also contains a list of the TCP control
+ blocks that are associated to the MPTCP connection.
+
+ Note that the TCP control block on the TCP subflows does not contain
+ the RCV.WND and SND.WND state variables as these are maintained at
+ the MPTCP connection level and not at the subflow level.
+
+ Inside each TCP control block, the following state variables are
+ defined.
+
+B.2.1. Sending Side
+
+ SND.UNA (32 bits): This is the sequence number of the next byte to
+ be acknowledged on the subflow. This variable is updated upon
+ reception of each TCP acknowledgment on the subflow.
+
+ SND.NXT (32 bits): This is the sequence number of the next byte to
+ be sent on the subflow. SND.NXT is used to set the value of
+ SEG.SEQ upon transmission of the next segment.
+
+B.2.2. Receiving Side
+
+ RCV.NXT (32 bits): This is the sequence number of the next byte that
+ is expected on the subflow. This state variable is modified upon
+ reception of in-order segments. The value of RCV.NXT is copied to
+ the SEG.ACK field of the next segments transmitted on the subflow.
+
+ RCV.WND (32 bits with RFC 1323, 16 bits otherwise): This is the
+ subflow-level receive window that is updated with the window field
+ from the segments received on this subflow.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 62]
+
+RFC 6824 Multipath TCP January 2013
+
+
+Appendix C. Finite State Machine
+
+ The diagram in Figure 17 shows the Finite State Machine for
+ connection-level closure. This illustrates how the DATA_FIN
+ connection-level signal (indicated as the DFIN flag on a DATA_ACK)
+ interacts with subflow-level FINs, and permits "break-before-make"
+ handover between subflows.
+
+ +---------+
+ | M_ESTAB |
+ +---------+
+ M_CLOSE | | rcv DATA_FIN
+ ------- | | -------
+ +---------+ snd DATA_FIN / \ snd DATA_ACK[DFIN] +---------+
+ | M_FIN |<----------------- ------------------->| M_CLOSE |
+ | WAIT-1 |--------------------------- | WAIT |
+ +---------+ rcv DATA_FIN \ +---------+
+ | rcv DATA_ACK[DFIN] ------- | M_CLOSE |
+ | -------------- snd DATA_ACK | ------- |
+ | CLOSE all subflows | snd DATA_FIN |
+ V V V
+ +-----------+ +-----------+ +-----------+
+ |M_FINWAIT-2| | M_CLOSING | | M_LAST-ACK|
+ +-----------+ +-----------+ +-----------+
+ | rcv DATA_ACK[DFIN] | rcv DATA_ACK[DFIN] |
+ | rcv DATA_FIN -------------- | -------------- |
+ | ------- CLOSE all subflows | CLOSE all subflows |
+ | snd DATA_ACK[DFIN] V delete MPTCP PCB V
+ \ +-----------+ +---------+
+ ------------------------>|M_TIME WAIT|----------------->| M_CLOSED|
+ +-----------+ +---------+
+ All subflows in CLOSED
+ ------------
+ delete MPTCP PCB
+
+ Figure 17: Finite State Machine for Connection Closure
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 63]
+
+RFC 6824 Multipath TCP January 2013
+
+
+Authors' Addresses
+
+ Alan Ford
+ Cisco
+ Ruscombe Business Park
+ Ruscombe, Berkshire RG10 9NN
+ UK
+
+ EMail: alanford@cisco.com
+
+
+ Costin Raiciu
+ University Politehnica of Bucharest
+ Splaiul Independentei 313
+ Bucharest
+ Romania
+
+ EMail: costin.raiciu@cs.pub.ro
+
+
+ Mark Handley
+ University College London
+ Gower Street
+ London WC1E 6BT
+ UK
+
+ EMail: m.handley@cs.ucl.ac.uk
+
+
+ Olivier Bonaventure
+ Universite catholique de Louvain
+ Pl. Ste Barbe, 2
+ Louvain-la-Neuve 1348
+ Belgium
+
+ EMail: olivier.bonaventure@uclouvain.be
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Ford, et al. Experimental [Page 64]
+