1 files changed, 4147 insertions, 0 deletions
diff --git a/doc/rfc/rfc5044.txt b/doc/rfc/rfc5044.txt
new file mode 100644
index 0000000..075c8d5
--- /dev/null
+++ b/doc/rfc/rfc5044.txt
@@ -0,0 +1,4147 @@
+
+
+
+
+
+
+Network Working Group                                          P. Culley
+Request for Comments: 5044                       Hewlett-Packard Company
+Category: Standards Track                                       U. Elzur
+                                                    Broadcom Corporation
+                                                                R. Recio
+                                                         IBM Corporation
+                                                               S. Bailey
+                                                   Sandburst Corporation
+                                                              J. Carrier
+                                                               Cray Inc.
+                                                            October 2007
+
+
+            Marker PDU Aligned Framing for TCP Specification
+
+Status of This Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Abstract
+
+   Marker PDU Aligned Framing (MPA) is designed to work as an
+   "adaptation layer" between TCP and the Direct Data Placement protocol
+   (DDP) as described in RFC 5041.  It preserves the reliable, in-order
+   delivery of TCP, while adding the preservation of higher-level
+   protocol record boundaries that DDP requires.  MPA is fully compliant
+   with applicable TCP RFCs and can be utilized with existing TCP
+   implementations.  MPA also supports integrated implementations that
+   combine TCP, MPA and DDP to reduce buffering requirements in the
+   implementation and improve performance at the system level.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                     [Page 1]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+Table of Contents
+
+   1. Introduction ....................................................4
+      1.1. Motivation .................................................4
+      1.2. Protocol Overview ..........................................5
+   2. Glossary ........................................................8
+   3. MPA's Interactions with DDP ....................................11
+   4. MPA Full Operation Phase .......................................13
+      4.1. FPDU Format ...............................................13
+      4.2. Marker Format .............................................14
+      4.3. MPA Markers ...............................................14
+      4.4. CRC Calculation ...........................................16
+      4.5. FPDU Size Considerations ..................................21
+   5. MPA's interactions with TCP ....................................22
+      5.1. MPA transmitters with a standard layered TCP ..............22
+      5.2. MPA receivers with a standard layered TCP .................23
+   6. MPA Receiver FPDU Identification ...............................24
+   7. Connection Semantics ...........................................24
+      7.1. Connection Setup ..........................................24
+           7.1.1. MPA Request and Reply Frame Format .................26
+           7.1.2. Connection Startup Rules ...........................28
+           7.1.3. Example Delayed Startup Sequence ...................30
+           7.1.4. Use of Private Data ................................33
+                  7.1.4.1. Motivation ................................33
+                  7.1.4.2. Example Immediate Startup Using
+                           Private Data ..............................35
+           7.1.5. "Dual Stack" Implementations .......................37
+      7.2. Normal Connection Teardown ................................38
+   8. Error Semantics ................................................39
+   9. Security Considerations ........................................40
+      9.1. Protocol-Specific Security Considerations .................40
+           9.1.1. Spoofing ...........................................40
+                  9.1.1.1. Impersonation .............................41
+                  9.1.1.2. Stream Hijacking ..........................41
+                  9.1.1.3. Man-in-the-Middle Attack ..................41
+           9.1.2. Eavesdropping ......................................42
+      9.2. Introduction to Security Options ..........................42
+      9.3. Using IPsec with MPA ......................................43
+      9.4. Requirements for IPsec Encapsulation of MPA/DDP ...........43
+   10. IANA Considerations ...........................................44
+   Appendix A. Optimized MPA-Aware TCP Implementations ...............45
+      A.1. Optimized MPA/TCP Transmitters ............................46
+      A.2. Effects of Optimized MPA/TCP Segmentation .................46
+      A.3. Optimized MPA/TCP Receivers ...............................48
+      A.4. Re-segmenting Middleboxes and Non-Optimized MPA/TCP
+           Senders ...................................................49
+      A.5. Receiver Implementation ...................................50
+           A.5.1. Network Layer Reassembly Buffers ...................51
+
+
+
+Culley, et al.              Standards Track                     [Page 2]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+           A.5.2. TCP Reassembly Buffers .............................52
+   Appendix B. Analysis of MPA over TCP Operations ...................52
+      B.1. Assumptions ...............................................53
+           B.1.1. MPA Is Layered beneath DDP .........................53
+           B.1.2. MPA Preserves DDP Message Framing ..................53
+           B.1.3. The Size of the ULPDU Passed to MPA Is Less Than
+                  EMSS Under Normal Conditions .......................53
+           B.1.4. Out-of-Order Placement but NO Out-of-Order Delivery.54
+     B.2.  The Value of FPDU Alignment ...............................54
+           B.2.1. Impact of Lack of FPDU Alignment on the Receiver
+                  Computational Load and Complexity ..................56
+           B.2.2. FPDU Alignment Effects on TCP Wire Protocol ........60
+   Appendix C. IETF Implementation Interoperability with RDMA
+               Consortium Protocols ..................................62
+     C.1. Negotiated Parameters ......................................63
+     C.2. RDMAC RNIC and Non-Permissive IETF RNIC ....................64
+          C.2.1. RDMAC RNIC Initiator ................................65
+          C.2.2. Non-Permissive IETF RNIC Initiator ..................65
+          C.2.3. RDMAC RNIC and Permissive IETF RNIC .................65
+          C.2.4. RDMAC RNIC Initiator ................................66
+          C.2.5. Permissive IETF RNIC Initiator ......................67
+     C.3. Non-Permissive IETF RNIC and Permissive IETF RNIC ..........67
+   Normative References ..............................................68
+   Informative References ............................................68
+   Contributors ......................................................70
+
+Table of Figures
+
+   Figure 1: ULP MPA TCP Layering .....................................5
+   Figure 2: FPDU Format .............................................13
+   Figure 3: Marker Format ...........................................14
+   Figure 4: Example FPDU Format with Marker .........................16
+   Figure 5: Annotated Hex Dump of an FPDU ...........................19
+   Figure 6: Annotated Hex Dump of an FPDU with Marker ...............20
+   Figure 7: Fully Layered Implementation ............................22
+   Figure 8: MPA Request/Reply Frame .................................26
+   Figure 9: Example Delayed Startup Negotiation .....................31
+   Figure 10: Example Immediate Startup Negotiation ..................35
+   Figure 11: Optimized MPA/TCP Implementation .......................45
+   Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream .....56
+   Figure 13: Aligned FPDU Placed Immediately after TCP Header .......58
+   Figure 14: Connection Parameters for the RNIC Types ...............63
+   Figure 15: MPA Negotiation between an RDMAC RNIC and a
+              Non-Permissive IETF RNIC ...............................65
+   Figure 16: MPA Negotiation between an RDMAC RNIC and a Permissive
+              IETF RNIC ..............................................66
+   Figure 17: MPA Negotiation between a Non-Permissive IETF RNIC and
+              a Permissive IETF RNIC .................................67
+
+
+
+Culley, et al.              Standards Track                     [Page 3]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+1.  Introduction
+
+   This section discusses the reason for creating MPA on TCP and a
+   general overview of the protocol.
+
+1.1.  Motivation
+
+   The Direct Data Placement protocol [DDP], when used with TCP
+   [RFC793], requires a mechanism to detect record boundaries.  The DDP
+   records are referred to as Upper Layer Protocol Data Units by this
+   document.  The ability to locate the Upper Layer Protocol Data Unit
+   (ULPDU) boundary is useful to a hardware network adapter that uses
+   DDP to directly place the data in the application buffer based on the
+   control information carried in the ULPDU header.  This may be done
+   without requiring that the packets arrive in order.  Potential
+   benefits of this capability are the avoidance of the memory copy
+   overhead and a smaller memory requirement for handling out-of-order
+   or dropped packets.
+
+   Many approaches have been proposed for a generalized framing
+   mechanism.  Some are probabilistic in nature and others are
+   deterministic.  An example probabilistic approach is characterized by
+   a detectable value embedded in the octet stream, with no method of
+   preventing that value elsewhere within user data.  It is
+   probabilistic because under some conditions the receiver may
+   incorrectly interpret application data as the detectable value.
+   Under these conditions, the protocol may fail with unacceptable
+   frequency.  One deterministic approach is characterized by embedded
+   controls at known locations in the octet stream.  Because the
+   receiver can guarantee it will only examine the data stream at
+   locations that are known to contain the embedded control, the
+   protocol can never misinterpret application data as being embedded
+   control data.  For unambiguous handling of an out-of-order packet, a
+   deterministic approach is preferred.
+
+   The MPA protocol provides a framing mechanism for DDP running over
+   TCP using the deterministic approach.  It allows the location of the
+   ULPDU to be determined in the TCP stream even if the TCP segments
+   arrive out of order.
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                     [Page 4]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+1.2.  Protocol Overview
+
+   The layering of PDUs with MPA is shown in Figure 1, below.
+
+               +------------------+
+               |     ULP client   |
+               +------------------+  <- Consumer messages
+               |        DDP       |
+               +------------------+  <- ULPDUs
+               |        MPA*      |
+               +------------------+  <- FPDUs (containing ULPDUs)
+               |        TCP*      |
+               +------------------+  <- TCP Segments (containing FPDUs)
+               |      IP etc.     |
+               +------------------+
+                * These may be fully layered or optimized together.
+
+                       Figure 1: ULP MPA TCP Layering
+
+   MPA is described as an extra layer above TCP and below DDP.  The
+   operation sequence is:
+
+   1.  A TCP connection is established by ULP action.  This is done
+       using methods not described by this specification.  The ULP may
+       exchange some amount of data in streaming mode prior to starting
+       MPA, but is not required to do so.
+
+   2.  The Consumer negotiates the use of DDP and MPA at both ends of a
+       connection.  The mechanisms to do this are not described in this
+       specification.  The negotiation may be done in streaming mode, or
+       by some other mechanism (such as a pre-arranged port number).
+
+   3.  The ULP activates MPA on each end in the Startup Phase, either as
+       an Initiator or a Responder, as determined by the ULP.  This mode
+       verifies the usage of MPA, specifies the use of CRC and Markers,
+       and allows the ULP to communicate some additional data via a
+       Private Data exchange.  See Section 7.1, Connection Setup, for
+       more details on the startup process.
+
+   4.  At the end of the Startup Phase, the ULP puts MPA (and DDP) into
+       Full Operation and begins sending DDP data as further described
+       below.  In this document, DDP data chunks are called ULPDUs.  For
+       a description of the DDP data, see [DDP].
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                     [Page 5]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Following is a description of data transfer when MPA is in Full
+   Operation.
+
+   1.  DDP determines the Maximum ULPDU (MULPDU) size by querying MPA
+       for this value.  MPA derives this information from TCP or IP,
+       when it is available, or chooses a reasonable value.
+
+   2.  DDP creates ULPDUs of MULPDU size or smaller, and hands them to
+       MPA at the sender.
+
+   3.  MPA creates a Framed Protocol Data Unit (FPDU) by prepending a
+       header, optionally inserting Markers, and appending a CRC field
+       after the ULPDU and PAD (if any).  MPA delivers the FPDU to TCP.
+
+   4.  The TCP sender puts the FPDUs into the TCP stream.  If the sender
+       is optimized MPA/TCP, it segments the TCP stream in such a way
+       that a TCP Segment boundary is also the boundary of an FPDU.  TCP
+       then passes each segment to the IP layer for transmission.
+
+   5.  The receiver may or may not be optimized.  If it is optimized
+       MPA/TCP, it may separate passing the TCP payload to MPA from
+       passing the TCP payload ordering information to MPA.  In either
+       case, RFC-compliant TCP wire behavior is observed at both the
+       sender and receiver.
+
+   6.  The MPA receiver locates and assembles complete FPDUs within the
+       stream, verifies their integrity, and removes MPA Markers (when
+       present), ULPDU_Length, PAD, and the CRC field.
+
+   7.  MPA then provides the complete ULPDUs to DDP.  MPA may also
+       separate passing MPA payload to DDP from passing the MPA payload
+       ordering information.
+
+   A fully layered MPA on TCP is implemented as a data stream ULP for
+   TCP and is therefore RFC compliant.
+
+   An optimized DDP/MPA/TCP uses a TCP layer that potentially contains
+   some additional behaviors as suggested in this document.  When
+   DDP/MPA/TCP are cross-layer optimized, the behavior of TCP
+   (especially sender segmentation) may change from that of the un-
+   optimized implementation, but the changes are within the bounds
+   permitted by the TCP RFC specifications, and will interoperate with
+   an un-optimized TCP.  The additional behaviors are described in
+   Appendix A and are not normative; they are described at a TCP
+   interface layer as a convenience.  Implementations may achieve the
+   described functionality using any method, including cross-layer
+   optimizations between TCP, MPA, and DDP.
+
+
+
+
+Culley, et al.              Standards Track                     [Page 6]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   An optimized DDP/MPA/TCP sender is able to segment the data stream
+   such that TCP segments begin with FPDUs (FPDU Alignment).  This has
+   significant advantages for receivers.  When segments arrive with
+   aligned FPDUs, the receiver usually need not buffer any portion of
+   the segment, allowing DDP to place it in its destination memory
+   immediately, thus avoiding copies from intermediate buffers (DDP's
+   reason for existence).
+
+   An optimized DDP/MPA/TCP receiver allows a DDP on MPA implementation
+   to locate the start of ULPDUs that may be received out of order.  It
+   also allows the implementation to determine if the entire ULPDU has
+   been received.  As a result, MPA can pass out-of-order ULPDUs to DDP
+   for immediate use.  This enables a DDP on MPA implementation to save
+   a significant amount of intermediate storage by placing the ULPDUs in
+   the right locations in the application buffers when they arrive,
+   rather than waiting until full ordering can be restored.
+
+   The ability of a receiver to recover out-of-order ULPDUs is optional
+   and declared to the transmitter during startup.  When the receiver
+   declares that it does not support out-of-order recovery, the
+   transmitter does not add the control information to the data stream
+   needed for out-of-order recovery.
+
+   If the receiver is fully layered, then MPA receives a strictly
+   ordered stream of data and does not deal with out-of-order ULPDUs.
+   In this case, MPA passes each ULPDU to DDP when the last bytes arrive
+   from TCP, along with the indication that they are in order.
+
+   MPA implementations that support recovery of out-of-order ULPDUs MUST
+   support a mechanism to indicate the ordering of ULPDUs as the sender
+   transmitted them and indicate when missing intermediate segments
+   arrive.  These mechanisms allow DDP to reestablish record ordering
+   and report Delivery of complete messages (groups of records).
+
+   MPA also addresses enhanced data integrity.  Some users of TCP have
+   noted that the TCP checksum is not as strong as could be desired (see
+   [CRCTCP]).  Studies such as [CRCTCP] have shown that the TCP checksum
+   indicates segments in error at a much higher rate than the underlying
+   link characteristics would indicate.  With these higher error rates,
+   the chance that an error will escape detection, when using only the
+   TCP checksum for data integrity, becomes a concern.  A stronger
+   integrity check can reduce the chance of data errors being missed.
+
+   MPA includes a CRC check to increase the ULPDU data integrity to the
+   level provided by other modern protocols, such as SCTP [RFC4960].  It
+   is possible to disable this CRC check; however, CRCs MUST be enabled
+   unless it is clear that the end-to-end connection through the network
+   has data integrity at least as good as an MPA with CRC enabled (for
+
+
+
+Culley, et al.              Standards Track                     [Page 7]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   example, when IPsec is implemented end to end).  DDP's ULP expects
+   this level of data integrity and therefore the ULP does not have to
+   provide its own duplicate data integrity and error recovery for lost
+   data.
+
+2.  Glossary
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in [RFC2119].
+
+   Consumer - the ULPs or applications that lie above MPA and DDP.  The
+       Consumer is responsible for making TCP connections, starting MPA
+       and DDP connections, and generally controlling operations.
+
+   CRC - Cyclic Redundancy Check.
+
+   Delivery - (Delivered, Delivers) - For MPA, Delivery is defined as
+       the process of informing DDP that a particular PDU is ordered for
+       use.  A PDU is Delivered in the exact order that it was sent by
+       the original sender; MPA uses TCP's byte stream ordering to
+       determine when Delivery is possible.  This is specifically
+       different from "passing the PDU to DDP", which may generally
+       occur in any order, while the order of Delivery is strictly
+       defined.
+
+   EMSS - Effective Maximum Segment Size.  EMSS is the smaller of the
+       TCP maximum segment size (MSS) as defined in RFC 793 [RFC793],
+       and the current path Maximum Transmission Unit (MTU) [RFC1191].
+
+   FPDU - Framed Protocol Data Unit.  The unit of data created by an MPA
+       sender.
+
+   FPDU Alignment - The property that an FPDU is Header Aligned with the
+       TCP segment, and the TCP segment includes an integer number of
+       FPDUs.  A TCP segment with an FPDU Alignment allows immediate
+       processing of the contained FPDUs without waiting on other TCP
+       segments to arrive or combining with prior segments.
+
+   FPDU Pointer (FPDUPTR) - This field of the Marker is used to indicate
+       the beginning of an FPDU.
+
+   Full Operation (Full Operation Phase) - After the completion of the
+       Startup Phase, MPA begins exchanging FPDUs.
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                     [Page 8]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Header Alignment - The property that a TCP segment begins with an
+       FPDU.  The FPDU is Header Aligned when the FPDU header is exactly
+       at the start of the TCP segment (right behind the TCP headers on
+       the wire).
+
+   Initiator - The endpoint of a connection that sends the MPA Request
+       Frame, i.e., the first to actually send data (which may not be
+       the one that sends the TCP SYN).
+
+   Marker - A four-octet field that is placed in the MPA data stream at
+       fixed octet intervals (every 512 octets).
+
+   MPA-aware TCP - A TCP implementation that is aware of the receiver
+       efficiencies of MPA FPDU Alignment and is capable of sending TCP
+       segments that begin with an FPDU.
+
+   MPA-enabled - MPA is enabled if the MPA protocol is visible on the
+       wire.  When the sender is MPA-enabled, it is inserting framing
+       and Markers.  When the receiver is MPA-enabled, it is
+       interpreting framing and Markers.
+
+   MPA Request Frame - Data sent from the MPA Initiator to the MPA
+       Responder during the Startup Phase.
+
+   MPA Reply Frame - Data sent from the MPA Responder to the MPA
+       Initiator during the Startup Phase.
+
+   MPA - Marker-based ULP PDU Aligned Framing for TCP protocol.  This
+       document defines the MPA protocol.
+
+   MULPDU - Maximum ULPDU.  The current maximum size of the record that
+       is acceptable for DDP to pass to MPA for transmission.
+
+   Node - A computing device attached to one or more links of a network.
+       A Node in this context does not refer to a specific application
+       or protocol instantiation running on the computer.  A Node may
+       consist of one or more MPA on TCP devices installed in a host
+       computer.
+
+   PAD - A 1-3 octet group of zeros used to fill an FPDU to an exact
+       modulo 4 size.
+
+   PDU - Protocol data unit
+
+   Private Data - A block of data exchanged between MPA endpoints during
+       initial connection setup.
+
+
+
+
+
+Culley, et al.              Standards Track                     [Page 9]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Protection Domain - An RDMA concept (see [VERBS-RDMA] and [RDMASEC])
+       that ties use of various endpoint resources (memory access, etc.)
+       to the specific RDMA/DDP/MPA connection.
+
+   RDDP - A suite of protocols including MPA, [DDP], [RDMAP], an overall
+       security document [RDMASEC], a problem statement [RFC4297], an
+       architecture document [RFC4296], and an applicability document
+       [APPL].
+
+   RDMA - Remote Direct Memory Access; a protocol that uses DDP and MPA
+       to enable applications to transfer data directly from memory
+       buffers.  See [RDMAP].
+
+   Remote Peer - The MPA protocol implementation on the opposite end of
+       the connection.  Used to refer to the remote entity when
+       describing protocol exchanges or other interactions between two
+       Nodes.
+
+   Responder - The connection endpoint that responds to an incoming MPA
+       connection request (the MAP Request Frame).  This may not be the
+       endpoint that awaited the TCP SYN.
+
+   Startup Phase - The initial exchanges of an MPA connection that
+       serves to more fully identify MPA endpoints to each other and
+       pass connection specific setup information to each other.
+
+   ULP - Upper Layer Protocol.  The protocol layer above the protocol
+       layer currently being referenced.  The ULP for MPA is DDP [DDP].
+
+   ULPDU - Upper Layer Protocol Data Unit.  The data record defined by
+       the layer above MPA (DDP).  ULPDU corresponds to DDP's DDP
+       segment.
+
+   ULPDU_Length - A field in the FPDU describing the length of the
+       included ULPDU.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 10]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+3.  MPA's Interactions with DDP
+
+   DDP requires MPA to maintain DDP record boundaries from the sender to
+   the receiver.  When using MPA on TCP to send data, DDP provides
+   records (ULPDUs) to MPA.  MPA will use the reliable transmission
+   abilities of TCP to transmit the data, and will insert appropriate
+   additional information into the TCP stream to allow the MPA receiver
+   to locate the record boundary information.
+
+   As such, MPA accepts complete records (ULPDUs) from DDP at the sender
+   and returns them to DDP at the receiver.
+
+   MPA MUST encapsulate the ULPDU such that there is exactly one ULPDU
+   contained in one FPDU.
+
+   MPA over a standard TCP stack can usually provide FPDU Alignment with
+   the TCP Header if the FPDU is equal to TCP's EMSS.  An optimized
+   MPA/TCP stack can also maintain alignment as long as the FPDU is less
+   than or equal to TCP's EMSS.  Since FPDU Alignment is generally
+   desired by the receiver, DDP cooperates with MPA to ensure FPDUs'
+   lengths do not exceed the EMSS under normal conditions.  This is done
+   with the MULPDU mechanism.
+
+   MPA MUST provide information to DDP on the current maximum size of
+   the record that is acceptable to send (MULPDU).  DDP SHOULD limit
+   each record size to MULPDU.  The range of MULPDU values MUST be
+   between 128 octets and 64768 octets, inclusive.
+
+   The sending DDP MUST NOT post a ULPDU larger than 64768 octets to
+   MPA.  DDP MAY post a ULPDU of any size between one and 64768 octets;
+   however, MPA is not REQUIRED to support a ULPDU Length that is
+   greater than the current MULPDU.
+
+   While the maximum theoretical length supported by the MPA header
+   ULPDU_Length field is 65535, TCP over IP requires the IP datagram
+   maximum length to be 65535 octets.  To enable MPA to support FPDU
+   Alignment, the maximum size of the FPDU must fit within an IP
+   datagram.  Thus, the ULPDU limit of 64768 octets was derived by
+   taking the maximum IP datagram length, subtracting from it the
+   maximum total length of the sum of the IPv4 header, TCP header, IPv4
+   options, TCP options, and the worst-case MPA overhead, and then
+   rounding the result down to a 128-octet boundary.
+
+   Note that MULPDU will be significantly smaller than the theoretical
+   maximum in most implementations for most circumstances, due to link
+   MTUs, use of extra headers such as required for IPsec, etc.
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 11]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   On receive, MPA MUST pass each ULPDU with its length to DDP when it
+   has been validated.
+
+   If an MPA implementation supports passing out-of-order ULPDUs to DDP,
+   the MPA implementation SHOULD:
+
+   *   Pass each ULPDU with its length to DDP as soon as it has been
+       fully received and validated.
+
+   *   Provide a mechanism to indicate the ordering of ULPDUs as the
+       sender transmitted them.  One possible mechanism might be
+       providing the TCP sequence number for each ULPDU.
+
+   *   Provide a mechanism to indicate when a given ULPDU (and prior
+       ULPDUs) are complete (Delivered to DDP).  One possible mechanism
+       might be to allow DDP to see the current outgoing TCP ACK
+       sequence number.
+
+   *   Provide an indication to DDP that the TCP has closed or has begun
+       to close the connection (e.g., received a FIN).
+
+   MPA MUST provide the protocol version negotiated with its peer to
+   DDP.  DDP will use this version to set the version in its header and
+   to report the version to [RDMAP].
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 12]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+4.  MPA Full Operation Phase
+
+   The following sections describe the main semantics of the Full
+   Operation Phase of MPA.
+
+4.1.  FPDU Format
+
+   MPA senders create FPDUs out of ULPDUs.  The format of an FPDU shown
+   below MUST be used for all MPA FPDUs.  For purposes of clarity,
+   Markers are not shown in Figure 2.
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |          ULPDU_Length         |                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
+      |                                                               |
+      ~                                                               ~
+      ~                            ULPDU                              ~
+      |                                                               |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               |          PAD (0-3 octets)     |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                             CRC                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+                           Figure 2: FPDU Format
+
+   ULPDU_Length: 16 bits (unsigned integer).  This is the number of
+   octets of the contained ULPDU.  It does not include the length of the
+   FPDU header itself, the pad, the CRC, or of any Markers that fall
+   within the ULPDU.  The 16-bit ULPDU Length field is large enough to
+   support the largest IP datagrams for IPv4 or IPv6.
+
+   PAD: The PAD field trails the ULPDU and contains between 0 and 3
+   octets of data.  The pad data MUST be set to zero by the sender and
+   ignored by the receiver (except for CRC checking).  The length of the
+   pad is set so as to make the size of the FPDU an integral multiple of
+   four.
+
+   CRC: 32 bits.  When CRCs are enabled, this field contains a CRC32c
+   check value, which is used to verify the entire contents of the FPDU,
+   using CRC32c.  See Section 4.4, CRC Calculation.  When CRCs are not
+   enabled, this field is still present, may contain any value, and MUST
+   NOT be checked.
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 13]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   The FPDU adds a minimum of 6 octets to the length of the ULPDU.  In
+   addition, the total length of the FPDU will include the length of any
+   Markers and from 0 to 3 pad octets added to round-up the ULPDU size.
+
+4.2.  Marker Format
+
+   The format of a Marker MUST be as specified in Figure 3:
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |           RESERVED            |            FPDUPTR            |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+                          Figure 3: Marker Format
+
+   RESERVED: The Reserved field MUST be set to zero on transmit and
+   ignored on receive (except for CRC calculation).
+
+   FPDUPTR: The FPDU Pointer is a relative pointer, 16 bits long,
+   interpreted as an unsigned integer that indicates the number of
+   octets in the TCP stream from the beginning of the ULPDU Length field
+   to the first octet of the entire Marker.  The least significant two
+   bits MUST always be set to zero at the transmitter, and the receivers
+   MUST always treat these as zero for calculations.
+
+4.3.  MPA Markers
+
+   MPA Markers are used to identify the start of FPDUs when packets are
+   received out of order.  This is done by locating the Markers at fixed
+   intervals in the data stream (which is correlated to the TCP sequence
+   number) and using the Marker value to locate the preceding FPDU
+   start.
+
+   All MPA Markers are included in the containing FPDU CRC calculation
+   (when both CRCs and Markers are in use).
+
+   The MPA receiver's ability to locate out-of-order FPDUs and pass the
+   ULPDUs to DDP is implementation dependent.  MPA/DDP allows those
+   receivers that are able to deal with out-of-order FPDUs in this way
+   to require the insertion of Markers in the data stream.  When the
+   receiver cannot deal with out-of-order FPDUs in this way, it may
+   disable the insertion of Markers at the sender.  All MPA senders MUST
+   be able to generate Markers when their use is declared by the
+   opposing receiver (see Section 7.1, Connection Setup).
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 14]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   When Markers are enabled, MPA senders MUST insert a Marker into the
+   data stream at a 512-octet periodic interval in the TCP Sequence
+   Number Space.  The Marker contains a 16-bit unsigned integer referred
+   to as the FPDUPTR (FPDU Pointer).
+
+   If the FPDUPTR's value is non-zero, the FPDU Pointer is a 16-bit
+   relative back-pointer.  FPDUPTR MUST contain the number of octets in
+   the TCP stream from the beginning of the ULPDU Length field to the
+   first octet of the Marker, unless the Marker falls between FPDUs.
+   Thus, the location of the first octet of the previous FPDU header can
+   be determined by subtracting the value of the given Marker from the
+   current octet-stream sequence number (i.e., TCP sequence number) of
+   the first octet of the Marker.  Note that this computation MUST take
+   into account that the TCP sequence number could have wrapped between
+   the Marker and the header.
+
+   An FPDUPTR value of 0x0000 is a special case -- it is used when the
+   Marker falls exactly between FPDUs (between the preceding FPDU CRC
+   field and the next FPDU's ULPDU Length field).  In this case, the
+   Marker is considered to be contained in the following FPDU; the
+   Marker MUST be included in the CRC calculation of the FPDU following
+   the Marker (if CRCs are being generated or checked).  Thus, an
+   FPDUPTR value of 0x0000 means that immediately following the Marker
+   is an FPDU header (the ULPDU Length field).
+
+   Since all FPDUs are integral multiples of 4 octets, the bottom two
+   bits of the FPDUPTR as calculated by the sender are zero.  MPA
+   reserves these bits so they MUST be treated as zero for computation
+   at the receiver.
+
+   When Markers are enabled (see Section 7.1, Connection Setup), the MPA
+   Markers MUST be inserted immediately preceding the first FPDU of Full
+   Operation Phase, and at every 512th octet of the TCP octet stream
+   thereafter.  As a result, the first Marker has an FPDUPTR value of
+   0x0000.  If the first Marker begins at octet sequence number
+   SeqStart, then Markers are inserted such that the first octet of the
+   Marker is at octet sequence number SeqNum if the remainder of (SeqNum
+   - SeqStart) mod 512 is zero.  Note that SeqNum can wrap.
+
+   For example, if the TCP sequence number were used to calculate the
+   insertion point of the Marker, the starting TCP sequence number is
+   unlikely to be zero, and 512-octet multiples are unlikely to fall on
+   a modulo 512 of zero.  If the MPA connection is started at TCP
+   sequence number 11, then the 1st Marker will begin at 11, and
+   subsequent Markers will begin at 523, 1035, etc.
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 15]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   If an FPDU is large enough to contain multiple Markers, they MUST all
+   point to the same point in the TCP stream: the first octet of the
+   ULPDU Length field for the FPDU.
+
+   If a Marker interval contains multiple FPDUs (the FPDUs are small),
+   the Marker MUST point to the start of the ULPDU Length field for the
+   FPDU containing the Marker unless the Marker falls between FPDUs, in
+   which case the Marker MUST be zero.
+
+   The following example shows an FPDU containing a Marker.
+
+   0                   1                   2                   3
+   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |       ULPDU Length (0x0010)   |                               |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
+   |                                                               |
+   +                                                               +
+   |                         ULPDU (octets 0-9)                    |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |            (0x0000)           |        FPDU ptr (0x000C)      |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |                        ULPDU (octets 10-15)                   |
+   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |                               |          PAD (2 octets:0,0)   |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   |                              CRC                              |
+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+              Figure 4: Example FPDU Format with Marker
+
+   MPA Receivers MUST preserve ULPDU boundaries when passing data to
+   DDP.  MPA Receivers MUST pass the ULPDU data and the ULPDU Length to
+   DDP and not the Markers, headers, and CRC.
+
+4.4.  CRC Calculation
+
+   An MPA implementation MUST implement CRC support and MUST either:
+
+   (1)  always use CRCs; the MPA provider is not REQUIRED to support an
+        administrator's request that CRCs not be used.
+
+        or
+
+   (2a) only indicate a preference not to use CRCs on the explicit
+        request of the system administrator, via an interface not
+        defined in this spec.  The default configuration for a
+        connection MUST be to use CRCs.
+
+
+
+Culley, et al.              Standards Track                    [Page 16]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   (2b) disable CRC checking (and possibly generation) if both the local
+        and remote endpoints indicate preference not to use CRCs.
+
+   An administrative decision to have a host request CRC suppression
+   SHOULD NOT be made unless there is assurance that the TCP connection
+   involved provides protection from undetected errors that is at least
+   as strong as an end-to-end CRC32c.  End-to-end usage of an IPsec
+   cryptographic integrity check is among the ways to provide such
+   protection, and the use of channel bindings [NFSv4CHANNEL] by the ULP
+   can provide a high level of assurance that the IPsec protection scope
+   is end-to-end with respect to the ULP.
+
+   The process MUST be invisible to the ULP.
+
+   After receipt of an MPA startup declaration indicating that its peer
+   requires CRCs, an MPA instance MUST continue generating and checking
+   CRCs until the connection terminates.  If an MPA instance has
+   declared that it does not require CRCs, it MUST turn off CRC checking
+   immediately after receipt of an MPA mode declaration indicating that
+   its peer also does not require CRCs.  It MAY continue generating
+   CRCs.  See Section 7.1, Connection Setup, for details on the MPA
+   startup.
+
+   When sending an FPDU, the sender MUST include a CRC field.  When CRCs
+   are enabled, the CRC field in the MPA FPDU MUST be computed using the
+   CRC32c polynomial in the manner described in the iSCSI Protocol
+   [iSCSI] document for Header and Data Digests.
+
+   The fields which MUST be included in the CRC calculation when sending
+   an FPDU are as follows:
+
+   1)  If a Marker does not immediately precede the ULPDU Length field,
+       the CRC-32c is calculated from the first octet of the ULPDU
+       Length field, through all the ULPDU and Markers (if present), to
+       the last octet of the PAD (if present), inclusive.  If there is a
+       Marker immediately following the PAD, the Marker is included in
+       the CRC calculation for this FPDU.
+
+   2)  If a Marker immediately precedes the first octet of the ULPDU
+       Length field of the FPDU, (i.e., the Marker fell between FPDUs,
+       and thus is required to be included in the second FPDU), the
+       CRC-32c is calculated from the first octet of the Marker, through
+       the ULPDU Length header, through all the ULPDU and Markers (if
+       present), to the last octet of the PAD (if present), inclusive.
+
+   3)  After calculating the CRC-32c, the resultant value is placed into
+       the CRC field at the end of the FPDU.
+
+
+
+
+Culley, et al.              Standards Track                    [Page 17]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   When an FPDU is received, and CRC checking is enabled, the receiver
+   MUST first perform the following:
+
+   1)  Calculate the CRC of the incoming FPDU in the same fashion as
+       defined above.
+
+   2)  Verify that the calculated CRC-32c value is the same as the
+       received CRC-32c value found in the FPDU CRC field.  If not, the
+       receiver MUST treat the FPDU as an invalid FPDU.
+
+   The procedure for handling invalid FPDUs is covered in Section 8,
+   Error Semantics.
+
+   The following is an annotated hex dump of an example FPDU sent as the
+   first FPDU on the stream.  As such, it starts with a Marker.  The
+   FPDU contains a 42 octet ULPDU (an example DDP segment) which in turn
+   contains 24 octets of the contained ULPDU, which is a data load that
+   is all zeros.  The CRC32c has been correctly calculated and can be
+   used as a reference.  See the [DDP] and [RDMAP] specification for
+   definitions of the DDP Control field, Queue, MSN, MO, and Send Data.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 18]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+       Octet Contents  Annotation
+       Count
+
+       0000    00      Marker: Reserved
+       0001    00
+       0002    00      Marker: FPDUPTR
+       0003    00
+       0004    00      ULPDU Length
+       0005    2a
+       0006    41      DDP Control Field, Send with Last flag set
+       0007    43
+       0008    00      Reserved (DDP STag position with no STag)
+       0009    00
+       000a    00
+       000b    00
+       000c    00      DDP Queue = 0
+       000d    00
+       000e    00
+       000f    00
+       0010    00      DDP MSN = 1
+       0011    00
+       0012    00
+       0013    01
+       0014    00      DDP MO = 0
+       0015    00
+       0016    00
+       0017    00
+       0018    00      DDP Send Data (24 octets of zeros)
+       ...
+       002f    00
+       0030    52      CRC32c
+       0031    23
+       0032    99
+       0033    83
+
+                  Figure 5: Annotated Hex Dump of an FPDU
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 19]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+      The following is an example sent as the second FPDU of the stream
+      where the first FPDU (which is not shown here) had a length of 492
+      octets and was also a Send to Queue 0 with Last Flag set.  This
+      example contains a Marker.
+
+       Octet Contents  Annotation
+       Count
+
+       01ec    00      Length
+       01ed    2a
+       01ee    41      DDP Control Field: Send with Last Flag set
+       01ef    43
+       01f0    00      Reserved (DDP STag position with no STag)
+       01f1    00
+       01f2    00
+       01f3    00
+       01f4    00      DDP Queue = 0
+       01f5    00
+       01f6    00
+       01f7    00
+       01f8    00      DDP MSN = 2
+       01f9    00
+       01fa    00
+       01fb    02
+       01fc    00      DDP MO = 0
+       01fd    00
+       01fe    00
+       01ff    00
+       0200    00      Marker: Reserved
+       0201    00
+       0202    00      Marker: FPDUPTR
+       0203    14
+       0204    00      DDP Send Data (24 octets of zeros)
+       ...
+       021b    00
+       021c    84      CRC32c
+       021d    92
+       021e    58
+       021f    98
+
+            Figure 6: Annotated Hex Dump of an FPDU with Marker
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 20]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+4.5.  FPDU Size Considerations
+
+   MPA defines the Maximum Upper Layer Protocol Data Unit (MULPDU) as
+   the size of the largest ULPDU fitting in an FPDU.  For an empty TCP
+   Segment, MULPDU is EMSS minus the FPDU overhead (6 octets) minus
+   space for Markers and pad octets.
+
+       The maximum ULPDU Length for a single ULPDU when Markers are
+       present MUST be computed as:
+
+       MULPDU = EMSS - (6 + 4 * Ceiling(EMSS / 512) + EMSS mod 4)
+
+   The formula above accounts for the worst-case number of Markers.
+
+       The maximum ULPDU Length for a single ULPDU when Markers are NOT
+       present MUST be computed as:
+
+       MULPDU = EMSS - (6 + EMSS mod 4)
+
+   As a further optimization of the wire efficiency an MPA
+   implementation MAY dynamically adjust the MULPDU (see Section 5 for
+   latency and wire efficiency trade-offs).  When one or more FPDUs are
+   already packed into a TCP Segment, MULPDU MAY be reduced accordingly.
+
+   DDP SHOULD provide ULPDUs that are as large as possible, but less
+   than or equal to MULPDU.
+
+   If the TCP implementation needs to adjust EMSS to support MTU changes
+   or changing TCP options, the MULPDU value is changed accordingly.
+
+   In certain rare situations, the EMSS may shrink below 128 octets in
+   size.  If this occurs, the MPA on TCP sender MUST NOT shrink the
+   MULPDU below 128 octets and is not required to follow the
+   segmentation rules in Section 5.1 and Appendix A.
+
+   If one or more FPDUs are already packed into a TCP segment, such that
+   the remaining room is less than 128 octets, MPA MUST NOT provide a
+   MULPDU smaller than 128.  In this case, MPA would typically provide a
+   MULPDU for the next full sized segment, but may still pack the next
+   FPDU into the small remaining room, provide that the next FPDU is
+   small enough to fit.
+
+   The value 128 is chosen as to allow DDP designers room for the DDP
+   Header and some user data.
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 21]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+5.  MPA's interactions with TCP
+
+   The following sections describe MPA's interactions with TCP.  This
+   section discusses using a standard layered TCP stack with MPA
+   attached above a TCP socket.  Discussion of using an optimized MPA-
+   aware TCP with an MPA implementation that takes advantage of the
+   extra optimizations is done in Appendix A.
+
+                   +-----------------------------------+
+                   | +-----+       +-----------------+ |
+                   | | MPA |       | Other Protocols | |
+                   | +-----+       +-----------------+ |
+                   |    ||                  ||         |
+                   |  ----- socket API --------------  |
+                   |            ||                     |
+                   |         +-----+                   |
+                   |         | TCP |                   |
+                   |         +-----+                   |
+                   |            ||                     |
+                   |         +-----+                   |
+                   |         | IP  |                   |
+                   |         +-----+                   |
+                   +-----------------------------------+
+
+                   Figure 7: Fully Layered Implementation
+
+   The Fully layered implementation is described for completeness;
+   however, the user is cautioned that the reduced probability of FPDU
+   alignment when transmitting with this implementation will tend to
+   introduce a higher overhead at optimized receivers.  In addition, the
+   lack of out-of-order receive processing will significantly reduce the
+   value of DDP/MPA by imposing higher buffering and copying overhead in
+   the local receiver.
+
+5.1.  MPA transmitters with a standard layered TCP
+
+   MPA transmitters SHOULD calculate a MULPDU as described in Section
+   4.5.  If the TCP implementation allows EMSS to be determined by MPA,
+   that value should be used.  If the transmit side TCP implementation
+   is not able to report the EMSS, MPA SHOULD use the current MTU value
+   to establish a likely FPDU size, taking into account the various
+   expected header sizes.
+
+   MPA transmitters SHOULD also use whatever facilities the TCP stack
+   presents to cause the TCP transmitter to start TCP segments at FPDU
+   boundaries.  Multiple FPDUs MAY be packed into a single TCP segment
+   as determined by the EMSS calculation as long as they are entirely
+   contained in the TCP segment.
+
+
+
+Culley, et al.              Standards Track                    [Page 22]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   For example, passing FPDU buffers sized to the current EMSS to the
+   TCP socket and using the TCP_NODELAY socket option to disable the
+   Nagle [RFC896] algorithm will usually result in many of the segments
+   starting with an FPDU.
+
+   It is recognized that various effects can cause an FPDU Alignment to
+   be lost.  Following are a few of the effects:
+
+   *   ULPDUs that are smaller than the MULPDU.  If these are sent in a
+       continuous stream, FPDU Alignment will be lost.  Note that
+       careful use of a dynamic MULPDU can help in this case; the MULPDU
+       for future FPDUs can be adjusted to re-establish alignment with
+       the segments based on the current EMSS.
+
+   *   Sending enough data that the TCP receive window limit is reached.
+       TCP may send a smaller segment to exactly fill the receive
+       window.
+
+   *   Sending data when TCP is operating up against the congestion
+       window.  If TCP is not tracking the congestion window in
+       segments, it may transmit a smaller segment to exactly fill the
+       receive window.
+
+   *   Changes in EMSS due to varying TCP options, or changes in MTU.
+
+   If FPDU Alignment with TCP segments is lost for any reason, the
+   alignment is regained after a break in transmission where the TCP
+   send buffers are emptied.  Many usage models for DDP/MPA will include
+   such breaks.
+
+   MPA receivers are REQUIRED to be able to operate correctly even if
+   alignment is lost (see Section 6).
+
+5.2.  MPA receivers with a standard layered TCP
+
+   MPA receivers will get TCP data in the usual ordered stream.  The
+   receivers MUST identify FPDU boundaries by using the ULPDU_LENGTH
+   field, as described in Section 6.  Receivers MAY utilize markers to
+   check for FPDU boundary consistency, but they are NOT required to
+   examine the markers to determine the FPDU boundaries.
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 23]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+6.  MPA Receiver FPDU Identification
+
+   An MPA receiver MUST first verify the FPDU before passing the ULPDU
+   to DDP.  To do this, the receiver MUST:
+
+   *   locate the start of the FPDU unambiguously,
+
+   *   verify its CRC (if CRC checking is enabled).
+
+   If the above conditions are true, the MPA receiver passes the ULPDU
+   to DDP.
+
+   To detect the start of the FPDU unambiguously one of the following
+   MUST be used:
+
+   1:  In an ordered TCP stream, the ULPDU Length field in the current
+       FPDU when FPDU has a valid CRC, can be used to identify the
+       beginning of the next FPDU.
+
+   2:  For optimized MPA/TCP receivers that support out-of-order
+       reception of FPDUs (see Section 4.3, MPA Markers) a Marker can
+       always be used to locate the beginning of an FPDU (in FPDUs with
+       valid CRCs).  Since the location of the Marker is known in the
+       octet stream (sequence number space), the Marker can always be
+       found.
+
+   3:  Having found an FPDU by means of a Marker, an optimized MPA/TCP
+       receiver can find following contiguous FPDUs by using the ULPDU
+       Length fields (from FPDUs with valid CRCs) to establish the next
+       FPDU boundary.
+
+   The ULPDU Length field (see Section 4) MUST be used to determine if
+   the entire FPDU is present before forwarding the ULPDU to DDP.
+
+   CRC calculation is discussed in Section 4.4 above.
+
+7.  Connection Semantics
+
+7.1.  Connection Setup
+
+   MPA requires that the Consumer MUST activate MPA, and any TCP
+   enhancements for MPA, on a TCP half connection at the same location
+   in the octet stream at both the sender and the receiver.  This is
+   required in order for the Marker scheme to correctly locate the
+   Markers (if enabled) and to correctly locate the first FPDU.
+
+   MPA, and any TCP enhancements for MPA are enabled by the ULP in both
+   directions at once at an endpoint.
+
+
+
+Culley, et al.              Standards Track                    [Page 24]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   This can be accomplished several ways, and is left up to DDP's ULP:
+
+   *   DDP's ULP MAY require DDP on MPA startup immediately after TCP
+       connection setup.  This has the advantage that no streaming mode
+       negotiation is needed.  An example of such a protocol is shown in
+       Figure 10: Example Immediate Startup negotiation.
+
+       This may be accomplished by using a well-known port, or a service
+       locator protocol to locate an appropriate port on which DDP on
+       MPA is expected to operate.
+
+   *   DDP's ULP MAY negotiate the start of DDP on MPA sometime after a
+       normal TCP startup, using TCP streaming data exchanges on the
+       same connection.  The exchange establishes that DDP on MPA (as
+       well as other ULPs) will be used, and exactly locates the point
+       in the octet stream where MPA is to begin operation.  Note that
+       such a negotiation protocol is outside the scope of this
+       specification.  A simplified example of such a protocol is shown
+       in Figure 9: Example Delayed Startup negotiation on page 33.
+
+   An MPA endpoint operates in two distinct phases.
+
+   The Startup Phase is used to verify correct MPA setup, exchange CRC
+   and Marker configuration, and optionally pass Private Data between
+   endpoints prior to completing a DDP connection.  During this phase,
+   specifically formatted frames are exchanged as TCP byte streams
+   without using CRCs or Markers.  During this phase a DDP endpoint need
+   not be "bound" to the MPA connection.  In fact, the choice of DDP
+   endpoint and its operating parameters may not be known until the
+   Consumer supplied Private Data (if any) has been examined by the
+   Consumer.
+
+   The second distinct phase is Full Operation during which FPDUs are
+   sent using all the rules that pertain (CRCs, Markers, MULPDU
+   restrictions, etc.).  A DDP endpoint MUST be "bound" to the MPA
+   connection at entry to this phase.
+
+   When Private Data is passed between ULPs in the Startup Phase, the
+   ULP is responsible for interpreting that data, and then placing MPA
+   into Full Operation.
+
+   Note: The following text differentiates the two endpoints by calling
+       them Initiator and Responder.  This is quite arbitrary and is NOT
+       related to the TCP startup (SYN, SYN/ACK sequence).  The
+       Initiator is the side that sends first in the MPA startup
+       sequence (the MPA Request Frame).
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 25]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Note: The possibility that both endpoints would be allowed to make a
+       connection at the same time, sometimes called an active/active
+       connection, was considered by the work group and rejected.  There
+       were several motivations for this decision.  One was that
+       applications needing this facility were few (none other than
+       theoretical at the time of this document).  Another was that the
+       facility created some implementation difficulties, particularly
+       with the "dual stack" designs described later on.  A last issue
+       was that dealing with rejected connections at startup would have
+       required at least an additional frame type, and more recovery
+       actions, complicating the protocol.  While none of these issues
+       was overwhelming, the group and implementers were not motivated
+       to do the work to resolve these issues.  The protocol includes a
+       method of detecting these active/active startup attempts so that
+       they can be rejected and an error reported.
+
+   The ULP is responsible for determining which side is Initiator or
+   Responder.  For client/server type ULPs, this is easy.  For peer-peer
+   ULPs (which might utilize a TCP style active/active startup), some
+   mechanism (not defined by this specification) must be established, or
+   some streaming mode data exchanged prior to MPA startup to determine
+   which side starts in Initiator and which starts in Responder MPA
+   mode.
+
+7.1.1  MPA Request and Reply Frame Format
+
+       0                   1                   2                   3
+       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   0  |                                                               |
+      +         Key (16 bytes containing "MPA ID Req Frame")          +
+   4  |      (4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65)        |
+      +         Or  (16 bytes containing "MPA ID Rep Frame")          +
+   8  |      (4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65)        |
+      +                                                               +
+   12 |                                                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+   16 |M|C|R| Res     |     Rev       |          PD_Length            |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                                                               |
+      ~                                                               ~
+      ~                   Private Data                                ~
+      |                                                               |
+      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+      |                               |
+      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+                     Figure 8: MPA Request/Reply Frame
+
+
+
+Culley, et al.              Standards Track                    [Page 26]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Key: This field contains the "key" used to validate that the sender
+       is an MPA sender.  Initiator mode senders MUST set this field to
+       the fixed value "MPA ID Req Frame" or (in byte order) 4D 50 41 20
+       49 44 20 52 65 71 20 46 72 61 6D 65 (in hexadecimal).  Responder
+       mode receivers MUST check this field for the same value, and
+       close the connection and report an error locally if any other
+       value is detected.  Responder mode senders MUST set this field to
+       the fixed value "MPA ID Rep Frame" or (in byte order) 4D 50 41 20
+       49 44 20 52 65 70 20 46 72 61 6D 65 (in hexadecimal).  Initiator
+       mode receivers MUST check this field for the same value, and
+       close the connection and report an error locally if any other
+       value is detected.
+
+   M: This bit declares an endpoint's REQUIRED Marker usage.  When this
+       bit is '1' in an MPA Request Frame, the Initiator declares that
+       Markers are REQUIRED in FPDUs sent from the Responder.  When set
+       to '1' in an MPA Reply Frame, this bit declares that Markers are
+       REQUIRED in FPDUs sent from the Initiator.  When in a received
+       MPA Request Frame or MPA Reply Frame and the value is '0',
+       Markers MUST NOT be added to the data stream by that endpoint.
+       When '1' Markers MUST be added as described in Section 4.3, MPA
+       Markers.
+
+   C: This bit declares an endpoint's preferred CRC usage.  When this
+       field is '0' in the MPA Request Frame and the MPA Reply Frame,
+       CRCs MUST not be checked and need not be generated by either
+       endpoint.  When this bit is '1' in either the MPA Request Frame
+       or MPA Reply Frame, CRCs MUST be generated and checked by both
+       endpoints.  Note that even when not in use, the CRC field remains
+       present in the FPDU.  When CRCs are not in use, the CRC field
+       MUST be considered valid for FPDU checking regardless of its
+       contents.
+
+   R: This bit is set to zero, and not checked on reception in the MPA
+       Request Frame.  In the MPA Reply Frame, this bit is the Rejected
+       Connection bit, set by the Responders ULP to indicate acceptance
+       '0', or rejection '1', of the connection parameters provided in
+       the Private Data.
+
+   Res: This field is reserved for future use.  It MUST be set to zero
+       when sending, and not checked on reception.
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 27]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Rev: This field contains the revision of MPA.  For this version of
+       the specification, senders MUST set this field to one.  MPA
+       receivers compliant with this version of the specification MUST
+       check this field.  If the MPA receiver cannot interoperate with
+       the received version, then it MUST close the connection and
+       report an error locally.  Otherwise, the MPA receiver should
+       report the received version to the ULP.
+
+   PD_Length: This field MUST contain the length in octets of the
+       Private Data field.  A value of zero indicates that there is no
+       Private Data field present at all.  If the receiver detects that
+       the PD_Length field does not match the length of the Private Data
+       field, or if the length of the Private Data field exceeds 512
+       octets, the receiver MUST close the connection and report an
+       error locally.  Otherwise, the MPA receiver should pass the
+       PD_Length value and Private Data to the ULP.
+
+   Private Data: This field may contain any value defined by ULPs or may
+       not be present.  The Private Data field MUST be between 0 and 512
+       octets in length.  ULPs define how to size, set, and validate
+       this field within these limits.  Private Data usage is further
+       discussed in Section 7.1.4.
+
+7.1.2.  Connection Startup Rules
+
+   The following rules apply to MPA connection Startup Phase:
+
+   1.  When MPA is started in the Initiator mode, the MPA implementation
+       MUST send a valid MPA Request Frame.  The MPA Request Frame MAY
+       include ULP-supplied Private Data.
+
+   2.  When MPA is started in the Responder mode, the MPA implementation
+       MUST wait until an MPA Request Frame is received and validated
+       before entering Full MPA/DDP Operation.
+
+       If the MPA Request Frame is improperly formatted, the
+       implementation MUST close the TCP connection and exit MPA.
+
+       If the MPA Request Frame is properly formatted but the Private
+       Data is not acceptable, the implementation SHOULD return an MPA
+       Reply Frame with the Rejected Connection bit set to '1'; the MPA
+       Reply Frame MAY include ULP-supplied Private Data; the
+       implementation MUST exit MPA, leaving the TCP connection open.
+       The ULP may close TCP or use the connection for other purposes.
+
+       If the MPA Request Frame is properly formatted and the Private
+       Data is acceptable, the implementation SHOULD return an MPA Reply
+       Frame with the Rejected Connection bit set to '0'; the MPA Reply
+
+
+
+Culley, et al.              Standards Track                    [Page 28]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+       Frame MAY include ULP-supplied Private Data; and the Responder
+       SHOULD prepare to interpret any data received as FPDUs and pass
+       any received ULPDUs to DDP.
+
+       Note: Since the receiver's ability to deal with Markers is
+           unknown until the Request and Reply Frames have been
+           received, sending FPDUs before this occurs is not possible.
+
+
+       Note: The requirement to wait on a Request Frame before sending a
+           Reply Frame is a design choice.  It makes for a well-ordered
+           sequence of events at each end, and avoids having to specify
+           how to deal with situations where both ends start at the same
+           time.
+
+   3.  MPA Initiator mode implementations MUST receive and validate an
+       MPA Reply Frame.
+
+       If the MPA Reply Frame is improperly formatted, the
+       implementation MUST close the TCP connection and exit MPA.
+
+       If the MPA Reply Frame is properly formatted but is the Private
+       Data is not acceptable, or if the Rejected Connection bit is set
+       to '1', the implementation MUST exit MPA, leaving the TCP
+       connection open.  The ULP may close TCP or use the connection for
+       other purposes.
+
+       If the MPA Reply Frame is properly formatted and the Private Data
+       is acceptable, and the Reject Connection bit is set to '0', the
+       implementation SHOULD enter Full MPA/DDP Operation Phase;
+       interpreting any received data as FPDUs and sending DDP ULPDUs as
+       FPDUs.
+
+   4.  MPA Responder mode implementations MUST receive and validate at
+       least one FPDU before sending any FPDUs or Markers.
+
+       Note: This requirement is present to allow the Initiator time to
+           get its receiver into Full Operation before an FPDU arrives,
+           avoiding potential race conditions at the Initiator.  This
+           was also subject to some debate in the work group before
+           rough consensus was reached.  Eliminating this requirement
+           would allow faster startup in some types of applications.
+           However, that would also make certain implementations
+           (particularly "dual stack") much harder.
+
+   5.  If a received "Key" does not match the expected value (see
+       Section 7.1.1, MPA Request and Reply Frame Format) the TCP/DDP
+       connection MUST be closed, and an error returned to the ULP.
+
+
+
+Culley, et al.              Standards Track                    [Page 29]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   6.  The received Private Data fields may be used by Consumers at
+       either end to further validate the connection and set up DDP or
+       other ULP parameters.  The Initiator ULP MAY close the
+       TCP/MPA/DDP connection as a result of validating the Private Data
+       fields.  The Responder SHOULD return an MPA Reply Frame with the
+       "Reject Connection" bit set to '1' if the validation of the
+       Private Data is not acceptable to the ULP.
+
+   7.  When the first FPDU is to be sent, then if Markers are enabled,
+       the first octets sent are the special Marker 0x00000000, followed
+       by the start of the FPDU (the FPDU's ULPDU Length field).  If
+       Markers are not enabled, the first octets sent are the start of
+       the FPDU (the FPDU's ULPDU Length field).
+
+   8.  MPA implementations MUST use the difference between the MPA
+       Request Frame and the MPA Reply Frame to check for incorrect
+       "Initiator/Initiator" startups.  Implementations SHOULD put a
+       timeout on waiting for the MPA Request Frame when started in
+       Responder mode, to detect incorrect "Responder/Responder"
+       startups.
+
+   9.  MPA implementations MUST validate the PD_Length field.  The
+       buffer that receives the Private Data field MUST be large enough
+       to receive that data; the amount of Private Data MUST not exceed
+       the PD_Length or the application buffer.  If any of the above
+       fails, the startup frame MUST be considered improperly formatted.
+
+   10. MPA implementations SHOULD implement a reasonable timeout while
+       waiting for the entire set of startup frames; this prevents
+       certain denial-of-service attacks.  ULPs SHOULD implement a
+       reasonable timeout while waiting for FPDUs, ULPDUs, and
+       application level messages to guard against application failures
+       and certain denial-of-service attacks.
+
+7.1.3.  Example Delayed Startup Sequence
+
+   A variety of startup sequences are possible when using MPA on TCP.
+   Following is an example of an MPA/DDP startup that occurs after TCP
+   has been running for a while and has exchanged some amount of
+   streaming data.  This example does not use any Private Data (an
+   example that does is shown later in Section 7.1.4.2, Example
+   Immediate Startup Using Private Data), although it is perfectly legal
+   to include the Private Data.  Note that since the example does not
+   use any Private Data, there are no ULP interactions shown between
+   receiving "startup frames" and putting MPA into Full Operation.
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 30]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+         Initiator                                 Responder
+
+  +---------------------------+
+  |ULP streaming mode         |
+  |  <Hello> request to       |
+  |  transition to DDP/MPA    |           +---------------------------+
+  |  mode (optional).         | --------> |ULP gets request;          |
+  +---------------------------+           |  enables MPA Responder    |
+                                          |  mode with last (optional)|
+                                          |  streaming mode           |
+                                          |  <Hello Ack> for MPA to   |
+                                          |  send.                    |
+  +---------------------------+           |MPA waits for incoming     |
+  |ULP receives streaming     | <-------- |  <MPA Request Frame>.     |
+  |  <Hello Ack>;             |           +---------------------------+
+  |Enters MPA Initiator mode; |
+  |MPA sends                  |
+  |  <MPA Request Frame>;     |
+  |MPA waits for incoming     |           +---------------------------+
+  |  <MPA Reply Frame>.       | - - - - > |MPA receives               |
+  +---------------------------+           |  <MPA Request Frame>.     |
+                                          |Consumer binds DDP to MPA; |
+                                          |MPA sends the              |
+                                          |  <MPA Reply Frame>.       |
+                                          |DDP/MPA enables FPDU       |
+  +---------------------------+           |  decoding, but does not   |
+  |MPA receives the           | < - - - - |  send any FPDUs.          |
+  |  <MPA Reply Frame>        |           +---------------------------+
+  |Consumer binds DDP to MPA; |
+  |DDP/MPA begins Full        |
+  |  Operation.               |
+  |MPA sends first FPDU (as   |           +---------------------------+
+  |  DDP ULPDUs become        | ========> |MPA receives first FPDU.   |
+  |  available).              |           |MPA sends first FPDU (as   |
+  +---------------------------+           |  DDP ULPDUs become        |
+                                  <====== |  available).              |
+                                          +---------------------------+
+
+              Figure 9: Example Delayed Startup Negotiation
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 31]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   An example Delayed Startup sequence is described below:
+
+       *   Active and passive sides start up a TCP connection in the
+           usual fashion, probably using sockets APIs.  They exchange
+           some amount of streaming mode data.  At some point, one side
+           (the MPA Initiator) sends streaming mode data that
+           effectively says "Hello, let's go into MPA/DDP mode".
+
+   *   When the remote side (the MPA Responder) gets this streaming mode
+       message, the Consumer would send a last streaming mode message
+       that effectively says "I acknowledge your Hello, and am now in
+       MPA Responder mode".  The exchange of these messages establishes
+       the exact point in the TCP stream where MPA is enabled.  The
+       Responding Consumer enables MPA in the Responder mode and waits
+       for the initial MPA startup message.
+
+       *   The Initiating Consumer would enable MPA startup in the
+           Initiator mode which then sends the MPA Request Frame.  It is
+           assumed that no Private Data messages are needed for this
+           example, although it is possible to do so.  The Initiating
+           MPA (and Consumer) would also wait for the MPA connection to
+           be accepted.
+
+   *   The Responding MPA would receive the initial MPA Request Frame
+       and would inform the Consumer that this message arrived.  The
+       Consumer can then accept the MPA/DDP connection or close the TCP
+       connection.
+
+   *   To accept the connection request, the Responding Consumer would
+       use an appropriate API to bind the TCP/MPA connections to a DDP
+       endpoint, thus enabling MPA/DDP into Full Operation.  In the
+       process of going to Full Operation, MPA sends the MPA Reply
+       Frame.  MPA/DDP waits for the first incoming FPDU before sending
+       any FPDUs.
+
+   *   If the initial TCP data was not a properly formatted MPA Request
+       Frame, MPA will close or reset the TCP connection immediately.
+
+       *   The Initiating MPA would receive the MPA Reply Frame and
+           would report this message to the Consumer.  The Consumer can
+           then accept the MPA/DDP connection, or close or reset the TCP
+           connection to abort the process.
+
+       *   On determining that the connection is acceptable, the
+           Initiating Consumer would use an appropriate API to bind the
+           TCP/MPA connections to a DDP endpoint thus enabling MPA/DDP
+           into Full Operation.  MPA/DDP would begin sending DDP
+           messages as MPA FPDUs.
+
+
+
+Culley, et al.              Standards Track                    [Page 32]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+7.1.4.  Use of Private Data
+
+   This section is advisory in nature, in that it suggests a method by
+   which a ULP can deal with pre-DDP connection information exchange.
+
+7.1.4.1.  Motivation
+
+   Prior RDMA protocols have been developed that provide Private Data
+   via out-of-band mechanisms.  As a result, many applications now
+   expect some form of Private Data to be available for application use
+   prior to setting up the DDP/RDMA connection.  Following are some
+   examples of the use of Private Data.
+
+   An RDMA endpoint (referred to as a Queue Pair, or QP, in InfiniBand
+   and the [VERBS-RDMA]) must be associated with a Protection Domain.
+   No receive operations may be posted to the endpoint before it is
+   associated with a Protection Domain.  Indeed under both the
+   InfiniBand and proposed RDMA/DDP verbs [VERBS-RDMA] an endpoint/QP is
+   created within a Protection Domain.
+
+   There are some applications where the choice of Protection Domain is
+   dependent upon the identity of the remote ULP client.  For example,
+   if a user session requires multiple connections, it is highly
+   desirable for all of those connections to use a single Protection
+   Domain.  Note: Use of Protection Domains is further discussed in
+   [RDMASEC].
+
+   InfiniBand, the DAT APIs [DAT-API], and the IT-API [IT-API] all
+   provide for the active-side ULP to provide Private Data when
+   requesting a connection.  This data is passed to the ULP to allow it
+   to determine whether to accept the connection, and if so with which
+   endpoint (and implicitly which Protection Domain).
+
+   The Private Data can also be used to ensure that both ends of the
+   connection have configured their RDMA endpoints compatibly on such
+   matters as the RDMA Read capacity (see [RDMAP]).  Further ULP-
+   specific uses are also presumed, such as establishing the identity of
+   the client.
+
+   Private Data is also allowed for when accepting the connection, to
+   allow completion of any negotiation on RDMA resources and for other
+   ULP reasons.
+
+   There are several potential ways to exchange this Private Data.  For
+   example, the InfiniBand specification includes a connection
+   management protocol that allows a small amount of Private Data to be
+   exchanged using datagrams before actually starting the RDMA
+   connection.
+
+
+
+Culley, et al.              Standards Track                    [Page 33]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   This document allows for small amounts of Private Data to be
+   exchanged as part of the MPA startup sequence.  The actual Private
+   Data fields are carried in the MPA Request Frame and the MPA Reply
+   Frame.
+
+   If larger amounts of Private Data or more negotiation is necessary,
+   TCP streaming mode messages may be exchanged prior to enabling MPA.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 34]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+7.1.4.2.  Example Immediate Startup Using Private Data
+
+          Initiator                                 Responder
+
+   +---------------------------+
+   |TCP SYN sent.              |           +--------------------------+
+   +---------------------------+ --------> |TCP gets SYN packet;      |
+   +---------------------------+           |  sends SYN-Ack.          |
+   |TCP gets SYN-Ack           | <-------- +--------------------------+
+   |  sends Ack.               |
+   +---------------------------+ --------> +--------------------------+
+   +---------------------------+           |Consumer enables MPA      |
+   |Consumer enables MPA       |           |Responder mode, waits for |
+   |Initiator mode with        |           |  <MPA Request frame>.    |
+   |Private Data; MPA sends    |           +--------------------------+
+   |  <MPA Request Frame>;     |
+   |MPA waits for incoming     |           +--------------------------+
+   |  <MPA Reply Frame>.       | - - - - > |MPA receives              |
+   +---------------------------+           |  <MPA Request Frame>.    |
+                                           |Consumer examines Private |
+                                           |Data, provides MPA with   |
+                                           |return Private Data,      |
+                                           |binds DDP to MPA, and     |
+                                           |enables MPA to send an    |
+                                           |  <MPA Reply Frame>.      |
+                                           |DDP/MPA enables FPDU      |
+   +---------------------------+           |decoding, but does not    |
+   |MPA receives the           | < - - - - |send any FPDUs.           |
+   |  <MPA Reply Frame>.       |           +--------------------------+
+   |Consumer examines Private  |
+   |Data, binds DDP to MPA,    |
+   |and enables DDP/MPA to     |
+   |begin Full Operation.      |
+   |MPA sends first FPDU (as   |           +--------------------------+
+   |DDP ULPDUs become          | ========> |MPA receives first FPDU.  |
+   |available).                |           |MPA sends first FPDU (as  |
+   +---------------------------+           |DDP ULPDUs become         |
+                                   <====== |available).               |
+                                           +--------------------------+
+
+             Figure 10: Example Immediate Startup Negotiation
+
+   Note: The exact order of when MPA is started in the TCP connection
+       sequence is implementation dependent; the above diagram shows one
+       possible sequence.  Also, the Initiator "Ack" to the Responder's
+       "SYN-Ack" may be combined into the same TCP segment containing
+       the MPA Request Frame (as is allowed by TCP RFCs).
+
+
+
+
+Culley, et al.              Standards Track                    [Page 35]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   The example immediate startup sequence is described below:
+
+   *   The passive side (Responding Consumer) would listen on the TCP
+       destination port, to indicate its readiness to accept a
+       connection.
+
+       *   The active side (Initiating Consumer) would request a
+           connection from a TCP endpoint (that expected to upgrade to
+           MPA/DDP/RDMA and expected the Private Data) to a destination
+           address and port.
+
+       *   The Initiating Consumer would initiate a TCP connection to
+           the destination port.  Acceptance/rejection of the connection
+           would proceed as per normal TCP connection establishment.
+
+   *   The passive side (Responding Consumer) would receive the TCP
+       connection request as usual allowing normal TCP gatekeepers, such
+       as INETD and TCPserver, to exercise their normal
+       safeguard/logging functions.  On acceptance of the TCP
+       connection, the Responding Consumer would enable MPA in the
+       Responder mode and wait for the initial MPA startup message.
+
+       *   The Initiating Consumer would enable MPA startup in the
+           Initiator mode to send an initial MPA Request Frame with its
+           included Private Data message to send.  The Initiating MPA
+           (and Consumer) would also wait for the MPA connection to be
+           accepted, and any returned Private Data.
+
+   *   The Responding MPA would receive the initial MPA Request Frame
+       with the Private Data message and would pass the Private Data
+       through to the Consumer.  The Consumer can then accept the
+       MPA/DDP connection, close the TCP connection, or reject the MPA
+       connection with a return message.
+
+   *   To accept the connection request, the Responding Consumer would
+       use an appropriate API to bind the TCP/MPA connections to a DDP
+       endpoint, thus enabling MPA/DDP into Full Operation.  In the
+       process of going to Full Operation, MPA sends the MPA Reply
+       Frame, which includes the Consumer-supplied Private Data
+       containing any appropriate Consumer response.  MPA/DDP waits for
+       the first incoming FPDU before sending any FPDUs.
+
+   *   If the initial TCP data was not a properly formatted MPA Request
+       Frame, MPA will close or reset the TCP connection immediately.
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 36]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   *   To reject the MPA connection request, the Responding Consumer
+       would send an MPA Reply Frame with any ULP-supplied Private Data
+       (with reason for rejection), with the "Rejected Connection" bit
+       set to '1', and may close the TCP connection.
+
+       *   The Initiating MPA would receive the MPA Reply Frame with the
+           Private Data message and would report this message to the
+           Consumer, including the supplied Private Data.
+
+           If the "Rejected Connection" bit is set to a '1', MPA will
+           close the TCP connection and exit.
+
+           If the "Rejected Connection" bit is set to a '0', and on
+           determining from the MPA Reply Frame Private Data that the
+           connection is acceptable, the Initiating Consumer would use
+           an appropriate API to bind the TCP/MPA connections to a DDP
+           endpoint thus enabling MPA/DDP into Full Operation.  MPA/DDP
+           would begin sending DDP messages as MPA FPDUs.
+
+7.1.5.  "Dual Stack" Implementations
+
+   MPA/DDP implementations are commonly expected to be implemented as
+   part of a "dual stack" architecture.  One stack is the traditional
+   TCP stack, usually with a sockets interface API (Application
+   Programming Interface).  The second stack is the MPA/DDP stack with
+   its own API, and potentially separate code or hardware to deal with
+   the MPA/DDP data.  Of course, implementations may vary, so the
+   following comments are of an advisory nature only.
+
+   The use of the two stacks offers advantages:
+
+       TCP connection setup is usually done with the TCP stack.  This
+       allows use of the usual naming and addressing mechanisms.  It
+       also means that any mechanisms used to "harden" the connection
+       setup against security threats are also used when starting
+       MPA/DDP.
+
+       Some applications may have been originally designed for TCP, but
+       are "enhanced" to utilize MPA/DDP after a negotiation reveals the
+       capability to do so.  The negotiation process takes place in
+       TCP's streaming mode, using the usual TCP APIs.
+
+       Some new applications, designed for RDMA or DDP, still need to
+       exchange some data prior to starting MPA/DDP.  This exchange can
+       be of arbitrary length or complexity, but often consists of only
+       a small amount of Private Data, perhaps only a single message.
+       Using the TCP streaming mode for this exchange allows this to be
+       done using well-understood methods.
+
+
+
+Culley, et al.              Standards Track                    [Page 37]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   The main disadvantage of using two stacks is the conversion of an
+   active TCP connection between them.  This process must be done with
+   care to prevent loss of data.
+
+   To avoid some of the problems when using a "dual stack" architecture,
+   the following additional restrictions may be required by the
+   implementation:
+
+   1.  Enabling the DDP/MPA stack SHOULD be done only when no incoming
+       stream data is expected.  This is typically managed by the ULP
+       protocol.  When following the recommended startup sequence, the
+       Responder side enters DDP/MPA mode, sends the last streaming mode
+       data, and then waits for the MPA Request Frame.  No additional
+       streaming mode data is expected.  The Initiator side ULP receives
+       the last streaming mode data, and then enters DDP/MPA mode.
+       Again, no additional streaming mode data is expected.
+
+   2.  The DDP/MPA MAY provide the ability to send a "last streaming
+       message" as part of its Responder DDP/MPA enable function.  This
+       allows the DDP/MPA stack to more easily manage the conversion to
+       DDP/MPA mode (and avoid problems with a very fast return of the
+       MPA Request Frame from the Initiator side).
+
+   Note: Regardless of the "stack" architecture used, TCP's rules MUST
+       be followed.  For example, if network data is lost, re-segmented,
+       or re-ordered, TCP MUST recover appropriately even when this
+       occurs while switching stacks.
+
+7.2.  Normal Connection Teardown
+
+   Each half connection of MPA terminates when DDP closes the
+   corresponding TCP half connection.
+
+   A mechanism SHOULD be provided by MPA to DDP for DDP to be made aware
+   that a graceful close of the TCP connection has been received by the
+   TCP (e.g., FIN is received).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 38]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+8.  Error Semantics
+
+   The following errors MUST be detected by MPA and the codes SHOULD be
+   provided to DDP or other Consumer:
+
+   Code Error
+
+   1   TCP connection closed, terminated, or lost.  This includes lost
+       by timeout, too many retries, RST received, or FIN received.
+
+   2   Received MPA CRC does not match the calculated value for the
+       FPDU.
+
+   3   In the event that the CRC is valid, received MPA Marker (if
+       enabled) and ULPDU Length fields do not agree on the start of an
+       FPDU.  If the FPDU start determined from previous ULPDU Length
+       fields does not match with the MPA Marker position, MPA SHOULD
+       deliver an error to DDP.  It may not be possible to make this
+       check as a segment arrives, but the check SHOULD be made when a
+       gap creating an out-of-order sequence is closed and any time a
+       Marker points to an already identified FPDU.  It is OPTIONAL for
+       a receiver to check each Marker, if multiple Markers are present
+       in an FPDU, or if the segment is received in order.
+
+   4   Invalid MPA Request Frame or MPA Response Frame received.  In
+       this case, the TCP connection MUST be immediately closed.  DDP
+       and other ULPs should treat this similar to code 1, above.
+
+   When conditions 2 or 3 above are detected, an optimized MPA/TCP
+   implementation MAY choose to silently drop the TCP segment rather
+   than reporting the error to DDP.  In this case, the sending TCP will
+   retry the segment, usually correcting the error, unless the problem
+   was at the source.  In that case, the source will usually exceed the
+   number of retries and terminate the connection.
+
+   Once MPA delivers an error of any type, it MUST NOT pass or deliver
+   any additional FPDUs on that half connection.
+
+   For Error codes 2 and 3, MPA MUST NOT close the TCP connection
+   following a reported error.  Closing the connection is the
+   responsibility of DDP's ULP.
+
+       Note that since MPA will not Deliver any FPDUs on a half
+       connection following an error detected on the receive side of
+       that connection, DDP's ULP is expected to tear down the
+       connection.  This may not occur until after one or more last
+       messages are transmitted on the opposite half connection.  This
+       allows a diagnostic error message to be sent.
+
+
+
+Culley, et al.              Standards Track                    [Page 39]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+9.  Security Considerations
+
+   This section discusses the security considerations for MPA.
+
+9.1.  Protocol-Specific Security Considerations
+
+   The vulnerabilities of MPA to third-party attacks are no greater than
+   any other protocol running over TCP.  A third party, by sending
+   packets into the network that are delivered to an MPA receiver, could
+   launch a variety of attacks that take advantage of how MPA operates.
+   For example, a third party could send random packets that are valid
+   for TCP, but contain no FPDU headers.  An MPA receiver reports an
+   error to DDP when any packet arrives that cannot be validated as an
+   FPDU when properly located on an FPDU boundary.  A third party could
+   also send packets that are valid for TCP, MPA, and DDP, but do not
+   target valid buffers.  These types of attacks ultimately result in
+   loss of connection and thus become a type of DOS (Denial Of Service)
+   attack.  Communication security mechanisms such as IPsec [RFC2401,
+   RFC4301] may be used to prevent such attacks.
+
+   Independent of how MPA operates, a third party could use ICMP
+   messages to reduce the path MTU to such a small size that performance
+   would likewise be severely impacted.  Range checking on path MTU
+   sizes in ICMP packets may be used to prevent such attacks.
+
+   [RDMAP] and [DDP] are used to control, read, and write data buffers
+   over IP networks.  Therefore, the control and the data packets of
+   these protocols are vulnerable to the spoofing, tampering, and
+   information disclosure attacks listed below.  In addition, connection
+   to/from an unauthorized or unauthenticated endpoint is a potential
+   problem with most applications using RDMA, DDP, and MPA.
+
+9.1.1.  Spoofing
+
+   Spoofing attacks can be launched by the Remote Peer or by a network
+   based attacker.  A network-based spoofing attack applies to all
+   Remote Peers.  Because the MPA Stream requires a TCP Stream in the
+   ESTABLISHED state, certain types of traditional forms of wire attacks
+   do not apply -- an end-to-end handshake must have occurred to
+   establish the MPA Stream.  So, the only form of spoofing that applies
+   is one when a remote node can both send and receive packets.  Yet
+   even with this limitation the Stream is still exposed to the
+   following spoofing attacks.
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 40]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+9.1.1.1.  Impersonation
+
+   A network-based attacker can impersonate a legal MPA/DDP/RDMAP peer
+   (by spoofing a legal IP address) and establish an MPA/DDP/RDMAP
+   Stream with the victim.  End-to-end authentication (i.e., IPsec or
+   ULP authentication) provides protection against this attack.
+
+9.1.1.2.  Stream Hijacking
+
+   Stream hijacking happens when a network-based attacker follows the
+   Stream establishment phase, and waits until the authentication phase
+   (if such a phase exists) is completed successfully.  He can then
+   spoof the IP address and redirect the Stream from the victim to its
+   own machine.  For example, an attacker can wait until an iSCSI
+   authentication is completed successfully, and hijack the iSCSI
+   Stream.
+
+   The best protection against this form of attack is end-to-end
+   integrity protection and authentication, such as IPsec, to prevent
+   spoofing.  Another option is to provide physical security.
+   Discussion of physical security is out of scope for this document.
+
+9.1.1.3.  Man-in-the-Middle Attack
+
+   If a network-based attacker has the ability to delete, inject,
+   replay, or modify packets that will still be accepted by MPA (e.g.,
+   TCP sequence number is correct, FPDU is valid, etc.), then the Stream
+   can be exposed to a man-in-the-middle attack.  The attacker could
+   potentially use the services of [DDP] and [RDMAP] to read the
+   contents of the associated Data Buffer, to modify the contents of the
+   associated Data Buffer, or to disable further access to the buffer.
+   Other attacks on the connection setup sequence and even on TCP can be
+   used to cause denial of service.  The only countermeasure for this
+   form of attack is to either secure the MPA/DDP/RDMAP Stream (i.e.,
+   integrity protect) or attempt to provide physical security to prevent
+   man-in-the-middle type attacks.
+
+   The best protection against this form of attack is end-to-end
+   integrity protection and authentication, such as IPsec, to prevent
+   spoofing or tampering.  If Stream or session level authentication and
+   integrity protection are not used, then a man-in-the-middle attack
+   can occur, enabling spoofing and tampering.
+
+   Another approach is to restrict access to only the local subnet/link
+   and provide some mechanism to limit access, such as physical security
+   or 802.1.x.  This model is an extremely limited deployment scenario
+   and will not be further examined here.
+
+
+
+
+Culley, et al.              Standards Track                    [Page 41]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+9.1.2.  Eavesdropping
+
+   Generally speaking, Stream confidentiality protects against
+   eavesdropping.  Stream and/or session authentication and integrity
+   protection are a counter measurement against various spoofing and
+   tampering attacks.  The effectiveness of authentication and integrity
+   against a specific attack depend on whether the authentication is
+   machine-level authentication (as the one provided by IPsec) or ULP
+   authentication.
+
+9.2.  Introduction to Security Options
+
+   The following security services can be applied to an MPA/DDP/RDMAP
+   Stream:
+
+   1.  Session confidentiality - protects against eavesdropping.
+
+   2.  Per-packet data source authentication - protects against the
+       following spoofing attacks: network-based impersonation, Stream
+       hijacking, and man in the middle.
+
+   3.  Per-packet integrity - protects against tampering done by
+       network-based modification of FPDUs (indirectly affecting buffer
+       content through DDP services).
+
+   4.  Packet sequencing - protects against replay attacks, which is a
+       special case of the above tampering attack.
+
+   If an MPA/DDP/RDMAP Stream may be subject to impersonation attacks,
+   or Stream hijacking attacks, it is recommended that the Stream be
+   authenticated, integrity protected, and protected from replay
+   attacks.  It may use confidentiality protection to protect from
+   eavesdropping (in case the MPA/DDP/RDMAP Stream traverses a public
+   network).
+
+   IPsec is capable of providing the above security services for IP and
+   TCP traffic.
+
+   ULP protocols may be able to provide part of the above security
+   services.  See [NFSv4CHAN] for additional information on a promising
+   approach called "channel binding".  From [NFSv4CHAN]:
+
+       "The concept of channel bindings allows applications to prove
+       that the end-points of two secure channels at different network
+       layers are the same by binding authentication at one channel to
+       the session protection at the other channel.  The use of channel
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 42]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+       bindings allows applications to delegate session protection to
+       lower layers, which may significantly improve performance for
+       some applications."
+
+9.3.  Using IPsec with MPA
+
+   IPsec can be used to protect against the packet injection attacks
+   outlined above.  Because IPsec is designed to secure individual IP
+   packets, MPA can run above IPsec without change.  IPsec packets are
+   processed (e.g., integrity checked and decrypted) in the order they
+   are received, and an MPA receiver will process the decrypted FPDUs
+   contained in these packets in the same manner as FPDUs contained in
+   unsecured IP packets.
+
+   MPA implementations MUST implement IPsec as described in Section 9.4
+   below.  The use of IPsec is up to ULPs and administrators.
+
+9.4.  Requirements for IPsec Encapsulation of MPA/DDP
+
+   The IP Storage working group has spent significant time and effort to
+   define the normative IPsec requirements for IP storage [RFC3723].
+   Portions of that specification are applicable to a wide variety of
+   protocols, including the RDDP protocol suite.  In order not to
+   replicate this effort, an MPA on TCP implementation MUST follow the
+   requirements defined in RFC 3723, Sections 2.3 and 5, including the
+   associated normative references for those sections.
+
+   Additionally, since IPsec acceleration hardware may only be able to
+   handle a limited number of active Internet Key Exchange Protocol
+   (IKE) Phase 2 security associations (SAs), Phase 2 delete messages
+   MAY be sent for idle SAs, as a means of keeping the number of active
+   Phase 2 SAs to a minimum.  The receipt of an IKE Phase 2 delete
+   message MUST NOT be interpreted as a reason for tearing down a
+   DDP/RDMA Stream.  Rather, it is preferable to leave the Stream up,
+   and if additional traffic is sent on it, to bring up another IKE
+   Phase 2 SA to protect it.  This avoids the potential for continually
+   bringing Streams up and down.
+
+   The IPsec requirements for RDDP are based on the version of IPsec
+   specified in RFC 2401 [RFC2401] and related RFCs, as profiled by RFC
+   3723 [RFC3723], despite the existence of a newer version of IPsec
+   specified in RFC 4301 [RFC4301] and related RFCs.  One of the
+   important early applications of the RDDP protocols is their use with
+   iSCSI [iSER]; RDDP's IPsec requirements follow those of IPsec in
+   order to facilitate that usage by allowing a common profile of IPsec
+   to be used with iSCSI and the RDDP protocols.  In the future, RFC
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 43]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   3723 may be updated to the newer version of IPsec; the IPsec security
+   requirements of any such update should apply uniformly to iSCSI and
+   the RDDP protocols.
+
+   Note that there are serious security issues if IPsec is not
+   implemented end-to-end.  For example, if IPsec is implemented as a
+   tunnel in the middle of the network, any hosts between the peer and
+   the IPsec tunneling device can freely attack the unprotected Stream.
+
+10.  IANA Considerations
+
+   No IANA actions are required by this document.
+
+   If a well-known port is chosen as the mechanism to identify a DDP on
+   MPA on TCP, the well-known port must be registered with IANA.
+   Because the use of the port is DDP specific, registration of the port
+   with IANA is left to DDP.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 44]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+Appendix A.  Optimized MPA-Aware TCP Implementations
+
+   This appendix is for information only and is NOT part of the
+   standard.
+
+   This appendix covers some Optimized MPA-aware TCP implementation
+   guidance to implementers.  It is intended for those implementations
+   that want to send/receive as much traffic as possible in an aligned
+   and zero-copy fashion.
+
+                   +-----------------------------------+
+                   | +-----------+ +-----------------+ |
+                   | | Optimized | | Other Protocols | |
+                   | |  MPA/TCP  | +-----------------+ |
+                   | +-----------+        ||           |
+                   |         \\     --- socket API --- |
+                   |          \\          ||           |
+                   |           \\      +-----+         |
+                   |            \\     | TCP |         |
+                   |             \\    +-----+         |
+                   |              \\    //             |
+                   |             +-------+             |
+                   |             |  IP   |             |
+                   |             +-------+             |
+                   +-----------------------------------+
+
+                Figure 11: Optimized MPA/TCP Implementation
+
+   The diagram above shows a block diagram of a potential
+   implementation.  The network sub-system in the diagram can support
+   traditional sockets-based connections using the normal API as shown
+   on the right side of the diagram.  Connections for DDP/MPA/TCP are
+   run using the facilities shown on the left side of the diagram.
+
+   The DDP/MPA/TCP connections can be started using the facilities shown
+   on the left side using some suitable API, or they can be initiated
+   using the facilities shown on the right side and transitioned to the
+   left side at the point in the connection setup where MPA goes to
+   "Full MPA/DDP Operation Phase" as described in Section 7.1.2.
+
+   The optimized MPA/TCP implementations (left side of diagram and
+   described below) are only applicable to MPA.  All other TCP
+   applications continue to use the standard TCP stacks and interfaces
+   shown in the right side of the diagram.
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 45]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+A.1.  Optimized MPA/TCP Transmitters
+
+   The various TCP RFCs allow considerable choice in segmenting a TCP
+   stream.  In order to optimize FPDU recovery at the MPA receiver, an
+   optimized MPA/TCP implementation uses additional segmentation rules.
+
+   To provide optimum performance, an optimized MPA/TCP transmit side
+   implementation should be enabled to:
+
+   *   With an EMSS large enough to contain the FPDU(s), segment the
+       outgoing TCP stream such that the first octet of every TCP
+       segment begins with an FPDU.  Multiple FPDUs may be packed into a
+       single TCP segment as long as they are entirely contained in the
+       TCP segment.
+
+   *   Report the current EMSS from the TCP to the MPA transmit layer.
+
+   There are exceptions to the above rule.  Once an ULPDU is provided to
+   MPA, the MPA/TCP sender transmits it or fails the connection; it
+   cannot be repudiated.  As a result, during changes in MTU and EMSS,
+   or when TCP's Receive Window size (RWIN) becomes too small, it may be
+   necessary to send FPDUs that do not conform to the segmentation rule
+   above.
+
+   A possible, but less desirable, alternative is to use IP
+   fragmentation on accepted FPDUs to deal with MTU reductions or
+   extremely small EMSS.
+
+   Even when alignment with TCP segments is lost, the sender still
+   formats the FPDU according to FPDU format as shown in Figure 2.
+
+   On a retransmission, TCP does not necessarily preserve original TCP
+   segmentation boundaries.  This can lead to the loss of FPDU Alignment
+   and containment within a TCP segment during TCP retransmissions.  An
+   optimized MPA/TCP sender should try to preserve original TCP
+   segmentation boundaries on a retransmission.
+
+A.2.  Effects of Optimized MPA/TCP Segmentation
+
+   Optimized MPA/TCP senders will fill TCP segments to the EMSS with a
+   single FPDU when a DDP message is large enough.  Since the DDP
+   message may not exactly fit into TCP segments, a "message tail" often
+   occurs that results in an FPDU that is smaller than a single TCP
+   segment.  Additionally, some DDP messages may be considerably shorter
+   than the EMSS.  If a small FPDU is sent in a single TCP segment, the
+   result is a "short" TCP segment.
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 46]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Applications expected to see strong advantages from Direct Data
+   Placement include transaction-based applications and throughput
+   applications.  Request/response protocols typically send one FPDU per
+   TCP segment and then wait for a response.  Under these conditions,
+   these "short" TCP segments are an appropriate and expected effect of
+   the segmentation.
+
+   Another possibility is that the application might be sending multiple
+   messages (FPDUs) to the same endpoint before waiting for a response.
+   In this case, the segmentation policy would tend to reduce the
+   available connection bandwidth by under-filling the TCP segments.
+
+   Standard TCP implementations often utilize the Nagle [RFC896]
+   algorithm to ensure that segments are filled to the EMSS whenever the
+   round-trip latency is large enough that the source stream can fully
+   fill segments before ACKs arrive.  The algorithm does this by
+   delaying the transmission of TCP segments until a ULP can fill a
+   segment, or until an ACK arrives from the far side.  The algorithm
+   thus allows for smaller segments when latencies are shorter to keep
+   the ULP's end-to-end latency to reasonable levels.
+
+   The Nagle algorithm is not mandatory to use [RFC1122].
+
+   When used with optimized MPA/TCP stacks, Nagle and similar algorithms
+   can result in the "packing" of multiple FPDUs into TCP segments.
+
+   If a "message tail", small DDP messages, or the start of a larger DDP
+   message are available, MPA may pack multiple FPDUs into TCP segments.
+   When this is done, the TCP segments can be more fully utilized, but,
+   due to the size constraints of FPDUs, segments may not be filled to
+   the EMSS.  A dynamic MULPDU that informs DDP of the size of the
+   remaining TCP segment space makes filling the TCP segment more
+   effective.
+
+       Note that MPA receivers do more processing of a TCP segment that
+       contains multiple FPDUs; this may affect the performance of some
+       receiver implementations.
+
+   It is up to the ULP to decide if Nagle is useful with DDP/MPA.  Note
+   that many of the applications expected to take advantage of MPA/DDP
+   prefer to avoid the extra delays caused by Nagle.  In such scenarios,
+   it is anticipated there will be minimal opportunity for packing at
+   the transmitter and receivers may choose to optimize their
+   performance for this anticipated behavior.
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 47]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Therefore, the application is expected to set TCP parameters such
+   that it can trade off latency and wire efficiency.  Implementations
+   should provide a connection option that disables Nagle for MPA/TCP
+   similar to the way the TCP_NODELAY socket option is provided for a
+   traditional sockets interface.
+
+   When latency is not critical, application is expected to leave Nagle
+   enabled.  In this case, the TCP implementation may pack any available
+   FPDUs into TCP segments so that the segments are filled to the EMSS.
+   If the amount of data available is not enough to fill the TCP segment
+   when it is prepared for transmission, TCP can send the segment partly
+   filled, or use the Nagle algorithm to wait for the ULP to post more
+   data.
+
+A.3.  Optimized MPA/TCP Receivers
+
+   When an MPA receive implementation and the MPA-aware receive side TCP
+   implementation support handling out-of-order ULPDUs, the TCP receive
+   implementation performs the following functions:
+
+   1)  The implementation passes incoming TCP segments to MPA as soon as
+       they have been received and validated, even if not received in
+       order.  The TCP layer commits to keeping each segment before it
+       can be passed to the MPA.  This means that the segment must have
+       passed the TCP, IP, and lower layer data integrity validation
+       (i.e., checksum), must be in the receive window, must be part of
+       the same epoch (if timestamps are used to verify this), and must
+       have passed any other checks required by TCP RFCs.
+
+       This is not to imply that the data must be completely ordered
+       before use.  An implementation can accept out-of-order segments,
+       SACK them [RFC2018], and pass them to MPA immediately, before the
+       reception of the segments needed to fill in the gaps.  MPA
+       expects to utilize these segments when they are complete FPDUs or
+       can be combined into complete FPDUs to allow the passing of
+       ULPDUs to DDP when they arrive, independent of ordering.  DDP
+       uses the passed ULPDU to "place" the DDP segments (see [DDP] for
+       more details).
+
+       Since MPA performs a CRC calculation and other checks on received
+       FPDUs, the MPA/TCP implementation ensures that any TCP segments
+       that duplicate data already received and processed (as can happen
+       during TCP retries) do not overwrite already received and
+       processed FPDUs.  This avoids the possibility that duplicate data
+       may corrupt already validated FPDUs.
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 48]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   2)  The implementation provides a mechanism to indicate the ordering
+       of TCP segments as the sender transmitted them.  One possible
+       mechanism might be attaching the TCP sequence number to each
+       segment.
+
+   3)  The implementation also provides a mechanism to indicate when a
+       given TCP segment (and the prior TCP stream) is complete.  One
+       possible mechanism might be to utilize the leading (left) edge of
+       the TCP Receive Window.
+
+       MPA uses the ordering and completion indications to inform DDP
+       when a ULPDU is complete; MPA Delivers the FPDU to DDP.  DDP uses
+       the indications to "deliver" its messages to the DDP consumer
+       (see [DDP] for more details).
+
+       DDP on MPA utilizes the above two mechanisms to establish the
+       Delivery semantics that DDP's consumers agree to.  These
+       semantics are described fully in [DDP].  These include
+       requirements on DDP's consumer to respect ownership of buffers
+       prior to the time that DDP delivers them to the Consumer.
+
+   The use of SACK [RFC2018] significantly improves network utilization
+   and performance and is therefore recommended.  When combined with the
+   out-of-order passing of segments to MPA and DDP, significant
+   buffering and copying of received data can be avoided.
+
+A.4.  Re-Segmenting Middleboxes and Non-Optimized MPA/TCP Senders
+
+   Since MPA senders often start FPDUs on TCP segment boundaries, a
+   receiving optimized MPA/TCP implementation may be able to optimize
+   the reception of data in various ways.
+
+   However, MPA receivers MUST NOT depend on FPDU Alignment on TCP
+   segment boundaries.
+
+   Some MPA senders may be unable to conform to the sender requirements
+   because their implementation of TCP is not designed with MPA in mind.
+   Even for optimized MPA/TCP senders, the network may contain
+   "middleboxes" which modify the TCP stream by changing the
+   segmentation.  This is generally interoperable with TCP and its users
+   and MPA must be no exception.
+
+   The presence of Markers in MPA (when enabled) allows an optimized
+   MPA/TCP receiver to recover the FPDUs despite these obstacles,
+   although it may be necessary to utilize additional buffering at the
+   receiver to do so.
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 49]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Some of the cases that a receiver may have to contend with are listed
+   below as a reminder to the implementer:
+
+   *   A single aligned and complete FPDU, either in order or out of
+       order:  This can be passed to DDP as soon as validated, and
+       Delivered when ordering is established.
+
+   *   Multiple FPDUs in a TCP segment, aligned and fully contained,
+       either in order or out of order:  These can be passed to DDP as
+       soon as validated, and Delivered when ordering is established.
+
+   *   Incomplete FPDU: The receiver should buffer until the remainder
+       of the FPDU arrives.  If the remainder of the FPDU is already
+       available, this can be passed to DDP as soon as validated, and
+       Delivered when ordering is established.
+
+   *   Unaligned FPDU start: The partial FPDU must be combined with its
+       preceding portion(s).  If the preceding parts are already
+       available, and the whole FPDU is present, this can be passed to
+       DDP as soon as validated, and Delivered when ordering is
+       established.  If the whole FPDU is not available, the receiver
+       should buffer until the remainder of the FPDU arrives.
+
+   *   Combinations of unaligned or incomplete FPDUs (and potentially
+       other complete FPDUs) in the same TCP segment:  If any FPDU is
+       present in its entirety, or can be completed with portions
+       already available, it can be passed to DDP as soon as validated,
+       and Delivered when ordering is established.
+
+A.5.  Receiver Implementation
+
+   Transport & Network Layer Reassembly Buffers:
+
+   The use of reassembly buffers (either TCP reassembly buffers or IP
+   fragmentation reassembly buffers) is implementation dependent.  When
+   MPA is enabled, reassembly buffers are needed if out-of-order packets
+   arrive and Markers are not enabled.  Buffers are also needed if FPDU
+   alignment is lost or if IP fragmentation occurs.  This is because the
+   incoming out-of-order segment may not contain enough information for
+   MPA to process all of the FPDU.  For cases where a re-segmenting
+   middlebox is present, or where the TCP sender is not optimized, the
+   presence of Markers significantly reduces the amount of buffering
+   needed.
+
+   Recovery from IP fragmentation is transparent to the MPA Consumers.
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 50]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+A.5.1  Network Layer Reassembly Buffers
+
+   The MPA/TCP implementation should set the IP Don't Fragment bit at
+   the IP layer.  Thus, upon a path MTU change, intermediate devices
+   drop the IP datagram if it is too large and reply with an ICMP
+   message that tells the source TCP that the path MTU has changed.
+   This causes TCP to emit segments conformant with the new path MTU
+   size.  Thus, IP fragments under most conditions should never occur at
+   the receiver.  But it is possible.
+
+   There are several options for implementation of network layer
+   reassembly buffers:
+
+   1.  drop any IP fragments, and reply with an ICMP message according
+       to [RFC792] (fragmentation needed and DF set) to tell the Remote
+       Peer to resize its TCP segment.
+
+   2.  support an IP reassembly buffer, but have it of limited size
+       (possibly the same size as the local link's MTU).  The end node
+       would normally never Advertise a path MTU larger than the local
+       link MTU.  It is recommended that a dropped IP fragment cause an
+       ICMP message to be generated according to RFC 792.
+
+   3.  multiple IP reassembly buffers, of effectively unlimited size.
+
+   4.  support an IP reassembly buffer for the largest IP datagram (64
+       KB).
+
+   5.  support for a large IP reassembly buffer that could span multiple
+       IP datagrams.
+
+   An implementation should support at least 2 or 3 above, to avoid
+   dropping packets that have traversed the entire fabric.
+
+   There is no end-to-end ACK for IP reassembly buffers, so there is no
+   flow control on the buffer.  The only end-to-end ACK is a TCP ACK,
+   which can only occur when a complete IP datagram is delivered to TCP.
+   Because of this, under worst case, pathological scenarios, the
+   largest IP reassembly buffer is the TCP receive window (to buffer
+   multiple IP datagrams that have all been fragmented).
+
+   Note that if the Remote Peer does not implement re-segmentation of
+   the data stream upon receiving the ICMP reply updating the path MTU,
+   it is possible to halt forward progress because the opposite peer
+   would continue to retransmit using a transport segment size that is
+   too large.  This deadlock scenario is no different than if the fabric
+   MTU (not last-hop MTU) was reduced after connection setup, and the
+   remote node's behavior is not compliant with [RFC1122].
+
+
+
+Culley, et al.              Standards Track                    [Page 51]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+A.5.2  TCP Reassembly Buffers
+
+   A TCP reassembly buffer is also needed.  TCP reassembly buffers are
+   needed if FPDU Alignment is lost when using TCP with MPA or when the
+   MPA FPDU spans multiple TCP segments.  Buffers are also needed if
+   Markers are disabled and out-of-order packets arrive.
+
+   Since lost FPDU Alignment often means that FPDUs are incomplete, an
+   MPA on TCP implementation must have a reassembly buffer large enough
+   to recover an FPDU that is less than or equal to the MTU of the
+   locally attached link (this should be the largest possible Advertised
+   TCP path MTU).  If the MTU is smaller than 140 octets, a buffer of at
+   least 140 octets long is needed to support the minimum FPDU size.
+   The 140 octets allow for the minimum MULPDU of 128, 2 octets of pad,
+   2 of ULPDU_Length, 4 of CRC, and space for a possible Marker.  As
+   usual, additional buffering is likely to provide better performance.
+
+   Note that if the TCP segments were not stored, it would be possible
+   to deadlock the MPA algorithm.  If the path MTU is reduced, FPDU
+   Alignment requires the source TCP to re-segment the data stream to
+   the new path MTU.  The source MPA will detect this condition and
+   reduce the MPA segment size, but any FPDUs already posted to the
+   source TCP will be re-segmented and lose FPDU Alignment.  If the
+   destination does not support a TCP reassembly buffer, these segments
+   can never be successfully transmitted and the protocol deadlocks.
+
+   When a complete FPDU is received, processing continues normally.
+
+Appendix B.  Analysis of MPA over TCP Operations
+
+   This appendix is for information only and is NOT part of the
+   standard.
+
+   This appendix is an analysis of MPA on TCP and why it is useful to
+   integrate MPA with TCP (with modifications to typical TCP
+   implementations) to reduce overall system buffering and overhead.
+
+   One of MPA's high-level goals is to provide enough information, when
+   combined with the Direct Data Placement Protocol [DDP], to enable
+   out-of-order placement of DDP payload into the final Upper Layer
+   Protocol (ULP) Buffer.  Note that DDP separates the act of placing
+   data into a ULP Buffer from that of notifying the ULP that the ULP
+   Buffer is available for use.  In DDP terminology, the former is
+   defined as "Placement", and the later is defined as "Delivery".  MPA
+   supports in-order Delivery of the data to the ULP, including support
+   for Direct Data Placement in the final ULP Buffer location when TCP
+   segments arrive out of order.  Effectively, the goal is to use the
+
+
+
+
+Culley, et al.              Standards Track                    [Page 52]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   pre-posted ULP Buffers as the TCP receive buffer, where the
+   reassembly of the ULP Protocol Data Unit (PDU) by TCP (with MPA and
+   DDP) is done in place, in the ULP Buffer, with no data copies.
+
+   This appendix walks through the advantages and disadvantages of the
+   TCP sender modifications proposed by MPA:
+
+   1) that MPA prefers that the TCP sender to do Header Alignment, where
+      a TCP segment should begin with an MPA Framing Protocol Data Unit
+      (FPDU) (if there is payload present).
+
+   2) that there be an integral number of FPDUs in a TCP segment (under
+      conditions where the path MTU is not changing).
+
+   This appendix concludes that the scaling advantages of FPDU Alignment
+   are strong, based primarily on fairly drastic TCP receive buffer
+   reduction requirements and simplified receive handling.  The analysis
+   also shows that there is little effect to TCP wire behavior.
+
+B.1.  Assumptions
+
+B.1.1  MPA Is Layered beneath DDP
+
+   MPA is an adaptation layer between DDP and TCP.  DDP requires
+   preservation of DDP segment boundaries and a CRC32c digest covering
+   the DDP header and data.  MPA adds these features to the TCP stream
+   so that DDP over TCP has the same basic properties as DDP over SCTP.
+
+B.1.2.  MPA Preserves DDP Message Framing
+
+   MPA was designed as a framing layer specifically for DDP and was not
+   intended as a general-purpose framing layer for any other ULP using
+   TCP.
+
+   A framing layer allows ULPs using it to receive indications from the
+   transport layer only when complete ULPDUs are present.  As a framing
+   layer, MPA is not aware of the content of the DDP PDU, only that it
+   has received and, if necessary, reassembled a complete PDU for
+   Delivery to the DDP.
+
+B.1.3.  The Size of the ULPDU Passed to MPA Is Less Than EMSS under
+        Normal Conditions
+
+   To make reception of a complete DDP PDU on every received segment
+   possible, DDP passes to MPA a PDU that is no larger than the EMSS of
+   the underlying fabric.  Each FPDU that MPA creates contains
+   sufficient information for the receiver to directly place the ULP
+   payload in the correct location in the correct receive buffer.
+
+
+
+Culley, et al.              Standards Track                    [Page 53]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Edge cases when this condition does not occur are dealt with, but do
+   not need to be on the fast path.
+
+B.1.4.  Out-of-Order Placement but NO Out-of-Order Delivery
+
+   DDP receives complete DDP PDUs from MPA.  Each DDP PDU contains the
+   information necessary to place its ULP payload directly in the
+   correct location in host memory.
+
+   Because each DDP segment is self-describing, it is possible for DDP
+   segments received out of order to have their ULP payload placed
+   immediately in the ULP receive buffer.
+
+   Data delivery to the ULP is guaranteed to be in the order the data
+   was sent.  DDP only indicates data delivery to the ULP after TCP has
+   acknowledged the complete byte stream.
+
+B.2.  The Value of FPDU Alignment
+
+   Significant receiver optimizations can be achieved when Header
+   Alignment and complete FPDUs are the common case.  The optimizations
+   allow utilizing significantly fewer buffers on the receiver and less
+   computation per FPDU.  The net effect is the ability to build a
+   "flow-through" receiver that enables TCP-based solutions to scale to
+   10G and beyond in an economical way.  The optimizations are
+   especially relevant to hardware implementations of receivers that
+   process multiple protocol layers -- Data Link Layer (e.g., Ethernet),
+   Network and Transport Layer (e.g., TCP/IP), and even some ULP on top
+   of TCP (e.g., MPA/DDP).  As network speed increases, there is an
+   increasing desire to use a hardware-based receiver in order to
+   achieve an efficient high performance solution.
+
+   A TCP receiver, under worst-case conditions, has to allocate buffers
+   (BufferSizeTCP) whose capacities are a function of the bandwidth-
+   delay product.  Thus:
+
+       BufferSizeTCP = K * bandwidth [octets/second] * Delay [seconds].
+
+   Where bandwidth is the end-to-end bandwidth of the connection, delay
+   is the round-trip delay of the connection, and K is an
+   implementation-dependent constant.
+
+   Thus, BufferSizeTCP scales with the end-to-end bandwidth (10x more
+   buffers for a 10x increase in end-to-end bandwidth).  As this
+   buffering approach may scale poorly for hardware or software
+   implementations alike, several approaches allow reduction in the
+   amount of buffering required for high-speed TCP communication.
+
+
+
+
+Culley, et al.              Standards Track                    [Page 54]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   The MPA/DDP approach is to enable the ULP's Buffer to be used as the
+   TCP receive buffer.  If the application pre-posts a sufficient amount
+   of buffering, and each TCP segment has sufficient information to
+   place the payload into the right application buffer, when an out-of-
+   order TCP segment arrives it could potentially be placed directly in
+   the ULP Buffer.  However, placement can only be done when a complete
+   FPDU with the placement information is available to the receiver, and
+   the FPDU contents contain enough information to place the data into
+   the correct ULP Buffer (e.g., there is a DDP header available).
+
+   For the case when the FPDU is not aligned with the TCP segment, it
+   may take, on average, 2 TCP segments to assemble one FPDU.
+   Therefore, the receiver has to allocate BufferSizeNAF (Buffer Size,
+   Non-Aligned FPDU) octets:
+
+       BufferSizeNAF = K1* EMSS * number_of_connections + K2 * EMSS
+
+   Where K1 and K2 are implementation-dependent constants and EMSS is
+   the effective maximum segment size.
+
+   For example, a 1 GB/sec link with 10,000 connections and an EMSS of
+   1500 B would require 15 MB of memory.  Often the number of
+   connections used scales with the network speed, aggravating the
+   situation for higher speeds.
+
+   FPDU Alignment would allow the receiver to allocate BufferSizeAF
+   (Buffer Size, Aligned FPDU) octets:
+
+       BufferSizeAF = K2 * EMSS
+
+   for the same conditions.  An FPDU Aligned receiver may require memory
+   in the range of ~100s of KB -- which is feasible for an on-chip
+   memory and enables a "flow-through" design, in which the data flows
+   through the network interface card (NIC) and is placed directly in
+   the destination buffer.  Assuming most of the connections support
+   FPDU Alignment, the receiver buffers no longer scale with number of
+   connections.
+
+   Additional optimizations can be achieved in a balanced I/O sub-system
+   -- where the system interface of the network controller provides
+   ample bandwidth as compared with the network bandwidth.  For almost
+   twenty years this has been the case and the trend is expected to
+   continue.  While Ethernet speeds have scaled by 1000 (from 10
+   megabit/sec to 10 gigabit/sec), I/O bus bandwidth of volume CPU
+   architectures has scaled from ~2 MB/sec to ~2 GB/sec (PC-XT bus to
+   PCI-X DDR).  Under these conditions, the FPDU Alignment approach
+   allows BufferSizeAF to be indifferent to network speed.  It is
+   primarily a function of the local processing time for a given frame.
+
+
+
+Culley, et al.              Standards Track                    [Page 55]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Thus, when the FPDU Alignment approach is used, receive buffering is
+   expected to scale gracefully (i.e., less than linear scaling) as
+   network speed is increased.
+
+B.2.1.  Impact of Lack of FPDU Alignment on the Receiver Computational
+        Load and Complexity
+
+   The receiver must perform IP and TCP processing, and then perform
+   FPDU CRC checks, before it can trust the FPDU header placement
+   information.  For simplicity of the description, the assumption is
+   that an FPDU is carried in no more than 2 TCP segments.  In reality,
+   with no FPDU Alignment, an FPDU can be carried by more than 2 TCP
+   segments (e.g., if the path MTU was reduced).
+
+   ----++-----------------------------++-----------------------++-----
+   +---||---------------+    +--------||--------+   +----------||----+
+   |   TCP Seg X-1      |    |     TCP Seg X    |   |  TCP Seg X+1   |
+   +---||---------------+    +--------||--------+   +----------||----+
+   ----++-----------------------------++-----------------------++-----
+                   FPDU #N-1                  FPDU #N
+
+     Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream
+
+   The receiver algorithm for processing TCP segments (e.g., TCP segment
+   #X in Figure 12) carrying non-aligned FPDUs (in order or out of
+   order) includes:
+
+   Data Link Layer processing (whole frame) -- typically including a CRC
+   calculation.
+
+       1.  Network Layer processing (assuming not an IP fragment, the
+           whole Data Link Layer frame contains one IP datagram.  IP
+           fragments should be reassembled in a local buffer.  This is
+           not a performance optimization goal.)
+
+       2.  Transport Layer processing -- TCP protocol processing, header
+           and checksum checks.
+
+           a.  Classify incoming TCP segment using the 5 tuple (IP SRC,
+               IP DST, TCP SRC Port, TCP DST Port, protocol).
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 56]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+       3.  Find FPDU message boundaries.
+
+           a.  Get MPA state information for the connection.
+
+               If the TCP segment is in order, use the receiver-managed
+               MPA state information to calculate where the previous
+               FPDU message (#N-1) ends in the current TCP segment X.
+               (previously, when the MPA receiver processed the first
+               part of FPDU #N-1, it calculated the number of bytes
+               remaining to complete FPDU #N-1 by using the MPA Length
+               field).
+
+                   Get the stored partial CRC for FPDU #N-1.
+
+                   Complete CRC calculation for FPDU #N-1 data (first
+                       portion of TCP segment #X).
+
+                   Check CRC calculation for FPDU #N-1.
+
+                   If no FPDU CRC errors, placement is allowed.
+
+                   Locate the local buffer for the first portion of
+                       FPDU#N-1, CopyData(local buffer of first portion
+                       of FPDU #N-1, host buffer address, length).
+
+                   Compute host buffer address for second portion of
+                       FPDU #N-1.
+
+                   CopyData (local buffer of second portion of FPDU #N-
+                       1, host buffer address for second portion,
+                       length).
+
+                   Calculate the octet offset into the TCP segment for
+                       the next FPDU #N.
+
+                   Start calculation of CRC for available data for FPDU.
+                       #N
+
+                   Store partial CRC results for FPDU #N.
+
+                   Store local buffer address of first portion of FPDU
+                       #N.
+
+                   No further action is possible on FPDU #N, before it
+                       is completely received.
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 57]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+               If the TCP segment is out of order, the receiver must
+               buffer the data until at least one complete FPDU is
+               received.  Typically, buffering for more than one TCP
+               segment per connection is required.  Use the MPA-based
+               Markers to calculate where FPDU boundaries are.
+
+                   When a complete FPDU is available, a similar
+                   procedure to the in-order algorithm above is used.
+                   There is additional complexity, though, because when
+                   the missing segment arrives, this TCP segment must be
+                   run through the CRC engine after the CRC is
+                   calculated for the missing segment.
+
+   If we assume FPDU Alignment, the following diagram and the algorithm
+   below apply.  Note that when using MPA, the receiver is assumed to
+   actively detect presence or loss of FPDU Alignment for every TCP
+   segment received.
+
+      +--------------------------+      +--------------------------+
+   +--|--------------------------+   +--|--------------------------+
+   |  |       TCP Seg X          |   |  |         TCP Seg X+1      |
+   +--|--------------------------+   +--|--------------------------+
+      +--------------------------+      +--------------------------+
+                FPDU #N                          FPDU #N+1
+
+      Figure 13: Aligned FPDU Placed Immediately after TCP Header
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 58]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   The receiver algorithm for FPDU Aligned frames (in order or out of
+   order) includes:
+
+       1)  Data Link Layer processing (whole frame) -- typically
+           including a CRC calculation.
+
+       2)  Network Layer processing (assuming not an IP fragment, the
+           whole Data Link Layer frame contains one IP datagram.  IP
+           fragments should be reassembled in a local buffer.  This is
+           not a performance optimization goal.)
+
+       3)  Transport Layer processing -- TCP protocol processing, header
+           and checksum checks.
+
+           a.  Classify incoming TCP segment using the 5 tuple (IP SRC,
+               IP DST, TCP SRC Port, TCP DST Port, protocol).
+
+       4)  Check for Header Alignment (described in detail in Section
+           6).  Assuming Header Alignment for the rest of the algorithm
+           below.
+
+           a.  If the header is not aligned, see the algorithm defined
+               in the prior section.
+
+       5)  If TCP segment is in order or out of order, the MPA header is
+           at the beginning of the current TCP payload.  Get the FPDU
+           length from the FPDU header.
+
+       6)  Calculate CRC over FPDU.
+
+       7)  Check CRC calculation for FPDU #N.
+
+       8)  If no FPDU CRC errors, placement is allowed.
+
+       9)  CopyData(TCP segment #X, host buffer address, length).
+
+       10) Loop to #5 until all the FPDUs in the TCP segment are
+           consumed in order to handle FPDU packing.
+
+   Implementation note: In both cases, the receiver has to classify the
+   incoming TCP segment and associate it with one of the flows it
+   maintains.  In the case of no FPDU Alignment, the receiver is forced
+   to classify incoming traffic before it can calculate the FPDU CRC.
+   In the case of FPDU Alignment, the operations order is left to the
+   implementer.
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 59]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   The FPDU Aligned receiver algorithm is significantly simpler.  There
+   is no need to locally buffer portions of FPDUs.  Accessing state
+   information is also substantially simplified -- the normal case does
+   not require retrieving information to find out where an FPDU starts
+   and ends or retrieval of a partial CRC before the CRC calculation can
+   commence.  This avoids adding internal latencies, having multiple
+   data passes through the CRC machine, or scheduling multiple commands
+   for moving the data to the host buffer.
+
+   The aligned FPDU approach is useful for in-order and out-of-order
+   reception.  The receiver can use the same mechanisms for data storage
+   in both cases, and only needs to account for when all the TCP
+   segments have arrived to enable Delivery.  The Header Alignment,
+   along with the high probability that at least one complete FPDU is
+   found with every TCP segment, allows the receiver to perform data
+   placement for out-of-order TCP segments with no need for intermediate
+   buffering.  Essentially, the TCP receive buffer has been eliminated
+   and TCP reassembly is done in place within the ULP Buffer.
+
+   In case FPDU Alignment is not found, the receiver should follow the
+   algorithm for non-aligned FPDU reception, which may be slower and
+   less efficient.
+
+B.2.2.  FPDU Alignment Effects on TCP Wire Protocol
+
+   In an optimized MPA/TCP implementation, TCP exposes its EMSS to MPA.
+   MPA uses the EMSS to calculate its MULPDU, which it then exposes to
+   DDP, its ULP.  DDP uses the MULPDU to segment its payload so that
+   each FPDU sent by MPA fits completely into one TCP segment.  This has
+   no impact on wire protocol, and exposing this information is already
+   supported on many TCP implementations, including all modern flavors
+   of BSD networking, through the TCP_MAXSEG socket option.
+
+   In the common case, the ULP (i.e., DDP over MPA) messages provided to
+   the TCP layer are segmented to MULPDU size.  It is assumed that the
+   ULP message size is bounded by MULPDU, such that a single ULP message
+   can be encapsulated in a single TCP segment.  Therefore, in the
+   common case, there is no increase in the number of TCP segments
+   emitted.  For smaller ULP messages, the sender can also apply
+   packing, i.e., the sender packs as many complete FPDUs as possible
+   into one TCP segment.  The requirement to always have a complete FPDU
+   may increase the number of TCP segments emitted.  Typically, a ULP
+   message size varies from a few bytes to multiple EMSSs (e.g., 64
+   Kbytes).  In some cases, the ULP may post more than one message at a
+   time for transmission, giving the sender an opportunity for packing.
+   In the case where more than one FPDU is available for transmission
+   and the FPDUs are encapsulated into a TCP segment and there is no
+   room in the TCP segment to include the next complete FPDU, another
+
+
+
+Culley, et al.              Standards Track                    [Page 60]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   TCP segment is sent.  In this corner case, some of the TCP segments
+   are not full size.  In the worst-case scenario, the ULP may choose an
+   FPDU size that is EMSS/2 +1 and has multiple messages available for
+   transmission.  For this poor choice of FPDU size, the average TCP
+   segment size is therefore about 1/2 of the EMSS and the number of TCP
+   segments emitted is approaching 2x of what is possible without the
+   requirement to encapsulate an integer number of complete FPDUs in
+   every TCP segment.  This is a dynamic situation that only lasts for
+   the duration where the sender ULP has multiple non-optimal messages
+   for transmission and this causes a minor impact on the wire
+   utilization.
+
+   However, it is not expected that requiring FPDU Alignment will have a
+   measurable impact on wire behavior of most applications.  Throughput
+   applications with large I/Os are expected to take full advantage of
+   the EMSS.  Another class of applications with many small outstanding
+   buffers (as compared to EMSS) is expected to use packing when
+   applicable.  Transaction-oriented applications are also optimal.
+
+   TCP retransmission is another area that can affect sender behavior.
+   TCP supports retransmission of the exact, originally transmitted
+   segment (see [RFC793], Sections 2.6 and 3.7 (under "Managing the
+   Window") and [RFC1122], Section 4.2.2.15).  In the unlikely event
+   that part of the original segment has been received and acknowledged
+   by the Remote Peer (e.g., a re-segmenting middlebox, as documented in
+   Appendix A.4, Re-Segmenting Middleboxes and Non-Optimized MPA/TCP
+   Senders), a better available bandwidth utilization may be possible by
+   retransmitting only the missing octets.  If an optimized MPA/TCP
+   retransmits complete FPDUs, there may be some marginal bandwidth
+   loss.
+
+   Another area where a change in the TCP segment number may have impact
+   is that of slow start and congestion avoidance.  Slow-start
+   exponential increase is measured in segments per second, as the
+   algorithm focuses on the overhead per segment at the source for
+   congestion that eventually results in dropped segments.  Slow-start
+   exponential bandwidth growth for optimized MPA/TCP is similar to any
+   TCP implementation.  Congestion avoidance allows for a linear growth
+   in available bandwidth when recovering after a packet drop.  Similar
+   to the analysis for slow start, optimized MPA/TCP doesn't change the
+   behavior of the algorithm.  Therefore, the average size of the
+   segment versus EMSS is not a major factor in the assessment of the
+   bandwidth growth for a sender.  Both slow start and congestion
+   avoidance for an optimized MPA/TCP will behave similarly to any TCP
+   sender and allow an optimized MPA/TCP to enjoy the theoretical
+   performance limits of the algorithms.
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 61]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   In summary, the ULP messages generated at the sender (e.g., the
+   amount of messages grouped for every transmission request) and
+   message size distribution has the most significant impact over the
+   number of TCP segments emitted.  The worst-case effect for certain
+   ULPs (with average message size of EMSS/2+1 to EMSS) is bounded by an
+   increase of up to 2x in the number of TCP segments and acknowledges.
+   In reality, the effect is expected to be marginal.
+
+Appendix C.  IETF Implementation Interoperability with RDMA Consortium
+             Protocols
+
+   This appendix is for information only and is NOT part of the
+   standard.
+
+   This appendix covers methods of making MPA implementations
+   interoperate with both IETF and RDMA Consortium versions of the
+   protocols.
+
+   The RDMA Consortium created early specifications of the MPA/DDP/RDMA
+   protocols, and some manufacturers created implementations of those
+   protocols before the IETF versions were finalized.  These protocols
+   are very similar to the IETF versions making it possible for
+   implementations to be created or modified to support either set of
+   specifications.
+
+   For those interested, the RDMA Consortium protocol documents (draft-
+   culley-iwarp-mpa-v1.0.pdf [RDMA-MPA], draft-shah-iwarp-ddp-v1.0.pdf
+   [RDMA-DDP], and draft-recio-iwarp-rdmac-v1.0.pdf [RDMA-RDMAC]) can be
+   obtained at http://www.rdmaconsortium.org/home.
+
+   In this section, implementations of MPA/DDP/RDMA that conform to the
+   RDMAC specifications are called RDMAC RNICs.  Implementations of
+   MPA/DDP/RDMA that conform to the IETF RFCs are called IETF RNICs.
+
+   Without the exchange of MPA Request/Reply Frames, there is no
+   standard mechanism for enabling RDMAC RNICs to interoperate with IETF
+   RNICs.  Even if a ULP uses a well-known port to start an IETF RNIC
+   immediately in RDMA mode (i.e., without exchanging the MPA
+   Request/Reply messages), there is no reason to believe an IETF RNIC
+   will interoperate with an RDMAC RNIC because of the differences in
+   the version number in the DDP and RDMAP headers on the wire.
+
+   Therefore, the ULP or other supporting entity at the RDMAC RNIC must
+   implement MPA Request/Reply Frames on behalf of the RNIC in order to
+   negotiate the connection parameters.  The following section describes
+   the results following the exchange of the MPA Request/Reply Frames
+   before the conversion from streaming to RDMA mode.
+
+
+
+
+Culley, et al.              Standards Track                    [Page 62]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+C.1.  Negotiated Parameters
+
+   Three types of RNICs are considered:
+
+   Upgraded RDMAC RNIC - an RNIC implementing the RDMAC protocols that
+   has a ULP or other supporting entity that exchanges the MPA
+   Request/Reply Frames in streaming mode before the conversion to RDMA
+   mode.
+
+   Non-permissive IETF RNIC - an RNIC implementing the IETF protocols
+   that is not capable of implementing the RDMAC protocols.  Such an
+   RNIC can only interoperate with other IETF RNICs.
+
+   Permissive IETF RNIC - an RNIC implementing the IETF protocols that
+   is capable of implementing the RDMAC protocols on a per-connection
+   basis.
+
+   The Permissive IETF RNIC is recommended for those implementers that
+   want maximum interoperability with other RNIC implementations.
+
+   The values used by these three RNIC types for the MPA, DDP, and RDMAP
+   versions as well as MPA Markers and CRC are summarized in Figure 14.
+
+    +----------------++-----------+-----------+-----------+-----------+
+    | RNIC TYPE      || DDP/RDMAP |    MPA    |    MPA    |    MPA    |
+    |                ||  Version  | Revision  |  Markers  |    CRC    |
+    +----------------++-----------+-----------+-----------+-----------+
+    +----------------++-----------+-----------+-----------+-----------+
+    | RDMAC          ||     0     |     0     |     1     |     1     |
+    |                ||           |           |           |           |
+    +----------------++-----------+-----------+-----------+-----------+
+    | IETF           ||     1     |     1     |  0 or 1   |  0 or 1   |
+    | Non-permissive ||           |           |           |           |
+    +----------------++-----------+-----------+-----------+-----------+
+    | IETF           ||  1 or 0   |  1 or 0   |  0 or 1   |  0 or 1   |
+    | permissive     ||           |           |           |           |
+    +----------------++-----------+-----------+-----------+-----------+
+
+           Figure 14: Connection Parameters for the RNIC Types
+            for MPA Markers and MPA CRC, enabled=1, disabled=0.
+
+   It is assumed there is no mixing of versions allowed between MPA,
+   DDP, and RDMAP.  The RNIC either generates the RDMAC protocols on the
+   wire (version is zero) or uses the IETF protocols (version is one).
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 63]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   During the exchange of the MPA Request/Reply Frames, each peer
+   provides its MPA Revision, Marker preference (M: 0=disabled,
+   1=enabled), and CRC preference.  The MPA Revision provided in the MPA
+   Request Frame and the MPA Reply Frame may differ.
+
+   From the information in the MPA Request/Reply Frames, each side sets
+   the Version field (V: 0=RDMAC, 1=IETF) of the DDP/RDMAP protocols as
+   well as the state of the Markers for each half connection.  Between
+   DDP and RDMAP, no mixing of versions is allowed.  Moreover, the DDP
+   and RDMAP version MUST be identical in the two directions.  The RNIC
+   either generates the RDMAC protocols on the wire (version is zero) or
+   uses the IETF protocols (version is one).
+
+   In the following sections, the figures do not discuss CRC negotiation
+   because there is no interoperability issue for CRCs.  Since the RDMAC
+   RNIC will always request CRC use, then, according to the IETF MPA
+   specification, both peers MUST generate and check CRCs.
+
+C.2.  RDMAC RNIC and Non-Permissive IETF RNIC
+
+   Figure 15 shows that a Non-permissive IETF RNIC cannot interoperate
+   with an RDMAC RNIC, despite the fact that both peers exchange MPA
+   Request/Reply Frames.  For a Non-permissive IETF RNIC, the MPA
+   negotiation has no effect on the DDP/RDMAP version and it is unable
+   to interoperate with the RDMAC RNIC.
+
+   The rows in the figure show the state of the Marker field in the MPA
+   Request Frame sent by the MPA Initiator.  The columns show the state
+   of the Marker field in the MPA Reply Frame sent by the MPA Responder.
+   Each type of RNIC is shown as an Initiator and a Responder.  The
+   connection results are shown in the lower right corner, at the
+   intersection of the different RNIC types, where V=0 is the RDMAC
+   DDP/RDMAP version, V=1 is the IETF DDP/RDMAC version, M=0 means MPA
+   Markers are disabled, and M=1 means MPA Markers are enabled.  The
+   negotiated Marker state is shown as X/Y, for the receive direction of
+   the Initiator/Responder.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 64]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+          +---------------------------++-----------------------+
+          |   MPA                     ||          MPA          |
+          | CONNECT                   ||       Responder       |
+          |   MODE  +-----------------++-------+---------------+
+          |         |   RNIC          || RDMAC |     IETF      |
+          |         |   TYPE          ||       | Non-permissive|
+          |         |          +------++-------+-------+-------+
+          |         |          |MARKER|| M=1   | M=0   |  M=1  |
+          +---------+----------+------++-------+-------+-------+
+          +---------+----------+------++-------+-------+-------+
+          |         |   RDMAC  | M=1  || V=0   | close | close |
+          |         |          |      || M=1/1 |       |       |
+          |         +----------+------++-------+-------+-------+
+          |   MPA   |          | M=0  || close | V=1   | V=1   |
+          |Initiator|   IETF   |      ||       | M=0/0 | M=0/1 |
+          |         |Non-perms.+------++-------+-------+-------+
+          |         |          | M=1  || close | V=1   | V=1   |
+          |         |          |      ||       | M=1/0 | M=1/1 |
+          +---------+----------+------++-------+-------+-------+
+
+           Figure 15: MPA Negotiation between an RDMAC RNIC and
+                      a Non-Permissive IETF RNIC
+
+C.2.1.  RDMAC RNIC Initiator
+
+   If the RDMAC RNIC is the MPA Initiator, its ULP sends an MPA Request
+   Frame with Rev field set to zero and the M and C bits set to one.
+   Because the Non-permissive IETF RNIC cannot dynamically downgrade the
+   version number it uses for DDP and RDMAP, it would send an MPA Reply
+   Frame with the Rev field equal to one and then gracefully close the
+   connection.
+
+C.2.2.  Non-Permissive IETF RNIC Initiator
+
+   If the Non-permissive IETF RNIC is the MPA Initiator, it sends an MPA
+   Request Frame with Rev field equal to one.  The ULP or supporting
+   entity for the RDMAC RNIC responds with an MPA Reply Frame that has
+   the Rev field equal to zero and the M bit set to one.  The Non-
+   permissive IETF RNIC will gracefully close the connection after it
+   reads the incompatible Rev field in the MPA Reply Frame.
+
+C.2.3.  RDMAC RNIC and Permissive IETF RNIC
+
+   Figure 16 shows that a Permissive IETF RNIC can interoperate with an
+   RDMAC RNIC regardless of its Marker preference.  The figure uses the
+   same format as shown with the Non-permissive IETF RNIC.
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 65]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+          +---------------------------++-----------------------+
+          |   MPA                     ||          MPA          |
+          | CONNECT                   ||       Responder       |
+          |   MODE  +-----------------++-------+---------------+
+          |         |   RNIC          || RDMAC |     IETF      |
+          |         |   TYPE          ||       |  Permissive   |
+          |         |          +------++-------+-------+-------+
+          |         |          |MARKER|| M=1   | M=0   | M=1   |
+          +---------+----------+------++-------+-------+-------+
+          +---------+----------+------++-------+-------+-------+
+          |         |   RDMAC  | M=1  || V=0   | N/A   | V=0   |
+          |         |          |      || M=1/1 |       | M=1/1 |
+          |         +----------+------++-------+-------+-------+
+          |   MPA   |          | M=0  || V=0   | V=1   | V=1   |
+          |Initiator|   IETF   |      || M=1/1 | M=0/0 | M=0/1 |
+          |         |Permissive+------++-------+-------+-------+
+          |         |          | M=1  || V=0   | V=1   | V=1   |
+          |         |          |      || M=1/1 | M=1/0 | M=1/1 |
+          +---------+----------+------++-------+-------+-------+
+
+           Figure 16: MPA Negotiation between an RDMAC RNIC and
+                         a Permissive IETF RNIC
+
+   A truly Permissive IETF RNIC will recognize an RDMAC RNIC from the
+   Rev field of the MPA Req/Rep Frames and then adjust its receive
+   Marker state and DDP/RDMAP version to accommodate the RDMAC RNIC.  As
+   a result, as an MPA Responder, the Permissive IETF RNIC will never
+   return an MPA Reply Frame with the M bit set to zero.  This case is
+   shown as a not applicable (N/A) in Figure 16.
+
+C.2.4.  RDMAC RNIC Initiator
+
+   When the RDMAC RNIC is the MPA Initiator, its ULP or other supporting
+   entity prepares an MPA Request message and sets the revision to zero
+   and the M bit and C bit to one.
+
+   The Permissive IETF Responder receives the MPA Request message and
+   checks the revision field.  Since it is capable of generating RDMAC
+   DDP/RDMAP headers, it sends an MPA Reply message with revision set to
+   zero and the M and C bits set to one.  The Responder must inform its
+   ULP that it is generating version zero DDP/RDMAP messages.
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 66]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+C.2.5  Permissive IETF RNIC Initiator
+
+   If the Permissive IETF RNIC is the MPA Initiator, it prepares the MPA
+   Request Frame setting the Rev field to one.  Regardless of the value
+   of the M bit in the MPA Request Frame, the ULP or other supporting
+   entity for the RDMAC RNIC will create an MPA Reply Frame with Rev
+   equal to zero and the M bit set to one.
+
+   When the Initiator reads the Rev field of the MPA Reply Frame and
+   finds that its peer is an RDMAC RNIC, it must inform its ULP that it
+   should generate version zero DDP/RDMAP messages and enable MPA
+   Markers and CRC.
+
+C.3.  Non-Permissive IETF RNIC and Permissive IETF RNIC
+
+   For completeness, Figure 17 below shows the results of MPA
+   negotiation between a Non-permissive IETF RNIC and a Permissive IETF
+   RNIC.  The important point from this figure is that an IETF RNIC
+   cannot detect whether its peer is a Permissive or Non-permissive
+   RNIC.
+
+      +---------------------------++-------------------------------+
+      |   MPA                     ||              MPA              |
+      | CONNECT                   ||            Responder          |
+      |   MODE  +-----------------++---------------+---------------+
+      |         |   RNIC          ||     IETF      |     IETF      |
+      |         |   TYPE          || Non-permissive|  Permissive   |
+      |         |          +------++-------+-------+-------+-------+
+      |         |          |MARKER|| M=0   | M=1   | M=0   | M=1   |
+      +---------+----------+------++-------+-------+-------+-------+
+      +---------+----------+------++-------+-------+-------+-------+
+      |         |          | M=0  || V=1   | V=1   | V=1   | V=1   |
+      |         |   IETF   |      || M=0/0 | M=0/1 | M=0/0 | M=0/1 |
+      |         |Non-perms.+------++-------+-------+-------+-------+
+      |         |          | M=1  || V=1   | V=1   | V=1   | V=1   |
+      |         |          |      || M=1/0 | M=1/1 | M=1/0 | M=1/1 |
+      |   MPA   +----------+------++-------+-------+-------+-------+
+      |Initiator|          | M=0  || V=1   | V=1   | V=1   | V=1   |
+      |         |   IETF   |      || M=0/0 | M=0/1 | M=0/0 | M=0/1 |
+      |         |Permissive+------++-------+-------+-------+-------+
+      |         |          | M=1  || V=1   | V=1   | V=1   | V=1   |
+      |         |          |      || M=1/0 | M=1/1 | M=1/0 | M=1/1 |
+      +---------+----------+------++-------+-------+-------+-------+
+
+    Figure 17: MPA negotiation between a Non-permissive IETF RNIC and a
+                           Permissive IETF RNIC.
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 67]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+Normative References
+
+   [iSCSI]      Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M.,
+                and E. Zeidner, "Internet Small Computer Systems
+                Interface (iSCSI)", RFC 3720, April 2004.
+
+   [RFC1191]    Mogul, J. and S. Deering, "Path MTU discovery", RFC
+                1191, November 1990.
+
+   [RFC2018]    Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
+                Selective Acknowledgment Options", RFC 2018, October
+                1996.
+
+   [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate
+                Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [RFC2401]    Kent, S. and R. Atkinson, "Security Architecture for the
+                Internet Protocol", RFC 2401, November 1998.
+
+   [RFC3723]    Aboba, B., Tseng, J., Walker, J., Rangan, V., and F.
+                Travostino, "Securing Block Storage Protocols over IP",
+                RFC 3723, April 2004.
+
+   [RFC793]     Postel, J., "Transmission Control Protocol", STD 7, RFC
+                793, September 1981.
+
+   [RDMASEC]    Pinkerton, J. and E. Deleganes, "Direct Data Placement
+                Protocol (DDP) / Remote Direct Memory Access Protocol
+                (RDMAP) Security", RFC 5042, October 2007.
+
+Informative References
+
+   [APPL]       Bestler, C. and L. Coene, "Applicability of Remote
+                Direct Memory Access Protocol (RDMA) and Direct Data
+                Placement (DDP)", RFC 5045, October 2007.
+
+   [CRCTCP]     Stone J., Partridge, C., "When the CRC and TCP checksum
+                disagree", ACM Sigcomm, Sept. 2000.
+
+   [DAT-API]    DAT Collaborative, "kDAPL (Kernel Direct Access
+                Programming Library) and uDAPL (User Direct Access
+                Programming Library)", Http://www.datcollaborative.org.
+
+   [DDP]        Shah, H., Pinkerton, J., Recio, R., and P. Culley,
+                "Direct Data Placement over Reliable Transports", RFC
+                5041, October 2007.
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 68]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   [iSER]       Ko, M., Chadalapaka, M., Hufferd, J., Elzur, U., Shah,
+                H., and P. Thaler, "Internet Small Computer System
+                Interface (iSCSI) Extensions for Remote Direct Memory
+                Access (RDMA)" RFC 5046, October 2007.
+
+   [IT-API]     The Open Group, "Interconnect Transport API (IT-API)"
+                Version 2.1, http://www.opengroup.org.
+
+   [NFSv4CHAN]  Williams, N., "On the Use of Channel Bindings to Secure
+                Channels", Work in Progress, June 2006.
+
+   [RDMA-DDP]   "Direct Data Placement over Reliable Transports (Version
+                1.0)", RDMA Consortium, October 2002,
+                <http://www.rdmaconsortium.org/home/draft-shah-iwarp-
+                ddp-v1.0.pdf>.
+
+   [RDMA-MPA]   "Marker PDU Aligned Framing for TCP Specification
+                (Version 1.0)", RDMA Consortium, October 2002,
+                <http://www.rdmaconsortium.org/home/draft-culley-iwarp-
+                mpa-v1.0.pdf>.
+
+   [RDMA-RDMAC] "An RDMA Protocol Specification (Version 1.0)", RDMA
+                Consortium, October 2002,
+                <http://www.rdmaconsortium.org/home/draft-recio-iwarp-
+                rdmac-v1.0.pdf>.
+
+   [RDMAP]      Recio, R., Culley, P., Garcia, D., Hilland, J., and B.
+                Metzler, "A Remote Direct Memory Access Protocol
+                Specification", RFC 5040, October 2007.
+
+   [RFC792]     Postel, J., "Internet Control Message Protocol", STD 5,
+                RFC 792, September 1981.
+
+   [RFC896]     Nagle, J., "Congestion control in IP/TCP internetworks",
+                RFC 896, January 1984.
+
+   [RFC1122]    Braden, R., "Requirements for Internet Hosts -
+                Communication Layers", STD 3, RFC 1122, October 1989.
+
+   [RFC4960]    Stewart, R., Ed., "Stream Control Transmission
+                Protocol", RFC 4960, September 2007.
+
+   [RFC4296]    Bailey, S. and T. Talpey, "The Architecture of Direct
+                Data Placement (DDP) and Remote Direct Memory Access
+                (RDMA) on Internet Protocols", RFC 4296, December 2005.
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 69]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   [RFC4297]    Romanow, A., Mogul, J., Talpey, T., and S. Bailey,
+                "Remote Direct Memory Access (RDMA) over IP Problem
+                Statement", RFC 4297, December 2005.
+
+   [RFC4301]    Kent, S. and K. Seo, "Security Architecture for the
+                Internet Protocol", RFC 4301, December 2005.
+
+   [VERBS-RMDA] "RDMA Protocol Verbs Specification", RDMA Consortium
+                standard, April 2003, <http://www.rdmaconsortium.org/
+                home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf>.
+
+Contributors
+
+   Dwight Barron
+   Hewlett-Packard Company
+   20555 SH 249
+   Houston, TX 77070-2698 USA
+   Phone: 281-514-2769
+   EMail: dwight.barron@hp.com
+
+   Jeff Chase
+   Department of Computer Science
+   Duke University
+   Durham, NC 27708-0129 USA
+   Phone: +1 919 660 6559
+   EMail: chase@cs.duke.edu
+
+   Ted Compton
+   EMC Corporation
+   Research Triangle Park, NC 27709 USA
+   Phone: 919-248-6075
+   EMail: compton_ted@emc.com
+
+   Dave Garcia
+   24100 Hutchinson Rd.
+   Los Gatos, CA  95033
+   Phone: 831 247 4464
+   EMail: Dave.Garcia@StanfordAlumni.org
+
+   Hari Ghadia
+   Gen10 Technology, Inc.
+   1501 W Shady Grove Road
+   Grand Prairie, TX 75050
+   Phone: (972) 301 3630
+   EMail: hghadia@gen10technology.com
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 70]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Howard C. Herbert
+   Intel Corporation
+   MS CH7-404
+   5000 West Chandler Blvd.
+   Chandler, AZ 85226
+   Phone: 480-554-3116
+   EMail: howard.c.herbert@intel.com
+
+   Jeff Hilland
+   Hewlett-Packard Company
+   20555 SH 249
+   Houston, TX 77070-2698 USA
+   Phone: 281-514-9489
+   EMail: jeff.hilland@hp.com
+
+   Mike Ko
+   IBM
+   650 Harry Rd.
+   San Jose, CA 95120
+   Phone: (408) 927-2085
+   EMail: mako@us.ibm.com
+
+   Mike Krause
+   Hewlett-Packard Corporation, 43LN
+   19410 Homestead Road
+   Cupertino, CA 95014 USA
+   Phone: +1 (408) 447-3191
+   EMail: krause@cup.hp.com
+
+   Dave Minturn
+   Intel Corporation
+   MS JF1-210
+   5200 North East Elam Young Parkway
+   Hillsboro, Oregon  97124
+   Phone: 503-712-4106
+   EMail: dave.b.minturn@intel.com
+
+   Jim Pinkerton
+   Microsoft, Inc.
+   One Microsoft Way
+   Redmond, WA 98052 USA
+   EMail: jpink@microsoft.com
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 71]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+   Hemal Shah
+   Broadcom Corporation
+   5300 California Avenue
+   Irvine, CA 92617 USA
+   Phone: +1 (949) 926-6941
+   EMail: hemal@broadcom.com
+
+   Allyn Romanow
+   Cisco Systems
+   170 W Tasman Drive
+   San Jose, CA 95134 USA
+   Phone: +1 408 525 8836
+   EMail: allyn@cisco.com
+
+   Tom Talpey
+   Network Appliance
+   1601 Trapelo Road #16
+   Waltham, MA  02451 USA
+   Phone: +1 (781) 768-5329
+   EMail: thomas.talpey@netapp.com
+
+   Patricia Thaler
+   Broadcom
+   16215 Alton Parkway
+   Irvine, CA 92618
+   Phone: 916 570 2707
+   EMail: pthaler@broadcom.com
+
+   Jim Wendt
+   Hewlett Packard Corporation
+   8000 Foothills Boulevard MS 5668
+   Roseville, CA 95747-5668 USA
+   Phone: +1 916 785 5198
+   EMail: jim_wendt@hp.com
+
+   Jim Williams
+   Emulex Corporation
+   580 Main Street
+   Bolton, MA 01740 USA
+   Phone: +1 978 779 7224
+   EMail: jim.williams@emulex.com
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 72]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+Authors' Addresses
+
+   Paul R. Culley
+   Hewlett-Packard Company
+   20555 SH 249
+   Houston, TX 77070-2698 USA
+   Phone: 281-514-5543
+   EMail: paul.culley@hp.com
+
+   Uri Elzur
+   5300 California Avenue
+   Irvine, CA 92617, USA
+   Phone: 949.926.6432
+   EMail: uri@broadcom.com
+
+   Renato J Recio
+   IBM
+   Internal Zip 9043
+   11400 Burnett Road
+   Austin, Texas 78759
+   Phone: 512-838-3685
+   EMail: recio@us.ibm.com
+
+   Stephen Bailey
+   Sandburst Corporation
+   600 Federal Street
+   Andover, MA 01810 USA
+   Phone: +1 978 689 1614
+   EMail: steph@sandburst.com
+
+   John Carrier
+   Cray Inc.
+   411 First Avenue S, Suite 600
+   Seattle, WA 98104-2860
+   Phone: 206-701-2090
+   EMail: carrier@cray.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 73]
+
+RFC 5044                  MPA Framing for TCP               October 2007
+
+
+Full Copyright Statement
+
+   Copyright (C) The IETF Trust (2007).
+
+   This document is subject to the rights, licenses and restrictions
+   contained in BCP 78, and except as set forth therein, the authors
+   retain all their rights.
+
+   This document and the information contained herein are provided on an
+   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
+   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
+   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
+   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+   The IETF takes no position regarding the validity or scope of any
+   Intellectual Property Rights or other rights that might be claimed to
+   pertain to the implementation or use of the technology described in
+   this document or the extent to which any license under such rights
+   might or might not be available; nor does it represent that it has
+   made any independent effort to identify any such rights.  Information
+   on the procedures with respect to rights in RFC documents can be
+   found in BCP 78 and BCP 79.
+
+   Copies of IPR disclosures made to the IETF Secretariat and any
+   assurances of licenses to be made available, or the result of an
+   attempt made to obtain a general license or permission for the use of
+   such proprietary rights by implementers or users of this
+   specification can be obtained from the IETF on-line IPR repository at
+   http://www.ietf.org/ipr.
+
+   The IETF invites any interested party to bring to its attention any
+   copyrights, patents or patent applications, or other proprietary
+   rights that may cover technology that may be required to implement
+   this standard.  Please address the information to the IETF at
+   ietf-ipr@ietf.org.
+
+
+
+
+
+
+
+
+
+
+
+
+Culley, et al.              Standards Track                    [Page 74]
+