diff options
Diffstat (limited to 'doc/rfc/rfc5044.txt')
| -rw-r--r-- | doc/rfc/rfc5044.txt | 4147 | 
1 files changed, 4147 insertions, 0 deletions
| diff --git a/doc/rfc/rfc5044.txt b/doc/rfc/rfc5044.txt new file mode 100644 index 0000000..075c8d5 --- /dev/null +++ b/doc/rfc/rfc5044.txt @@ -0,0 +1,4147 @@ + + + + + + +Network Working Group                                          P. Culley +Request for Comments: 5044                       Hewlett-Packard Company +Category: Standards Track                                       U. Elzur +                                                    Broadcom Corporation +                                                                R. Recio +                                                         IBM Corporation +                                                               S. Bailey +                                                   Sandburst Corporation +                                                              J. Carrier +                                                               Cray Inc. +                                                            October 2007 + + +            Marker PDU Aligned Framing for TCP Specification + +Status of This Memo + +   This document specifies an Internet standards track protocol for the +   Internet community, and requests discussion and suggestions for +   improvements.  Please refer to the current edition of the "Internet +   Official Protocol Standards" (STD 1) for the standardization state +   and status of this protocol.  Distribution of this memo is unlimited. + +Abstract + +   Marker PDU Aligned Framing (MPA) is designed to work as an +   "adaptation layer" between TCP and the Direct Data Placement protocol +   (DDP) as described in RFC 5041.  It preserves the reliable, in-order +   delivery of TCP, while adding the preservation of higher-level +   protocol record boundaries that DDP requires.  MPA is fully compliant +   with applicable TCP RFCs and can be utilized with existing TCP +   implementations.  MPA also supports integrated implementations that +   combine TCP, MPA and DDP to reduce buffering requirements in the +   implementation and improve performance at the system level. + + + + + + + + + + + + + + + + + +Culley, et al.              Standards Track                     [Page 1] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +Table of Contents + +   1. Introduction ....................................................4 +      1.1. Motivation .................................................4 +      1.2. Protocol Overview ..........................................5 +   2. Glossary ........................................................8 +   3. MPA's Interactions with DDP ....................................11 +   4. MPA Full Operation Phase .......................................13 +      4.1. FPDU Format ...............................................13 +      4.2. Marker Format .............................................14 +      4.3. MPA Markers ...............................................14 +      4.4. CRC Calculation ...........................................16 +      4.5. FPDU Size Considerations ..................................21 +   5. MPA's interactions with TCP ....................................22 +      5.1. MPA transmitters with a standard layered TCP ..............22 +      5.2. MPA receivers with a standard layered TCP .................23 +   6. MPA Receiver FPDU Identification ...............................24 +   7. Connection Semantics ...........................................24 +      7.1. Connection Setup ..........................................24 +           7.1.1. MPA Request and Reply Frame Format .................26 +           7.1.2. Connection Startup Rules ...........................28 +           7.1.3. Example Delayed Startup Sequence ...................30 +           7.1.4. Use of Private Data ................................33 +                  7.1.4.1. Motivation ................................33 +                  7.1.4.2. Example Immediate Startup Using +                           Private Data ..............................35 +           7.1.5. "Dual Stack" Implementations .......................37 +      7.2. Normal Connection Teardown ................................38 +   8. Error Semantics ................................................39 +   9. Security Considerations ........................................40 +      9.1. Protocol-Specific Security Considerations .................40 +           9.1.1. Spoofing ...........................................40 +                  9.1.1.1. Impersonation .............................41 +                  9.1.1.2. Stream Hijacking ..........................41 +                  9.1.1.3. Man-in-the-Middle Attack ..................41 +           9.1.2. Eavesdropping ......................................42 +      9.2. Introduction to Security Options ..........................42 +      9.3. Using IPsec with MPA ......................................43 +      9.4. Requirements for IPsec Encapsulation of MPA/DDP ...........43 +   10. IANA Considerations ...........................................44 +   Appendix A. Optimized MPA-Aware TCP Implementations ...............45 +      A.1. Optimized MPA/TCP Transmitters ............................46 +      A.2. Effects of Optimized MPA/TCP Segmentation .................46 +      A.3. Optimized MPA/TCP Receivers ...............................48 +      A.4. Re-segmenting Middleboxes and Non-Optimized MPA/TCP +           Senders ...................................................49 +      A.5. Receiver Implementation ...................................50 +           A.5.1. Network Layer Reassembly Buffers ...................51 + + + +Culley, et al.              Standards Track                     [Page 2] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +           A.5.2. TCP Reassembly Buffers .............................52 +   Appendix B. Analysis of MPA over TCP Operations ...................52 +      B.1. Assumptions ...............................................53 +           B.1.1. MPA Is Layered beneath DDP .........................53 +           B.1.2. MPA Preserves DDP Message Framing ..................53 +           B.1.3. The Size of the ULPDU Passed to MPA Is Less Than +                  EMSS Under Normal Conditions .......................53 +           B.1.4. Out-of-Order Placement but NO Out-of-Order Delivery.54 +     B.2.  The Value of FPDU Alignment ...............................54 +           B.2.1. Impact of Lack of FPDU Alignment on the Receiver +                  Computational Load and Complexity ..................56 +           B.2.2. FPDU Alignment Effects on TCP Wire Protocol ........60 +   Appendix C. IETF Implementation Interoperability with RDMA +               Consortium Protocols ..................................62 +     C.1. Negotiated Parameters ......................................63 +     C.2. RDMAC RNIC and Non-Permissive IETF RNIC ....................64 +          C.2.1. RDMAC RNIC Initiator ................................65 +          C.2.2. Non-Permissive IETF RNIC Initiator ..................65 +          C.2.3. RDMAC RNIC and Permissive IETF RNIC .................65 +          C.2.4. RDMAC RNIC Initiator ................................66 +          C.2.5. Permissive IETF RNIC Initiator ......................67 +     C.3. Non-Permissive IETF RNIC and Permissive IETF RNIC ..........67 +   Normative References ..............................................68 +   Informative References ............................................68 +   Contributors ......................................................70 + +Table of Figures + +   Figure 1: ULP MPA TCP Layering .....................................5 +   Figure 2: FPDU Format .............................................13 +   Figure 3: Marker Format ...........................................14 +   Figure 4: Example FPDU Format with Marker .........................16 +   Figure 5: Annotated Hex Dump of an FPDU ...........................19 +   Figure 6: Annotated Hex Dump of an FPDU with Marker ...............20 +   Figure 7: Fully Layered Implementation ............................22 +   Figure 8: MPA Request/Reply Frame .................................26 +   Figure 9: Example Delayed Startup Negotiation .....................31 +   Figure 10: Example Immediate Startup Negotiation ..................35 +   Figure 11: Optimized MPA/TCP Implementation .......................45 +   Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream .....56 +   Figure 13: Aligned FPDU Placed Immediately after TCP Header .......58 +   Figure 14: Connection Parameters for the RNIC Types ...............63 +   Figure 15: MPA Negotiation between an RDMAC RNIC and a +              Non-Permissive IETF RNIC ...............................65 +   Figure 16: MPA Negotiation between an RDMAC RNIC and a Permissive +              IETF RNIC ..............................................66 +   Figure 17: MPA Negotiation between a Non-Permissive IETF RNIC and +              a Permissive IETF RNIC .................................67 + + + +Culley, et al.              Standards Track                     [Page 3] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +1.  Introduction + +   This section discusses the reason for creating MPA on TCP and a +   general overview of the protocol. + +1.1.  Motivation + +   The Direct Data Placement protocol [DDP], when used with TCP +   [RFC793], requires a mechanism to detect record boundaries.  The DDP +   records are referred to as Upper Layer Protocol Data Units by this +   document.  The ability to locate the Upper Layer Protocol Data Unit +   (ULPDU) boundary is useful to a hardware network adapter that uses +   DDP to directly place the data in the application buffer based on the +   control information carried in the ULPDU header.  This may be done +   without requiring that the packets arrive in order.  Potential +   benefits of this capability are the avoidance of the memory copy +   overhead and a smaller memory requirement for handling out-of-order +   or dropped packets. + +   Many approaches have been proposed for a generalized framing +   mechanism.  Some are probabilistic in nature and others are +   deterministic.  An example probabilistic approach is characterized by +   a detectable value embedded in the octet stream, with no method of +   preventing that value elsewhere within user data.  It is +   probabilistic because under some conditions the receiver may +   incorrectly interpret application data as the detectable value. +   Under these conditions, the protocol may fail with unacceptable +   frequency.  One deterministic approach is characterized by embedded +   controls at known locations in the octet stream.  Because the +   receiver can guarantee it will only examine the data stream at +   locations that are known to contain the embedded control, the +   protocol can never misinterpret application data as being embedded +   control data.  For unambiguous handling of an out-of-order packet, a +   deterministic approach is preferred. + +   The MPA protocol provides a framing mechanism for DDP running over +   TCP using the deterministic approach.  It allows the location of the +   ULPDU to be determined in the TCP stream even if the TCP segments +   arrive out of order. + + + + + + + + + + + + +Culley, et al.              Standards Track                     [Page 4] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +1.2.  Protocol Overview + +   The layering of PDUs with MPA is shown in Figure 1, below. + +               +------------------+ +               |     ULP client   | +               +------------------+  <- Consumer messages +               |        DDP       | +               +------------------+  <- ULPDUs +               |        MPA*      | +               +------------------+  <- FPDUs (containing ULPDUs) +               |        TCP*      | +               +------------------+  <- TCP Segments (containing FPDUs) +               |      IP etc.     | +               +------------------+ +                * These may be fully layered or optimized together. + +                       Figure 1: ULP MPA TCP Layering + +   MPA is described as an extra layer above TCP and below DDP.  The +   operation sequence is: + +   1.  A TCP connection is established by ULP action.  This is done +       using methods not described by this specification.  The ULP may +       exchange some amount of data in streaming mode prior to starting +       MPA, but is not required to do so. + +   2.  The Consumer negotiates the use of DDP and MPA at both ends of a +       connection.  The mechanisms to do this are not described in this +       specification.  The negotiation may be done in streaming mode, or +       by some other mechanism (such as a pre-arranged port number). + +   3.  The ULP activates MPA on each end in the Startup Phase, either as +       an Initiator or a Responder, as determined by the ULP.  This mode +       verifies the usage of MPA, specifies the use of CRC and Markers, +       and allows the ULP to communicate some additional data via a +       Private Data exchange.  See Section 7.1, Connection Setup, for +       more details on the startup process. + +   4.  At the end of the Startup Phase, the ULP puts MPA (and DDP) into +       Full Operation and begins sending DDP data as further described +       below.  In this document, DDP data chunks are called ULPDUs.  For +       a description of the DDP data, see [DDP]. + + + + + + + + +Culley, et al.              Standards Track                     [Page 5] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Following is a description of data transfer when MPA is in Full +   Operation. + +   1.  DDP determines the Maximum ULPDU (MULPDU) size by querying MPA +       for this value.  MPA derives this information from TCP or IP, +       when it is available, or chooses a reasonable value. + +   2.  DDP creates ULPDUs of MULPDU size or smaller, and hands them to +       MPA at the sender. + +   3.  MPA creates a Framed Protocol Data Unit (FPDU) by prepending a +       header, optionally inserting Markers, and appending a CRC field +       after the ULPDU and PAD (if any).  MPA delivers the FPDU to TCP. + +   4.  The TCP sender puts the FPDUs into the TCP stream.  If the sender +       is optimized MPA/TCP, it segments the TCP stream in such a way +       that a TCP Segment boundary is also the boundary of an FPDU.  TCP +       then passes each segment to the IP layer for transmission. + +   5.  The receiver may or may not be optimized.  If it is optimized +       MPA/TCP, it may separate passing the TCP payload to MPA from +       passing the TCP payload ordering information to MPA.  In either +       case, RFC-compliant TCP wire behavior is observed at both the +       sender and receiver. + +   6.  The MPA receiver locates and assembles complete FPDUs within the +       stream, verifies their integrity, and removes MPA Markers (when +       present), ULPDU_Length, PAD, and the CRC field. + +   7.  MPA then provides the complete ULPDUs to DDP.  MPA may also +       separate passing MPA payload to DDP from passing the MPA payload +       ordering information. + +   A fully layered MPA on TCP is implemented as a data stream ULP for +   TCP and is therefore RFC compliant. + +   An optimized DDP/MPA/TCP uses a TCP layer that potentially contains +   some additional behaviors as suggested in this document.  When +   DDP/MPA/TCP are cross-layer optimized, the behavior of TCP +   (especially sender segmentation) may change from that of the un- +   optimized implementation, but the changes are within the bounds +   permitted by the TCP RFC specifications, and will interoperate with +   an un-optimized TCP.  The additional behaviors are described in +   Appendix A and are not normative; they are described at a TCP +   interface layer as a convenience.  Implementations may achieve the +   described functionality using any method, including cross-layer +   optimizations between TCP, MPA, and DDP. + + + + +Culley, et al.              Standards Track                     [Page 6] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   An optimized DDP/MPA/TCP sender is able to segment the data stream +   such that TCP segments begin with FPDUs (FPDU Alignment).  This has +   significant advantages for receivers.  When segments arrive with +   aligned FPDUs, the receiver usually need not buffer any portion of +   the segment, allowing DDP to place it in its destination memory +   immediately, thus avoiding copies from intermediate buffers (DDP's +   reason for existence). + +   An optimized DDP/MPA/TCP receiver allows a DDP on MPA implementation +   to locate the start of ULPDUs that may be received out of order.  It +   also allows the implementation to determine if the entire ULPDU has +   been received.  As a result, MPA can pass out-of-order ULPDUs to DDP +   for immediate use.  This enables a DDP on MPA implementation to save +   a significant amount of intermediate storage by placing the ULPDUs in +   the right locations in the application buffers when they arrive, +   rather than waiting until full ordering can be restored. + +   The ability of a receiver to recover out-of-order ULPDUs is optional +   and declared to the transmitter during startup.  When the receiver +   declares that it does not support out-of-order recovery, the +   transmitter does not add the control information to the data stream +   needed for out-of-order recovery. + +   If the receiver is fully layered, then MPA receives a strictly +   ordered stream of data and does not deal with out-of-order ULPDUs. +   In this case, MPA passes each ULPDU to DDP when the last bytes arrive +   from TCP, along with the indication that they are in order. + +   MPA implementations that support recovery of out-of-order ULPDUs MUST +   support a mechanism to indicate the ordering of ULPDUs as the sender +   transmitted them and indicate when missing intermediate segments +   arrive.  These mechanisms allow DDP to reestablish record ordering +   and report Delivery of complete messages (groups of records). + +   MPA also addresses enhanced data integrity.  Some users of TCP have +   noted that the TCP checksum is not as strong as could be desired (see +   [CRCTCP]).  Studies such as [CRCTCP] have shown that the TCP checksum +   indicates segments in error at a much higher rate than the underlying +   link characteristics would indicate.  With these higher error rates, +   the chance that an error will escape detection, when using only the +   TCP checksum for data integrity, becomes a concern.  A stronger +   integrity check can reduce the chance of data errors being missed. + +   MPA includes a CRC check to increase the ULPDU data integrity to the +   level provided by other modern protocols, such as SCTP [RFC4960].  It +   is possible to disable this CRC check; however, CRCs MUST be enabled +   unless it is clear that the end-to-end connection through the network +   has data integrity at least as good as an MPA with CRC enabled (for + + + +Culley, et al.              Standards Track                     [Page 7] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   example, when IPsec is implemented end to end).  DDP's ULP expects +   this level of data integrity and therefore the ULP does not have to +   provide its own duplicate data integrity and error recovery for lost +   data. + +2.  Glossary + +   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", +   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this +   document are to be interpreted as described in [RFC2119]. + +   Consumer - the ULPs or applications that lie above MPA and DDP.  The +       Consumer is responsible for making TCP connections, starting MPA +       and DDP connections, and generally controlling operations. + +   CRC - Cyclic Redundancy Check. + +   Delivery - (Delivered, Delivers) - For MPA, Delivery is defined as +       the process of informing DDP that a particular PDU is ordered for +       use.  A PDU is Delivered in the exact order that it was sent by +       the original sender; MPA uses TCP's byte stream ordering to +       determine when Delivery is possible.  This is specifically +       different from "passing the PDU to DDP", which may generally +       occur in any order, while the order of Delivery is strictly +       defined. + +   EMSS - Effective Maximum Segment Size.  EMSS is the smaller of the +       TCP maximum segment size (MSS) as defined in RFC 793 [RFC793], +       and the current path Maximum Transmission Unit (MTU) [RFC1191]. + +   FPDU - Framed Protocol Data Unit.  The unit of data created by an MPA +       sender. + +   FPDU Alignment - The property that an FPDU is Header Aligned with the +       TCP segment, and the TCP segment includes an integer number of +       FPDUs.  A TCP segment with an FPDU Alignment allows immediate +       processing of the contained FPDUs without waiting on other TCP +       segments to arrive or combining with prior segments. + +   FPDU Pointer (FPDUPTR) - This field of the Marker is used to indicate +       the beginning of an FPDU. + +   Full Operation (Full Operation Phase) - After the completion of the +       Startup Phase, MPA begins exchanging FPDUs. + + + + + + + +Culley, et al.              Standards Track                     [Page 8] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Header Alignment - The property that a TCP segment begins with an +       FPDU.  The FPDU is Header Aligned when the FPDU header is exactly +       at the start of the TCP segment (right behind the TCP headers on +       the wire). + +   Initiator - The endpoint of a connection that sends the MPA Request +       Frame, i.e., the first to actually send data (which may not be +       the one that sends the TCP SYN). + +   Marker - A four-octet field that is placed in the MPA data stream at +       fixed octet intervals (every 512 octets). + +   MPA-aware TCP - A TCP implementation that is aware of the receiver +       efficiencies of MPA FPDU Alignment and is capable of sending TCP +       segments that begin with an FPDU. + +   MPA-enabled - MPA is enabled if the MPA protocol is visible on the +       wire.  When the sender is MPA-enabled, it is inserting framing +       and Markers.  When the receiver is MPA-enabled, it is +       interpreting framing and Markers. + +   MPA Request Frame - Data sent from the MPA Initiator to the MPA +       Responder during the Startup Phase. + +   MPA Reply Frame - Data sent from the MPA Responder to the MPA +       Initiator during the Startup Phase. + +   MPA - Marker-based ULP PDU Aligned Framing for TCP protocol.  This +       document defines the MPA protocol. + +   MULPDU - Maximum ULPDU.  The current maximum size of the record that +       is acceptable for DDP to pass to MPA for transmission. + +   Node - A computing device attached to one or more links of a network. +       A Node in this context does not refer to a specific application +       or protocol instantiation running on the computer.  A Node may +       consist of one or more MPA on TCP devices installed in a host +       computer. + +   PAD - A 1-3 octet group of zeros used to fill an FPDU to an exact +       modulo 4 size. + +   PDU - Protocol data unit + +   Private Data - A block of data exchanged between MPA endpoints during +       initial connection setup. + + + + + +Culley, et al.              Standards Track                     [Page 9] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Protection Domain - An RDMA concept (see [VERBS-RDMA] and [RDMASEC]) +       that ties use of various endpoint resources (memory access, etc.) +       to the specific RDMA/DDP/MPA connection. + +   RDDP - A suite of protocols including MPA, [DDP], [RDMAP], an overall +       security document [RDMASEC], a problem statement [RFC4297], an +       architecture document [RFC4296], and an applicability document +       [APPL]. + +   RDMA - Remote Direct Memory Access; a protocol that uses DDP and MPA +       to enable applications to transfer data directly from memory +       buffers.  See [RDMAP]. + +   Remote Peer - The MPA protocol implementation on the opposite end of +       the connection.  Used to refer to the remote entity when +       describing protocol exchanges or other interactions between two +       Nodes. + +   Responder - The connection endpoint that responds to an incoming MPA +       connection request (the MAP Request Frame).  This may not be the +       endpoint that awaited the TCP SYN. + +   Startup Phase - The initial exchanges of an MPA connection that +       serves to more fully identify MPA endpoints to each other and +       pass connection specific setup information to each other. + +   ULP - Upper Layer Protocol.  The protocol layer above the protocol +       layer currently being referenced.  The ULP for MPA is DDP [DDP]. + +   ULPDU - Upper Layer Protocol Data Unit.  The data record defined by +       the layer above MPA (DDP).  ULPDU corresponds to DDP's DDP +       segment. + +   ULPDU_Length - A field in the FPDU describing the length of the +       included ULPDU. + + + + + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 10] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +3.  MPA's Interactions with DDP + +   DDP requires MPA to maintain DDP record boundaries from the sender to +   the receiver.  When using MPA on TCP to send data, DDP provides +   records (ULPDUs) to MPA.  MPA will use the reliable transmission +   abilities of TCP to transmit the data, and will insert appropriate +   additional information into the TCP stream to allow the MPA receiver +   to locate the record boundary information. + +   As such, MPA accepts complete records (ULPDUs) from DDP at the sender +   and returns them to DDP at the receiver. + +   MPA MUST encapsulate the ULPDU such that there is exactly one ULPDU +   contained in one FPDU. + +   MPA over a standard TCP stack can usually provide FPDU Alignment with +   the TCP Header if the FPDU is equal to TCP's EMSS.  An optimized +   MPA/TCP stack can also maintain alignment as long as the FPDU is less +   than or equal to TCP's EMSS.  Since FPDU Alignment is generally +   desired by the receiver, DDP cooperates with MPA to ensure FPDUs' +   lengths do not exceed the EMSS under normal conditions.  This is done +   with the MULPDU mechanism. + +   MPA MUST provide information to DDP on the current maximum size of +   the record that is acceptable to send (MULPDU).  DDP SHOULD limit +   each record size to MULPDU.  The range of MULPDU values MUST be +   between 128 octets and 64768 octets, inclusive. + +   The sending DDP MUST NOT post a ULPDU larger than 64768 octets to +   MPA.  DDP MAY post a ULPDU of any size between one and 64768 octets; +   however, MPA is not REQUIRED to support a ULPDU Length that is +   greater than the current MULPDU. + +   While the maximum theoretical length supported by the MPA header +   ULPDU_Length field is 65535, TCP over IP requires the IP datagram +   maximum length to be 65535 octets.  To enable MPA to support FPDU +   Alignment, the maximum size of the FPDU must fit within an IP +   datagram.  Thus, the ULPDU limit of 64768 octets was derived by +   taking the maximum IP datagram length, subtracting from it the +   maximum total length of the sum of the IPv4 header, TCP header, IPv4 +   options, TCP options, and the worst-case MPA overhead, and then +   rounding the result down to a 128-octet boundary. + +   Note that MULPDU will be significantly smaller than the theoretical +   maximum in most implementations for most circumstances, due to link +   MTUs, use of extra headers such as required for IPsec, etc. + + + + + +Culley, et al.              Standards Track                    [Page 11] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   On receive, MPA MUST pass each ULPDU with its length to DDP when it +   has been validated. + +   If an MPA implementation supports passing out-of-order ULPDUs to DDP, +   the MPA implementation SHOULD: + +   *   Pass each ULPDU with its length to DDP as soon as it has been +       fully received and validated. + +   *   Provide a mechanism to indicate the ordering of ULPDUs as the +       sender transmitted them.  One possible mechanism might be +       providing the TCP sequence number for each ULPDU. + +   *   Provide a mechanism to indicate when a given ULPDU (and prior +       ULPDUs) are complete (Delivered to DDP).  One possible mechanism +       might be to allow DDP to see the current outgoing TCP ACK +       sequence number. + +   *   Provide an indication to DDP that the TCP has closed or has begun +       to close the connection (e.g., received a FIN). + +   MPA MUST provide the protocol version negotiated with its peer to +   DDP.  DDP will use this version to set the version in its header and +   to report the version to [RDMAP]. + + + + + + + + + + + + + + + + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 12] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +4.  MPA Full Operation Phase + +   The following sections describe the main semantics of the Full +   Operation Phase of MPA. + +4.1.  FPDU Format + +   MPA senders create FPDUs out of ULPDUs.  The format of an FPDU shown +   below MUST be used for all MPA FPDUs.  For purposes of clarity, +   Markers are not shown in Figure 2. + +       0                   1                   2                   3 +       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +      |          ULPDU_Length         |                               | +      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               + +      |                                                               | +      ~                                                               ~ +      ~                            ULPDU                              ~ +      |                                                               | +      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +      |                               |          PAD (0-3 octets)     | +      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +      |                             CRC                               | +      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +                           Figure 2: FPDU Format + +   ULPDU_Length: 16 bits (unsigned integer).  This is the number of +   octets of the contained ULPDU.  It does not include the length of the +   FPDU header itself, the pad, the CRC, or of any Markers that fall +   within the ULPDU.  The 16-bit ULPDU Length field is large enough to +   support the largest IP datagrams for IPv4 or IPv6. + +   PAD: The PAD field trails the ULPDU and contains between 0 and 3 +   octets of data.  The pad data MUST be set to zero by the sender and +   ignored by the receiver (except for CRC checking).  The length of the +   pad is set so as to make the size of the FPDU an integral multiple of +   four. + +   CRC: 32 bits.  When CRCs are enabled, this field contains a CRC32c +   check value, which is used to verify the entire contents of the FPDU, +   using CRC32c.  See Section 4.4, CRC Calculation.  When CRCs are not +   enabled, this field is still present, may contain any value, and MUST +   NOT be checked. + + + + + + +Culley, et al.              Standards Track                    [Page 13] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   The FPDU adds a minimum of 6 octets to the length of the ULPDU.  In +   addition, the total length of the FPDU will include the length of any +   Markers and from 0 to 3 pad octets added to round-up the ULPDU size. + +4.2.  Marker Format + +   The format of a Marker MUST be as specified in Figure 3: + +       0                   1                   2                   3 +       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +      |           RESERVED            |            FPDUPTR            | +      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +                          Figure 3: Marker Format + +   RESERVED: The Reserved field MUST be set to zero on transmit and +   ignored on receive (except for CRC calculation). + +   FPDUPTR: The FPDU Pointer is a relative pointer, 16 bits long, +   interpreted as an unsigned integer that indicates the number of +   octets in the TCP stream from the beginning of the ULPDU Length field +   to the first octet of the entire Marker.  The least significant two +   bits MUST always be set to zero at the transmitter, and the receivers +   MUST always treat these as zero for calculations. + +4.3.  MPA Markers + +   MPA Markers are used to identify the start of FPDUs when packets are +   received out of order.  This is done by locating the Markers at fixed +   intervals in the data stream (which is correlated to the TCP sequence +   number) and using the Marker value to locate the preceding FPDU +   start. + +   All MPA Markers are included in the containing FPDU CRC calculation +   (when both CRCs and Markers are in use). + +   The MPA receiver's ability to locate out-of-order FPDUs and pass the +   ULPDUs to DDP is implementation dependent.  MPA/DDP allows those +   receivers that are able to deal with out-of-order FPDUs in this way +   to require the insertion of Markers in the data stream.  When the +   receiver cannot deal with out-of-order FPDUs in this way, it may +   disable the insertion of Markers at the sender.  All MPA senders MUST +   be able to generate Markers when their use is declared by the +   opposing receiver (see Section 7.1, Connection Setup). + + + + + + +Culley, et al.              Standards Track                    [Page 14] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   When Markers are enabled, MPA senders MUST insert a Marker into the +   data stream at a 512-octet periodic interval in the TCP Sequence +   Number Space.  The Marker contains a 16-bit unsigned integer referred +   to as the FPDUPTR (FPDU Pointer). + +   If the FPDUPTR's value is non-zero, the FPDU Pointer is a 16-bit +   relative back-pointer.  FPDUPTR MUST contain the number of octets in +   the TCP stream from the beginning of the ULPDU Length field to the +   first octet of the Marker, unless the Marker falls between FPDUs. +   Thus, the location of the first octet of the previous FPDU header can +   be determined by subtracting the value of the given Marker from the +   current octet-stream sequence number (i.e., TCP sequence number) of +   the first octet of the Marker.  Note that this computation MUST take +   into account that the TCP sequence number could have wrapped between +   the Marker and the header. + +   An FPDUPTR value of 0x0000 is a special case -- it is used when the +   Marker falls exactly between FPDUs (between the preceding FPDU CRC +   field and the next FPDU's ULPDU Length field).  In this case, the +   Marker is considered to be contained in the following FPDU; the +   Marker MUST be included in the CRC calculation of the FPDU following +   the Marker (if CRCs are being generated or checked).  Thus, an +   FPDUPTR value of 0x0000 means that immediately following the Marker +   is an FPDU header (the ULPDU Length field). + +   Since all FPDUs are integral multiples of 4 octets, the bottom two +   bits of the FPDUPTR as calculated by the sender are zero.  MPA +   reserves these bits so they MUST be treated as zero for computation +   at the receiver. + +   When Markers are enabled (see Section 7.1, Connection Setup), the MPA +   Markers MUST be inserted immediately preceding the first FPDU of Full +   Operation Phase, and at every 512th octet of the TCP octet stream +   thereafter.  As a result, the first Marker has an FPDUPTR value of +   0x0000.  If the first Marker begins at octet sequence number +   SeqStart, then Markers are inserted such that the first octet of the +   Marker is at octet sequence number SeqNum if the remainder of (SeqNum +   - SeqStart) mod 512 is zero.  Note that SeqNum can wrap. + +   For example, if the TCP sequence number were used to calculate the +   insertion point of the Marker, the starting TCP sequence number is +   unlikely to be zero, and 512-octet multiples are unlikely to fall on +   a modulo 512 of zero.  If the MPA connection is started at TCP +   sequence number 11, then the 1st Marker will begin at 11, and +   subsequent Markers will begin at 523, 1035, etc. + + + + + + +Culley, et al.              Standards Track                    [Page 15] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   If an FPDU is large enough to contain multiple Markers, they MUST all +   point to the same point in the TCP stream: the first octet of the +   ULPDU Length field for the FPDU. + +   If a Marker interval contains multiple FPDUs (the FPDUs are small), +   the Marker MUST point to the start of the ULPDU Length field for the +   FPDU containing the Marker unless the Marker falls between FPDUs, in +   which case the Marker MUST be zero. + +   The following example shows an FPDU containing a Marker. + +   0                   1                   2                   3 +   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +   |       ULPDU Length (0x0010)   |                               | +   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               + +   |                                                               | +   +                                                               + +   |                         ULPDU (octets 0-9)                    | +   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +   |            (0x0000)           |        FPDU ptr (0x000C)      | +   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +   |                        ULPDU (octets 10-15)                   | +   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +   |                               |          PAD (2 octets:0,0)   | +   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +   |                              CRC                              | +   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +              Figure 4: Example FPDU Format with Marker + +   MPA Receivers MUST preserve ULPDU boundaries when passing data to +   DDP.  MPA Receivers MUST pass the ULPDU data and the ULPDU Length to +   DDP and not the Markers, headers, and CRC. + +4.4.  CRC Calculation + +   An MPA implementation MUST implement CRC support and MUST either: + +   (1)  always use CRCs; the MPA provider is not REQUIRED to support an +        administrator's request that CRCs not be used. + +        or + +   (2a) only indicate a preference not to use CRCs on the explicit +        request of the system administrator, via an interface not +        defined in this spec.  The default configuration for a +        connection MUST be to use CRCs. + + + +Culley, et al.              Standards Track                    [Page 16] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   (2b) disable CRC checking (and possibly generation) if both the local +        and remote endpoints indicate preference not to use CRCs. + +   An administrative decision to have a host request CRC suppression +   SHOULD NOT be made unless there is assurance that the TCP connection +   involved provides protection from undetected errors that is at least +   as strong as an end-to-end CRC32c.  End-to-end usage of an IPsec +   cryptographic integrity check is among the ways to provide such +   protection, and the use of channel bindings [NFSv4CHANNEL] by the ULP +   can provide a high level of assurance that the IPsec protection scope +   is end-to-end with respect to the ULP. + +   The process MUST be invisible to the ULP. + +   After receipt of an MPA startup declaration indicating that its peer +   requires CRCs, an MPA instance MUST continue generating and checking +   CRCs until the connection terminates.  If an MPA instance has +   declared that it does not require CRCs, it MUST turn off CRC checking +   immediately after receipt of an MPA mode declaration indicating that +   its peer also does not require CRCs.  It MAY continue generating +   CRCs.  See Section 7.1, Connection Setup, for details on the MPA +   startup. + +   When sending an FPDU, the sender MUST include a CRC field.  When CRCs +   are enabled, the CRC field in the MPA FPDU MUST be computed using the +   CRC32c polynomial in the manner described in the iSCSI Protocol +   [iSCSI] document for Header and Data Digests. + +   The fields which MUST be included in the CRC calculation when sending +   an FPDU are as follows: + +   1)  If a Marker does not immediately precede the ULPDU Length field, +       the CRC-32c is calculated from the first octet of the ULPDU +       Length field, through all the ULPDU and Markers (if present), to +       the last octet of the PAD (if present), inclusive.  If there is a +       Marker immediately following the PAD, the Marker is included in +       the CRC calculation for this FPDU. + +   2)  If a Marker immediately precedes the first octet of the ULPDU +       Length field of the FPDU, (i.e., the Marker fell between FPDUs, +       and thus is required to be included in the second FPDU), the +       CRC-32c is calculated from the first octet of the Marker, through +       the ULPDU Length header, through all the ULPDU and Markers (if +       present), to the last octet of the PAD (if present), inclusive. + +   3)  After calculating the CRC-32c, the resultant value is placed into +       the CRC field at the end of the FPDU. + + + + +Culley, et al.              Standards Track                    [Page 17] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   When an FPDU is received, and CRC checking is enabled, the receiver +   MUST first perform the following: + +   1)  Calculate the CRC of the incoming FPDU in the same fashion as +       defined above. + +   2)  Verify that the calculated CRC-32c value is the same as the +       received CRC-32c value found in the FPDU CRC field.  If not, the +       receiver MUST treat the FPDU as an invalid FPDU. + +   The procedure for handling invalid FPDUs is covered in Section 8, +   Error Semantics. + +   The following is an annotated hex dump of an example FPDU sent as the +   first FPDU on the stream.  As such, it starts with a Marker.  The +   FPDU contains a 42 octet ULPDU (an example DDP segment) which in turn +   contains 24 octets of the contained ULPDU, which is a data load that +   is all zeros.  The CRC32c has been correctly calculated and can be +   used as a reference.  See the [DDP] and [RDMAP] specification for +   definitions of the DDP Control field, Queue, MSN, MO, and Send Data. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 18] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +       Octet Contents  Annotation +       Count + +       0000    00      Marker: Reserved +       0001    00 +       0002    00      Marker: FPDUPTR +       0003    00 +       0004    00      ULPDU Length +       0005    2a +       0006    41      DDP Control Field, Send with Last flag set +       0007    43 +       0008    00      Reserved (DDP STag position with no STag) +       0009    00 +       000a    00 +       000b    00 +       000c    00      DDP Queue = 0 +       000d    00 +       000e    00 +       000f    00 +       0010    00      DDP MSN = 1 +       0011    00 +       0012    00 +       0013    01 +       0014    00      DDP MO = 0 +       0015    00 +       0016    00 +       0017    00 +       0018    00      DDP Send Data (24 octets of zeros) +       ... +       002f    00 +       0030    52      CRC32c +       0031    23 +       0032    99 +       0033    83 + +                  Figure 5: Annotated Hex Dump of an FPDU + + + + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 19] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +      The following is an example sent as the second FPDU of the stream +      where the first FPDU (which is not shown here) had a length of 492 +      octets and was also a Send to Queue 0 with Last Flag set.  This +      example contains a Marker. + +       Octet Contents  Annotation +       Count + +       01ec    00      Length +       01ed    2a +       01ee    41      DDP Control Field: Send with Last Flag set +       01ef    43 +       01f0    00      Reserved (DDP STag position with no STag) +       01f1    00 +       01f2    00 +       01f3    00 +       01f4    00      DDP Queue = 0 +       01f5    00 +       01f6    00 +       01f7    00 +       01f8    00      DDP MSN = 2 +       01f9    00 +       01fa    00 +       01fb    02 +       01fc    00      DDP MO = 0 +       01fd    00 +       01fe    00 +       01ff    00 +       0200    00      Marker: Reserved +       0201    00 +       0202    00      Marker: FPDUPTR +       0203    14 +       0204    00      DDP Send Data (24 octets of zeros) +       ... +       021b    00 +       021c    84      CRC32c +       021d    92 +       021e    58 +       021f    98 + +            Figure 6: Annotated Hex Dump of an FPDU with Marker + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 20] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +4.5.  FPDU Size Considerations + +   MPA defines the Maximum Upper Layer Protocol Data Unit (MULPDU) as +   the size of the largest ULPDU fitting in an FPDU.  For an empty TCP +   Segment, MULPDU is EMSS minus the FPDU overhead (6 octets) minus +   space for Markers and pad octets. + +       The maximum ULPDU Length for a single ULPDU when Markers are +       present MUST be computed as: + +       MULPDU = EMSS - (6 + 4 * Ceiling(EMSS / 512) + EMSS mod 4) + +   The formula above accounts for the worst-case number of Markers. + +       The maximum ULPDU Length for a single ULPDU when Markers are NOT +       present MUST be computed as: + +       MULPDU = EMSS - (6 + EMSS mod 4) + +   As a further optimization of the wire efficiency an MPA +   implementation MAY dynamically adjust the MULPDU (see Section 5 for +   latency and wire efficiency trade-offs).  When one or more FPDUs are +   already packed into a TCP Segment, MULPDU MAY be reduced accordingly. + +   DDP SHOULD provide ULPDUs that are as large as possible, but less +   than or equal to MULPDU. + +   If the TCP implementation needs to adjust EMSS to support MTU changes +   or changing TCP options, the MULPDU value is changed accordingly. + +   In certain rare situations, the EMSS may shrink below 128 octets in +   size.  If this occurs, the MPA on TCP sender MUST NOT shrink the +   MULPDU below 128 octets and is not required to follow the +   segmentation rules in Section 5.1 and Appendix A. + +   If one or more FPDUs are already packed into a TCP segment, such that +   the remaining room is less than 128 octets, MPA MUST NOT provide a +   MULPDU smaller than 128.  In this case, MPA would typically provide a +   MULPDU for the next full sized segment, but may still pack the next +   FPDU into the small remaining room, provide that the next FPDU is +   small enough to fit. + +   The value 128 is chosen as to allow DDP designers room for the DDP +   Header and some user data. + + + + + + + +Culley, et al.              Standards Track                    [Page 21] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +5.  MPA's interactions with TCP + +   The following sections describe MPA's interactions with TCP.  This +   section discusses using a standard layered TCP stack with MPA +   attached above a TCP socket.  Discussion of using an optimized MPA- +   aware TCP with an MPA implementation that takes advantage of the +   extra optimizations is done in Appendix A. + +                   +-----------------------------------+ +                   | +-----+       +-----------------+ | +                   | | MPA |       | Other Protocols | | +                   | +-----+       +-----------------+ | +                   |    ||                  ||         | +                   |  ----- socket API --------------  | +                   |            ||                     | +                   |         +-----+                   | +                   |         | TCP |                   | +                   |         +-----+                   | +                   |            ||                     | +                   |         +-----+                   | +                   |         | IP  |                   | +                   |         +-----+                   | +                   +-----------------------------------+ + +                   Figure 7: Fully Layered Implementation + +   The Fully layered implementation is described for completeness; +   however, the user is cautioned that the reduced probability of FPDU +   alignment when transmitting with this implementation will tend to +   introduce a higher overhead at optimized receivers.  In addition, the +   lack of out-of-order receive processing will significantly reduce the +   value of DDP/MPA by imposing higher buffering and copying overhead in +   the local receiver. + +5.1.  MPA transmitters with a standard layered TCP + +   MPA transmitters SHOULD calculate a MULPDU as described in Section +   4.5.  If the TCP implementation allows EMSS to be determined by MPA, +   that value should be used.  If the transmit side TCP implementation +   is not able to report the EMSS, MPA SHOULD use the current MTU value +   to establish a likely FPDU size, taking into account the various +   expected header sizes. + +   MPA transmitters SHOULD also use whatever facilities the TCP stack +   presents to cause the TCP transmitter to start TCP segments at FPDU +   boundaries.  Multiple FPDUs MAY be packed into a single TCP segment +   as determined by the EMSS calculation as long as they are entirely +   contained in the TCP segment. + + + +Culley, et al.              Standards Track                    [Page 22] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   For example, passing FPDU buffers sized to the current EMSS to the +   TCP socket and using the TCP_NODELAY socket option to disable the +   Nagle [RFC896] algorithm will usually result in many of the segments +   starting with an FPDU. + +   It is recognized that various effects can cause an FPDU Alignment to +   be lost.  Following are a few of the effects: + +   *   ULPDUs that are smaller than the MULPDU.  If these are sent in a +       continuous stream, FPDU Alignment will be lost.  Note that +       careful use of a dynamic MULPDU can help in this case; the MULPDU +       for future FPDUs can be adjusted to re-establish alignment with +       the segments based on the current EMSS. + +   *   Sending enough data that the TCP receive window limit is reached. +       TCP may send a smaller segment to exactly fill the receive +       window. + +   *   Sending data when TCP is operating up against the congestion +       window.  If TCP is not tracking the congestion window in +       segments, it may transmit a smaller segment to exactly fill the +       receive window. + +   *   Changes in EMSS due to varying TCP options, or changes in MTU. + +   If FPDU Alignment with TCP segments is lost for any reason, the +   alignment is regained after a break in transmission where the TCP +   send buffers are emptied.  Many usage models for DDP/MPA will include +   such breaks. + +   MPA receivers are REQUIRED to be able to operate correctly even if +   alignment is lost (see Section 6). + +5.2.  MPA receivers with a standard layered TCP + +   MPA receivers will get TCP data in the usual ordered stream.  The +   receivers MUST identify FPDU boundaries by using the ULPDU_LENGTH +   field, as described in Section 6.  Receivers MAY utilize markers to +   check for FPDU boundary consistency, but they are NOT required to +   examine the markers to determine the FPDU boundaries. + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 23] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +6.  MPA Receiver FPDU Identification + +   An MPA receiver MUST first verify the FPDU before passing the ULPDU +   to DDP.  To do this, the receiver MUST: + +   *   locate the start of the FPDU unambiguously, + +   *   verify its CRC (if CRC checking is enabled). + +   If the above conditions are true, the MPA receiver passes the ULPDU +   to DDP. + +   To detect the start of the FPDU unambiguously one of the following +   MUST be used: + +   1:  In an ordered TCP stream, the ULPDU Length field in the current +       FPDU when FPDU has a valid CRC, can be used to identify the +       beginning of the next FPDU. + +   2:  For optimized MPA/TCP receivers that support out-of-order +       reception of FPDUs (see Section 4.3, MPA Markers) a Marker can +       always be used to locate the beginning of an FPDU (in FPDUs with +       valid CRCs).  Since the location of the Marker is known in the +       octet stream (sequence number space), the Marker can always be +       found. + +   3:  Having found an FPDU by means of a Marker, an optimized MPA/TCP +       receiver can find following contiguous FPDUs by using the ULPDU +       Length fields (from FPDUs with valid CRCs) to establish the next +       FPDU boundary. + +   The ULPDU Length field (see Section 4) MUST be used to determine if +   the entire FPDU is present before forwarding the ULPDU to DDP. + +   CRC calculation is discussed in Section 4.4 above. + +7.  Connection Semantics + +7.1.  Connection Setup + +   MPA requires that the Consumer MUST activate MPA, and any TCP +   enhancements for MPA, on a TCP half connection at the same location +   in the octet stream at both the sender and the receiver.  This is +   required in order for the Marker scheme to correctly locate the +   Markers (if enabled) and to correctly locate the first FPDU. + +   MPA, and any TCP enhancements for MPA are enabled by the ULP in both +   directions at once at an endpoint. + + + +Culley, et al.              Standards Track                    [Page 24] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   This can be accomplished several ways, and is left up to DDP's ULP: + +   *   DDP's ULP MAY require DDP on MPA startup immediately after TCP +       connection setup.  This has the advantage that no streaming mode +       negotiation is needed.  An example of such a protocol is shown in +       Figure 10: Example Immediate Startup negotiation. + +       This may be accomplished by using a well-known port, or a service +       locator protocol to locate an appropriate port on which DDP on +       MPA is expected to operate. + +   *   DDP's ULP MAY negotiate the start of DDP on MPA sometime after a +       normal TCP startup, using TCP streaming data exchanges on the +       same connection.  The exchange establishes that DDP on MPA (as +       well as other ULPs) will be used, and exactly locates the point +       in the octet stream where MPA is to begin operation.  Note that +       such a negotiation protocol is outside the scope of this +       specification.  A simplified example of such a protocol is shown +       in Figure 9: Example Delayed Startup negotiation on page 33. + +   An MPA endpoint operates in two distinct phases. + +   The Startup Phase is used to verify correct MPA setup, exchange CRC +   and Marker configuration, and optionally pass Private Data between +   endpoints prior to completing a DDP connection.  During this phase, +   specifically formatted frames are exchanged as TCP byte streams +   without using CRCs or Markers.  During this phase a DDP endpoint need +   not be "bound" to the MPA connection.  In fact, the choice of DDP +   endpoint and its operating parameters may not be known until the +   Consumer supplied Private Data (if any) has been examined by the +   Consumer. + +   The second distinct phase is Full Operation during which FPDUs are +   sent using all the rules that pertain (CRCs, Markers, MULPDU +   restrictions, etc.).  A DDP endpoint MUST be "bound" to the MPA +   connection at entry to this phase. + +   When Private Data is passed between ULPs in the Startup Phase, the +   ULP is responsible for interpreting that data, and then placing MPA +   into Full Operation. + +   Note: The following text differentiates the two endpoints by calling +       them Initiator and Responder.  This is quite arbitrary and is NOT +       related to the TCP startup (SYN, SYN/ACK sequence).  The +       Initiator is the side that sends first in the MPA startup +       sequence (the MPA Request Frame). + + + + + +Culley, et al.              Standards Track                    [Page 25] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Note: The possibility that both endpoints would be allowed to make a +       connection at the same time, sometimes called an active/active +       connection, was considered by the work group and rejected.  There +       were several motivations for this decision.  One was that +       applications needing this facility were few (none other than +       theoretical at the time of this document).  Another was that the +       facility created some implementation difficulties, particularly +       with the "dual stack" designs described later on.  A last issue +       was that dealing with rejected connections at startup would have +       required at least an additional frame type, and more recovery +       actions, complicating the protocol.  While none of these issues +       was overwhelming, the group and implementers were not motivated +       to do the work to resolve these issues.  The protocol includes a +       method of detecting these active/active startup attempts so that +       they can be rejected and an error reported. + +   The ULP is responsible for determining which side is Initiator or +   Responder.  For client/server type ULPs, this is easy.  For peer-peer +   ULPs (which might utilize a TCP style active/active startup), some +   mechanism (not defined by this specification) must be established, or +   some streaming mode data exchanged prior to MPA startup to determine +   which side starts in Initiator and which starts in Responder MPA +   mode. + +7.1.1  MPA Request and Reply Frame Format + +       0                   1                   2                   3 +       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +   0  |                                                               | +      +         Key (16 bytes containing "MPA ID Req Frame")          + +   4  |      (4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65)        | +      +         Or  (16 bytes containing "MPA ID Rep Frame")          + +   8  |      (4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65)        | +      +                                                               + +   12 |                                                               | +      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +   16 |M|C|R| Res     |     Rev       |          PD_Length            | +      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +      |                                                               | +      ~                                                               ~ +      ~                   Private Data                                ~ +      |                                                               | +      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +      |                               | +      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +                     Figure 8: MPA Request/Reply Frame + + + +Culley, et al.              Standards Track                    [Page 26] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Key: This field contains the "key" used to validate that the sender +       is an MPA sender.  Initiator mode senders MUST set this field to +       the fixed value "MPA ID Req Frame" or (in byte order) 4D 50 41 20 +       49 44 20 52 65 71 20 46 72 61 6D 65 (in hexadecimal).  Responder +       mode receivers MUST check this field for the same value, and +       close the connection and report an error locally if any other +       value is detected.  Responder mode senders MUST set this field to +       the fixed value "MPA ID Rep Frame" or (in byte order) 4D 50 41 20 +       49 44 20 52 65 70 20 46 72 61 6D 65 (in hexadecimal).  Initiator +       mode receivers MUST check this field for the same value, and +       close the connection and report an error locally if any other +       value is detected. + +   M: This bit declares an endpoint's REQUIRED Marker usage.  When this +       bit is '1' in an MPA Request Frame, the Initiator declares that +       Markers are REQUIRED in FPDUs sent from the Responder.  When set +       to '1' in an MPA Reply Frame, this bit declares that Markers are +       REQUIRED in FPDUs sent from the Initiator.  When in a received +       MPA Request Frame or MPA Reply Frame and the value is '0', +       Markers MUST NOT be added to the data stream by that endpoint. +       When '1' Markers MUST be added as described in Section 4.3, MPA +       Markers. + +   C: This bit declares an endpoint's preferred CRC usage.  When this +       field is '0' in the MPA Request Frame and the MPA Reply Frame, +       CRCs MUST not be checked and need not be generated by either +       endpoint.  When this bit is '1' in either the MPA Request Frame +       or MPA Reply Frame, CRCs MUST be generated and checked by both +       endpoints.  Note that even when not in use, the CRC field remains +       present in the FPDU.  When CRCs are not in use, the CRC field +       MUST be considered valid for FPDU checking regardless of its +       contents. + +   R: This bit is set to zero, and not checked on reception in the MPA +       Request Frame.  In the MPA Reply Frame, this bit is the Rejected +       Connection bit, set by the Responders ULP to indicate acceptance +       '0', or rejection '1', of the connection parameters provided in +       the Private Data. + +   Res: This field is reserved for future use.  It MUST be set to zero +       when sending, and not checked on reception. + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 27] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Rev: This field contains the revision of MPA.  For this version of +       the specification, senders MUST set this field to one.  MPA +       receivers compliant with this version of the specification MUST +       check this field.  If the MPA receiver cannot interoperate with +       the received version, then it MUST close the connection and +       report an error locally.  Otherwise, the MPA receiver should +       report the received version to the ULP. + +   PD_Length: This field MUST contain the length in octets of the +       Private Data field.  A value of zero indicates that there is no +       Private Data field present at all.  If the receiver detects that +       the PD_Length field does not match the length of the Private Data +       field, or if the length of the Private Data field exceeds 512 +       octets, the receiver MUST close the connection and report an +       error locally.  Otherwise, the MPA receiver should pass the +       PD_Length value and Private Data to the ULP. + +   Private Data: This field may contain any value defined by ULPs or may +       not be present.  The Private Data field MUST be between 0 and 512 +       octets in length.  ULPs define how to size, set, and validate +       this field within these limits.  Private Data usage is further +       discussed in Section 7.1.4. + +7.1.2.  Connection Startup Rules + +   The following rules apply to MPA connection Startup Phase: + +   1.  When MPA is started in the Initiator mode, the MPA implementation +       MUST send a valid MPA Request Frame.  The MPA Request Frame MAY +       include ULP-supplied Private Data. + +   2.  When MPA is started in the Responder mode, the MPA implementation +       MUST wait until an MPA Request Frame is received and validated +       before entering Full MPA/DDP Operation. + +       If the MPA Request Frame is improperly formatted, the +       implementation MUST close the TCP connection and exit MPA. + +       If the MPA Request Frame is properly formatted but the Private +       Data is not acceptable, the implementation SHOULD return an MPA +       Reply Frame with the Rejected Connection bit set to '1'; the MPA +       Reply Frame MAY include ULP-supplied Private Data; the +       implementation MUST exit MPA, leaving the TCP connection open. +       The ULP may close TCP or use the connection for other purposes. + +       If the MPA Request Frame is properly formatted and the Private +       Data is acceptable, the implementation SHOULD return an MPA Reply +       Frame with the Rejected Connection bit set to '0'; the MPA Reply + + + +Culley, et al.              Standards Track                    [Page 28] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +       Frame MAY include ULP-supplied Private Data; and the Responder +       SHOULD prepare to interpret any data received as FPDUs and pass +       any received ULPDUs to DDP. + +       Note: Since the receiver's ability to deal with Markers is +           unknown until the Request and Reply Frames have been +           received, sending FPDUs before this occurs is not possible. + + +       Note: The requirement to wait on a Request Frame before sending a +           Reply Frame is a design choice.  It makes for a well-ordered +           sequence of events at each end, and avoids having to specify +           how to deal with situations where both ends start at the same +           time. + +   3.  MPA Initiator mode implementations MUST receive and validate an +       MPA Reply Frame. + +       If the MPA Reply Frame is improperly formatted, the +       implementation MUST close the TCP connection and exit MPA. + +       If the MPA Reply Frame is properly formatted but is the Private +       Data is not acceptable, or if the Rejected Connection bit is set +       to '1', the implementation MUST exit MPA, leaving the TCP +       connection open.  The ULP may close TCP or use the connection for +       other purposes. + +       If the MPA Reply Frame is properly formatted and the Private Data +       is acceptable, and the Reject Connection bit is set to '0', the +       implementation SHOULD enter Full MPA/DDP Operation Phase; +       interpreting any received data as FPDUs and sending DDP ULPDUs as +       FPDUs. + +   4.  MPA Responder mode implementations MUST receive and validate at +       least one FPDU before sending any FPDUs or Markers. + +       Note: This requirement is present to allow the Initiator time to +           get its receiver into Full Operation before an FPDU arrives, +           avoiding potential race conditions at the Initiator.  This +           was also subject to some debate in the work group before +           rough consensus was reached.  Eliminating this requirement +           would allow faster startup in some types of applications. +           However, that would also make certain implementations +           (particularly "dual stack") much harder. + +   5.  If a received "Key" does not match the expected value (see +       Section 7.1.1, MPA Request and Reply Frame Format) the TCP/DDP +       connection MUST be closed, and an error returned to the ULP. + + + +Culley, et al.              Standards Track                    [Page 29] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   6.  The received Private Data fields may be used by Consumers at +       either end to further validate the connection and set up DDP or +       other ULP parameters.  The Initiator ULP MAY close the +       TCP/MPA/DDP connection as a result of validating the Private Data +       fields.  The Responder SHOULD return an MPA Reply Frame with the +       "Reject Connection" bit set to '1' if the validation of the +       Private Data is not acceptable to the ULP. + +   7.  When the first FPDU is to be sent, then if Markers are enabled, +       the first octets sent are the special Marker 0x00000000, followed +       by the start of the FPDU (the FPDU's ULPDU Length field).  If +       Markers are not enabled, the first octets sent are the start of +       the FPDU (the FPDU's ULPDU Length field). + +   8.  MPA implementations MUST use the difference between the MPA +       Request Frame and the MPA Reply Frame to check for incorrect +       "Initiator/Initiator" startups.  Implementations SHOULD put a +       timeout on waiting for the MPA Request Frame when started in +       Responder mode, to detect incorrect "Responder/Responder" +       startups. + +   9.  MPA implementations MUST validate the PD_Length field.  The +       buffer that receives the Private Data field MUST be large enough +       to receive that data; the amount of Private Data MUST not exceed +       the PD_Length or the application buffer.  If any of the above +       fails, the startup frame MUST be considered improperly formatted. + +   10. MPA implementations SHOULD implement a reasonable timeout while +       waiting for the entire set of startup frames; this prevents +       certain denial-of-service attacks.  ULPs SHOULD implement a +       reasonable timeout while waiting for FPDUs, ULPDUs, and +       application level messages to guard against application failures +       and certain denial-of-service attacks. + +7.1.3.  Example Delayed Startup Sequence + +   A variety of startup sequences are possible when using MPA on TCP. +   Following is an example of an MPA/DDP startup that occurs after TCP +   has been running for a while and has exchanged some amount of +   streaming data.  This example does not use any Private Data (an +   example that does is shown later in Section 7.1.4.2, Example +   Immediate Startup Using Private Data), although it is perfectly legal +   to include the Private Data.  Note that since the example does not +   use any Private Data, there are no ULP interactions shown between +   receiving "startup frames" and putting MPA into Full Operation. + + + + + + +Culley, et al.              Standards Track                    [Page 30] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +         Initiator                                 Responder + +  +---------------------------+ +  |ULP streaming mode         | +  |  <Hello> request to       | +  |  transition to DDP/MPA    |           +---------------------------+ +  |  mode (optional).         | --------> |ULP gets request;          | +  +---------------------------+           |  enables MPA Responder    | +                                          |  mode with last (optional)| +                                          |  streaming mode           | +                                          |  <Hello Ack> for MPA to   | +                                          |  send.                    | +  +---------------------------+           |MPA waits for incoming     | +  |ULP receives streaming     | <-------- |  <MPA Request Frame>.     | +  |  <Hello Ack>;             |           +---------------------------+ +  |Enters MPA Initiator mode; | +  |MPA sends                  | +  |  <MPA Request Frame>;     | +  |MPA waits for incoming     |           +---------------------------+ +  |  <MPA Reply Frame>.       | - - - - > |MPA receives               | +  +---------------------------+           |  <MPA Request Frame>.     | +                                          |Consumer binds DDP to MPA; | +                                          |MPA sends the              | +                                          |  <MPA Reply Frame>.       | +                                          |DDP/MPA enables FPDU       | +  +---------------------------+           |  decoding, but does not   | +  |MPA receives the           | < - - - - |  send any FPDUs.          | +  |  <MPA Reply Frame>        |           +---------------------------+ +  |Consumer binds DDP to MPA; | +  |DDP/MPA begins Full        | +  |  Operation.               | +  |MPA sends first FPDU (as   |           +---------------------------+ +  |  DDP ULPDUs become        | ========> |MPA receives first FPDU.   | +  |  available).              |           |MPA sends first FPDU (as   | +  +---------------------------+           |  DDP ULPDUs become        | +                                  <====== |  available).              | +                                          +---------------------------+ + +              Figure 9: Example Delayed Startup Negotiation + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 31] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   An example Delayed Startup sequence is described below: + +       *   Active and passive sides start up a TCP connection in the +           usual fashion, probably using sockets APIs.  They exchange +           some amount of streaming mode data.  At some point, one side +           (the MPA Initiator) sends streaming mode data that +           effectively says "Hello, let's go into MPA/DDP mode". + +   *   When the remote side (the MPA Responder) gets this streaming mode +       message, the Consumer would send a last streaming mode message +       that effectively says "I acknowledge your Hello, and am now in +       MPA Responder mode".  The exchange of these messages establishes +       the exact point in the TCP stream where MPA is enabled.  The +       Responding Consumer enables MPA in the Responder mode and waits +       for the initial MPA startup message. + +       *   The Initiating Consumer would enable MPA startup in the +           Initiator mode which then sends the MPA Request Frame.  It is +           assumed that no Private Data messages are needed for this +           example, although it is possible to do so.  The Initiating +           MPA (and Consumer) would also wait for the MPA connection to +           be accepted. + +   *   The Responding MPA would receive the initial MPA Request Frame +       and would inform the Consumer that this message arrived.  The +       Consumer can then accept the MPA/DDP connection or close the TCP +       connection. + +   *   To accept the connection request, the Responding Consumer would +       use an appropriate API to bind the TCP/MPA connections to a DDP +       endpoint, thus enabling MPA/DDP into Full Operation.  In the +       process of going to Full Operation, MPA sends the MPA Reply +       Frame.  MPA/DDP waits for the first incoming FPDU before sending +       any FPDUs. + +   *   If the initial TCP data was not a properly formatted MPA Request +       Frame, MPA will close or reset the TCP connection immediately. + +       *   The Initiating MPA would receive the MPA Reply Frame and +           would report this message to the Consumer.  The Consumer can +           then accept the MPA/DDP connection, or close or reset the TCP +           connection to abort the process. + +       *   On determining that the connection is acceptable, the +           Initiating Consumer would use an appropriate API to bind the +           TCP/MPA connections to a DDP endpoint thus enabling MPA/DDP +           into Full Operation.  MPA/DDP would begin sending DDP +           messages as MPA FPDUs. + + + +Culley, et al.              Standards Track                    [Page 32] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +7.1.4.  Use of Private Data + +   This section is advisory in nature, in that it suggests a method by +   which a ULP can deal with pre-DDP connection information exchange. + +7.1.4.1.  Motivation + +   Prior RDMA protocols have been developed that provide Private Data +   via out-of-band mechanisms.  As a result, many applications now +   expect some form of Private Data to be available for application use +   prior to setting up the DDP/RDMA connection.  Following are some +   examples of the use of Private Data. + +   An RDMA endpoint (referred to as a Queue Pair, or QP, in InfiniBand +   and the [VERBS-RDMA]) must be associated with a Protection Domain. +   No receive operations may be posted to the endpoint before it is +   associated with a Protection Domain.  Indeed under both the +   InfiniBand and proposed RDMA/DDP verbs [VERBS-RDMA] an endpoint/QP is +   created within a Protection Domain. + +   There are some applications where the choice of Protection Domain is +   dependent upon the identity of the remote ULP client.  For example, +   if a user session requires multiple connections, it is highly +   desirable for all of those connections to use a single Protection +   Domain.  Note: Use of Protection Domains is further discussed in +   [RDMASEC]. + +   InfiniBand, the DAT APIs [DAT-API], and the IT-API [IT-API] all +   provide for the active-side ULP to provide Private Data when +   requesting a connection.  This data is passed to the ULP to allow it +   to determine whether to accept the connection, and if so with which +   endpoint (and implicitly which Protection Domain). + +   The Private Data can also be used to ensure that both ends of the +   connection have configured their RDMA endpoints compatibly on such +   matters as the RDMA Read capacity (see [RDMAP]).  Further ULP- +   specific uses are also presumed, such as establishing the identity of +   the client. + +   Private Data is also allowed for when accepting the connection, to +   allow completion of any negotiation on RDMA resources and for other +   ULP reasons. + +   There are several potential ways to exchange this Private Data.  For +   example, the InfiniBand specification includes a connection +   management protocol that allows a small amount of Private Data to be +   exchanged using datagrams before actually starting the RDMA +   connection. + + + +Culley, et al.              Standards Track                    [Page 33] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   This document allows for small amounts of Private Data to be +   exchanged as part of the MPA startup sequence.  The actual Private +   Data fields are carried in the MPA Request Frame and the MPA Reply +   Frame. + +   If larger amounts of Private Data or more negotiation is necessary, +   TCP streaming mode messages may be exchanged prior to enabling MPA. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 34] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +7.1.4.2.  Example Immediate Startup Using Private Data + +          Initiator                                 Responder + +   +---------------------------+ +   |TCP SYN sent.              |           +--------------------------+ +   +---------------------------+ --------> |TCP gets SYN packet;      | +   +---------------------------+           |  sends SYN-Ack.          | +   |TCP gets SYN-Ack           | <-------- +--------------------------+ +   |  sends Ack.               | +   +---------------------------+ --------> +--------------------------+ +   +---------------------------+           |Consumer enables MPA      | +   |Consumer enables MPA       |           |Responder mode, waits for | +   |Initiator mode with        |           |  <MPA Request frame>.    | +   |Private Data; MPA sends    |           +--------------------------+ +   |  <MPA Request Frame>;     | +   |MPA waits for incoming     |           +--------------------------+ +   |  <MPA Reply Frame>.       | - - - - > |MPA receives              | +   +---------------------------+           |  <MPA Request Frame>.    | +                                           |Consumer examines Private | +                                           |Data, provides MPA with   | +                                           |return Private Data,      | +                                           |binds DDP to MPA, and     | +                                           |enables MPA to send an    | +                                           |  <MPA Reply Frame>.      | +                                           |DDP/MPA enables FPDU      | +   +---------------------------+           |decoding, but does not    | +   |MPA receives the           | < - - - - |send any FPDUs.           | +   |  <MPA Reply Frame>.       |           +--------------------------+ +   |Consumer examines Private  | +   |Data, binds DDP to MPA,    | +   |and enables DDP/MPA to     | +   |begin Full Operation.      | +   |MPA sends first FPDU (as   |           +--------------------------+ +   |DDP ULPDUs become          | ========> |MPA receives first FPDU.  | +   |available).                |           |MPA sends first FPDU (as  | +   +---------------------------+           |DDP ULPDUs become         | +                                   <====== |available).               | +                                           +--------------------------+ + +             Figure 10: Example Immediate Startup Negotiation + +   Note: The exact order of when MPA is started in the TCP connection +       sequence is implementation dependent; the above diagram shows one +       possible sequence.  Also, the Initiator "Ack" to the Responder's +       "SYN-Ack" may be combined into the same TCP segment containing +       the MPA Request Frame (as is allowed by TCP RFCs). + + + + +Culley, et al.              Standards Track                    [Page 35] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   The example immediate startup sequence is described below: + +   *   The passive side (Responding Consumer) would listen on the TCP +       destination port, to indicate its readiness to accept a +       connection. + +       *   The active side (Initiating Consumer) would request a +           connection from a TCP endpoint (that expected to upgrade to +           MPA/DDP/RDMA and expected the Private Data) to a destination +           address and port. + +       *   The Initiating Consumer would initiate a TCP connection to +           the destination port.  Acceptance/rejection of the connection +           would proceed as per normal TCP connection establishment. + +   *   The passive side (Responding Consumer) would receive the TCP +       connection request as usual allowing normal TCP gatekeepers, such +       as INETD and TCPserver, to exercise their normal +       safeguard/logging functions.  On acceptance of the TCP +       connection, the Responding Consumer would enable MPA in the +       Responder mode and wait for the initial MPA startup message. + +       *   The Initiating Consumer would enable MPA startup in the +           Initiator mode to send an initial MPA Request Frame with its +           included Private Data message to send.  The Initiating MPA +           (and Consumer) would also wait for the MPA connection to be +           accepted, and any returned Private Data. + +   *   The Responding MPA would receive the initial MPA Request Frame +       with the Private Data message and would pass the Private Data +       through to the Consumer.  The Consumer can then accept the +       MPA/DDP connection, close the TCP connection, or reject the MPA +       connection with a return message. + +   *   To accept the connection request, the Responding Consumer would +       use an appropriate API to bind the TCP/MPA connections to a DDP +       endpoint, thus enabling MPA/DDP into Full Operation.  In the +       process of going to Full Operation, MPA sends the MPA Reply +       Frame, which includes the Consumer-supplied Private Data +       containing any appropriate Consumer response.  MPA/DDP waits for +       the first incoming FPDU before sending any FPDUs. + +   *   If the initial TCP data was not a properly formatted MPA Request +       Frame, MPA will close or reset the TCP connection immediately. + + + + + + + +Culley, et al.              Standards Track                    [Page 36] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   *   To reject the MPA connection request, the Responding Consumer +       would send an MPA Reply Frame with any ULP-supplied Private Data +       (with reason for rejection), with the "Rejected Connection" bit +       set to '1', and may close the TCP connection. + +       *   The Initiating MPA would receive the MPA Reply Frame with the +           Private Data message and would report this message to the +           Consumer, including the supplied Private Data. + +           If the "Rejected Connection" bit is set to a '1', MPA will +           close the TCP connection and exit. + +           If the "Rejected Connection" bit is set to a '0', and on +           determining from the MPA Reply Frame Private Data that the +           connection is acceptable, the Initiating Consumer would use +           an appropriate API to bind the TCP/MPA connections to a DDP +           endpoint thus enabling MPA/DDP into Full Operation.  MPA/DDP +           would begin sending DDP messages as MPA FPDUs. + +7.1.5.  "Dual Stack" Implementations + +   MPA/DDP implementations are commonly expected to be implemented as +   part of a "dual stack" architecture.  One stack is the traditional +   TCP stack, usually with a sockets interface API (Application +   Programming Interface).  The second stack is the MPA/DDP stack with +   its own API, and potentially separate code or hardware to deal with +   the MPA/DDP data.  Of course, implementations may vary, so the +   following comments are of an advisory nature only. + +   The use of the two stacks offers advantages: + +       TCP connection setup is usually done with the TCP stack.  This +       allows use of the usual naming and addressing mechanisms.  It +       also means that any mechanisms used to "harden" the connection +       setup against security threats are also used when starting +       MPA/DDP. + +       Some applications may have been originally designed for TCP, but +       are "enhanced" to utilize MPA/DDP after a negotiation reveals the +       capability to do so.  The negotiation process takes place in +       TCP's streaming mode, using the usual TCP APIs. + +       Some new applications, designed for RDMA or DDP, still need to +       exchange some data prior to starting MPA/DDP.  This exchange can +       be of arbitrary length or complexity, but often consists of only +       a small amount of Private Data, perhaps only a single message. +       Using the TCP streaming mode for this exchange allows this to be +       done using well-understood methods. + + + +Culley, et al.              Standards Track                    [Page 37] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   The main disadvantage of using two stacks is the conversion of an +   active TCP connection between them.  This process must be done with +   care to prevent loss of data. + +   To avoid some of the problems when using a "dual stack" architecture, +   the following additional restrictions may be required by the +   implementation: + +   1.  Enabling the DDP/MPA stack SHOULD be done only when no incoming +       stream data is expected.  This is typically managed by the ULP +       protocol.  When following the recommended startup sequence, the +       Responder side enters DDP/MPA mode, sends the last streaming mode +       data, and then waits for the MPA Request Frame.  No additional +       streaming mode data is expected.  The Initiator side ULP receives +       the last streaming mode data, and then enters DDP/MPA mode. +       Again, no additional streaming mode data is expected. + +   2.  The DDP/MPA MAY provide the ability to send a "last streaming +       message" as part of its Responder DDP/MPA enable function.  This +       allows the DDP/MPA stack to more easily manage the conversion to +       DDP/MPA mode (and avoid problems with a very fast return of the +       MPA Request Frame from the Initiator side). + +   Note: Regardless of the "stack" architecture used, TCP's rules MUST +       be followed.  For example, if network data is lost, re-segmented, +       or re-ordered, TCP MUST recover appropriately even when this +       occurs while switching stacks. + +7.2.  Normal Connection Teardown + +   Each half connection of MPA terminates when DDP closes the +   corresponding TCP half connection. + +   A mechanism SHOULD be provided by MPA to DDP for DDP to be made aware +   that a graceful close of the TCP connection has been received by the +   TCP (e.g., FIN is received). + + + + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 38] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +8.  Error Semantics + +   The following errors MUST be detected by MPA and the codes SHOULD be +   provided to DDP or other Consumer: + +   Code Error + +   1   TCP connection closed, terminated, or lost.  This includes lost +       by timeout, too many retries, RST received, or FIN received. + +   2   Received MPA CRC does not match the calculated value for the +       FPDU. + +   3   In the event that the CRC is valid, received MPA Marker (if +       enabled) and ULPDU Length fields do not agree on the start of an +       FPDU.  If the FPDU start determined from previous ULPDU Length +       fields does not match with the MPA Marker position, MPA SHOULD +       deliver an error to DDP.  It may not be possible to make this +       check as a segment arrives, but the check SHOULD be made when a +       gap creating an out-of-order sequence is closed and any time a +       Marker points to an already identified FPDU.  It is OPTIONAL for +       a receiver to check each Marker, if multiple Markers are present +       in an FPDU, or if the segment is received in order. + +   4   Invalid MPA Request Frame or MPA Response Frame received.  In +       this case, the TCP connection MUST be immediately closed.  DDP +       and other ULPs should treat this similar to code 1, above. + +   When conditions 2 or 3 above are detected, an optimized MPA/TCP +   implementation MAY choose to silently drop the TCP segment rather +   than reporting the error to DDP.  In this case, the sending TCP will +   retry the segment, usually correcting the error, unless the problem +   was at the source.  In that case, the source will usually exceed the +   number of retries and terminate the connection. + +   Once MPA delivers an error of any type, it MUST NOT pass or deliver +   any additional FPDUs on that half connection. + +   For Error codes 2 and 3, MPA MUST NOT close the TCP connection +   following a reported error.  Closing the connection is the +   responsibility of DDP's ULP. + +       Note that since MPA will not Deliver any FPDUs on a half +       connection following an error detected on the receive side of +       that connection, DDP's ULP is expected to tear down the +       connection.  This may not occur until after one or more last +       messages are transmitted on the opposite half connection.  This +       allows a diagnostic error message to be sent. + + + +Culley, et al.              Standards Track                    [Page 39] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +9.  Security Considerations + +   This section discusses the security considerations for MPA. + +9.1.  Protocol-Specific Security Considerations + +   The vulnerabilities of MPA to third-party attacks are no greater than +   any other protocol running over TCP.  A third party, by sending +   packets into the network that are delivered to an MPA receiver, could +   launch a variety of attacks that take advantage of how MPA operates. +   For example, a third party could send random packets that are valid +   for TCP, but contain no FPDU headers.  An MPA receiver reports an +   error to DDP when any packet arrives that cannot be validated as an +   FPDU when properly located on an FPDU boundary.  A third party could +   also send packets that are valid for TCP, MPA, and DDP, but do not +   target valid buffers.  These types of attacks ultimately result in +   loss of connection and thus become a type of DOS (Denial Of Service) +   attack.  Communication security mechanisms such as IPsec [RFC2401, +   RFC4301] may be used to prevent such attacks. + +   Independent of how MPA operates, a third party could use ICMP +   messages to reduce the path MTU to such a small size that performance +   would likewise be severely impacted.  Range checking on path MTU +   sizes in ICMP packets may be used to prevent such attacks. + +   [RDMAP] and [DDP] are used to control, read, and write data buffers +   over IP networks.  Therefore, the control and the data packets of +   these protocols are vulnerable to the spoofing, tampering, and +   information disclosure attacks listed below.  In addition, connection +   to/from an unauthorized or unauthenticated endpoint is a potential +   problem with most applications using RDMA, DDP, and MPA. + +9.1.1.  Spoofing + +   Spoofing attacks can be launched by the Remote Peer or by a network +   based attacker.  A network-based spoofing attack applies to all +   Remote Peers.  Because the MPA Stream requires a TCP Stream in the +   ESTABLISHED state, certain types of traditional forms of wire attacks +   do not apply -- an end-to-end handshake must have occurred to +   establish the MPA Stream.  So, the only form of spoofing that applies +   is one when a remote node can both send and receive packets.  Yet +   even with this limitation the Stream is still exposed to the +   following spoofing attacks. + + + + + + + + +Culley, et al.              Standards Track                    [Page 40] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +9.1.1.1.  Impersonation + +   A network-based attacker can impersonate a legal MPA/DDP/RDMAP peer +   (by spoofing a legal IP address) and establish an MPA/DDP/RDMAP +   Stream with the victim.  End-to-end authentication (i.e., IPsec or +   ULP authentication) provides protection against this attack. + +9.1.1.2.  Stream Hijacking + +   Stream hijacking happens when a network-based attacker follows the +   Stream establishment phase, and waits until the authentication phase +   (if such a phase exists) is completed successfully.  He can then +   spoof the IP address and redirect the Stream from the victim to its +   own machine.  For example, an attacker can wait until an iSCSI +   authentication is completed successfully, and hijack the iSCSI +   Stream. + +   The best protection against this form of attack is end-to-end +   integrity protection and authentication, such as IPsec, to prevent +   spoofing.  Another option is to provide physical security. +   Discussion of physical security is out of scope for this document. + +9.1.1.3.  Man-in-the-Middle Attack + +   If a network-based attacker has the ability to delete, inject, +   replay, or modify packets that will still be accepted by MPA (e.g., +   TCP sequence number is correct, FPDU is valid, etc.), then the Stream +   can be exposed to a man-in-the-middle attack.  The attacker could +   potentially use the services of [DDP] and [RDMAP] to read the +   contents of the associated Data Buffer, to modify the contents of the +   associated Data Buffer, or to disable further access to the buffer. +   Other attacks on the connection setup sequence and even on TCP can be +   used to cause denial of service.  The only countermeasure for this +   form of attack is to either secure the MPA/DDP/RDMAP Stream (i.e., +   integrity protect) or attempt to provide physical security to prevent +   man-in-the-middle type attacks. + +   The best protection against this form of attack is end-to-end +   integrity protection and authentication, such as IPsec, to prevent +   spoofing or tampering.  If Stream or session level authentication and +   integrity protection are not used, then a man-in-the-middle attack +   can occur, enabling spoofing and tampering. + +   Another approach is to restrict access to only the local subnet/link +   and provide some mechanism to limit access, such as physical security +   or 802.1.x.  This model is an extremely limited deployment scenario +   and will not be further examined here. + + + + +Culley, et al.              Standards Track                    [Page 41] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +9.1.2.  Eavesdropping + +   Generally speaking, Stream confidentiality protects against +   eavesdropping.  Stream and/or session authentication and integrity +   protection are a counter measurement against various spoofing and +   tampering attacks.  The effectiveness of authentication and integrity +   against a specific attack depend on whether the authentication is +   machine-level authentication (as the one provided by IPsec) or ULP +   authentication. + +9.2.  Introduction to Security Options + +   The following security services can be applied to an MPA/DDP/RDMAP +   Stream: + +   1.  Session confidentiality - protects against eavesdropping. + +   2.  Per-packet data source authentication - protects against the +       following spoofing attacks: network-based impersonation, Stream +       hijacking, and man in the middle. + +   3.  Per-packet integrity - protects against tampering done by +       network-based modification of FPDUs (indirectly affecting buffer +       content through DDP services). + +   4.  Packet sequencing - protects against replay attacks, which is a +       special case of the above tampering attack. + +   If an MPA/DDP/RDMAP Stream may be subject to impersonation attacks, +   or Stream hijacking attacks, it is recommended that the Stream be +   authenticated, integrity protected, and protected from replay +   attacks.  It may use confidentiality protection to protect from +   eavesdropping (in case the MPA/DDP/RDMAP Stream traverses a public +   network). + +   IPsec is capable of providing the above security services for IP and +   TCP traffic. + +   ULP protocols may be able to provide part of the above security +   services.  See [NFSv4CHAN] for additional information on a promising +   approach called "channel binding".  From [NFSv4CHAN]: + +       "The concept of channel bindings allows applications to prove +       that the end-points of two secure channels at different network +       layers are the same by binding authentication at one channel to +       the session protection at the other channel.  The use of channel + + + + + +Culley, et al.              Standards Track                    [Page 42] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +       bindings allows applications to delegate session protection to +       lower layers, which may significantly improve performance for +       some applications." + +9.3.  Using IPsec with MPA + +   IPsec can be used to protect against the packet injection attacks +   outlined above.  Because IPsec is designed to secure individual IP +   packets, MPA can run above IPsec without change.  IPsec packets are +   processed (e.g., integrity checked and decrypted) in the order they +   are received, and an MPA receiver will process the decrypted FPDUs +   contained in these packets in the same manner as FPDUs contained in +   unsecured IP packets. + +   MPA implementations MUST implement IPsec as described in Section 9.4 +   below.  The use of IPsec is up to ULPs and administrators. + +9.4.  Requirements for IPsec Encapsulation of MPA/DDP + +   The IP Storage working group has spent significant time and effort to +   define the normative IPsec requirements for IP storage [RFC3723]. +   Portions of that specification are applicable to a wide variety of +   protocols, including the RDDP protocol suite.  In order not to +   replicate this effort, an MPA on TCP implementation MUST follow the +   requirements defined in RFC 3723, Sections 2.3 and 5, including the +   associated normative references for those sections. + +   Additionally, since IPsec acceleration hardware may only be able to +   handle a limited number of active Internet Key Exchange Protocol +   (IKE) Phase 2 security associations (SAs), Phase 2 delete messages +   MAY be sent for idle SAs, as a means of keeping the number of active +   Phase 2 SAs to a minimum.  The receipt of an IKE Phase 2 delete +   message MUST NOT be interpreted as a reason for tearing down a +   DDP/RDMA Stream.  Rather, it is preferable to leave the Stream up, +   and if additional traffic is sent on it, to bring up another IKE +   Phase 2 SA to protect it.  This avoids the potential for continually +   bringing Streams up and down. + +   The IPsec requirements for RDDP are based on the version of IPsec +   specified in RFC 2401 [RFC2401] and related RFCs, as profiled by RFC +   3723 [RFC3723], despite the existence of a newer version of IPsec +   specified in RFC 4301 [RFC4301] and related RFCs.  One of the +   important early applications of the RDDP protocols is their use with +   iSCSI [iSER]; RDDP's IPsec requirements follow those of IPsec in +   order to facilitate that usage by allowing a common profile of IPsec +   to be used with iSCSI and the RDDP protocols.  In the future, RFC + + + + + +Culley, et al.              Standards Track                    [Page 43] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   3723 may be updated to the newer version of IPsec; the IPsec security +   requirements of any such update should apply uniformly to iSCSI and +   the RDDP protocols. + +   Note that there are serious security issues if IPsec is not +   implemented end-to-end.  For example, if IPsec is implemented as a +   tunnel in the middle of the network, any hosts between the peer and +   the IPsec tunneling device can freely attack the unprotected Stream. + +10.  IANA Considerations + +   No IANA actions are required by this document. + +   If a well-known port is chosen as the mechanism to identify a DDP on +   MPA on TCP, the well-known port must be registered with IANA. +   Because the use of the port is DDP specific, registration of the port +   with IANA is left to DDP. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 44] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +Appendix A.  Optimized MPA-Aware TCP Implementations + +   This appendix is for information only and is NOT part of the +   standard. + +   This appendix covers some Optimized MPA-aware TCP implementation +   guidance to implementers.  It is intended for those implementations +   that want to send/receive as much traffic as possible in an aligned +   and zero-copy fashion. + +                   +-----------------------------------+ +                   | +-----------+ +-----------------+ | +                   | | Optimized | | Other Protocols | | +                   | |  MPA/TCP  | +-----------------+ | +                   | +-----------+        ||           | +                   |         \\     --- socket API --- | +                   |          \\          ||           | +                   |           \\      +-----+         | +                   |            \\     | TCP |         | +                   |             \\    +-----+         | +                   |              \\    //             | +                   |             +-------+             | +                   |             |  IP   |             | +                   |             +-------+             | +                   +-----------------------------------+ + +                Figure 11: Optimized MPA/TCP Implementation + +   The diagram above shows a block diagram of a potential +   implementation.  The network sub-system in the diagram can support +   traditional sockets-based connections using the normal API as shown +   on the right side of the diagram.  Connections for DDP/MPA/TCP are +   run using the facilities shown on the left side of the diagram. + +   The DDP/MPA/TCP connections can be started using the facilities shown +   on the left side using some suitable API, or they can be initiated +   using the facilities shown on the right side and transitioned to the +   left side at the point in the connection setup where MPA goes to +   "Full MPA/DDP Operation Phase" as described in Section 7.1.2. + +   The optimized MPA/TCP implementations (left side of diagram and +   described below) are only applicable to MPA.  All other TCP +   applications continue to use the standard TCP stacks and interfaces +   shown in the right side of the diagram. + + + + + + + +Culley, et al.              Standards Track                    [Page 45] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +A.1.  Optimized MPA/TCP Transmitters + +   The various TCP RFCs allow considerable choice in segmenting a TCP +   stream.  In order to optimize FPDU recovery at the MPA receiver, an +   optimized MPA/TCP implementation uses additional segmentation rules. + +   To provide optimum performance, an optimized MPA/TCP transmit side +   implementation should be enabled to: + +   *   With an EMSS large enough to contain the FPDU(s), segment the +       outgoing TCP stream such that the first octet of every TCP +       segment begins with an FPDU.  Multiple FPDUs may be packed into a +       single TCP segment as long as they are entirely contained in the +       TCP segment. + +   *   Report the current EMSS from the TCP to the MPA transmit layer. + +   There are exceptions to the above rule.  Once an ULPDU is provided to +   MPA, the MPA/TCP sender transmits it or fails the connection; it +   cannot be repudiated.  As a result, during changes in MTU and EMSS, +   or when TCP's Receive Window size (RWIN) becomes too small, it may be +   necessary to send FPDUs that do not conform to the segmentation rule +   above. + +   A possible, but less desirable, alternative is to use IP +   fragmentation on accepted FPDUs to deal with MTU reductions or +   extremely small EMSS. + +   Even when alignment with TCP segments is lost, the sender still +   formats the FPDU according to FPDU format as shown in Figure 2. + +   On a retransmission, TCP does not necessarily preserve original TCP +   segmentation boundaries.  This can lead to the loss of FPDU Alignment +   and containment within a TCP segment during TCP retransmissions.  An +   optimized MPA/TCP sender should try to preserve original TCP +   segmentation boundaries on a retransmission. + +A.2.  Effects of Optimized MPA/TCP Segmentation + +   Optimized MPA/TCP senders will fill TCP segments to the EMSS with a +   single FPDU when a DDP message is large enough.  Since the DDP +   message may not exactly fit into TCP segments, a "message tail" often +   occurs that results in an FPDU that is smaller than a single TCP +   segment.  Additionally, some DDP messages may be considerably shorter +   than the EMSS.  If a small FPDU is sent in a single TCP segment, the +   result is a "short" TCP segment. + + + + + +Culley, et al.              Standards Track                    [Page 46] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Applications expected to see strong advantages from Direct Data +   Placement include transaction-based applications and throughput +   applications.  Request/response protocols typically send one FPDU per +   TCP segment and then wait for a response.  Under these conditions, +   these "short" TCP segments are an appropriate and expected effect of +   the segmentation. + +   Another possibility is that the application might be sending multiple +   messages (FPDUs) to the same endpoint before waiting for a response. +   In this case, the segmentation policy would tend to reduce the +   available connection bandwidth by under-filling the TCP segments. + +   Standard TCP implementations often utilize the Nagle [RFC896] +   algorithm to ensure that segments are filled to the EMSS whenever the +   round-trip latency is large enough that the source stream can fully +   fill segments before ACKs arrive.  The algorithm does this by +   delaying the transmission of TCP segments until a ULP can fill a +   segment, or until an ACK arrives from the far side.  The algorithm +   thus allows for smaller segments when latencies are shorter to keep +   the ULP's end-to-end latency to reasonable levels. + +   The Nagle algorithm is not mandatory to use [RFC1122]. + +   When used with optimized MPA/TCP stacks, Nagle and similar algorithms +   can result in the "packing" of multiple FPDUs into TCP segments. + +   If a "message tail", small DDP messages, or the start of a larger DDP +   message are available, MPA may pack multiple FPDUs into TCP segments. +   When this is done, the TCP segments can be more fully utilized, but, +   due to the size constraints of FPDUs, segments may not be filled to +   the EMSS.  A dynamic MULPDU that informs DDP of the size of the +   remaining TCP segment space makes filling the TCP segment more +   effective. + +       Note that MPA receivers do more processing of a TCP segment that +       contains multiple FPDUs; this may affect the performance of some +       receiver implementations. + +   It is up to the ULP to decide if Nagle is useful with DDP/MPA.  Note +   that many of the applications expected to take advantage of MPA/DDP +   prefer to avoid the extra delays caused by Nagle.  In such scenarios, +   it is anticipated there will be minimal opportunity for packing at +   the transmitter and receivers may choose to optimize their +   performance for this anticipated behavior. + + + + + + + +Culley, et al.              Standards Track                    [Page 47] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Therefore, the application is expected to set TCP parameters such +   that it can trade off latency and wire efficiency.  Implementations +   should provide a connection option that disables Nagle for MPA/TCP +   similar to the way the TCP_NODELAY socket option is provided for a +   traditional sockets interface. + +   When latency is not critical, application is expected to leave Nagle +   enabled.  In this case, the TCP implementation may pack any available +   FPDUs into TCP segments so that the segments are filled to the EMSS. +   If the amount of data available is not enough to fill the TCP segment +   when it is prepared for transmission, TCP can send the segment partly +   filled, or use the Nagle algorithm to wait for the ULP to post more +   data. + +A.3.  Optimized MPA/TCP Receivers + +   When an MPA receive implementation and the MPA-aware receive side TCP +   implementation support handling out-of-order ULPDUs, the TCP receive +   implementation performs the following functions: + +   1)  The implementation passes incoming TCP segments to MPA as soon as +       they have been received and validated, even if not received in +       order.  The TCP layer commits to keeping each segment before it +       can be passed to the MPA.  This means that the segment must have +       passed the TCP, IP, and lower layer data integrity validation +       (i.e., checksum), must be in the receive window, must be part of +       the same epoch (if timestamps are used to verify this), and must +       have passed any other checks required by TCP RFCs. + +       This is not to imply that the data must be completely ordered +       before use.  An implementation can accept out-of-order segments, +       SACK them [RFC2018], and pass them to MPA immediately, before the +       reception of the segments needed to fill in the gaps.  MPA +       expects to utilize these segments when they are complete FPDUs or +       can be combined into complete FPDUs to allow the passing of +       ULPDUs to DDP when they arrive, independent of ordering.  DDP +       uses the passed ULPDU to "place" the DDP segments (see [DDP] for +       more details). + +       Since MPA performs a CRC calculation and other checks on received +       FPDUs, the MPA/TCP implementation ensures that any TCP segments +       that duplicate data already received and processed (as can happen +       during TCP retries) do not overwrite already received and +       processed FPDUs.  This avoids the possibility that duplicate data +       may corrupt already validated FPDUs. + + + + + + +Culley, et al.              Standards Track                    [Page 48] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   2)  The implementation provides a mechanism to indicate the ordering +       of TCP segments as the sender transmitted them.  One possible +       mechanism might be attaching the TCP sequence number to each +       segment. + +   3)  The implementation also provides a mechanism to indicate when a +       given TCP segment (and the prior TCP stream) is complete.  One +       possible mechanism might be to utilize the leading (left) edge of +       the TCP Receive Window. + +       MPA uses the ordering and completion indications to inform DDP +       when a ULPDU is complete; MPA Delivers the FPDU to DDP.  DDP uses +       the indications to "deliver" its messages to the DDP consumer +       (see [DDP] for more details). + +       DDP on MPA utilizes the above two mechanisms to establish the +       Delivery semantics that DDP's consumers agree to.  These +       semantics are described fully in [DDP].  These include +       requirements on DDP's consumer to respect ownership of buffers +       prior to the time that DDP delivers them to the Consumer. + +   The use of SACK [RFC2018] significantly improves network utilization +   and performance and is therefore recommended.  When combined with the +   out-of-order passing of segments to MPA and DDP, significant +   buffering and copying of received data can be avoided. + +A.4.  Re-Segmenting Middleboxes and Non-Optimized MPA/TCP Senders + +   Since MPA senders often start FPDUs on TCP segment boundaries, a +   receiving optimized MPA/TCP implementation may be able to optimize +   the reception of data in various ways. + +   However, MPA receivers MUST NOT depend on FPDU Alignment on TCP +   segment boundaries. + +   Some MPA senders may be unable to conform to the sender requirements +   because their implementation of TCP is not designed with MPA in mind. +   Even for optimized MPA/TCP senders, the network may contain +   "middleboxes" which modify the TCP stream by changing the +   segmentation.  This is generally interoperable with TCP and its users +   and MPA must be no exception. + +   The presence of Markers in MPA (when enabled) allows an optimized +   MPA/TCP receiver to recover the FPDUs despite these obstacles, +   although it may be necessary to utilize additional buffering at the +   receiver to do so. + + + + + +Culley, et al.              Standards Track                    [Page 49] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Some of the cases that a receiver may have to contend with are listed +   below as a reminder to the implementer: + +   *   A single aligned and complete FPDU, either in order or out of +       order:  This can be passed to DDP as soon as validated, and +       Delivered when ordering is established. + +   *   Multiple FPDUs in a TCP segment, aligned and fully contained, +       either in order or out of order:  These can be passed to DDP as +       soon as validated, and Delivered when ordering is established. + +   *   Incomplete FPDU: The receiver should buffer until the remainder +       of the FPDU arrives.  If the remainder of the FPDU is already +       available, this can be passed to DDP as soon as validated, and +       Delivered when ordering is established. + +   *   Unaligned FPDU start: The partial FPDU must be combined with its +       preceding portion(s).  If the preceding parts are already +       available, and the whole FPDU is present, this can be passed to +       DDP as soon as validated, and Delivered when ordering is +       established.  If the whole FPDU is not available, the receiver +       should buffer until the remainder of the FPDU arrives. + +   *   Combinations of unaligned or incomplete FPDUs (and potentially +       other complete FPDUs) in the same TCP segment:  If any FPDU is +       present in its entirety, or can be completed with portions +       already available, it can be passed to DDP as soon as validated, +       and Delivered when ordering is established. + +A.5.  Receiver Implementation + +   Transport & Network Layer Reassembly Buffers: + +   The use of reassembly buffers (either TCP reassembly buffers or IP +   fragmentation reassembly buffers) is implementation dependent.  When +   MPA is enabled, reassembly buffers are needed if out-of-order packets +   arrive and Markers are not enabled.  Buffers are also needed if FPDU +   alignment is lost or if IP fragmentation occurs.  This is because the +   incoming out-of-order segment may not contain enough information for +   MPA to process all of the FPDU.  For cases where a re-segmenting +   middlebox is present, or where the TCP sender is not optimized, the +   presence of Markers significantly reduces the amount of buffering +   needed. + +   Recovery from IP fragmentation is transparent to the MPA Consumers. + + + + + + +Culley, et al.              Standards Track                    [Page 50] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +A.5.1  Network Layer Reassembly Buffers + +   The MPA/TCP implementation should set the IP Don't Fragment bit at +   the IP layer.  Thus, upon a path MTU change, intermediate devices +   drop the IP datagram if it is too large and reply with an ICMP +   message that tells the source TCP that the path MTU has changed. +   This causes TCP to emit segments conformant with the new path MTU +   size.  Thus, IP fragments under most conditions should never occur at +   the receiver.  But it is possible. + +   There are several options for implementation of network layer +   reassembly buffers: + +   1.  drop any IP fragments, and reply with an ICMP message according +       to [RFC792] (fragmentation needed and DF set) to tell the Remote +       Peer to resize its TCP segment. + +   2.  support an IP reassembly buffer, but have it of limited size +       (possibly the same size as the local link's MTU).  The end node +       would normally never Advertise a path MTU larger than the local +       link MTU.  It is recommended that a dropped IP fragment cause an +       ICMP message to be generated according to RFC 792. + +   3.  multiple IP reassembly buffers, of effectively unlimited size. + +   4.  support an IP reassembly buffer for the largest IP datagram (64 +       KB). + +   5.  support for a large IP reassembly buffer that could span multiple +       IP datagrams. + +   An implementation should support at least 2 or 3 above, to avoid +   dropping packets that have traversed the entire fabric. + +   There is no end-to-end ACK for IP reassembly buffers, so there is no +   flow control on the buffer.  The only end-to-end ACK is a TCP ACK, +   which can only occur when a complete IP datagram is delivered to TCP. +   Because of this, under worst case, pathological scenarios, the +   largest IP reassembly buffer is the TCP receive window (to buffer +   multiple IP datagrams that have all been fragmented). + +   Note that if the Remote Peer does not implement re-segmentation of +   the data stream upon receiving the ICMP reply updating the path MTU, +   it is possible to halt forward progress because the opposite peer +   would continue to retransmit using a transport segment size that is +   too large.  This deadlock scenario is no different than if the fabric +   MTU (not last-hop MTU) was reduced after connection setup, and the +   remote node's behavior is not compliant with [RFC1122]. + + + +Culley, et al.              Standards Track                    [Page 51] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +A.5.2  TCP Reassembly Buffers + +   A TCP reassembly buffer is also needed.  TCP reassembly buffers are +   needed if FPDU Alignment is lost when using TCP with MPA or when the +   MPA FPDU spans multiple TCP segments.  Buffers are also needed if +   Markers are disabled and out-of-order packets arrive. + +   Since lost FPDU Alignment often means that FPDUs are incomplete, an +   MPA on TCP implementation must have a reassembly buffer large enough +   to recover an FPDU that is less than or equal to the MTU of the +   locally attached link (this should be the largest possible Advertised +   TCP path MTU).  If the MTU is smaller than 140 octets, a buffer of at +   least 140 octets long is needed to support the minimum FPDU size. +   The 140 octets allow for the minimum MULPDU of 128, 2 octets of pad, +   2 of ULPDU_Length, 4 of CRC, and space for a possible Marker.  As +   usual, additional buffering is likely to provide better performance. + +   Note that if the TCP segments were not stored, it would be possible +   to deadlock the MPA algorithm.  If the path MTU is reduced, FPDU +   Alignment requires the source TCP to re-segment the data stream to +   the new path MTU.  The source MPA will detect this condition and +   reduce the MPA segment size, but any FPDUs already posted to the +   source TCP will be re-segmented and lose FPDU Alignment.  If the +   destination does not support a TCP reassembly buffer, these segments +   can never be successfully transmitted and the protocol deadlocks. + +   When a complete FPDU is received, processing continues normally. + +Appendix B.  Analysis of MPA over TCP Operations + +   This appendix is for information only and is NOT part of the +   standard. + +   This appendix is an analysis of MPA on TCP and why it is useful to +   integrate MPA with TCP (with modifications to typical TCP +   implementations) to reduce overall system buffering and overhead. + +   One of MPA's high-level goals is to provide enough information, when +   combined with the Direct Data Placement Protocol [DDP], to enable +   out-of-order placement of DDP payload into the final Upper Layer +   Protocol (ULP) Buffer.  Note that DDP separates the act of placing +   data into a ULP Buffer from that of notifying the ULP that the ULP +   Buffer is available for use.  In DDP terminology, the former is +   defined as "Placement", and the later is defined as "Delivery".  MPA +   supports in-order Delivery of the data to the ULP, including support +   for Direct Data Placement in the final ULP Buffer location when TCP +   segments arrive out of order.  Effectively, the goal is to use the + + + + +Culley, et al.              Standards Track                    [Page 52] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   pre-posted ULP Buffers as the TCP receive buffer, where the +   reassembly of the ULP Protocol Data Unit (PDU) by TCP (with MPA and +   DDP) is done in place, in the ULP Buffer, with no data copies. + +   This appendix walks through the advantages and disadvantages of the +   TCP sender modifications proposed by MPA: + +   1) that MPA prefers that the TCP sender to do Header Alignment, where +      a TCP segment should begin with an MPA Framing Protocol Data Unit +      (FPDU) (if there is payload present). + +   2) that there be an integral number of FPDUs in a TCP segment (under +      conditions where the path MTU is not changing). + +   This appendix concludes that the scaling advantages of FPDU Alignment +   are strong, based primarily on fairly drastic TCP receive buffer +   reduction requirements and simplified receive handling.  The analysis +   also shows that there is little effect to TCP wire behavior. + +B.1.  Assumptions + +B.1.1  MPA Is Layered beneath DDP + +   MPA is an adaptation layer between DDP and TCP.  DDP requires +   preservation of DDP segment boundaries and a CRC32c digest covering +   the DDP header and data.  MPA adds these features to the TCP stream +   so that DDP over TCP has the same basic properties as DDP over SCTP. + +B.1.2.  MPA Preserves DDP Message Framing + +   MPA was designed as a framing layer specifically for DDP and was not +   intended as a general-purpose framing layer for any other ULP using +   TCP. + +   A framing layer allows ULPs using it to receive indications from the +   transport layer only when complete ULPDUs are present.  As a framing +   layer, MPA is not aware of the content of the DDP PDU, only that it +   has received and, if necessary, reassembled a complete PDU for +   Delivery to the DDP. + +B.1.3.  The Size of the ULPDU Passed to MPA Is Less Than EMSS under +        Normal Conditions + +   To make reception of a complete DDP PDU on every received segment +   possible, DDP passes to MPA a PDU that is no larger than the EMSS of +   the underlying fabric.  Each FPDU that MPA creates contains +   sufficient information for the receiver to directly place the ULP +   payload in the correct location in the correct receive buffer. + + + +Culley, et al.              Standards Track                    [Page 53] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Edge cases when this condition does not occur are dealt with, but do +   not need to be on the fast path. + +B.1.4.  Out-of-Order Placement but NO Out-of-Order Delivery + +   DDP receives complete DDP PDUs from MPA.  Each DDP PDU contains the +   information necessary to place its ULP payload directly in the +   correct location in host memory. + +   Because each DDP segment is self-describing, it is possible for DDP +   segments received out of order to have their ULP payload placed +   immediately in the ULP receive buffer. + +   Data delivery to the ULP is guaranteed to be in the order the data +   was sent.  DDP only indicates data delivery to the ULP after TCP has +   acknowledged the complete byte stream. + +B.2.  The Value of FPDU Alignment + +   Significant receiver optimizations can be achieved when Header +   Alignment and complete FPDUs are the common case.  The optimizations +   allow utilizing significantly fewer buffers on the receiver and less +   computation per FPDU.  The net effect is the ability to build a +   "flow-through" receiver that enables TCP-based solutions to scale to +   10G and beyond in an economical way.  The optimizations are +   especially relevant to hardware implementations of receivers that +   process multiple protocol layers -- Data Link Layer (e.g., Ethernet), +   Network and Transport Layer (e.g., TCP/IP), and even some ULP on top +   of TCP (e.g., MPA/DDP).  As network speed increases, there is an +   increasing desire to use a hardware-based receiver in order to +   achieve an efficient high performance solution. + +   A TCP receiver, under worst-case conditions, has to allocate buffers +   (BufferSizeTCP) whose capacities are a function of the bandwidth- +   delay product.  Thus: + +       BufferSizeTCP = K * bandwidth [octets/second] * Delay [seconds]. + +   Where bandwidth is the end-to-end bandwidth of the connection, delay +   is the round-trip delay of the connection, and K is an +   implementation-dependent constant. + +   Thus, BufferSizeTCP scales with the end-to-end bandwidth (10x more +   buffers for a 10x increase in end-to-end bandwidth).  As this +   buffering approach may scale poorly for hardware or software +   implementations alike, several approaches allow reduction in the +   amount of buffering required for high-speed TCP communication. + + + + +Culley, et al.              Standards Track                    [Page 54] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   The MPA/DDP approach is to enable the ULP's Buffer to be used as the +   TCP receive buffer.  If the application pre-posts a sufficient amount +   of buffering, and each TCP segment has sufficient information to +   place the payload into the right application buffer, when an out-of- +   order TCP segment arrives it could potentially be placed directly in +   the ULP Buffer.  However, placement can only be done when a complete +   FPDU with the placement information is available to the receiver, and +   the FPDU contents contain enough information to place the data into +   the correct ULP Buffer (e.g., there is a DDP header available). + +   For the case when the FPDU is not aligned with the TCP segment, it +   may take, on average, 2 TCP segments to assemble one FPDU. +   Therefore, the receiver has to allocate BufferSizeNAF (Buffer Size, +   Non-Aligned FPDU) octets: + +       BufferSizeNAF = K1* EMSS * number_of_connections + K2 * EMSS + +   Where K1 and K2 are implementation-dependent constants and EMSS is +   the effective maximum segment size. + +   For example, a 1 GB/sec link with 10,000 connections and an EMSS of +   1500 B would require 15 MB of memory.  Often the number of +   connections used scales with the network speed, aggravating the +   situation for higher speeds. + +   FPDU Alignment would allow the receiver to allocate BufferSizeAF +   (Buffer Size, Aligned FPDU) octets: + +       BufferSizeAF = K2 * EMSS + +   for the same conditions.  An FPDU Aligned receiver may require memory +   in the range of ~100s of KB -- which is feasible for an on-chip +   memory and enables a "flow-through" design, in which the data flows +   through the network interface card (NIC) and is placed directly in +   the destination buffer.  Assuming most of the connections support +   FPDU Alignment, the receiver buffers no longer scale with number of +   connections. + +   Additional optimizations can be achieved in a balanced I/O sub-system +   -- where the system interface of the network controller provides +   ample bandwidth as compared with the network bandwidth.  For almost +   twenty years this has been the case and the trend is expected to +   continue.  While Ethernet speeds have scaled by 1000 (from 10 +   megabit/sec to 10 gigabit/sec), I/O bus bandwidth of volume CPU +   architectures has scaled from ~2 MB/sec to ~2 GB/sec (PC-XT bus to +   PCI-X DDR).  Under these conditions, the FPDU Alignment approach +   allows BufferSizeAF to be indifferent to network speed.  It is +   primarily a function of the local processing time for a given frame. + + + +Culley, et al.              Standards Track                    [Page 55] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Thus, when the FPDU Alignment approach is used, receive buffering is +   expected to scale gracefully (i.e., less than linear scaling) as +   network speed is increased. + +B.2.1.  Impact of Lack of FPDU Alignment on the Receiver Computational +        Load and Complexity + +   The receiver must perform IP and TCP processing, and then perform +   FPDU CRC checks, before it can trust the FPDU header placement +   information.  For simplicity of the description, the assumption is +   that an FPDU is carried in no more than 2 TCP segments.  In reality, +   with no FPDU Alignment, an FPDU can be carried by more than 2 TCP +   segments (e.g., if the path MTU was reduced). + +   ----++-----------------------------++-----------------------++----- +   +---||---------------+    +--------||--------+   +----------||----+ +   |   TCP Seg X-1      |    |     TCP Seg X    |   |  TCP Seg X+1   | +   +---||---------------+    +--------||--------+   +----------||----+ +   ----++-----------------------------++-----------------------++----- +                   FPDU #N-1                  FPDU #N + +     Figure 12: Non-Aligned FPDU Freely Placed in TCP Octet Stream + +   The receiver algorithm for processing TCP segments (e.g., TCP segment +   #X in Figure 12) carrying non-aligned FPDUs (in order or out of +   order) includes: + +   Data Link Layer processing (whole frame) -- typically including a CRC +   calculation. + +       1.  Network Layer processing (assuming not an IP fragment, the +           whole Data Link Layer frame contains one IP datagram.  IP +           fragments should be reassembled in a local buffer.  This is +           not a performance optimization goal.) + +       2.  Transport Layer processing -- TCP protocol processing, header +           and checksum checks. + +           a.  Classify incoming TCP segment using the 5 tuple (IP SRC, +               IP DST, TCP SRC Port, TCP DST Port, protocol). + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 56] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +       3.  Find FPDU message boundaries. + +           a.  Get MPA state information for the connection. + +               If the TCP segment is in order, use the receiver-managed +               MPA state information to calculate where the previous +               FPDU message (#N-1) ends in the current TCP segment X. +               (previously, when the MPA receiver processed the first +               part of FPDU #N-1, it calculated the number of bytes +               remaining to complete FPDU #N-1 by using the MPA Length +               field). + +                   Get the stored partial CRC for FPDU #N-1. + +                   Complete CRC calculation for FPDU #N-1 data (first +                       portion of TCP segment #X). + +                   Check CRC calculation for FPDU #N-1. + +                   If no FPDU CRC errors, placement is allowed. + +                   Locate the local buffer for the first portion of +                       FPDU#N-1, CopyData(local buffer of first portion +                       of FPDU #N-1, host buffer address, length). + +                   Compute host buffer address for second portion of +                       FPDU #N-1. + +                   CopyData (local buffer of second portion of FPDU #N- +                       1, host buffer address for second portion, +                       length). + +                   Calculate the octet offset into the TCP segment for +                       the next FPDU #N. + +                   Start calculation of CRC for available data for FPDU. +                       #N + +                   Store partial CRC results for FPDU #N. + +                   Store local buffer address of first portion of FPDU +                       #N. + +                   No further action is possible on FPDU #N, before it +                       is completely received. + + + + + + +Culley, et al.              Standards Track                    [Page 57] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +               If the TCP segment is out of order, the receiver must +               buffer the data until at least one complete FPDU is +               received.  Typically, buffering for more than one TCP +               segment per connection is required.  Use the MPA-based +               Markers to calculate where FPDU boundaries are. + +                   When a complete FPDU is available, a similar +                   procedure to the in-order algorithm above is used. +                   There is additional complexity, though, because when +                   the missing segment arrives, this TCP segment must be +                   run through the CRC engine after the CRC is +                   calculated for the missing segment. + +   If we assume FPDU Alignment, the following diagram and the algorithm +   below apply.  Note that when using MPA, the receiver is assumed to +   actively detect presence or loss of FPDU Alignment for every TCP +   segment received. + +      +--------------------------+      +--------------------------+ +   +--|--------------------------+   +--|--------------------------+ +   |  |       TCP Seg X          |   |  |         TCP Seg X+1      | +   +--|--------------------------+   +--|--------------------------+ +      +--------------------------+      +--------------------------+ +                FPDU #N                          FPDU #N+1 + +      Figure 13: Aligned FPDU Placed Immediately after TCP Header + + + + + + + + + + + + + + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 58] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   The receiver algorithm for FPDU Aligned frames (in order or out of +   order) includes: + +       1)  Data Link Layer processing (whole frame) -- typically +           including a CRC calculation. + +       2)  Network Layer processing (assuming not an IP fragment, the +           whole Data Link Layer frame contains one IP datagram.  IP +           fragments should be reassembled in a local buffer.  This is +           not a performance optimization goal.) + +       3)  Transport Layer processing -- TCP protocol processing, header +           and checksum checks. + +           a.  Classify incoming TCP segment using the 5 tuple (IP SRC, +               IP DST, TCP SRC Port, TCP DST Port, protocol). + +       4)  Check for Header Alignment (described in detail in Section +           6).  Assuming Header Alignment for the rest of the algorithm +           below. + +           a.  If the header is not aligned, see the algorithm defined +               in the prior section. + +       5)  If TCP segment is in order or out of order, the MPA header is +           at the beginning of the current TCP payload.  Get the FPDU +           length from the FPDU header. + +       6)  Calculate CRC over FPDU. + +       7)  Check CRC calculation for FPDU #N. + +       8)  If no FPDU CRC errors, placement is allowed. + +       9)  CopyData(TCP segment #X, host buffer address, length). + +       10) Loop to #5 until all the FPDUs in the TCP segment are +           consumed in order to handle FPDU packing. + +   Implementation note: In both cases, the receiver has to classify the +   incoming TCP segment and associate it with one of the flows it +   maintains.  In the case of no FPDU Alignment, the receiver is forced +   to classify incoming traffic before it can calculate the FPDU CRC. +   In the case of FPDU Alignment, the operations order is left to the +   implementer. + + + + + + +Culley, et al.              Standards Track                    [Page 59] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   The FPDU Aligned receiver algorithm is significantly simpler.  There +   is no need to locally buffer portions of FPDUs.  Accessing state +   information is also substantially simplified -- the normal case does +   not require retrieving information to find out where an FPDU starts +   and ends or retrieval of a partial CRC before the CRC calculation can +   commence.  This avoids adding internal latencies, having multiple +   data passes through the CRC machine, or scheduling multiple commands +   for moving the data to the host buffer. + +   The aligned FPDU approach is useful for in-order and out-of-order +   reception.  The receiver can use the same mechanisms for data storage +   in both cases, and only needs to account for when all the TCP +   segments have arrived to enable Delivery.  The Header Alignment, +   along with the high probability that at least one complete FPDU is +   found with every TCP segment, allows the receiver to perform data +   placement for out-of-order TCP segments with no need for intermediate +   buffering.  Essentially, the TCP receive buffer has been eliminated +   and TCP reassembly is done in place within the ULP Buffer. + +   In case FPDU Alignment is not found, the receiver should follow the +   algorithm for non-aligned FPDU reception, which may be slower and +   less efficient. + +B.2.2.  FPDU Alignment Effects on TCP Wire Protocol + +   In an optimized MPA/TCP implementation, TCP exposes its EMSS to MPA. +   MPA uses the EMSS to calculate its MULPDU, which it then exposes to +   DDP, its ULP.  DDP uses the MULPDU to segment its payload so that +   each FPDU sent by MPA fits completely into one TCP segment.  This has +   no impact on wire protocol, and exposing this information is already +   supported on many TCP implementations, including all modern flavors +   of BSD networking, through the TCP_MAXSEG socket option. + +   In the common case, the ULP (i.e., DDP over MPA) messages provided to +   the TCP layer are segmented to MULPDU size.  It is assumed that the +   ULP message size is bounded by MULPDU, such that a single ULP message +   can be encapsulated in a single TCP segment.  Therefore, in the +   common case, there is no increase in the number of TCP segments +   emitted.  For smaller ULP messages, the sender can also apply +   packing, i.e., the sender packs as many complete FPDUs as possible +   into one TCP segment.  The requirement to always have a complete FPDU +   may increase the number of TCP segments emitted.  Typically, a ULP +   message size varies from a few bytes to multiple EMSSs (e.g., 64 +   Kbytes).  In some cases, the ULP may post more than one message at a +   time for transmission, giving the sender an opportunity for packing. +   In the case where more than one FPDU is available for transmission +   and the FPDUs are encapsulated into a TCP segment and there is no +   room in the TCP segment to include the next complete FPDU, another + + + +Culley, et al.              Standards Track                    [Page 60] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   TCP segment is sent.  In this corner case, some of the TCP segments +   are not full size.  In the worst-case scenario, the ULP may choose an +   FPDU size that is EMSS/2 +1 and has multiple messages available for +   transmission.  For this poor choice of FPDU size, the average TCP +   segment size is therefore about 1/2 of the EMSS and the number of TCP +   segments emitted is approaching 2x of what is possible without the +   requirement to encapsulate an integer number of complete FPDUs in +   every TCP segment.  This is a dynamic situation that only lasts for +   the duration where the sender ULP has multiple non-optimal messages +   for transmission and this causes a minor impact on the wire +   utilization. + +   However, it is not expected that requiring FPDU Alignment will have a +   measurable impact on wire behavior of most applications.  Throughput +   applications with large I/Os are expected to take full advantage of +   the EMSS.  Another class of applications with many small outstanding +   buffers (as compared to EMSS) is expected to use packing when +   applicable.  Transaction-oriented applications are also optimal. + +   TCP retransmission is another area that can affect sender behavior. +   TCP supports retransmission of the exact, originally transmitted +   segment (see [RFC793], Sections 2.6 and 3.7 (under "Managing the +   Window") and [RFC1122], Section 4.2.2.15).  In the unlikely event +   that part of the original segment has been received and acknowledged +   by the Remote Peer (e.g., a re-segmenting middlebox, as documented in +   Appendix A.4, Re-Segmenting Middleboxes and Non-Optimized MPA/TCP +   Senders), a better available bandwidth utilization may be possible by +   retransmitting only the missing octets.  If an optimized MPA/TCP +   retransmits complete FPDUs, there may be some marginal bandwidth +   loss. + +   Another area where a change in the TCP segment number may have impact +   is that of slow start and congestion avoidance.  Slow-start +   exponential increase is measured in segments per second, as the +   algorithm focuses on the overhead per segment at the source for +   congestion that eventually results in dropped segments.  Slow-start +   exponential bandwidth growth for optimized MPA/TCP is similar to any +   TCP implementation.  Congestion avoidance allows for a linear growth +   in available bandwidth when recovering after a packet drop.  Similar +   to the analysis for slow start, optimized MPA/TCP doesn't change the +   behavior of the algorithm.  Therefore, the average size of the +   segment versus EMSS is not a major factor in the assessment of the +   bandwidth growth for a sender.  Both slow start and congestion +   avoidance for an optimized MPA/TCP will behave similarly to any TCP +   sender and allow an optimized MPA/TCP to enjoy the theoretical +   performance limits of the algorithms. + + + + + +Culley, et al.              Standards Track                    [Page 61] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   In summary, the ULP messages generated at the sender (e.g., the +   amount of messages grouped for every transmission request) and +   message size distribution has the most significant impact over the +   number of TCP segments emitted.  The worst-case effect for certain +   ULPs (with average message size of EMSS/2+1 to EMSS) is bounded by an +   increase of up to 2x in the number of TCP segments and acknowledges. +   In reality, the effect is expected to be marginal. + +Appendix C.  IETF Implementation Interoperability with RDMA Consortium +             Protocols + +   This appendix is for information only and is NOT part of the +   standard. + +   This appendix covers methods of making MPA implementations +   interoperate with both IETF and RDMA Consortium versions of the +   protocols. + +   The RDMA Consortium created early specifications of the MPA/DDP/RDMA +   protocols, and some manufacturers created implementations of those +   protocols before the IETF versions were finalized.  These protocols +   are very similar to the IETF versions making it possible for +   implementations to be created or modified to support either set of +   specifications. + +   For those interested, the RDMA Consortium protocol documents (draft- +   culley-iwarp-mpa-v1.0.pdf [RDMA-MPA], draft-shah-iwarp-ddp-v1.0.pdf +   [RDMA-DDP], and draft-recio-iwarp-rdmac-v1.0.pdf [RDMA-RDMAC]) can be +   obtained at http://www.rdmaconsortium.org/home. + +   In this section, implementations of MPA/DDP/RDMA that conform to the +   RDMAC specifications are called RDMAC RNICs.  Implementations of +   MPA/DDP/RDMA that conform to the IETF RFCs are called IETF RNICs. + +   Without the exchange of MPA Request/Reply Frames, there is no +   standard mechanism for enabling RDMAC RNICs to interoperate with IETF +   RNICs.  Even if a ULP uses a well-known port to start an IETF RNIC +   immediately in RDMA mode (i.e., without exchanging the MPA +   Request/Reply messages), there is no reason to believe an IETF RNIC +   will interoperate with an RDMAC RNIC because of the differences in +   the version number in the DDP and RDMAP headers on the wire. + +   Therefore, the ULP or other supporting entity at the RDMAC RNIC must +   implement MPA Request/Reply Frames on behalf of the RNIC in order to +   negotiate the connection parameters.  The following section describes +   the results following the exchange of the MPA Request/Reply Frames +   before the conversion from streaming to RDMA mode. + + + + +Culley, et al.              Standards Track                    [Page 62] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +C.1.  Negotiated Parameters + +   Three types of RNICs are considered: + +   Upgraded RDMAC RNIC - an RNIC implementing the RDMAC protocols that +   has a ULP or other supporting entity that exchanges the MPA +   Request/Reply Frames in streaming mode before the conversion to RDMA +   mode. + +   Non-permissive IETF RNIC - an RNIC implementing the IETF protocols +   that is not capable of implementing the RDMAC protocols.  Such an +   RNIC can only interoperate with other IETF RNICs. + +   Permissive IETF RNIC - an RNIC implementing the IETF protocols that +   is capable of implementing the RDMAC protocols on a per-connection +   basis. + +   The Permissive IETF RNIC is recommended for those implementers that +   want maximum interoperability with other RNIC implementations. + +   The values used by these three RNIC types for the MPA, DDP, and RDMAP +   versions as well as MPA Markers and CRC are summarized in Figure 14. + +    +----------------++-----------+-----------+-----------+-----------+ +    | RNIC TYPE      || DDP/RDMAP |    MPA    |    MPA    |    MPA    | +    |                ||  Version  | Revision  |  Markers  |    CRC    | +    +----------------++-----------+-----------+-----------+-----------+ +    +----------------++-----------+-----------+-----------+-----------+ +    | RDMAC          ||     0     |     0     |     1     |     1     | +    |                ||           |           |           |           | +    +----------------++-----------+-----------+-----------+-----------+ +    | IETF           ||     1     |     1     |  0 or 1   |  0 or 1   | +    | Non-permissive ||           |           |           |           | +    +----------------++-----------+-----------+-----------+-----------+ +    | IETF           ||  1 or 0   |  1 or 0   |  0 or 1   |  0 or 1   | +    | permissive     ||           |           |           |           | +    +----------------++-----------+-----------+-----------+-----------+ + +           Figure 14: Connection Parameters for the RNIC Types +            for MPA Markers and MPA CRC, enabled=1, disabled=0. + +   It is assumed there is no mixing of versions allowed between MPA, +   DDP, and RDMAP.  The RNIC either generates the RDMAC protocols on the +   wire (version is zero) or uses the IETF protocols (version is one). + + + + + + + +Culley, et al.              Standards Track                    [Page 63] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   During the exchange of the MPA Request/Reply Frames, each peer +   provides its MPA Revision, Marker preference (M: 0=disabled, +   1=enabled), and CRC preference.  The MPA Revision provided in the MPA +   Request Frame and the MPA Reply Frame may differ. + +   From the information in the MPA Request/Reply Frames, each side sets +   the Version field (V: 0=RDMAC, 1=IETF) of the DDP/RDMAP protocols as +   well as the state of the Markers for each half connection.  Between +   DDP and RDMAP, no mixing of versions is allowed.  Moreover, the DDP +   and RDMAP version MUST be identical in the two directions.  The RNIC +   either generates the RDMAC protocols on the wire (version is zero) or +   uses the IETF protocols (version is one). + +   In the following sections, the figures do not discuss CRC negotiation +   because there is no interoperability issue for CRCs.  Since the RDMAC +   RNIC will always request CRC use, then, according to the IETF MPA +   specification, both peers MUST generate and check CRCs. + +C.2.  RDMAC RNIC and Non-Permissive IETF RNIC + +   Figure 15 shows that a Non-permissive IETF RNIC cannot interoperate +   with an RDMAC RNIC, despite the fact that both peers exchange MPA +   Request/Reply Frames.  For a Non-permissive IETF RNIC, the MPA +   negotiation has no effect on the DDP/RDMAP version and it is unable +   to interoperate with the RDMAC RNIC. + +   The rows in the figure show the state of the Marker field in the MPA +   Request Frame sent by the MPA Initiator.  The columns show the state +   of the Marker field in the MPA Reply Frame sent by the MPA Responder. +   Each type of RNIC is shown as an Initiator and a Responder.  The +   connection results are shown in the lower right corner, at the +   intersection of the different RNIC types, where V=0 is the RDMAC +   DDP/RDMAP version, V=1 is the IETF DDP/RDMAC version, M=0 means MPA +   Markers are disabled, and M=1 means MPA Markers are enabled.  The +   negotiated Marker state is shown as X/Y, for the receive direction of +   the Initiator/Responder. + + + + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 64] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +          +---------------------------++-----------------------+ +          |   MPA                     ||          MPA          | +          | CONNECT                   ||       Responder       | +          |   MODE  +-----------------++-------+---------------+ +          |         |   RNIC          || RDMAC |     IETF      | +          |         |   TYPE          ||       | Non-permissive| +          |         |          +------++-------+-------+-------+ +          |         |          |MARKER|| M=1   | M=0   |  M=1  | +          +---------+----------+------++-------+-------+-------+ +          +---------+----------+------++-------+-------+-------+ +          |         |   RDMAC  | M=1  || V=0   | close | close | +          |         |          |      || M=1/1 |       |       | +          |         +----------+------++-------+-------+-------+ +          |   MPA   |          | M=0  || close | V=1   | V=1   | +          |Initiator|   IETF   |      ||       | M=0/0 | M=0/1 | +          |         |Non-perms.+------++-------+-------+-------+ +          |         |          | M=1  || close | V=1   | V=1   | +          |         |          |      ||       | M=1/0 | M=1/1 | +          +---------+----------+------++-------+-------+-------+ + +           Figure 15: MPA Negotiation between an RDMAC RNIC and +                      a Non-Permissive IETF RNIC + +C.2.1.  RDMAC RNIC Initiator + +   If the RDMAC RNIC is the MPA Initiator, its ULP sends an MPA Request +   Frame with Rev field set to zero and the M and C bits set to one. +   Because the Non-permissive IETF RNIC cannot dynamically downgrade the +   version number it uses for DDP and RDMAP, it would send an MPA Reply +   Frame with the Rev field equal to one and then gracefully close the +   connection. + +C.2.2.  Non-Permissive IETF RNIC Initiator + +   If the Non-permissive IETF RNIC is the MPA Initiator, it sends an MPA +   Request Frame with Rev field equal to one.  The ULP or supporting +   entity for the RDMAC RNIC responds with an MPA Reply Frame that has +   the Rev field equal to zero and the M bit set to one.  The Non- +   permissive IETF RNIC will gracefully close the connection after it +   reads the incompatible Rev field in the MPA Reply Frame. + +C.2.3.  RDMAC RNIC and Permissive IETF RNIC + +   Figure 16 shows that a Permissive IETF RNIC can interoperate with an +   RDMAC RNIC regardless of its Marker preference.  The figure uses the +   same format as shown with the Non-permissive IETF RNIC. + + + + + +Culley, et al.              Standards Track                    [Page 65] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +          +---------------------------++-----------------------+ +          |   MPA                     ||          MPA          | +          | CONNECT                   ||       Responder       | +          |   MODE  +-----------------++-------+---------------+ +          |         |   RNIC          || RDMAC |     IETF      | +          |         |   TYPE          ||       |  Permissive   | +          |         |          +------++-------+-------+-------+ +          |         |          |MARKER|| M=1   | M=0   | M=1   | +          +---------+----------+------++-------+-------+-------+ +          +---------+----------+------++-------+-------+-------+ +          |         |   RDMAC  | M=1  || V=0   | N/A   | V=0   | +          |         |          |      || M=1/1 |       | M=1/1 | +          |         +----------+------++-------+-------+-------+ +          |   MPA   |          | M=0  || V=0   | V=1   | V=1   | +          |Initiator|   IETF   |      || M=1/1 | M=0/0 | M=0/1 | +          |         |Permissive+------++-------+-------+-------+ +          |         |          | M=1  || V=0   | V=1   | V=1   | +          |         |          |      || M=1/1 | M=1/0 | M=1/1 | +          +---------+----------+------++-------+-------+-------+ + +           Figure 16: MPA Negotiation between an RDMAC RNIC and +                         a Permissive IETF RNIC + +   A truly Permissive IETF RNIC will recognize an RDMAC RNIC from the +   Rev field of the MPA Req/Rep Frames and then adjust its receive +   Marker state and DDP/RDMAP version to accommodate the RDMAC RNIC.  As +   a result, as an MPA Responder, the Permissive IETF RNIC will never +   return an MPA Reply Frame with the M bit set to zero.  This case is +   shown as a not applicable (N/A) in Figure 16. + +C.2.4.  RDMAC RNIC Initiator + +   When the RDMAC RNIC is the MPA Initiator, its ULP or other supporting +   entity prepares an MPA Request message and sets the revision to zero +   and the M bit and C bit to one. + +   The Permissive IETF Responder receives the MPA Request message and +   checks the revision field.  Since it is capable of generating RDMAC +   DDP/RDMAP headers, it sends an MPA Reply message with revision set to +   zero and the M and C bits set to one.  The Responder must inform its +   ULP that it is generating version zero DDP/RDMAP messages. + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 66] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +C.2.5  Permissive IETF RNIC Initiator + +   If the Permissive IETF RNIC is the MPA Initiator, it prepares the MPA +   Request Frame setting the Rev field to one.  Regardless of the value +   of the M bit in the MPA Request Frame, the ULP or other supporting +   entity for the RDMAC RNIC will create an MPA Reply Frame with Rev +   equal to zero and the M bit set to one. + +   When the Initiator reads the Rev field of the MPA Reply Frame and +   finds that its peer is an RDMAC RNIC, it must inform its ULP that it +   should generate version zero DDP/RDMAP messages and enable MPA +   Markers and CRC. + +C.3.  Non-Permissive IETF RNIC and Permissive IETF RNIC + +   For completeness, Figure 17 below shows the results of MPA +   negotiation between a Non-permissive IETF RNIC and a Permissive IETF +   RNIC.  The important point from this figure is that an IETF RNIC +   cannot detect whether its peer is a Permissive or Non-permissive +   RNIC. + +      +---------------------------++-------------------------------+ +      |   MPA                     ||              MPA              | +      | CONNECT                   ||            Responder          | +      |   MODE  +-----------------++---------------+---------------+ +      |         |   RNIC          ||     IETF      |     IETF      | +      |         |   TYPE          || Non-permissive|  Permissive   | +      |         |          +------++-------+-------+-------+-------+ +      |         |          |MARKER|| M=0   | M=1   | M=0   | M=1   | +      +---------+----------+------++-------+-------+-------+-------+ +      +---------+----------+------++-------+-------+-------+-------+ +      |         |          | M=0  || V=1   | V=1   | V=1   | V=1   | +      |         |   IETF   |      || M=0/0 | M=0/1 | M=0/0 | M=0/1 | +      |         |Non-perms.+------++-------+-------+-------+-------+ +      |         |          | M=1  || V=1   | V=1   | V=1   | V=1   | +      |         |          |      || M=1/0 | M=1/1 | M=1/0 | M=1/1 | +      |   MPA   +----------+------++-------+-------+-------+-------+ +      |Initiator|          | M=0  || V=1   | V=1   | V=1   | V=1   | +      |         |   IETF   |      || M=0/0 | M=0/1 | M=0/0 | M=0/1 | +      |         |Permissive+------++-------+-------+-------+-------+ +      |         |          | M=1  || V=1   | V=1   | V=1   | V=1   | +      |         |          |      || M=1/0 | M=1/1 | M=1/0 | M=1/1 | +      +---------+----------+------++-------+-------+-------+-------+ + +    Figure 17: MPA negotiation between a Non-permissive IETF RNIC and a +                           Permissive IETF RNIC. + + + + + +Culley, et al.              Standards Track                    [Page 67] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +Normative References + +   [iSCSI]      Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, M., +                and E. Zeidner, "Internet Small Computer Systems +                Interface (iSCSI)", RFC 3720, April 2004. + +   [RFC1191]    Mogul, J. and S. Deering, "Path MTU discovery", RFC +                1191, November 1990. + +   [RFC2018]    Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP +                Selective Acknowledgment Options", RFC 2018, October +                1996. + +   [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate +                Requirement Levels", BCP 14, RFC 2119, March 1997. + +   [RFC2401]    Kent, S. and R. Atkinson, "Security Architecture for the +                Internet Protocol", RFC 2401, November 1998. + +   [RFC3723]    Aboba, B., Tseng, J., Walker, J., Rangan, V., and F. +                Travostino, "Securing Block Storage Protocols over IP", +                RFC 3723, April 2004. + +   [RFC793]     Postel, J., "Transmission Control Protocol", STD 7, RFC +                793, September 1981. + +   [RDMASEC]    Pinkerton, J. and E. Deleganes, "Direct Data Placement +                Protocol (DDP) / Remote Direct Memory Access Protocol +                (RDMAP) Security", RFC 5042, October 2007. + +Informative References + +   [APPL]       Bestler, C. and L. Coene, "Applicability of Remote +                Direct Memory Access Protocol (RDMA) and Direct Data +                Placement (DDP)", RFC 5045, October 2007. + +   [CRCTCP]     Stone J., Partridge, C., "When the CRC and TCP checksum +                disagree", ACM Sigcomm, Sept. 2000. + +   [DAT-API]    DAT Collaborative, "kDAPL (Kernel Direct Access +                Programming Library) and uDAPL (User Direct Access +                Programming Library)", Http://www.datcollaborative.org. + +   [DDP]        Shah, H., Pinkerton, J., Recio, R., and P. Culley, +                "Direct Data Placement over Reliable Transports", RFC +                5041, October 2007. + + + + + +Culley, et al.              Standards Track                    [Page 68] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   [iSER]       Ko, M., Chadalapaka, M., Hufferd, J., Elzur, U., Shah, +                H., and P. Thaler, "Internet Small Computer System +                Interface (iSCSI) Extensions for Remote Direct Memory +                Access (RDMA)" RFC 5046, October 2007. + +   [IT-API]     The Open Group, "Interconnect Transport API (IT-API)" +                Version 2.1, http://www.opengroup.org. + +   [NFSv4CHAN]  Williams, N., "On the Use of Channel Bindings to Secure +                Channels", Work in Progress, June 2006. + +   [RDMA-DDP]   "Direct Data Placement over Reliable Transports (Version +                1.0)", RDMA Consortium, October 2002, +                <http://www.rdmaconsortium.org/home/draft-shah-iwarp- +                ddp-v1.0.pdf>. + +   [RDMA-MPA]   "Marker PDU Aligned Framing for TCP Specification +                (Version 1.0)", RDMA Consortium, October 2002, +                <http://www.rdmaconsortium.org/home/draft-culley-iwarp- +                mpa-v1.0.pdf>. + +   [RDMA-RDMAC] "An RDMA Protocol Specification (Version 1.0)", RDMA +                Consortium, October 2002, +                <http://www.rdmaconsortium.org/home/draft-recio-iwarp- +                rdmac-v1.0.pdf>. + +   [RDMAP]      Recio, R., Culley, P., Garcia, D., Hilland, J., and B. +                Metzler, "A Remote Direct Memory Access Protocol +                Specification", RFC 5040, October 2007. + +   [RFC792]     Postel, J., "Internet Control Message Protocol", STD 5, +                RFC 792, September 1981. + +   [RFC896]     Nagle, J., "Congestion control in IP/TCP internetworks", +                RFC 896, January 1984. + +   [RFC1122]    Braden, R., "Requirements for Internet Hosts - +                Communication Layers", STD 3, RFC 1122, October 1989. + +   [RFC4960]    Stewart, R., Ed., "Stream Control Transmission +                Protocol", RFC 4960, September 2007. + +   [RFC4296]    Bailey, S. and T. Talpey, "The Architecture of Direct +                Data Placement (DDP) and Remote Direct Memory Access +                (RDMA) on Internet Protocols", RFC 4296, December 2005. + + + + + + +Culley, et al.              Standards Track                    [Page 69] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   [RFC4297]    Romanow, A., Mogul, J., Talpey, T., and S. Bailey, +                "Remote Direct Memory Access (RDMA) over IP Problem +                Statement", RFC 4297, December 2005. + +   [RFC4301]    Kent, S. and K. Seo, "Security Architecture for the +                Internet Protocol", RFC 4301, December 2005. + +   [VERBS-RMDA] "RDMA Protocol Verbs Specification", RDMA Consortium +                standard, April 2003, <http://www.rdmaconsortium.org/ +                home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf>. + +Contributors + +   Dwight Barron +   Hewlett-Packard Company +   20555 SH 249 +   Houston, TX 77070-2698 USA +   Phone: 281-514-2769 +   EMail: dwight.barron@hp.com + +   Jeff Chase +   Department of Computer Science +   Duke University +   Durham, NC 27708-0129 USA +   Phone: +1 919 660 6559 +   EMail: chase@cs.duke.edu + +   Ted Compton +   EMC Corporation +   Research Triangle Park, NC 27709 USA +   Phone: 919-248-6075 +   EMail: compton_ted@emc.com + +   Dave Garcia +   24100 Hutchinson Rd. +   Los Gatos, CA  95033 +   Phone: 831 247 4464 +   EMail: Dave.Garcia@StanfordAlumni.org + +   Hari Ghadia +   Gen10 Technology, Inc. +   1501 W Shady Grove Road +   Grand Prairie, TX 75050 +   Phone: (972) 301 3630 +   EMail: hghadia@gen10technology.com + + + + + + +Culley, et al.              Standards Track                    [Page 70] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Howard C. Herbert +   Intel Corporation +   MS CH7-404 +   5000 West Chandler Blvd. +   Chandler, AZ 85226 +   Phone: 480-554-3116 +   EMail: howard.c.herbert@intel.com + +   Jeff Hilland +   Hewlett-Packard Company +   20555 SH 249 +   Houston, TX 77070-2698 USA +   Phone: 281-514-9489 +   EMail: jeff.hilland@hp.com + +   Mike Ko +   IBM +   650 Harry Rd. +   San Jose, CA 95120 +   Phone: (408) 927-2085 +   EMail: mako@us.ibm.com + +   Mike Krause +   Hewlett-Packard Corporation, 43LN +   19410 Homestead Road +   Cupertino, CA 95014 USA +   Phone: +1 (408) 447-3191 +   EMail: krause@cup.hp.com + +   Dave Minturn +   Intel Corporation +   MS JF1-210 +   5200 North East Elam Young Parkway +   Hillsboro, Oregon  97124 +   Phone: 503-712-4106 +   EMail: dave.b.minturn@intel.com + +   Jim Pinkerton +   Microsoft, Inc. +   One Microsoft Way +   Redmond, WA 98052 USA +   EMail: jpink@microsoft.com + + + + + + + + + +Culley, et al.              Standards Track                    [Page 71] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +   Hemal Shah +   Broadcom Corporation +   5300 California Avenue +   Irvine, CA 92617 USA +   Phone: +1 (949) 926-6941 +   EMail: hemal@broadcom.com + +   Allyn Romanow +   Cisco Systems +   170 W Tasman Drive +   San Jose, CA 95134 USA +   Phone: +1 408 525 8836 +   EMail: allyn@cisco.com + +   Tom Talpey +   Network Appliance +   1601 Trapelo Road #16 +   Waltham, MA  02451 USA +   Phone: +1 (781) 768-5329 +   EMail: thomas.talpey@netapp.com + +   Patricia Thaler +   Broadcom +   16215 Alton Parkway +   Irvine, CA 92618 +   Phone: 916 570 2707 +   EMail: pthaler@broadcom.com + +   Jim Wendt +   Hewlett Packard Corporation +   8000 Foothills Boulevard MS 5668 +   Roseville, CA 95747-5668 USA +   Phone: +1 916 785 5198 +   EMail: jim_wendt@hp.com + +   Jim Williams +   Emulex Corporation +   580 Main Street +   Bolton, MA 01740 USA +   Phone: +1 978 779 7224 +   EMail: jim.williams@emulex.com + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 72] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +Authors' Addresses + +   Paul R. Culley +   Hewlett-Packard Company +   20555 SH 249 +   Houston, TX 77070-2698 USA +   Phone: 281-514-5543 +   EMail: paul.culley@hp.com + +   Uri Elzur +   5300 California Avenue +   Irvine, CA 92617, USA +   Phone: 949.926.6432 +   EMail: uri@broadcom.com + +   Renato J Recio +   IBM +   Internal Zip 9043 +   11400 Burnett Road +   Austin, Texas 78759 +   Phone: 512-838-3685 +   EMail: recio@us.ibm.com + +   Stephen Bailey +   Sandburst Corporation +   600 Federal Street +   Andover, MA 01810 USA +   Phone: +1 978 689 1614 +   EMail: steph@sandburst.com + +   John Carrier +   Cray Inc. +   411 First Avenue S, Suite 600 +   Seattle, WA 98104-2860 +   Phone: 206-701-2090 +   EMail: carrier@cray.com + + + + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 73] + +RFC 5044                  MPA Framing for TCP               October 2007 + + +Full Copyright Statement + +   Copyright (C) The IETF Trust (2007). + +   This document is subject to the rights, licenses and restrictions +   contained in BCP 78, and except as set forth therein, the authors +   retain all their rights. + +   This document and the information contained herein are provided on an +   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS +   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND +   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS +   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF +   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED +   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + +   The IETF takes no position regarding the validity or scope of any +   Intellectual Property Rights or other rights that might be claimed to +   pertain to the implementation or use of the technology described in +   this document or the extent to which any license under such rights +   might or might not be available; nor does it represent that it has +   made any independent effort to identify any such rights.  Information +   on the procedures with respect to rights in RFC documents can be +   found in BCP 78 and BCP 79. + +   Copies of IPR disclosures made to the IETF Secretariat and any +   assurances of licenses to be made available, or the result of an +   attempt made to obtain a general license or permission for the use of +   such proprietary rights by implementers or users of this +   specification can be obtained from the IETF on-line IPR repository at +   http://www.ietf.org/ipr. + +   The IETF invites any interested party to bring to its attention any +   copyrights, patents or patent applications, or other proprietary +   rights that may cover technology that may be required to implement +   this standard.  Please address the information to the IETF at +   ietf-ipr@ietf.org. + + + + + + + + + + + + +Culley, et al.              Standards Track                    [Page 74] + |