summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc6581.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc6581.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc6581.txt')
-rw-r--r--doc/rfc/rfc6581.txt1403
1 files changed, 1403 insertions, 0 deletions
diff --git a/doc/rfc/rfc6581.txt b/doc/rfc/rfc6581.txt
new file mode 100644
index 0000000..6a6ddd0
--- /dev/null
+++ b/doc/rfc/rfc6581.txt
@@ -0,0 +1,1403 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) A. Kanevsky, Ed.
+Request for Comments: 6581 Dell Inc.
+Updates: 5043, 5044 C. Bestler, Ed.
+Category: Standards Track Nexenta Systems
+ISSN: 2070-1721 R. Sharp
+ Intel
+ S. Wise
+ Open Grid Computing
+ April 2012
+
+
+ Enhanced Remote Direct Memory Access (RDMA)
+ Connection Establishment
+
+Abstract
+
+ This document updates RFC 5043 and RFC 5044 by extending Marker
+ Protocol Data Unit (PDU) Aligned Framing (MPA) negotiation for Remote
+ Direct Memory Access (RDMA) connection establishment. The first
+ enhancement extends RFC 5044, enabling peer-to-peer connection
+ establishment over MPA / Transmission Control Protocol (TCP). The
+ second enhancement extends both RFC 5043 and RFC 5044, by providing
+ an option for standardized exchange of RDMA-layer connection
+ configuration.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by
+ the Internet Engineering Steering Group (IESG). Further
+ information on Internet Standards is available in Section 2 of
+ RFC 5741.
+
+ Information about the current status of this document, any
+ errata, and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc6581.
+
+
+
+
+
+
+
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 1]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+Copyright Notice
+
+ Copyright (c) 2012 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 1.1. Summary of Changes Affecting RFC 5044 ......................4
+ 1.2. Summary of Changes Affecting RFC 5043 ......................4
+ 2. Requirements Language ...........................................4
+ 3. Definitions .....................................................4
+ 4. Motivations .....................................................7
+ 4.1. Standardization of RDMA Read Parameter Configuration .......7
+ 4.2. Enabling MPA Mode ..........................................9
+ 4.3. Lack of Explicit RTR in MPA Request/Reply Exchange ........10
+ 4.4. Limitations on ULP Workaround .............................11
+ 4.4.1. Transport Neutral APIs .............................11
+ 4.4.2. Work/Completion Queue Accounting ...................11
+ 4.4.3. Host-based Implementation of MPA Fencing ...........12
+ 5. Enhanced MPA Connection Establishment ..........................13
+ 6. Enhanced MPA Request/Reply Frames ..............................14
+ 7. Enhanced SCTP Session Control Chunks ...........................15
+ 8. MPA Error Reporting ............................................16
+ 9. Enhanced RDMA Connection Establishment Data ....................17
+ 9.1. IRD and ORD Negotiation ...................................18
+ 9.2. Peer-to-Peer Connection Negotiation .......................20
+ 9.3. Enhanced Connection Negotiation Flow ......................21
+ 10. Interoperability ..............................................21
+ 11. IANA Considerations ...........................................22
+ 12. Security Considerations .......................................23
+ 13. Acknowledgements ..............................................23
+ 14. References ....................................................23
+ 14.1. Normative References .....................................23
+ 14.2. Informative References ...................................24
+
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 2]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+1. Introduction
+
+ When used over the Transmission Control Protocol (TCP), the current
+ Remote Direct Data Placement (RDDP) [RFC5041] suite of protocols
+ relies on the MPA [RFC5044] protocol for both connection
+ establishment and for markers for TCP layering.
+
+ A typical model for establishing an RDMA connection has the following
+ steps:
+
+ o The passive side (responder) Upper Layer Protocol (ULP) listens
+ for connection requests.
+
+ o The active side (initiator) ULP submits a connection request using
+ an RDMA endpoint, the desired destination, and the parameters to
+ be used for the connection. Those parameters include both RDMA-
+ layer characteristics, such as the number of simultaneous RDMA
+ Read Requests to be allowed, and application-specific data.
+
+ o The passive side ULP receives a connection request that includes
+ the identity of the active side and the requested connection
+ characteristics. The passive side ULP uses this information to
+ decide whether to accept the connection, and if it is to be
+ accepted, how to create and/or configure the local RDMA endpoint.
+
+ o If accepting, the responder submits its acceptance of the
+ connection request, which in turn generates the accept message to
+ the initiator. This responder accept operation includes the RDMA
+ endpoint to be used and the connection characteristics (both the
+ RDMA configuration and any application-specific Private Data to be
+ transferred to the initiator).
+
+ o The active side receives confirmation that the connection has been
+ accepted, what the configured connection characteristics are, and
+ any application-supplied Private Data.
+
+ Currently, MPA only supports a client-server model for connection
+ establishment, forcing peer-to-peer applications to interact as
+ though they had a client-server relationship. In addition,
+ negotiation of some parameters specific to the Remote Direct Memory
+ Access Protocol (RDMAP) [RFC5040] are left to ULP negotiation.
+ Providing an optional ULP-independent format for exchanging these
+ parameters would be of benefit to transport neutral RDMA
+ applications.
+
+
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 3]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+1.1. Summary of Changes Affecting RFC 5044
+
+ This document enhances the MPA connection setup protocol [RFC5044].
+ First, it adds exchange and negotiation of the parameters necessary
+ to support RDMA Read Requests. Second, it adds a message that serves
+ as a Ready to Receive (RTR) indication from the initiator to the
+ responder as the last message of connection establishment and adds
+ negotiation of which type of message to use for carrying the RTR
+ indication into MPA Request/Reply Frames.
+
+ RTR indications are optional and are carried by existing RDMA message
+ types, specifically a zero-length FULPDU Send message, a zero-length
+ RDMA Read message, or a zero-length RDMA write message. The presence
+ vs. absence of the RTR indication and the type of RDMA message to use
+ are negotiated by control flags in Enhanced RDMA connection
+ establishment data specified by this document (see Section 9). RDMA
+ implementations are often tightly integrated with application
+ libraries and hardware, hence the flexibility to use more than one
+ type of RDMA message enables implementations to choose message types
+ that are less disruptive to the implementation structure. When an
+ RTR indication is used, and MPA connection setup negotiation
+ indicates support for multiple RDMA message types as RTR indications
+ by both the initiator and responder, the initiator selects one of the
+ supported RDMA message types as the RTR indication at the initiator's
+ sole discretion.
+
+1.2. Summary of Changes Affecting RFC 5043
+
+ This document enhances [RFC5043] by adding new Enhanced Session
+ Control Chunks that extend the currently defined Chunks with the
+ addition of Inbound RDMA Read Queue Depth (IRD) and Outbound RDMA
+ Read Queue Depth (ORD) negotiation.
+
+2. Requirements Language
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [RFC2119].
+
+3. Definitions
+
+ Active Side: See Initiator.
+
+ Consumer: The ULPs or applications that lie above MPA and Direct
+ Data Placement (DDP). The Consumer is responsible for making TCP
+ or Stream Control Transmission Protocol (SCTP) connections,
+ starting MPA and DDP connections, and generally controlling
+ operations. See [RFC5044] and [RFC5043].
+
+
+
+Kanevsky, et al. Standards Track [Page 4]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ CRC: Cyclic Redundancy Check
+
+ Completion Queue (CQ): A Consumer-accessible queue where the RDMA
+ device reports completions of Work Requests. A Consumer is able
+ to reap completions from a CQ without requiring per-transaction
+ support from the kernel or other privileged entity. See [RDMAC].
+
+ Completion Queue Entry (CQE): Transport- and device-specific
+ representation of a Work Completion. A CQ holds CQEs. See
+ [RDMAC].
+
+ FULPDU: Framed Upper Layer Protocol PDU. See FPDU of [RFC5044].
+
+ Inbound RDMA Read Request Queue (IRRQ): A queue that is associated
+ with an RDMA connection that tracks active incoming simultaneous
+ RDMA Read Request Messages. See [RDMAC].
+
+ Inbound RDMA Read Queue Depth (IRD): The maximum number of incoming
+ simultaneous RDMA Read Request Messages an RDMA connection can
+ handle. See [RDMAC].
+
+ Initiator: The endpoint of a connection that sends the MPA Request
+ Frame. The initiator is the active side of the connection
+ establishment. See [RFC5044].
+
+ IRD: See Inbound RDMA Read Queue Depth.
+
+ MPA Fencing: MPA responder connection establishment logic that
+ ensures that no ULP messages will be transferred until the
+ initiator's first message has been received.
+
+ MPA Request Frame: Data sent from the MPA initiator to the MPA
+ responder during the Startup Phase. See [RFC5044].
+
+ MPA Reply Frame: Data sent from the MPA responder to the MPA
+ initiator during the Startup Phase. See [RFC5044].
+
+ ORD: See Outbound RDMA Read Queue Depth.
+
+ Outbound RDMA Read Queue Depth (ORD): The maximum number of
+ simultaneous RDMA Read Requests that can be issued for the RDMA
+ connection. This should be less than or equal to the peer's IRD.
+ See [RDMAC].
+
+ Passive Side: See Responder.
+
+ Private Data: A block of data exchanged between MPA endpoints during
+ initial connection setup. See [RFC5044].
+
+
+
+Kanevsky, et al. Standards Track [Page 5]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ Queue Pair (QP): A Queue Pair is the set of Work Queues associated
+ exclusively with a single Endpoint (first defined in [VIA]). The
+ Send Queue (SQ), Receive Queue (RQ), and Inbound RDMA Read Queue
+ (IRQ) are considered to be part of the Queue Pair. The
+ potentially shared Completion Queue (CQ) and Shared Receive Queue
+ (SRQ) are not. See [RDMAC].
+
+ Remote Peer: The MPA protocol implementation on the opposite end of
+ the connection. Used to refer to the remote entity when
+ describing protocol exchanges or other interactions between two
+ nodes. See [RFC5044].
+
+ Responder: The connection endpoint that responds to an incoming MPA
+ connection request (the MPA Request Frame). The responder is the
+ passive side of the connection establishment. See [RFC5044].
+
+ Ready to Receive (RTR): RTR is an indication provided by the last
+ connection establishment message sent from the initiator to the
+ responder. An RTR indicates that the initiator is ready to
+ receive messages and that connection establishment is completed.
+
+ Startup Phase: The initial exchanges of an MPA connection that
+ serves to more fully identify MPA endpoints to each other and pass
+ connection-specific setup information to each other. See
+ [RFC5044].
+
+ Shared Receive Queue (SRQ): A shared pool of Receive Work Requests
+ posted by the Consumer that can be allocated by multiple RDMA
+ endpoints (QP). See [RDMAC].
+
+ Tagged (DDP) Message: A DDP Message that targets a Tagged Buffer
+ that is explicitly advertised to the Remote Peer through exchange
+ of an STag (memory handle), offset in the memory region identified
+ by STag, and length [RFC5040].
+
+ Untagged (DDP) Message: A DDP Message that targets an Untagged
+ Buffer associated with a queue specified the by Queue Number (QN).
+ [RFC5040].
+
+ Work Queue: An element of a QP that allows user-space applications
+ to submit Work Requests directly to network hardware (first
+ defined in [VIA]). Specific Work Queues include the Send Queue
+ (SQ) for transmit requests, Receive Queue (RQ) for receive
+ requests specific to a single endpoint, and Shared Receive Queues
+ (SRQs) for receive requests that can be allocated by one or more
+ endpoints. See [RDMAC].
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 6]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ Work Queue Element (WQE): Transport- and device-specific
+ representation of a Work Request. See [RDMAC].
+
+ Work Request: An elementary object used by Consumers to enqueue a
+ requested operation (WQEs) onto a Work Queue. See [RDMAC].
+
+4. Motivations
+
+ The goal of this document is two-fold. The first is to extend
+ support from the current client-server model for RDMA connection
+ setup to a peer-to-peer model. The second is to add negotiation of
+ the RDMA Read Queue size for both sides of an RDMA connection.
+
+4.1. Standardization of RDMA Read Parameter Configuration
+
+ Most RDMA applications are developed using a transport-neutral
+ Application Programming Interface (API) to access RDMA services based
+ on a "Queue Pair" paradigm as originally defined by the Virtual
+ Interface Architecture [VIA], refined by the Direct Access
+ Programming Library [DAPL], and most commonly deployed with the
+ OpenFabrics API [OFA].
+
+ These transport-neutral APIs seek to provide a common set of RDMA
+ services whether the underlying transport is, for example, RDDP over
+ MPA, RDDP over SCTP, or InfiniBand.
+
+ The common model for establishing an RDMA connection has the
+ following steps:
+
+ o The passive side ULP listens for connection requests.
+
+ o The active side ULP submits a connection request using an RDMA
+ endpoint ("Queue Pair"), the desired destination, and the
+ parameters to be used for the connection. Those parameters
+ include both RDMA-layer characteristics, such as the number of
+ simultaneous RDMA Read Requests to be allowed, and application-
+ specific data (typically referred to as "Private Data").
+
+ o The passive side ULP receives a connection request, which includes
+ the identity of the active side and the requested connection
+ characteristics. The passive side ULP uses this information to
+ decide whether to accept the connection, and if it is to be
+ accepted, how to create and/or configure the RDMA endpoint.
+
+
+
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 7]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ o If accepting, the passive side ULP submits its acceptance of the
+ connection request. This local accept operation includes the RDMA
+ endpoint to be used and the connection characteristics (both the
+ RDMA configuration and any application-specific Private Data to be
+ returned).
+
+ o The active side receives confirmation that the connection has been
+ accepted, what the configured connection characteristics are, and
+ any application-supplied Private Data.
+
+ As currently defined, DDP connection establishment requires the ULP
+ to encode the RDMA configuration in the application-specific Private
+ Data. This results in undesirable duplication of logic to cover RDMA
+ characteristics of both InfiniBand and RDDP for each ULP, and to
+ specify for InfiniBand and RDDP the extraction of the RDMA
+ characteristics for each ULP.
+
+ Both RDDP and InfiniBand support an initial Private Data exchange;
+ therefore, a standard definition of the RDMA characteristics within
+ the Private Data section would enable common connection establishment
+ APIs to format the RDMA characteristics based on the same API
+ information used when establishing either protocol to form the
+ connection. The application would then only have to indicate that it
+ was using this standard format to enable common connection
+ establishment procedures to apply common code to properly parse these
+ fields and configure the RDMA endpoints accordingly. Exchange of
+ parameters necessary to perform RDMA Read operations is a common
+ usage of the initial Private Data exchange.
+
+ One of the RDMA operations that is defined in [RDMAC] is an RDMA
+ Read. RDMA Read operations are performed using an untagged message
+ sent from a Queue Pair (QP) on the local endpoint to a QP on the
+ remote endpoint targeting the Inbound RDMA Read Request Queue (QN=1
+ or Inbound RDMA Read Request Queue (IRRQ)) associated with the
+ connection. RDMA Read responses transfer data associated with each
+ RDMA Read Request from the remote endpoint to the local endpoint
+ using tagged messages. An inbound RDMA Read Request remains on the
+ IRRQ from the time that it is received until the time that the last
+ tagged message associated with the RDMA request is acknowledged. The
+ IRRQ is associated with a QP but is not a Work Queue. Instead, the
+ IRRQ is a stand-alone queue that is used to manage RDMA Read Requests
+ associated with a QP. See [RDMAC], Section 6 for more information
+ regarding QPs and IRRQ. One of the characteristics that must be
+ configured for a QP is the size of the IRRQ. This parameter is
+ called the Inbound RDMA Read Queue Depth (IRD). Another
+ characteristic of a QP that must be configured is a local limit on
+ the number of simultaneous outbound RDMA Read Requests based on the
+ size of the remote endpoint QP's IRRQ. This parameter is call the
+
+
+
+Kanevsky, et al. Standards Track [Page 8]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ Outbound RDMA Read Queue Depth (ORD). ORD is used to limit the
+ number of simultaneous RDMA Read Requests such that the local
+ endpoint does not overrun the remote endpoint's IRRQ depth or IRD.
+ Note that outbound RDMA Reads are submitted to a QP's Send Queue at
+ the local peer, not to a separate outbound RDMA Read Request queue on
+ the local peer. The local endpoint uses ORD to strictly limit
+ simultaneous Read Requests so that IRRQ overruns do not occur at the
+ remote endpoint.
+
+ Determination of the values of the ORD and IRD are left to the ULP by
+ the current RDDP suite of protocols and also by [RDMAC]. Since this
+ negotiation of ORD and IRD is typical, it is desirable to provide a
+ common mechanism as described in this document.
+
+4.2. Enabling MPA Mode
+
+ MPA defines encoding of DDP Segments in Framed Upper Layer Protocol
+ PDUs (FULPDUs). Generation of FULPDUs requires the ability to
+ periodically insert MPA Markers and to generate the MPA CRC-32c for
+ each frame. Reception may require parsing/removing the markers after
+ using them to identify MPA Frame boundaries and validation of the
+ MPA-CRC32c.
+
+ A major design objective for MPA was to ensure that the resulting TCP
+ stream would be fully compliant for any and all TCP-aware
+ middleboxes. The challenge is that while only some TCP payload
+ streams are a valid stream of MPA FULPDUs, any sequence of bytes is a
+ valid TCP payload stream. The determination that a given stream is
+ in a specific MPA mode cannot be made at the MPA or TCP layer.
+ Therefore, enabling of MPA mode is handled by the ULP.
+
+ The MPA protocol can be viewed as having two parts:
+
+ o a specification of generation and reception of MPA FULPDUs. This
+ is unchanged by enhanced RDMA connection establishment.
+
+ o a pre-MPA exchange of messages to enable a specific MPA mode for
+ the TCP connection. Enhanced RDMA connection establishment
+ extends this protocol with two new features.
+
+ In typical implementations, generation and reception of MPA FULPDUs
+ is handled by hardware. The exchange of the MPA Request and Reply
+ Frames is then handled by host software. As will be explained, this
+ implementation split impedes applications that are not compatible
+ with the client-server assumptions in the current MPA Request/Reply
+ exchange.
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 9]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+4.3. Lack of Explicit RTR in MPA Request/Reply Exchange
+
+ The exchange of MPA Request and Reply messages to place a TCP
+ connection in MPA mode is specified in [RFC5044]. This protocol
+ provides many benefits to the design of MPA FULPDU hardware:
+
+ o The ULP is responsible for specifying the exact MPA Mode (Markers
+ enabled or disabled, CRC-32c enabled or suppressed) and the point
+ in the TCP streams (inbound and outbound) where MPA Frames will
+ begin.
+
+ o Before the first MPA Frame is transmitted, all pre-MPA mode TCP
+ payloads will have been acknowledged by the peer. Therefore, it
+ is never necessary to generate a retransmission that mixes pre-MPA
+ and MPA payload.
+
+ o Before MPA reception is enabled, all incoming pre-MPA mode TCP
+ payloads will have been acknowledged. Therefore, the host will
+ never receive a TCP segment that mixes pre-MPA and MPA payload.
+
+ The limitation of the current MPA Request/Reply exchange is that it
+ does not define a Ready to Receive (RTR) indication that the active
+ side would send, so that the passive side can know that the last non-
+ MPA payload (the MPA Reply) had been received.
+
+ Instead, the role of an RTR indication is piggybacked on the first
+ MPA FULPDU sent by the active side. This is actually a valuable
+ optimization for all applications that fit the classic client-server
+ model. The client only initiates the connection when it has a
+ request to send to the server, and the server has nothing to send
+ until it has received and processed the client request.
+
+ Even applications where the server sends some configuration data
+ immediately can easily send the same information as application
+ Private Data in the MPA Reply. So the currently defined exchange
+ works for almost all applications.
+
+ Many peer-to-peer applications, especially those involving cluster
+ calculations (frequently using Message Passing Interface (MPI)
+ [UsingMPI] or [RDS]), have no natural client or server roles ([PPMPI]
+ [OpenMP]). Typically, one member of the cluster is arbitrarily
+ selected to initiate the connection when the distributed task is
+ launched, while the other accepts it. At startup time, however,
+ there is no way to predict which node will have the first message to
+ actually send. Immediately establishing the connections is valuable
+ because it reduces latency once results are ready to transmit and it
+ validates connectivity throughout the cluster.
+
+
+
+
+Kanevsky, et al. Standards Track [Page 10]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ The lack of an explicit RTR indication in the MPA Request/Reply
+ exchange forces all applications to have a first message from the
+ connection initiator, whether or not this matches the application
+ communication model.
+
+4.4. Limitations on ULP Workaround
+
+ The requirement that the RDMA connection initiator sends the first
+ message does not appear to be onerous on first examination. The
+ natural question is why the application layer would not simply
+ generate a dummy message when there is no other message to submit.
+
+ There are three factors that make this workaround unsuitable for many
+ peer-to-peer applications:
+
+ o Transport-Neutral APIs.
+
+ o Work/Completion Queue Accounting.
+
+ o Host-based implementation of MPA Fencing.
+
+4.4.1. Transport-Neutral APIs
+
+ Many of these applications access RDMA services using a transport-
+ neutral API such as [DAPL] or [OFA]. Only RDDP over TCP [RFC5044]
+ has a first message requirement. Other RDMA transports, including
+ RDDP over SCTP (see [RFC5043]) and InfiniBand (see [IBTA]), do not.
+
+ Application or middleware communications can be expressed as
+ transport-neutral RDMA operations, allowing lower software layers to
+ translate to transport and device specifics. Having a distinct extra
+ message that is required only for one transport undermines the
+ application's goal of being transport neutral.
+
+4.4.2. Work/Completion Queue Accounting
+
+ RDMA local APIs conventionally use Work Queues to submit requests
+ (Work Queue elements or WQEs) and to asynchronously receive
+ completions (in Completion Queues or CQs).
+
+ Each Work Request can generate a Completion Queue Entry (CQE).
+ Completions for successful transmit Work Requests are frequently
+ suppressed, but the CQ capacity must account for the possibility that
+ each will complete in error. A CQ can receive completions from
+ multiple Work Queues.
+
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 11]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ CQs are defined to allow hardware RDMA implementations to generate
+ CQEs directly to a user-space-mapped buffer. This enables a user-
+ space RDMA Consumer to reap completions without requiring kernel
+ intervention.
+
+ A hardware RDMA implementation cannot reasonably wait for an
+ available slot in the CQ. The queue must be sized such that an
+ overflow will not occur. When an overflow does occur, it is
+ considered a catastrophic error and will typically require tearing
+ down all RDMA connections using that CQ.
+
+ This style of interface is very efficient, but places a burden on the
+ application to properly size each CQ to match the Work Queues that
+ feed it.
+
+ While the format of both WQEs and CQEs is transport and device
+ dependent, a transport-neutral API can deal with WQEs and CQEs as
+ abstract transport- and device-neutral objects. Therefore, the
+ number of WQEs and CQEs required for an application can be transport
+ and device neutral.
+
+ The capacity of the Work Queues and CQs can be calculated in an
+ abstract transport- and device-neutral fashion. If a dummy operation
+ approach is used, it would require lower layers to know the usage
+ model, and would disrupt the calculations by inserting a dummy
+ "operation" Work Request and filtering out the matching completion.
+ The lower layer does not know the usage model on which the queue
+ sizes are built, nor does it know how frequently an insertion will be
+ required.
+
+4.4.3. Host-based Implementation of MPA Fencing
+
+ Many hardware implementations of RDDP using MPA/TCP do not handle the
+ MPA Request/Reply exchange in hardware, rather they are handled by
+ the host processor in software. With such designs, it is common for
+ the MPA Fencing to be implemented in the user-space, device-specific
+ library (commonly referred to as a 'User Verbs' library or module).
+
+ When the generation and reception of MPA FULPDUs are already
+ dedicated to hardware, a Work Completion can only be generated by an
+ untagged message, since arrival of a message for a tagged buffer does
+ not necessarily generate a completion and is done without any
+ interaction with ULP [RFC5040].
+
+
+
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 12]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+5. Enhanced MPA Connection Establishment
+
+ Below we provide an overview of Enhanced Connection Setup. The goal
+ is to allow standard negotiation of the ORD/IRD setting on both sides
+ of the RDMA connection and/or to negotiate the initial data transfer
+ operation by the initiator when the existing 'client sends first'
+ rule does not match application requirements.
+
+ The RDMA connection initiator sends an MPA Request, as specified in
+ [RFC5044]; the new format defined here allows for:
+
+ o Standardized negotiation of ORD and IRD.
+
+ o Negotiation of RTR functionality and the RDMA message type to use
+ as the RTR indication.
+
+ The RDMA connection responder processes the MPA Request and generates
+ an MPA Reply, as specified in [RFC5044]; the new format completes the
+ negotiation.
+
+ The local interface needs to provide a way for a ULP to request the
+ use of explicit RTR indication on a per-application or per-connection
+ basis when an explicit RTR indication will be required. Piggybacking
+ the RTR on a Client's first message is a valuable optimization for
+ most connections.
+
+ The RDMA connection initiator MUST NOT allow any later FULPDUs to be
+ transmitted before the RTR indication. One method to achieve this is
+ to delay notifying the ULP that the RDMA connection has been
+ established until after any required RTR indication has been
+ transmitted.
+
+ All MPA exchanges are performed via TCP prior to RDMA establishment,
+ and are therefore signaled via TCP and not via RDMA completion.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 13]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+6. Enhanced MPA Request/Reply Frames
+
+ Enhanced RDMA connection establishment uses an alternate format for
+ MPA Requests and Replies as follows:
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ 0 | |
+ + Key (16 bytes containing "MPA ID Req Frame") +
+ 4 | (4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65) |
+ + Or (16 bytes containing "MPA ID Rep Frame") +
+ 8 | (4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65) |
+ + +
+ 12 | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ 16 |M|C|R|S| Res | Rev | PD_Length |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ ~ ~
+ ~ Private Data ~
+ | |
+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Key: Unchanged from [RFC5044].
+
+ M: Unchanged from [RFC5044].
+
+ C: Unchanged from [RFC5044].
+
+ R: Unchanged from [RFC5044].
+
+ S: One, if the Private Data begins with the enhanced RDMA connection
+ establishment data; 0 otherwise.
+
+ Res: One bit smaller than in [RFC5044]; otherwise unchanged. In
+ [RFC5044], the 'Res' field, in which the newly defined 'S' bit
+ resides, is reserved for future use. [RFC5044] specifies that
+ 'Res' MUST be set to zero when sending and MUST NOT be checked on
+ reception, making use of 'S' bit backwards compatibility with the
+ original MPA Frame format. When the 'S' bit is set to zero, no
+ additional Private Data is used for enhanced RDMA connection
+ establishment; therefore, the resulting MPA Request and Reply
+ Frames are identical to the unenhanced protocol.
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 14]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ Rev: This field contains the revision of MPA. To use any enhanced
+ connection establishment feature, this MUST be set to two or
+ higher. If no enhanced connection establishment features are
+ desired, it MAY be set to one. A host accepting MPA connections
+ MUST continue to accept MPA Requests with version one, even if it
+ supports version two.
+
+ PD_Length: Unchanged from [RFC5044]. This is the total length of
+ the Private Data field, including the enhanced RDMA connection
+ establishment data, if present.
+
+ Private Data: Unchanged from [RFC5044]. However, if the 'S' flag is
+ set, Private Data MUST begin with enhanced RDMA connection
+ establishment data (see Section 9).
+
+7. Enhanced SCTP Session Control Chunks
+
+ Enhanced RDMA connection establishment uses the first 32 bits of the
+ Private Data field for IRD and ORD negotiation in the "DDP Stream
+ Session Initiate" and "DDP Stream Session Accept" SCTP Session
+ Control Chunks.
+
+ The type of the SCTP Session Control Chunk is defined by a Function
+ Code (see [RFC4960]). [RFC5043] already defines codes for 'DDP
+ Stream Session Initiate' and 'DDP Stream Session Accept', which are
+ equivalent to an MPA Request Frame and an accepting MPA Reply Frame.
+
+ Enhanced RDMA connection establishment requires three additional
+ function codes listed below:
+
+ Enhanced DDP Stream Session Initiate: 0x005
+
+ Enhanced DDP Stream Session Accept: 0x006
+
+ Enhanced DDP Stream Session Reject: 0x007
+
+ The Enhanced Reject function code MUST be used to indicate rejection
+ of enhanced DDP stream session for a configuration that would have
+ been accepted for unenhanced DDP stream session negotiation.
+
+ The enhanced DDP stream session establishment follows the same rules
+ as the standard DDP stream session establishment as defined in
+ [RFC5043]. ULP-supplied Private Data MUST be included for Enhanced
+ DDP Stream Session Initiate, Enhanced DDP Stream Session Accept, and
+ Enhanced DDP Stream Session Reject messages, and MUST follow the
+ enhanced RDMA connection establishment data in the DDP Stream Session
+ Initiate and the Enhanced DDP Stream Session Accept messages.
+
+
+
+
+Kanevsky, et al. Standards Track [Page 15]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ Private Data length MUST NOT exceed 512 bytes in any message,
+ including enhanced RDMA connection establishment data.
+
+ Private Data MUST NOT be included in the DDP Stream Session TERM
+ message.
+
+ Received Extended DDP Stream Session Control messages SHOULD be
+ reported to the ULP. If reported, any supplied Private Data MUST be
+ available for the ULP to examine. For example, a received Extended
+ DDP Stream Session Control message is not reported to ULP if none of
+ the requested RTR indication types are supported by the receiver. In
+ this case, the Provider MAY generate a reject reply message
+ indicating which RTR indication types it supports.
+
+ The enhanced DDP stream management MUST use the DDP stream session
+ termination function code to terminate a stream established using
+ enhanced DDP stream session function codes.
+
+ [RFC5043] already supports either side sending the first DDP Message
+ since the Payload Protocol Identifier (PPID) already distinguishes
+ between Session Establishment and DDP Segments. The enhanced RDMA
+ connection establishment provides the ULP a transport-independent way
+ to support the peer-to-peer model.
+
+ The following additional Legal Sequences of DDP Stream Session
+ messages are defined:
+
+ o Enhanced Active/Passive Session Accepted: as with Section 6.2 of
+ [RFC5043], but with the extended opcodes as defined in this
+ document.
+
+ o Enhanced Active/Passive Session Rejected: as with Section 6.3 of
+ [RFC5043], but with the extended opcodes as defined in this
+ document.
+
+ o Enhanced Active/Passive Session Non-ULP Rejected: as with Section
+ 6.4 of [RFC5043], but with the extended opcodes as defined in this
+ document.
+
+8. MPA Error Reporting
+
+ The RDMA connection establishment protocol is layered upon the
+ protocols defined in [RFC5040] and [RFC5041]. Any enhanced RDMA
+ connection establishment error generates an MPA termination message
+ to a peer. [RFC5040] defines a triplet of protocol layers, error
+ types, and error codes for error specification. MPA negotiation for
+ RDMA connection establishment uses the following layer and error type
+ for MPA error reporting:
+
+
+
+Kanevsky, et al. Standards Track [Page 16]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ Layer: 0x2 - LLP Error Type: 0x0 - MPA
+
+ While [RFC5044] defines four error codes, [RFC5043] does not define
+ any. Enhanced RDMA connection establishment extends the error codes
+ defined in [RFC5044] by adding three new error codes. Thus, enhanced
+ RDMA connection establishment is backward compatible with both
+ [RFC5043] and [RFC5044].
+
+ The following error codes are defined for enhanced RDMA connection
+ establishment negotiation:
+
+ Error Code Description
+ --------------------------------------------------------
+ 0x05 Local catastrophic
+ 0x06 Insufficient IRD resources
+ 0x07 No matching RTR option
+
+9. Enhanced RDMA Connection Establishment Data
+
+ Enhanced RDMA connection establishment places the following 32 bits
+ at the beginning of the Private Data field of the MPA Request and
+ Reply Frames or the "DDP Stream Session Initiate" and "DDP Stream
+ Session Accept" SCTP Session Control Chunks. ULP-specified Private
+ Data follows this field. The maximum amount of ULP-specified Private
+ Data is therefore reduced by 4 bytes. Note that this field MUST be
+ sent in network byte order, with the IRD and ORD encoded as 14-bit
+ unsigned integers.
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ 0 |A|B| IRD |C|D| ORD |
+ 4 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ IRD: Inbound RDMA Read Queue Depth.
+
+ ORD: Outbound RDMA Read Queue Depth.
+
+ A: Control Flag for connection model.
+
+ B: Control Flag for use of a zero-length FULPDU (Send) RTR
+ indication.
+
+ C: Control Flag for use of a zero-length RDMA Write RTR indication.
+
+ D: Control Flag for use of a zero-length RDMA Read RTR indication.
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 17]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+9.1. IRD and ORD Negotiation
+
+ The IRD and ORD are used for negotiation of Inbound RDMA Read Request
+ Queue depths for both endpoints of the RDMA connection. The IRD is
+ used to configure the depth of the Inbound RDMA Read Request Queue
+ (IRRQ) on each endpoint. ORD is used to limit the number of
+ simultaneous outbound RDMA Read Requests allowed at any given point
+ in time in order to avoid IRRQ overruns at the remote endpoint. In
+ order to describe the negotiation of both local endpoint and remote
+ endpoint ORD and IRD values, four terms are defined:
+
+ Initiator IRD: The IRD value sent in the MPA Request or "DDP Stream
+ Session Initiate" SCTP Session Control Chunk. This is the value
+ of the initiator's IRD at the time of the MPA Request generation.
+ The responder sets its local ORD value to this value or less. The
+ initiator IRD is the maximum number of simultaneous inbound RDMA
+ Read Requests that the initiator can support for the requested
+ connection.
+
+ Initiator ORD: The ORD value in the MPA Request or "DDP Stream
+ Session Initiate" SCTP Session Control Chunk. This is the initial
+ value of the initiator's ORD at the time of the MPA Request
+ generation and also a request to the responder to support a
+ responder IRD of at least this value. The initiator ORD is the
+ maximum number of simultaneous outbound RDMA Read operations that
+ the initiator desires the responder to support for the requested
+ connection.
+
+ Responder IRD: The IRD value returned in the MPA Reply or "DDP
+ Stream Session Accept" SCTP Session Control Chunk. This is the
+ actual value that the responder sets for its local IRD. This
+ value is greater than or equal to the initiator ORD for successful
+ negotiations. The responder IRD is the maximum number of
+ simultaneous inbound RDMA Read Requests that the responder
+ actually can support for the requested connection.
+
+ Responder ORD: The ORD value returned in the MPA Reply or "DDP
+ Stream Session Accept" SCTP Session Control Chunk. This is the
+ actual value that the responder used for ORD and is less than or
+ equal to the initiator IRD for successful negotiations. The
+ responder ORD is the maximum number of simultaneous outbound RDMA
+ Read operations that the responder will allow for the requested
+ connection.
+
+
+
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 18]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ The relationships between these parameters after a successful
+ negotiation is complete are the following:
+
+ initiator ORD <= responder IRD
+
+ responder ORD <= initiator IRD
+
+ The responder and initiator MUST pass the peer's provided IRD and ORD
+ values to the ULP, in addition to using the values as calculated by
+ the preceding rules.
+
+ The responder ORD SHOULD be set to a value less than or equal to the
+ initiator IRD. If the initiator ORD is insufficient to support the
+ selected connection model, the responder IRD MAY be increased; for
+ example, if the initiator ORD is 0 (RDMA Reads will not be used by
+ the ULP) and the responder supports use of a zero-length RDMA Read
+ RTR indication, then the responder IRD can be set to 1. The
+ responder MUST set its ORD at most to the initiator IRD. The
+ responder MAY reject the connection request if the initiator IRD is
+ not sufficient for the ULP-required ORD and specify the required ORD
+ in the MPA Reject Frame responder ORD. Thus, the TERM message MUST
+ contain Layer 2, Error Type 0, Error Code 6.
+
+ Upon receiving the MPA Accept Frame from the responder, the initiator
+ MUST set its IRD at least to the responder ORD and its ORD at most to
+ the responder IRD. If the initiator does not have sufficient
+ resources for the required IRD, it MUST send a TERM message to the
+ responder indicating insufficient resources and terminate the
+ connection due to insufficient resources. Thus, the TERM message
+ MUST contain Layer 2, Error Type 0, Error Code 6.
+
+ The initiator MUST pass the responder provided IRD and ORD to the ULP
+ for both MPA Accept and Reject messages. The initiator ULP can
+ decide its course of action. For example, the initiator ULP may
+ terminate the established connection and renegotiate the responder
+ ORD.
+
+ An all ones value (0x3FFF) indicates that automatic negotiation of
+ the IRD or ORD is not desired, and that the ULP will be responsible
+ for it. The responder MUST respond to an initiator ORD value of
+ 0x3FFF by leaving its local endpoint IRD value unchanged and setting
+ the IRD to 0x3FFF in its reply message. The initiator MUST leave its
+ local endpoint ORD value unchanged upon receiving a responder IRD
+ value of 0x3FFF. The responder MUST respond to an initiator IRD
+ value of 0x3FFF by leaving its local endpoint ORD value unchanged,
+ and setting ORD to 0x3FFF in its reply message. The initiator MUST
+ leave its local endpoint IRD value unchanged upon receiving a
+ responder ORD value of 0x3FFF.
+
+
+
+Kanevsky, et al. Standards Track [Page 19]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+9.2. Peer-to-Peer Connection Negotiation
+
+ Control Flag A value 1 indicates that a peer-to-peer connection model
+ is being performed, and value 0 indicates a client-server model.
+ Control Flag B value 1 indicates that a zero-length FULPDU (Send) RTR
+ indication is requested for the initiator and supported by the
+ responder, respectively, 0 otherwise. Control Flag C value 1
+ indicates that a zero-length RDMA Write RTR indication is requested
+ for the initiator and supported by the responder, respectively, 0
+ otherwise. Control Flag D value 1 indicates that a zero-length RDMA
+ Read RTR indication is requested for the initiator and supported by
+ the responder, respectively, 0 otherwise. The initiator MUST set
+ Control Flag A to 1 for the peer-to-peer model. The initiator MUST
+ set each Control Flag B, C, and D to 1 for each of the options it
+ supports, if Control Flag A is set to 1.
+
+ The responder MUST support at least one RTR indication option if it
+ supports Enhanced RDMA connection establishment. If Control Flag A
+ is 1 in the MPA Request message, then the responder MUST set Control
+ Flag A to 1 in the MPA reply message. For each initiator-supported
+ RTR indication option, the responder SHOULD set the corresponding
+ Control Flag if the responder can support that option in an MPA
+ reply. The responder is not required to specify all RTR indication
+ options it supports. The responder MUST set at least one RTR
+ indication option if it supports more than one initiator-specified
+ RTR indication option. The responder MAY include additional RTR
+ indication options it supports, even if not requested by any
+ initiator specified RTR indication options. If the responder does
+ not support any of the initiator-specified RTR indication options,
+ then the responder MUST set at least one RTR indication type option
+ it supports.
+
+ Upon receiving the MPA Accept Frame with Control Flag A set to 1, the
+ initiator MUST generate one of the negotiated RTR indications. If
+ the initiator is not able to generate any of the responder-supported
+ RTR indications, then it MUST send a TERM message to the responder
+ indicating failure to negotiate a mutually compatible connection
+ model or RTR option, and terminate the connection. Thus, the TERM
+ message MUST contain Layer 2, Error Type 0, Error Code 7. The ULP
+ can negotiate a ULP-level RTR indication when a Provider-level RTR
+ indication cannot be negotiated.
+
+ The initiator MUST set Control Flag A to 0 for the client-server
+ model. The responder MUST set Control Flag A to 0 if Control Flag A
+ is 0 in the request. If Control Flag A is set to 0, then Control
+ Flags B, C, and D MUST also be set to 0. On reception, if Control
+ Flag A is set to 0, then Control Flags B, C, and D MUST be ignored.
+
+
+
+
+Kanevsky, et al. Standards Track [Page 20]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+9.3. Enhanced Connection Negotiation Flow
+
+ The RTR indication type and ORD/IRD negotiation follows the following
+ order:
+
+ initiator (MPA Request) --> The initiator sets Control Flag A to 1
+ to indicate the peer-to-peer connection model and sets its initial
+ IRD/ORD on the local endpoint of the connection. The initiator
+ also sets Control Flags B, C, and D to 1 for each initiator-
+ supported option of RTR indication.
+
+ responder (MPA Reply) <-- The responder matches the initiator's
+ Control Flag A value and sets ORD/IRD to its local endpoint values
+ based upon the initiator's initial ORD/IRD values and the number
+ of simultaneous RDMA Read Requests required by the ULP. The
+ responder sets Control Flags B, C, and D to 1 for each responder-
+ supported option of RTR indication options for the peer-to-peer
+ connection model. The responder also sets its IRD/ORD to actual
+ values.
+
+ initiator (First RDMA Message) --> After the initiator modifies its
+ ORD/IRD to match the responder's values as stated above, the
+ initiator sends the first message of the negotiated RTR indication
+ option. If no matching RTR indication option exists, then the
+ initiator sends a TERM message.
+
+ The initiator or responder MUST generate the TERM message that
+ contains Layer 2, Error Type 0, Error Code 5 when it encounters
+ any error locally for which the special Error Code is not defined
+ in Section 8 before resetting the connection.
+
+10. Interoperability
+
+ The initiator requests enhanced RDMA connection establishment by
+ sending an enhanced RDMA establishment request; an enhanced responder
+ is REQUIRED to respond with an enhanced RDMA connection establishment
+ response, whereas an unenhanced responder treats the enhanced request
+ as incorrectly formatted and closes the TCP connection. All
+ responders are REQUIRED to issue unenhanced RDMA connection
+ establishment responses in response to unenhanced RDMA connection
+ establishment requests.
+
+ The initiator MUST NOT use the enhanced RDMA connection establishment
+ formats or function codes when no enhanced functionality is desired.
+
+ The responder MUST continue to accept unenhanced connection requests.
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 21]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+ There are three initiator/responder cases that involve enhanced MPA:
+ both the initiator and responder, only the responder, and only the
+ initiator. The enhanced MPA Frame is defined by field 'S' set to 1.
+
+ Enhanced MPA initiator and responder: If the responder receives an
+ enhanced MPA message, it MUST respond with an enhanced MPA
+ message.
+
+ Enhanced MPA responder only: If the responder receives an unenhanced
+ MPA message ('S' is set to 0), it MUST respond with an unenhanced
+ MPA message.
+
+ Enhanced MPA initiator only: If the responder receives an enhanced
+ MPA message and it does not support enhanced RDMA connection
+ establishment, it MUST close the TCP connection and exit MPA.
+ From a standard RDMA connection establishment point of view, the
+ enhanced MPA Frame is improperly formatted as stated in [RFC5044].
+ Thus, both the initiator and responder report TCP connection
+ termination to an application locally. In this case, the
+ initiator MAY attempt to establish an RDMA connection using the
+ unenhanced MPA protocol as defined in [RFC5044] if this protocol
+ is compatible with the application, and let the ULP deal with ORD
+ and IRD and peer-to-peer negotiations.
+
+ A note for potential future enhancements for connection establishment
+ negotiation: It is possible to further extend formatting of Private
+ Data of the MPA Request and Reply Frames and to use other bits from
+ the "Res" field to indicate additional Private Data formatting.
+
+11. IANA Considerations
+
+ IANA has added the following entries to the "SCTP Function Codes for
+ DDP Session Control" registry created by Section 3.5 of [RFC6580]:
+
+ 0x0005, Enhanced DDP Stream Session Initiate, [RFC6581]
+
+ 0x0006, Enhanced DDP Stream Session Accept, [RFC6581]
+
+ 0x0007, Enhanced DDP Stream Session Reject, [RFC6581]
+
+ IANA has added the following entries to the "MPA Errors" registry
+ created by Section 3.3 of [RFC6580]:
+
+ 0x2/0x0/0x05, - MPA Error / Local catastrophic error, [RFC6581]
+
+ 0x2/0x0/0x06 - MPA Error / Insufficient IRD resources, [RFC6581]
+
+ 0x2/0x0/0x07 - MPA Error / No matching RTR option, [RFC6581]
+
+
+
+Kanevsky, et al. Standards Track [Page 22]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+12. Security Considerations
+
+ The security considerations from RFC 5044 and RFC 5043 apply and the
+ changes in this document do not introduce new security
+ considerations. However, it is recommended that implementations do
+ sanity checking for the input parameters, including ORD, IRD, and the
+ control flags used for RTR indication option negotiation.
+
+13. Acknowledgements
+
+ The authors wish to thank Sean Hefty, Dave Minturn, Tom Talpey, David
+ Black, and David Harrington for their valuable contributions and
+ reviews of this document.
+
+14. References
+
+14.1. Normative References
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+ [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC
+ 4960, September 2007.
+
+ [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D.
+ Garcia, "A Remote Direct Memory Access Protocol
+ Specification", RFC 5040, October 2007.
+
+ [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct
+ Data Placement over Reliable Transports", RFC 5041,
+ October 2007.
+
+ [RFC5043] Bestler, C. and R. Stewart, "Stream Control Transmission
+ Protocol (SCTP) Direct Data Placement (DDP) Adaptation",
+ RFC 5043, October 2007.
+
+ [RFC5044] Culley, P., Elzur, U., Recio, R., Bailey, S., and J.
+ Carrier, "Marker PDU Aligned Framing for TCP
+ Specification", RFC 5044, October 2007.
+
+ [RFC6580] Ko, M. and D. Black, "IANA Registries for the Remote
+ Direct Data Placement (RDDP) Protocols", RFC 6580, April
+ 2012.
+
+
+
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 23]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+14.2. Informative References
+
+ [DAPL] "Direct Access Programming Library",
+ <http://www.datcollaborative.org/uDAPL_doc_062102.pdf>.
+
+ [IBTA] "InfiniBand Architecture Specification Release 1.2.1",
+ <http://www.infinibandta.org>.
+
+ [OFA] "OFA verbs & APIs", <http://www.openfabrics.org/>.
+
+ [OpenMP] McGraw-Hill, "Parallel Programming in C with MPI and
+ OpenMP", 2003.
+
+ [PPMPI] Morgan Kaufmann Publishers Inc., "Parallel Programming
+ with MPI", 2008.
+
+ [RDMAC] "RDMA Protocol Verbs Specification (Version 1.0)",
+ <http://www.rdmaconsortium.org/home/
+ draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf>.
+
+ [RDS] Open Fabrics Association, "Reliable Datagram Socket",
+ 2008,
+ <http://www.openfabrics.org/archives/spring2008sonoma>.
+
+ [UsingMPI] MIT Press, "Using MPI-2: Advanced Features of the Message
+ Passing Interface", 1999.
+
+ [VIA] Cameron, Don and Greg Regnier, "Virtual Interface
+ Architecture", Intel, April 2002.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 24]
+
+RFC 6581 Enhanced RDMA Connection Establishment April 2012
+
+
+Authors' Addresses
+
+ Arkady Kanevsky (editor)
+ Dell Inc.
+ One Dell Way, MS PS2-47
+ Round Rock, TX 78682
+ USA
+
+ Phone: +1-512-728-0000
+ EMail: arkady.kanevsky@gmail.com
+
+
+ Caitlin Bestler (editor)
+ Nexenta Systems
+ 555 E El Camino Real #104
+ Sunnyvale, CA 94087
+ USA
+
+ Phone: +1-949-528-3085
+ EMail: Caitlin.Bestler@nexenta.com
+
+
+ Robert Sharp
+ Intel
+ LAD High Performance Message Passing, Mailstop: AN1-WTR1
+ 1501 South Mopac, Suite 400
+ Austin, TX 78746
+ USA
+
+ Phone: +1-512-493-3242
+ EMail: robert.o.sharp@intel.com
+
+
+ Steve Wise
+ Open Grid Computing
+ 4030 Braker Lane STE 130
+ Austin, TX 78759
+ USA
+
+ Phone: +1-512-343-9196 x101
+ EMail: swise@opengridcomputing.com
+
+
+
+
+
+
+
+
+
+
+Kanevsky, et al. Standards Track [Page 25]
+