diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc5040.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc5040.txt')
-rw-r--r-- | doc/rfc/rfc5040.txt | 3699 |
1 files changed, 3699 insertions, 0 deletions
diff --git a/doc/rfc/rfc5040.txt b/doc/rfc/rfc5040.txt new file mode 100644 index 0000000..41f5d17 --- /dev/null +++ b/doc/rfc/rfc5040.txt @@ -0,0 +1,3699 @@ + + + + + + +Network Working Group R. Recio +Request for Comments: 5040 B. Metzler +Category: Standards Track IBM Corporation + P. Culley + J. Hilland + Hewlett-Packard Company + D. Garcia + October 2007 + + + A Remote Direct Memory Access Protocol Specification + +Status of This Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Abstract + + This document defines a Remote Direct Memory Access Protocol (RDMAP) + that operates over the Direct Data Placement Protocol (DDP protocol). + RDMAP provides read and write services directly to applications and + enables data to be transferred directly into Upper Layer Protocol + (ULP) Buffers without intermediate data copies. It also enables a + kernel bypass implementation. + + + + + + + + + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 1] + +RFC 5040 RDMA Protocol Specification October 2007 + + +Table of Contents + + 1. Introduction ....................................................4 + 1.1. Architectural Goals ........................................4 + 1.2. Protocol Overview ..........................................5 + 1.3. RDMAP Layering .............................................7 + 2. Glossary ........................................................8 + 2.1. General ....................................................8 + 2.2. LLP .......................................................10 + 2.3. Direct Data Placement (DDP) ...............................11 + 2.4. Remote Direct Memory Access (RDMA) ........................13 + 3. ULP and Transport Attributes ...................................15 + 3.1. Transport Requirements and Assumptions ....................15 + 3.2. RDMAP Interactions with the ULP ...........................16 + 4. Header Format ..................................................19 + 4.1. RDMAP Control and Invalidate STag Field ...................20 + 4.2. RDMA Message Definitions ..................................23 + 4.3. RDMA Write Header .........................................24 + 4.4. RDMA Read Request Header ..................................24 + 4.5. RDMA Read Response Header .................................26 + 4.6. Send Header and Send with Solicited Event Header ..........26 + 4.7. Send with Invalidate Header and Send with SE and + Invalidate Header .........................................26 + 4.8. Terminate Header ..........................................26 + 5. Data Transfer ..................................................32 + 5.1. RDMA Write Message ........................................32 + 5.2. RDMA Read Operation .......................................33 + 5.2.1. RDMA Read Request Message ..........................33 + 5.2.2. RDMA Read Response Message .........................35 + 5.3. Send Message Type .........................................36 + 5.4. Terminate Message .........................................37 + 5.5. Ordering and Completions ..................................38 + 6. RDMAP Stream Management ........................................41 + 6.1. Stream Initialization .....................................41 + 6.2. Stream Teardown ...........................................42 + 6.2.1. RDMAP Abortive Termination .........................43 + 7. RDMAP Error Management .........................................43 + 7.1. RDMAP Error Surfacing .....................................44 + 7.2. Errors Detected at the Remote Peer on Incoming + RDMA Messages .............................................45 + 8. Security Considerations ........................................46 + 8.1. Summary of RDMAP-Specific Security Requirements ...........46 + 8.1.1. RDMAP (RNIC) Requirements ..........................47 + 8.1.2. Privileged Resource Manager Requirements ...........48 + 8.2. Security Services for RDMAP ...............................49 + 8.2.1. Available Security Services ........................49 + 8.2.2. Requirements for IPsec Services for RDMAP ..........50 + 9. IANA Considerations ............................................51 + + + +Recio, et al. Standards Track [Page 2] + +RFC 5040 RDMA Protocol Specification October 2007 + + + 10. References ....................................................52 + 10.1. Normative References .....................................52 + 10.2. Informative References ...................................53 + Appendix A. DDP Segment Formats for RDMA Messages .................54 + A.1. DDP Segment for RDMA Write ................................54 + A.2. DDP Segment for RDMA Read Request .........................55 + A.3. DDP Segment for RDMA Read Response ........................56 + A.4. DDP Segment for Send and Send with Solicited Event ........56 + A.5. DDP Segment for Send with Invalidate and Send with SE and + Invalidate ................................................57 + A.6. DDP Segment for Terminate .................................58 + Appendix B. Ordering and Completion Table .........................59 + Appendix C. Contributors ..........................................61 + +Table of Figures + + Figure 1: RDMAP Layering ...........................................7 + Figure 2: Example of MPA, DDP, and RDMAP Header Alignment over TCP .8 + Figure 3: DDP Control, RDMAP Control, and Invalidate STag Fields ..20 + Figure 4: RDMA Usage of DDP Fields ................................22 + Figure 5: RDMA Message Definitions ................................23 + Figure 6: RDMA Read Request Header Format .........................24 + Figure 7: Terminate Header Format .................................27 + Figure 8: Terminate Control Field .................................27 + Figure 9: Terminate Control Field Values ..........................29 + Figure 10: Error Type to RDMA Message Mapping .....................32 + Figure 11: RDMA Write, DDP Segment Format .........................54 + Figure 12: RDMA Read Request, DDP Segment Format ..................55 + Figure 13: RDMA Read Response, DDP Segment Format .................56 + Figure 14: Send and Send with Solicited Event, DDP Segment Format .56 + Figure 15: Send with Invalidate and Send with SE and Invalidate, + DDP Segment Format .....................................57 + Figure 16: Terminate, DDP Segment Format ..........................58 + Figure 17: Operation Ordering .....................................59 + + + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 3] + +RFC 5040 RDMA Protocol Specification October 2007 + + +1. Introduction + + Today, communications over TCP/IP typically require copy operations, + which add latency and consume significant CPU and memory resources. + The Remote Direct Memory Access Protocol (RDMAP) enables removal of + data copy operations and enables reduction in latencies by allowing a + local application to read or write data on a remote computer's memory + with minimal demands on memory bus bandwidth and CPU processing + overhead, while preserving memory protection semantics. + + RDMAP is layered on top of Direct Data Placement (DDP) and uses the + two buffer models available from DDP. DDP-related terminology is + discussed in Section 2.3. As RDMAP builds on DDP, the reader is + advised to become familiar with [DDP]. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC 2119 [RFC2119]. + +1.1. Architectural Goals + + RDMAP has been designed with the following high-level architectural + goals: + + * Provide a data transfer operation that allows a Local Peer to + transfer up to 2^32 - 1 octets directly into a previously + Advertised Buffer (i.e., Tagged Buffer) located at a Remote Peer + without requiring a copy operation. This is referred to as the + RDMA Write data transfer operation. + + * Provide a data transfer operation that allows a Local Peer to + retrieve up to 2^32 - 1 octets directly from a previously + Advertised Buffer (i.e., Tagged Buffer) located at a Remote Peer + without requiring a copy operation. This is referred to as the + RDMA Read data transfer operation. + + * Provide a data transfer operation that allows a Local Peer to send + up to 2^32 - 1 octets directly into a buffer located at a Remote + Peer that has not been explicitly Advertised. This is referred to + as the Send (Send with Invalidate, Send with Solicited Event, and + Send with Solicited Event and Invalidate) data transfer operation. + + * Enable the local ULP to use the Send Operation Type (includes + Send, Send with Invalidate, Send with Solicited Event, and Send + with Solicited Event and Invalidate) to signal to the remote ULP + the Completion of all previous Messages initiated by the local + ULP. + + + + +Recio, et al. Standards Track [Page 4] + +RFC 5040 RDMA Protocol Specification October 2007 + + + * Provide for all operations on a single RDMAP Stream to be reliably + transmitted in the order that they were submitted. + + * Provide RDMAP capabilities independently for each Stream when the + LLP supports multiple data Streams within an LLP connection. + +1.2. Protocol Overview + + RDMAP provides seven data transfer operations. Except for the RDMA + Read operation, each operation generates exactly one RDMA Message. + Following is a brief overview of the RDMA Operations and RDMA + Messages: + + 1. Send - A Send operation uses a Send Message to transfer data from + the Data Source into a buffer that has not been explicitly + Advertised by the Data Sink. The Send Message uses the DDP + Untagged Buffer Model to transfer the ULP Message into the Data + Sink's Untagged Buffer. + + 2. Send with Invalidate - A Send with Invalidate operation uses a + Send with Invalidate Message to transfer data from the Data + Source into a buffer that has not been explicitly Advertised by + the Data Sink. The Send with Invalidate Message includes all + functionality of the Send Message, with one addition: an STag + field is included in the Send with Invalidate Message. After the + message has been Placed and Delivered at the Data Sink, the + Remote Peer's buffer identified by the STag can no longer be + accessed remotely until the Remote Peer's ULP re-enables access + and Advertises the buffer. + + 3. Send with Solicited Event (Send with SE) - A Send with Solicited + Event operation uses a Send with Solicited Event Message to + transfer data from the Data Source into an Untagged Buffer at the + Data Sink. The Send with Solicited Event Message is similar to + the Send Message, with one addition: when the Send with Solicited + Event Message has been Placed and Delivered, an Event may be + generated at the recipient, if the recipient is configured to + generate such an Event. + + 4. Send with Solicited Event and Invalidate (Send with SE and + Invalidate) - A Send with Solicited Event and Invalidate + operation uses a Send with Solicited Event and Invalidate Message + to transfer data from the Data Source into a buffer that has not + been explicitly Advertised by the Data Sink. The Send with + Solicited Event and Invalidate Message is similar to the Send + with Invalidate Message, with one addition: when the Send with + + + + + +Recio, et al. Standards Track [Page 5] + +RFC 5040 RDMA Protocol Specification October 2007 + + + Solicited Event and Invalidate Message has been Placed and + Delivered, an Event may be generated at the recipient, if the + recipient is configured to generate such an Event. + + 5. Remote Direct Memory Access Write - An RDMA Write operation uses + an RDMA Write Message to transfer data from the Data Source to a + previously Advertised Buffer at the Data Sink. + + The ULP at the Remote Peer, which in this case is the Data Sink, + enables the Data Sink Tagged Buffer for access and Advertises the + buffer's size (length), location (Tagged Offset), and Steering + Tag (STag) to the Data Source through a ULP-specific mechanism. + The ULP at the Local Peer, which in this case is the Data Source, + initiates the RDMA Write operation. The RDMA Write Message uses + the DDP Tagged Buffer Model to transfer the ULP Message into the + Data Sink's Tagged Buffer. Note: the STag associated with the + Tagged Buffer remains valid until the ULP at the Remote Peer + invalidates it or the ULP at the Local Peer invalidates it + through a Send with Invalidate or Send with Solicited Event and + Invalidate. + + 6. Remote Direct Memory Access Read - The RDMA Read operation + transfers data to a Tagged Buffer at the Local Peer, which in + this case is the Data Sink, from a Tagged Buffer at the Remote + Peer, which in this case is the Data Source. The ULP at the Data + Source enables the Data Source Tagged Buffer for access and + Advertises the buffer's size (length), location (Tagged Offset), + and Steering Tag (STag) to the Data Sink through a ULP-specific + mechanism. The ULP at the Data Sink enables the Data Sink Tagged + Buffer for access and initiates the RDMA Read operation. The + RDMA Read operation consists of a single RDMA Read Request + Message and a single RDMA Read Response Message, and the latter + may be segmented into multiple DDP Segments. + + The RDMA Read Request Message uses the DDP Untagged Buffer Model + to Deliver the STag, starting Tagged Offset, and length for both + the Data Source and Data Sink Tagged Buffers to the Remote Peer's + RDMA Read Request Queue. + + The RDMA Read Response Message uses the DDP Tagged Buffer Model + to Deliver the Data Source's Tagged Buffer to the Data Sink, + without any involvement from the ULP at the Data Source. + + Note: the Data Source STag associated with the Tagged Buffer + remains valid until the ULP at the Data Source invalidates it or + the ULP at the Data Sink invalidates it through a Send with + + + + + +Recio, et al. Standards Track [Page 6] + +RFC 5040 RDMA Protocol Specification October 2007 + + + Invalidate or Send with Solicited Event and Invalidate. The Data + Sink STag associated with the Tagged Buffer remains valid until + the ULP at the Data Sink invalidates it. + + 7. Terminate - A Terminate operation uses a Terminate Message to + transfer to the Remote Peer information associated with an error + that occurred at the Local Peer. The Terminate Message uses the + DDP Untagged Buffer Model to transfer the Message into the Data + Sink's Untagged Buffer. + +1.3. RDMAP Layering + + RDMAP is dependent on DDP, subject to the requirements defined in + Section 3.1, "Transport Requirements and Assumptions". Figure 1, + "RDMAP Layering", depicts the relationship between Upper Layer + Protocols (ULPs), RDMAP, DDP protocol, the framing layer, and the + transport. For LLP protocol definitions of each LLP, see [MPA], + [TCP], and [SCTP]. + + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | Upper Layer Protocol (ULP) | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | RDMAP | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + | DDP protocol | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | | + | MPA | | + | | | + +-+-+-+-+-+-+-+-+-+ SCTP | + | | | + | TCP | | + | | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 1: RDMAP Layering + + If RDMAP is layered over DDP/MPA/TCP, then the respective headers and + ULP Payload are arranged as follows (Note: For clarity, MPA header + and CRC fields are included but MPA markers are not shown): + + + + + +Recio, et al. Standards Track [Page 7] + +RFC 5040 RDMA Protocol Specification October 2007 + + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + // TCP Header // + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | MPA Header | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + | | + // DDP Header // + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + // RDMA Header // + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + // ULP Payload // + // (shown with no pad bytes) // + // // + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | MPA CRC | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 2: Example of MPA, DDP, and RDMAP Header Alignment over TCP + +2. Glossary + +2.1. General + + Advertisement (Advertised, Advertise, Advertisements, Advertises) - + the act of informing a Remote Peer that a local RDMA Buffer is + available to it. A Node makes available an RDMA Buffer for + incoming RDMA Read or RDMA Write access by informing its RDMA/DDP + peer of the Tagged Buffer identifiers (STag, base address, and + buffer length). This Advertisement of Tagged Buffer information + is not defined by RDMA/DDP and is left to the ULP. A typical + method would be for the Local Peer to embed the Tagged Buffer's + Steering Tag, base address, and length in a Send Message destined + for the Remote Peer. + + Completion - Refer to "RDMA Completion" in Section 2.4. + + Completed - See "RDMA Completion" in Section 2.4. + + Complete - See "RDMA Completion" in Section 2.4. + + + +Recio, et al. Standards Track [Page 8] + +RFC 5040 RDMA Protocol Specification October 2007 + + + Completes - See "RDMA Completion" in Section 2.4. + + Data Sink - The peer receiving a data payload. Note that the Data + Sink can be required to both send and receive RDMA/DDP Messages + to transfer a data payload. + + Data Source - The peer sending a data payload. Note that the Data + Source can be required to both send and receive RDMA/DDP Messages + to transfer a data payload. + + Data Delivery (Delivery, Delivered, Delivers) - Delivery is defined + as the process of informing the ULP or consumer that a particular + Message is available for use. This is specifically different + from "Placement", which may generally occur in any order, while + the order of "Delivery" is strictly defined. See "Data + Placement" in Section 2.3. + + Delivery - See Data Delivery in Section 2.1. + + Delivered - See Data Delivery in Section 2.1. + + Delivers - See Data Delivery in Section 2.1. + + Fabric - The collection of links, switches, and routers that connect + a set of Nodes with RDMA/DDP protocol implementations. + + Fence (Fenced, Fences) - To block the current RDMA Operation from + executing until prior RDMA Operations have Completed. + + iWARP - A suite of wire protocols comprised of RDMAP, DDP, and MPA. + The iWARP protocol suite may be layered above TCP, SCTP, or other + transport protocols. + + Local Peer - The RDMA/DDP protocol implementation on the local end of + the connection. Used to refer to the local entity when + describing a protocol exchange or other interaction between two + Nodes. + + Node - A computing device attached to one or more links of a Fabric + (network). A Node in this context does not refer to a specific + application or protocol instantiation running on the computer. A + Node may consist of one or more RNICs installed in a host + computer. + + Placement - See "Data Placement" in Section 2.3. + + Placed - See "Data Placement" in Section 2.3. + + + + +Recio, et al. Standards Track [Page 9] + +RFC 5040 RDMA Protocol Specification October 2007 + + + Places - See "Data Placement" in Section 2.3. + + Remote Peer - The RDMA/DDP protocol implementation on the opposite + end of the connection. Used to refer to the remote entity when + describing protocol exchanges or other interactions between two + Nodes. + + RNIC - RDMA Network Interface Controller. In this context, this + would be a network I/O adapter or embedded controller with iWARP + and Verbs functionality. + + RNIC Interface (RI) - The presentation of the RNIC to the Verbs + Consumer as implemented through the combination of the RNIC and + the RNIC driver. + + Termination - See "RDMAP Abortive Termination" in Section 2.4. + + Terminated - See "RDMAP Abortive Termination" in Section 2.4. + + Terminate - See "RDMAP Abortive Termination" in Section 2.4. + + Terminates - See "RDMAP Abortive Termination" in Section 2.4. + + ULP - Upper Layer Protocol. The protocol layer above the one + currently being referenced. The ULP for RDMA/DDP is expected to + be an OS, Application, adaptation layer, or proprietary device. + The RDMA/DDP documents do not specify a ULP -- they provide a set + of semantics that allow a ULP to be designed to utilize RDMA/DDP. + + ULP Payload - The ULP data that is contained within a single protocol + segment or packet (e.g., a DDP Segment). + + Verbs - An abstract description of the functionality of an RNIC + Interface. The OS may expose some or all of this functionality + via one or more APIs to applications. The OS will also use some + of the functionality to manage the RNIC Interface. + +2.2. LLP + + LLP - Lower Layer Protocol. The protocol layer beneath the protocol + layer currently being referenced. For example, for DDP, the LLP + is SCTP, MPA, or other transport protocols. For RDMA, the LLP is + DDP. + + LLP Connection - Corresponds to an LLP transport-level connection + between the peer LLP layers on two Nodes. + + + + + +Recio, et al. Standards Track [Page 10] + +RFC 5040 RDMA Protocol Specification October 2007 + + + LLP Stream - Corresponds to a single LLP transport-level Stream + between the peer LLP layers on two Nodes. One or more LLP + Streams may map to a single transport-level LLP connection. For + transport protocols that support multiple Streams per connection + (e.g., SCTP), an LLP Stream corresponds to one transport-level + Stream. + + MULPDU - Maximum ULPDU. The current maximum size of the record that + is acceptable for DDP to pass to the LLP for transmission. + + ULPDU - Upper Layer Protocol Data Unit. The data record defined by + the layer above MPA. + +2.3. Direct Data Placement (DDP) + + Data Placement (Placement, Placed, Places) - For DDP, this term is + specifically used to indicate the process of writing to a data + buffer by a DDP implementation. DDP Segments carry Placement + information, which may be used by the receiving DDP + implementation to perform Data Placement of the DDP Segment ULP + Payload. See "Data Delivery". + + DDP Abortive Teardown - The act of closing a DDP Stream without + attempting to Complete in-progress and pending DDP Messages. + + DDP Graceful Teardown - The act of closing a DDP Stream such that all + in-progress and pending DDP Messages are allowed to Complete + successfully. + + DDP Control Field - A fixed 16-bit field in the DDP Header. The DDP + Control Field contains an 8-bit field whose contents are reserved + for use by the ULP. + + DDP Header - The header present in all DDP segments. The DDP Header + contains control and Placement fields that are used to define the + final Placement location for the ULP Payload carried in a DDP + Segment. + + DDP Message - A ULP-defined unit of data interchange, which is + subdivided into one or more DDP segments. This segmentation may + occur for a variety of reasons, including segmentation to respect + the maximum segment size of the underlying transport protocol. + + DDP Segment - The smallest unit of data transfer for the DDP + protocol. It includes a DDP Header and ULP Payload (if present). + A DDP Segment should be sized to fit within the underlying + transport protocol MULPDU. + + + + +Recio, et al. Standards Track [Page 11] + +RFC 5040 RDMA Protocol Specification October 2007 + + + DDP Stream - A sequence of DDP Messages whose ordering is defined by + the LLP. For SCTP, a DDP Stream maps directly to an SCTP Stream. + For MPA, a DDP Stream maps directly to a TCP connection, and a + single DDP Stream is supported. Note that DDP has no ordering + guarantees between DDP Streams. + + Direct Data Placement - A mechanism whereby ULP data contained within + DDP Segments may be Placed directly into its final destination in + memory without processing of the ULP. This may occur even when + the DDP Segments arrive out of order. Out-of-order Placement + support may require the Data Sink to implement the LLP and DDP as + one functional block. + + Direct Data Placement Protocol (DDP) - Also, a wire protocol that + supports Direct Data Placement by associating explicit memory + buffer placement information with the LLP payload units. + + Message Offset (MO) - For the DDP Untagged Buffer Model, specifies + the offset, in bytes, from the start of a DDP Message. + + Message Sequence Number (MSN) - For the DDP Untagged Buffer Model, + specifies a sequence number that is increasing with each DDP + Message. + + Queue Number (QN) - For the DDP Untagged Buffer Model, identifies a + destination Data Sink queue for a DDP Segment. + + Steering Tag - An identifier of a Tagged Buffer on a Node, valid as + defined within a protocol specification. + + STag - Steering Tag + + Tagged Buffer - A buffer that is explicitly Advertised to the Remote + Peer through exchange of an STag, Tagged Offset, and length. + + Tagged Buffer Model - A DDP data transfer model used to transfer + Tagged Buffers from the Local Peer to the Remote Peer. + + Tagged DDP Message - A DDP Message that targets a Tagged Buffer. + + Tagged Offset (TO) - The offset within a Tagged Buffer on a Node. + + Untagged Buffer - A buffer that is not explicitly Advertised to the + Remote Peer. Untagged Buffers support one of the two available + data transfer mechanisms called the Untagged Buffer Model. An + Untagged Buffer is used to send asynchronous control messages to + the Remote Peer for RDMA Read, Send, and Terminate requests. + Untagged Buffers handle Untagged DDP Messages. + + + +Recio, et al. Standards Track [Page 12] + +RFC 5040 RDMA Protocol Specification October 2007 + + + Untagged Buffer Model - A DDP data transfer model used to transfer + Untagged Buffers from the Local Peer to the Remote Peer. + + Untagged DDP Message - A DDP Message that targets an Untagged Buffer. + +2.4. Remote Direct Memory Access (RDMA) + + Completion Queues (CQs) - Logical components of the RNIC Interface + that conceptually represent how an RNIC notifies the ULP about + the completion of the transmission of data, or the completion of + the reception of data; see [RDMASEC]. + + Event - An indication provided by the RDMAP layer to the ULP to + indicate a Completion or other condition requiring immediate + attention. + + Invalidate STag - A mechanism used to prevent the Remote Peer from + reusing a previous explicitly Advertised STag, until the Local + Peer makes it available through a subsequent explicit + Advertisement. The STag cannot be accessed remotely until it is + explicitly Advertised again. + + RDMA Completion (Completion, Completed, Complete, Completes) - For + RDMA, Completion is defined as the process of informing the ULP + that a particular RDMA Operation has performed all functions + specified for the RDMA Operations, including Placement and + Delivery. The Completion semantic of each RDMA Operation is + distinctly defined. + + RDMA Message - A data transfer mechanism used to fulfill an RDMA + Operation. + + RDMA Operation - A sequence of RDMA Messages, including control + Messages, to transfer data from a Data Source to a Data Sink. + The following RDMA Operations are defined: RDMA Writes, RDMA + Read, Send, Send with Invalidate, Send with Solicited Event, Send + with Solicited Event and Invalidate, and Terminate. + + RDMA Protocol (RDMAP) - A wire protocol that supports RDMA Operations + to transfer ULP data between a Local Peer and the Remote Peer. + + RDMAP Abortive Termination (Termination, Terminated, Terminate, + Terminates) - The act of closing an RDMAP Stream without + attempting to Complete in-progress and pending RDMA Operations. + + RDMAP Graceful Termination - The act of closing an RDMAP Stream such + that all in-progress and pending RDMA Operations are allowed to + Complete successfully. + + + +Recio, et al. Standards Track [Page 13] + +RFC 5040 RDMA Protocol Specification October 2007 + + + RDMA Read - An RDMA Operation used by the Data Sink to transfer the + contents of a source RDMA buffer from the Remote Peer to the + Local Peer. An RDMA Read operation consists of a single RDMA + Read Request Message and a single RDMA Read Response Message. + + RDMA Read Request - An RDMA Message used by the Data Sink to request + the Data Source to transfer the contents of an RDMA buffer. The + RDMA Read Request Message describes both the Data Source and Data + Sink RDMA buffers. + + RDMA Read Request Queue - The queue used for processing RDMA Read + Requests. The RDMA Read Request Queue has a DDP Queue Number of + 1. + + RDMA Read Response - An RDMA Message used by the Data Source to + transfer the contents of an RDMA buffer to the Data Sink, in + response to an RDMA Read Request. The RDMA Read Response Message + only describes the data sink RDMA buffer. + + RDMAP Stream - An association between a pair of RDMAP + implementations, possibly on different Nodes, which transfer ULP + data using RDMA Operations. There may be multiple RDMAP Streams + on a single Node. An RDMAP Stream maps directly to a single DDP + Stream. + + RDMA Write - An RDMA Operation that transfers the contents of a + source RDMA Buffer from the Local Peer to a destination RDMA + Buffer at the Remote Peer using RDMA. The RDMA Write Message + only describes the Data Sink RDMA buffer. + + Remote Direct Memory Access (RDMA) - A method of accessing memory on + a remote system in which the local system specifies the remote + location of the data to be transferred. Employing an RNIC in the + remote system allows the access to take place without + interrupting the processing of the CPU(s) on the system. + + Send - An RDMA Operation that transfers the contents of a ULP Buffer + from the Local Peer to an Untagged Buffer at the Remote Peer. + + Send Message Type - A Send Message, Send with Invalidate Message, + Send with Solicited Event Message, or Send with Solicited Event + and Invalidate Message. + + Send Operation Type - A Send Operation, Send with Invalidate + Operation, Send with Solicited Event Operation, or Send with + Solicited Event and Invalidate Operation. + + + + + +Recio, et al. Standards Track [Page 14] + +RFC 5040 RDMA Protocol Specification October 2007 + + + Solicited Event (SE) - A facility by which an RDMA Operation sender + may cause an Event to be generated at the recipient, if the + recipient is configured to generate such an Event, when a Send + with Solicited Event Message or Send with Solicited Event and + Invalidate Message is received. Note: The Local Peer's ULP can + use the Solicited Event mechanism to ensure that Messages + designated as important to the ULP are handled in an expeditious + manner by the Remote Peer's ULP. The ULP at the Local Peer can + indicate a given Send Message Type is important by using the Send + with Solicited Event Message or Send with Solicited Event and + Invalidate Message. The ULP at the Remote Peer can choose to + only be notified when valid Send with Solicited Event Messages + and/or Send with Solicited Event and Invalidate Messages arrive + and handle other valid incoming Send Messages or Send with + Invalidate Messages at its leisure. + + Terminate - An RDMA Message used by a Node to pass an error + indication to the peer Node on an RDMAP Stream. This operation + is for RDMAP use only. + + ULP Buffer - A buffer owned above the RDMAP layer and Advertised to + the RDMAP layer either as a Tagged Buffer or an Untagged ULP + Buffer. + + ULP Message - The ULP data that is handed to a specific protocol + layer for transmission. Data boundaries are preserved as they + are transmitted through iWARP. + +3. ULP and Transport Attributes + +3.1. Transport Requirements and Assumptions + + RDMAP MUST be layered on top of the Direct Data Placement Protocol + [DDP]. + + RDMAP requires the following DDP support: + + * RDMAP uses three queues for Untagged Buffers: + + * Queue Number 0 (used by RDMAP for Send, Send with Invalidate, + Send with Solicited Event, and Send with Solicited Event and + Invalidate operations). + + * Queue Number 1 (used by RDMAP for RDMA Read operations). + + * Queue Number 2 (used by RDMAP for Terminate operations). + + * DDP maps a single RDMA Message to a single DDP Message. + + + +Recio, et al. Standards Track [Page 15] + +RFC 5040 RDMA Protocol Specification October 2007 + + + * DDP uses the STag and Tagged Offset provided by the RDMAP for + Tagged Buffer Messages (i.e., RDMA Write and RDMA Read Response). + + * When the DDP layer Delivers an Untagged DDP Message to the RDMAP + layer, DDP provides the length of the DDP Message. This ensures + that RDMAP does not have to carry a length field in its header. + + * When the RDMAP layer provides an RDMA Message to the DDP layer, + DDP must insert the RsvdULP field value provided by the RDMAP + layer into the associated DDP Message. + + * When the DDP layer Delivers a DDP Message to the RDMAP layer, DDP + provides the RsvdULP field. + + * The RsvdULP field must be 1 octet for DDP Tagged Messages and 5 + octets for DDP Untagged Messages. + + * DDP propagates to RDMAP all operation or protection errors (used + by RDMAP Terminate) and, when appropriate, the DDP Header fields + of the DDP Segment that encountered the error. + + * If an RDMA Operation is aborted by DDP or a lower layer, the + contents of the Data Sink buffers associated with the operation + are considered indeterminate. + + * DDP, in conjunction with the lower layers, provides reliable, in- + order Delivery. + +3.2. RDMAP Interactions with the ULP + + RDMAP provides the ULP with access to the following RDMA Operations + as defined in this specification: + + * Send + + * Send with Solicited Event + + * Send with Invalidate + + * Send with Solicited Event and Invalidate + + * RDMA Write + + * RDMA Read + + + + + + + +Recio, et al. Standards Track [Page 16] + +RFC 5040 RDMA Protocol Specification October 2007 + + + For Send Operation Types, the following are the interactions between + the RDMAP layer and the ULP: + + * At the Data Source: + + * The ULP passes to the RDMAP layer the following: + + * ULP Message Length + + * ULP Message + + * An indication of the Send Operation Type, where the valid + types are: Send, Send with Solicited Event, Send with + Invalidate, or Send with Solicited Event and Invalidate. + + * An Invalidate STag, if the Send Operation Type was Send with + Invalidate or Send with Solicited Event and Invalidate. + + * When the Send Operation Type Completes, an indication of the + Completion results. + + * At the Data Sink: + + * If the Send Operation Type Completed successfully, the RDMAP + layer passes the following information to the ULP Layer: + + * ULP Message Length + + * ULP Message + + * An Event, if the Data Sink is configured to generate an + Event. + + * An Invalidated STag, if the Send Operation Type was Send + with Invalidate or Send with Solicited Event and Invalidate. + + * If the Send Operation Type Completed in error, the Data Sink + RDMAP layer will pass up the corresponding error information to + the Data Sink ULP and send a Terminate Message to the Data + Source RDMAP layer. The Data Source RDMAP layer will then pass + up the Terminate Message to the ULP. + + For RDMA Write operations, the following are the interactions between + the RDMAP layer and the ULP: + + * At the Data Source: + + * The ULP passes to the RDMAP layer the following: + + + +Recio, et al. Standards Track [Page 17] + +RFC 5040 RDMA Protocol Specification October 2007 + + + * ULP Message Length + + * ULP Message + + * Data Sink STag + + * Data Sink Tagged Offset + + * When the RDMA Write operation Completes, an indication of + the Completion results. + + * At the Data Sink: + + * If the RDMA Write completed successfully, the RDMAP layer does + not Deliver the RDMA Write to the ULP. It does Place the ULP + Message transferred through the RDMA Write Message into the ULP + Buffer. + + * If the RDMA Write completed in error, the Data Sink RDMAP layer + will pass up the corresponding error information to the Data + Sink ULP and send a Terminate Message to the Data Source RDMAP + layer. The Data Source RDMAP layer will then pass up the + Terminate Message to the ULP. + + For RDMA Read operations, the following are the interactions between + the RDMAP layer and the ULP: + + * At the Data Sink: + + * The ULP passes to the RDMAP layer the following: + + * ULP Message Length + + * Data Source STag + + * Data Sink STag + + * Data Source Tagged Offset + + * Data Sink Tagged Offset + + * When the RDMA Read operation Completes, an indication of the + Completion results. + + * At the Data Source: + + * If no error occurred while processing the RDMA Read Request, + the Data Source will not pass up any information to the ULP. + + + +Recio, et al. Standards Track [Page 18] + +RFC 5040 RDMA Protocol Specification October 2007 + + + * If an error occurred while processing the RDMA Read Request, + the Data Source RDMAP layer will pass up the corresponding + error information to the Data Source ULP and send a Terminate + Message to the Data Sink RDMAP layer. The Data Sink RDMAP + layer will then pass up the Terminate Message to the ULP. + + For STags made available to the RDMAP layer, following are the + interactions between the RDMAP layer and the ULP: + + * If the ULP enables an STag, the ULP passes the following to the + RDMAP layer: + + * STag; + + * range of Tagged Offsets that are associated with a given STag; + + * remote access rights (read, write, or read and write) + associated with a given, valid STag; and + + * association between a given STag and a given RDMAP Stream. + + * If the ULP disables an STag, the ULP passes to the RDMAP layer the + STag. + + If an error occurs at the RDMAP layer, the RDMAP layer may pass back + error information (e.g., the content of a Terminate Message) to the + ULP. + +4. Header Format + + The control information of RDMA Messages is included in DDP + protocol-defined header fields, with the following exceptions: + + * The first octet reserved for ULP usage on all DDP Messages in the + DDP Protocol (i.e., the RsvdULP Field) is used by RDMAP to carry + the RDMA Message Opcode and the RDMAP version. This octet is + known as the RDMAP Control Field in this specification. For Send + with Invalidate and Send with Solicited Event and Invalidate, + RDMAP uses the second through fifth octets, provided by DDP on + Untagged DDP Messages, to carry the STag that will be Invalidated. + + * The RDMA Message length is passed by the RDMAP layer to the DDP + layer on all outbound transfers. + + * For RDMA Read Request Messages, the RDMA Read Message Size is + included in the RDMA Read Request Header. + + + + + +Recio, et al. Standards Track [Page 19] + +RFC 5040 RDMA Protocol Specification October 2007 + + + * The RDMA Message length is passed to the RDMAP layer by the DDP + layer on inbound Untagged Buffer transfers. + + * Two RDMA Messages carry additional RDMAP headers. The RDMA Read + Request carries the Data Sink and Data Source buffer descriptions, + including buffer length. The Terminate carries additional + information associated with the error that caused the Terminate. + +4.1. RDMAP Control and Invalidate STag Field + + The version of RDMAP defined by this specification uses all 8 bits of + the RDMAP Control Field. The first octet reserved for ULP use in the + DDP Protocol MUST be used by the RDMAP to carry the RDMAP Control + Field. The ordering of the bits in the first octet MUST be as + defined in Figure 3, "DDP Control, RDMAP Control, and Invalidate STag + Fields". For Send with Invalidate and Send with Solicited Event and + Invalidate, the second through fifth octets of the DDP RsvdULP field + MUST be used by RDMAP to carry the Invalidate STag. Figure 3 depicts + the format of the DDP Control and RDMAP Control fields. (Note: In + Figure 3, the DDP Header is offset by 16 bits to accommodate the MPA + header defined in [MPA]. The MPA header is only present if DDP is + layered on top of MPA.) + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |T|L| Resrv | DV| RV|Rsv| Opcode| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Invalidate STag | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 3: DDP Control, RDMAP Control, and Invalidate STag Fields + + All RDMA Messages handed by the RDMAP layer to the DDP layer MUST + define the value of the Tagged flag in the DDP Header. Figure 4, + "RDMA Usage of DDP Fields", MUST be used to define the value of the + Tagged flag that is handed to the DDP layer for each RDMA Message. + + Figure 4 defines the value of the RDMA Opcode field that MUST be used + for each RDMA Message. + + Figure 4 defines when the STag, Queue Number, and Tagged Offset + fields MUST be provided for each RDMA Message. + + + + + + + + +Recio, et al. Standards Track [Page 20] + +RFC 5040 RDMA Protocol Specification October 2007 + + + For this version of the RDMAP, all RDMA Messages MUST have: + + * Bits 24-25; RDMA Version field: 01b for an RNIC that complies with + this RDMA protocol specification. 00b for an RNIC that complies + with the RDMA Consortium's RDMA protocol specification. Both + version numbers are valid. Interoperability is dependent on MPA + protocol version negotiation (e.g., MPA marker and MPA CRC). + + * Bits 26-27; Reserved. MUST be set to zero by sender, ignored by + the receiver. + + * Bits 28-31; OpCode field: see Figure 4. + + * Bits 32-63; Invalidate STag. However, this field is only valid + for Send with Invalidate and Send with Solicited Event and + Invalidate Messages (see Figure 4). + + For Send, Send with Solicited Event, RDMA Read Request, and + Terminate, the Invalidate STag field MUST be set to zero on + transmit and ignored by the receiver. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 21] + +RFC 5040 RDMA Protocol Specification October 2007 + + + -------+-----------+-------+------+-------+-----------+-------------- + RDMA | Message | Tagged| STag | Queue | Invalidate| Message + Message| Type | Flag | and | Number| STag | Length + OpCode | | | TO | | | Communicated + | | | | | | between DDP + | | | | | | and RDMAP + -------+-----------+-------+------+-------+-----------+-------------- + 0000b | RDMA Write| 1 | Valid| N/A | N/A | Yes + | | | | | | + -------+-----------+-------+------+-------+-----------+-------------- + 0001b | RDMA Read | 0 | N/A | 1 | N/A | Yes + | Request | | | | | + -------+-----------+-------+------+-------+-----------+-------------- + 0010b | RDMA Read | 1 | Valid| N/A | N/A | Yes + | Response | | | | | + -------+-----------+-------+------+-------+-----------+-------------- + 0011b | Send | 0 | N/A | 0 | N/A | Yes + | | | | | | + -------+-----------+-------+------+-------+-----------+-------------- + 0100b | Send with | 0 | N/A | 0 | Valid | Yes + | Invalidate| | | | | + -------+-----------+-------+------+-------+-----------+-------------- + 0101b | Send with | 0 | N/A | 0 | N/A | Yes + | SE | | | | | + -------+-----------+-------+------+-------+-----------+-------------- + 0110b | Send with | 0 | N/A | 0 | Valid | Yes + | SE and | | | | | + | Invalidate| | | | | + -------+-----------+-------+------+-------+-----------+-------------- + 0111b | Terminate | 0 | N/A | 2 | N/A | Yes + | | | | | | + -------+-----------+-------+------+-------+-----------+-------------- + 1000b | | + to | Reserved | Not Specified + 1111b | | + -------+-----------+------------------------------------------------- + + Figure 4: RDMA Usage of DDP Fields + + Note: N/A means Not Applicable. + + + + + + + + + + + +Recio, et al. Standards Track [Page 22] + +RFC 5040 RDMA Protocol Specification October 2007 + + +4.2. RDMA Message Definitions + + The following figure defines which RDMA Headers MUST be used on each + RDMA Message and which RDMA Messages are allowed to carry ULP + Payload: + + -------+-----------+-------------------+------------------------- + RDMA | Message | RDMA Header Used | ULP Message allowed in + Message| Type | | the RDMA Message + OpCode | | | + | | | + -------+-----------+-------------------+------------------------- + 0000b | RDMA Write| None | Yes + | | | + -------+-----------+-------------------+------------------------- + 0001b | RDMA Read | RDMA Read Request | No + | Request | Header | + -------+-----------+-------------------+------------------------- + 0010b | RDMA Read | None | Yes + | Response | | + -------+-----------+-------------------+------------------------- + 0011b | Send | None | Yes + | | | + -------+-----------+-------------------+------------------------- + 0100b | Send with | None | Yes + | Invalidate| | + -------+-----------+-------------------+------------------------- + 0101b | Send with | None | Yes + | SE | | + -------+-----------+-------------------+------------------------- + 0110b | Send with | None | Yes + | SE and | | + | Invalidate| | + -------+-----------+-------------------+------------------------- + 0111b | Terminate | Terminate Header | No + | | | + -------+-----------+-------------------+------------------------- + 1000b | | + to | Reserved | Not Specified + 1111b | | + -------+-----------+-------------------+------------------------- + + Figure 5: RDMA Message Definitions + + + + + + + + +Recio, et al. Standards Track [Page 23] + +RFC 5040 RDMA Protocol Specification October 2007 + + +4.3. RDMA Write Header + + The RDMA Write Message does not include an RDMAP header. The RDMAP + layer passes to the DDP layer an RDMAP Control Field. The RDMA Write + Message is fully described by the DDP Headers of the DDP Segments + associated with the Message. + + See Appendix A for a description of the DDP Segment format associated + with RDMA Write Messages. + +4.4. RDMA Read Request Header + + The RDMA Read Request Message carries an RDMA Read Request Header + that describes the Data Sink and Data Source Buffers used by the RDMA + Read operation. The RDMA Read Request Header immediately follows the + DDP header. The RDMAP layer passes to the DDP layer an RDMAP Control + Field. The following figure depicts the RDMA Read Request Header + that MUST be used for all RDMA Read Request Messages: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Data Sink STag (SinkSTag) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + + Data Sink Tagged Offset (SinkTO) + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | RDMA Read Message Size (RDMARDSZ) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Data Source STag (SrcSTag) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + + Data Source Tagged Offset (SrcTO) + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 6: RDMA Read Request Header Format + + Data Sink Steering Tag: 32 bits. + + The Data Sink Steering Tag identifies the Data Sink's Tagged + Buffer. This field MUST be copied, without interpretation, + from the RDMA Read Request into the corresponding RDMA Read + Response; this field allows the Data Sink to place the + returning data. The STag is associated with the RDMAP Stream + through a mechanism that is outside the scope of the RDMAP + specification. + + + +Recio, et al. Standards Track [Page 24] + +RFC 5040 RDMA Protocol Specification October 2007 + + + Data Sink Tagged Offset: 64 bits. + + The Data Sink Tagged Offset specifies the starting offset, in + octets, from the base of the Data Sink's Tagged Buffer, where + the data is to be written by the Data Source. This field is + copied from the RDMA Read Request into the corresponding RDMA + Read Response and allows the Data Sink to place the returning + data. The Data Sink Tagged Offset MAY start at an arbitrary + offset. + + The Data Sink STag and Data Sink Tagged Offset fields + describe the buffer to which the RDMA Read data is written. + + Note: the DDP layer protects against a wrap of the Data Sink + Tagged Offset. + + RDMA Read Message Size: 32 bits. + + The RDMA Read Message Size is the amount of data, in octets, + read from the Data Source. A single RDMA Read Request + Message can retrieve from 0 to 2^32-1 data octets from the + Data Source. + + Data Source Steering Tag: 32 bits. + + The Data Source Steering Tag identifies the Data Source's + Tagged Buffer. The STag is associated with the RDMAP Stream + through a mechanism that is outside the scope of the RDMAP + specification. + + Data Source Tagged Offset: 64 bits. + + The Tagged Offset specifies the starting offset, in octets, + that is to be read from the Data Source's Tagged Buffer. The + Data Source Tagged Offset MAY start at an arbitrary offset. + + The Data Source STag and Data Source Tagged Offset fields + describe the buffer from which the RDMA Read data is read. + + See Section 7.2, "Errors Detected at the Remote Peer on Incoming RDMA + Messages", for a description of error checking required upon + processing of an RDMA Read Request at the Data Source. + + + + + + + + + +Recio, et al. Standards Track [Page 25] + +RFC 5040 RDMA Protocol Specification October 2007 + + +4.5. RDMA Read Response Header + + The RDMA Read Response Message does not include an RDMAP header. The + RDMAP layer passes to the DDP layer an RDMAP Control Field. The RDMA + Read Response Message is fully described by the DDP Headers of the + DDP Segments associated with the Message. + + See Appendix A for a description of the DDP Segment format associated + with RDMA Read Response Messages. + +4.6. Send Header and Send with Solicited Event Header + + The Send and Send with Solicited Event Messages do not include an + RDMAP header. The RDMAP layer passes to the DDP layer an RDMAP + Control Field. The Send and Send with Solicited Event Messages are + fully described by the DDP Headers of the DDP Segments associated + with the Messages. + + See Appendix A for a description of the DDP Segment format associated + with Send and Send with Solicited Event Messages. + +4.7. Send with Invalidate Header and Send with SE and Invalidate Header + + The Send with Invalidate and Send with Solicited Event and Invalidate + Messages do not include an RDMAP header. The RDMAP layer passes to + the DDP layer an RDMAP Control Field and the Invalidate STag field + (see section 4.1 RDMAP Control and Invalidate STag Field). The Send + with Invalidate and Send with Solicited Event and Invalidate Messages + are fully described by the DDP Headers of the DDP Segments associated + with the Messages. + + See Appendix A for a description of the DDP Segment format associated + with Send and Send with Solicited Event Messages. + +4.8. Terminate Header + + The Terminate Message carries a Terminate Header that contains + additional information associated with the cause of the Terminate. + The Terminate Header immediately follows the DDP header. The RDMAP + layer passes to the DDP layer an RDMAP Control Field. The following + figure depicts a Terminate Header that MUST be used for the Terminate + Message: + + + + + + + + + +Recio, et al. Standards Track [Page 26] + +RFC 5040 RDMA Protocol Specification October 2007 + + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Terminate Control | Reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP Segment Length (if any) | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + | | + // // + | Terminated DDP Header (if any) | + + + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + // // + | Terminated RDMA Header (if any) | + + + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 7: Terminate Header Format + + Terminate Control: 19 bits. + + The Terminate Control field MUST have the format defined in + Figure 8 below. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Layer | EType | Error Code |HdrCt| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 8: Terminate Control Field + + * Figure 9, "Terminate Control Field Values", defines the valid + values that MUST be used for this field. + + * Layer: 4 bits. + + Identifies the layer that encountered the error. + + * EType (RDMA Error Type): 4 bits. + + Identifies the type of error that caused the Terminate. When + the error is detected at the RDMAP layer, the RDMAP layer + inserts the Error Type into this field. When the error is + detected at an LLP layer, an LLP layer creates the Error Type + + + +Recio, et al. Standards Track [Page 27] + +RFC 5040 RDMA Protocol Specification October 2007 + + + and the DDP layer passes it up to the RDMAP layer, and the + RDMAP layer inserts it into this field. + + * Error Code: 8 bits. + + This field identifies the specific error that caused the + Terminate. When the error is detected at the RDMAP layer, the + RDMAP layer creates the Error Code. When the error is detected + at an LLP layer, the LLP layer creates the Error Code, the DDP + layer passes it up to the RDMAP layer, and the RDMAP layer + inserts it into this field. + + * HdrCt: 3 bits. + + Header control bits: + + * M: bit 16. DDP Segment Length valid. See Figure 10 for + when this bit SHOULD be set. + + * D: bit 17. DDP Header Included. See Figure 10 for when + this bit SHOULD be set. + + * R: bit 18. RDMAP Header Included. See Figure 10 for when + this bit SHOULD be set. + + + + + + + + + + + + + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 28] + +RFC 5040 RDMA Protocol Specification October 2007 + + + -------+-----------+-------+-------------+------+-------------------- + Layer | Layer | Error | Error Type | Error| Error Code Name + | Name | Type | Name | Code | + -------+-----------+-------+-------------+------+-------------------- + | | 0000b | Local | None | None - This error + | | | Catastrophic| | type does not have + | | | Error | | an error code. Any + | | | | | value in this field + | | | | | is acceptable. + | +-------+-------------+------+-------------------- + | | | | 00X | Invalid STag + | | | +------+-------------------- + | | | | 01X | Base or bounds + | | | | | violation + | | | Remote +------+-------------------- + | | 0001b | Protection | 02X | Access rights + | | | Error | | violation + | | | +------+-------------------- + 0000b | RDMA | | | 03X | STag not associated + | | | | | with RDMAP Stream + | | | +------+-------------------- + | | | | 04X | TO wrap + | | | +------+-------------------- + | | | | 09X | STag cannot be + | | | | | Invalidated + | | | +------+-------------------- + | | | | FFX | Unspecified Error + | +-------+-------------+------+-------------------- + | | | | 05X | Invalid RDMAP + | | | | | version + | | | +------+-------------------- + | | | | 06X | Unexpected OpCode + | | | Remote +------+-------------------- + | | 0010b | Operation | 07X | Catastrophic error, + | | | Error | | localized to RDMAP + | | | | | Stream + | | | +------+-------------------- + | | | | 08X | Catastrophic error, + | | | | | global + | | | +------+-------------------- + | | | | 09X | STag cannot be + | | | | | Invalidated + | | | +------+-------------------- + | | | | FFX | Unspecified Error + + + + + + + +Recio, et al. Standards Track [Page 29] + +RFC 5040 RDMA Protocol Specification October 2007 + + + -------+-----------+-------+-------------+------+-------------------- + 0001b | DDP | See DDP Specification [DDP] for a description of + | | the values and names. + -------+-----------+-------+----------------------------------------- + 0010b | LLP | For MPA, see MPA Specification [MPA] for a + |(e.g., MPA)| description of the values and names. + -------+-----------+-------+----------------------------------------- + + Figure 9: Terminate Control Field Values + + Reserved: 13 bits. This field MUST be set to zero on transmit, + ignored on receive. + + DDP Segment Length: 16 bits + + The length handed up by the DDP layer when the error was + detected. It MUST be valid if the M bit is set. It MUST be + present when the D bit is set. + + Terminated DDP Header: 112 bits for Tagged Messages and 144 bits + for Untagged Messages. + + The DDP Header of the incoming Message that is associated + with the Terminate. The DDP Header is not present if the + Terminate Error Type is a Local Catastrophic Error. It MUST + be present if the D bit is set. + + Terminated RDMA Header: 224 bits. + + The Terminated RDMA Header is only sent back if the terminate + is associated with an RDMA Read Request Message. It MUST be + present if the R bit is set. + + If the terminate occurs before the first RDMA Read Request + byte is processed, the original RDMA Read Request Header is + sent back. + + If the terminate occurs after the first RDMA Read Request + byte is processed, the RDMA Read Request Header is updated to + reflect the current location of the RDMA Read operation that + is in process: + + * Data Sink STag = Data Sink STag originally sent in the + RDMA Read Request. + + + + + + + +Recio, et al. Standards Track [Page 30] + +RFC 5040 RDMA Protocol Specification October 2007 + + + * Data Sink Tagged Offset = Current offset into the Data + Sink Tagged Buffer. For example, if the RDMA Read + Request was terminated after 2048 octets were sent, + then the Data Sink Tagged Offset = the original Data + Sink Tagged Offset + 2048. + + * Data Message size = Number of bytes left to transfer. + + * Data Source STag = Data Source STag in the RDMA Read + Request. + + * Data Source Tagged Offset = Current offset into the + Data Source Tagged Buffer. For example, if the RDMA + Read Request was terminated after 2048 octets were + sent, then the Data Source Tagged Offset = the + original Data Source Tagged Offset + 2048. + + Note: if a given LLP does not define any termination codes for the + RDMAP Termination message to use, then none would be used for that + LLP. + + Figure 10, "Error Type to RDMA Message Mapping", maps layer name and + error types to each RDMA Message type: + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 31] + +RFC 5040 RDMA Protocol Specification October 2007 + + + ---------+-------------+------------+------------+----------------- + Layer | Error Type | Terminate | Terminate | What type of + Name | Name | Includes | Includes | RDMA Message can + | | DDP Header | RDMA Header| cause the error + | | and DDP | | + | | Segment | | + | | Length | | + ---------+-------------+------------+------------+----------------- + | Local | No | No | Any + | Catastrophic| | | + | Error | | | + +-------------+------------+------------+----------------- + | Remote | Yes, if | Yes | Only RDMA Read + RDMA | Protection | possible | | Request, Send + | Error | | | with Invalidate, + | | | | and Send with SE + | | | | and Invalidate + +-------------+------------+------------+----------------- + | Remote | Yes, if | No | Any + | Operation | possible | | + | Error | | | + ---------+-------------+------------+------------+----------------- + DDP | See DDP Spec| Yes | No | Any + | [DDP] | | | + ---------+-------------+------------+------------+----------------- + LLP | See LLP Spec| No | No | Any + | (e.g., MPA) | | | + + Figure 10: Error Type to RDMA Message Mapping + +5. Data Transfer + +5.1. RDMA Write Message + + An RDMA Write is used by the Data Source to transfer data to a + previously Advertised Tagged Buffer at the Data Sink. The RDMA Write + Message has the following semantics: + + * An RDMA Write Message MUST reference a Tagged Buffer. That is, + the Data Source RDMAP layer MUST request that the DDP layer mark + the Message as Tagged. + + * A valid RDMA Write Message MUST NOT be delivered to the Data + Sink's ULP (i.e., it is placed by the DDP layer). + + * At the Remote Peer, when an invalid RDMA Write Message is + delivered to the Remote Peer's RDMAP layer, an error is surfaced + (see Section 7.1, "RDMAP Error Surfacing"). + + + +Recio, et al. Standards Track [Page 32] + +RFC 5040 RDMA Protocol Specification October 2007 + + + * The Tagged Offset of a Tagged Buffer MAY start at a non-zero + value. + + * An RDMA Write Message MAY target all or part of a previously + Advertised Buffer. + + * The RDMAP does not define how the buffer(s) are used by an + outbound RDMA Write or how they are addressed. For example, an + implementation of RDMA may choose to allow a gather-list of non- + contiguous data blocks to be the source of an RDMA Write. In this + case, the data blocks would be combined by the Data Source and + sent as a single RDMA Write Message to the Data Sink. + + * The Data Source RDMAP layer MUST issue RDMA Write Messages to the + DDP layer in the order they were submitted by the ULP. + + * At the Data Source, a subsequent Send (Send with Invalidate, Send + with Solicited Event, or Send with Solicited Event and Invalidate) + Message MAY be used to signal Delivery of previous RDMA Write + Messages to the Data Sink, if the ULP chooses to signal Delivery + in this fashion. + + * If the Local Peer wishes to write to multiple Tagged Buffers on + the Remote Peer, the Local Peer MUST use multiple RDMA Write + Messages. That is, a single RDMA Write Message can only write to + one remote Tagged Buffer. + + * The Data Source MAY issue a zero-length RDMA Write Message. + +5.2. RDMA Read Operation + + The RDMA Read operation MUST consist of a single RDMA Read Request + Message and a single RDMA Read Response Message. + +5.2.1. RDMA Read Request Message + + An RDMA Read Request is used by the Data Sink to transfer data from a + previously Advertised Tagged Buffer at the Data Source to a Tagged + Buffer at the Data Sink. The RDMA Read Request Message has the + following semantics: + + * An RDMA Read Request Message MUST reference an Untagged Buffer. + That is, the Local Peer's RDMAP layer MUST request that the DDP + mark the Message as Untagged. + + * One RDMA Read Request Message MUST consume one Untagged Buffer. + + + + + +Recio, et al. Standards Track [Page 33] + +RFC 5040 RDMA Protocol Specification October 2007 + + + * The Remote Peer's RDMAP layer MUST process an RDMA Read Request + Message. A valid RDMA Read Request Message MUST NOT be delivered + to the Data Sink's ULP (i.e., it is processed by the RDMAP layer). + + * At the Remote Peer, when an invalid RDMA Read Request Message is + delivered to the Remote Peer's RDMAP layer, an error is surfaced + (see Section 7.1, "RDMAP Error Surfacing"). + + * An RDMA Read Request Message MUST reference the RDMA Read Request + Queue. That is, the Local Peer's RDMAP layer MUST request that + the DDP layer set the Queue Number field to one. + + * The Local Peer MUST pass to the DDP layer RDMA Read Request + Messages in the order they were submitted by the ULP. + + * The Remote Peer MUST process the RDMA Read Request Messages in the + order they were sent. + + * If the Local Peer wishes to read from multiple Tagged Buffers on + the Remote Peer, the Local Peer MUST use multiple RDMA Read + Request Messages. That is, a single RDMA Read Request Message + MUST only read from one remote Tagged Buffer. + + * AN RDMA Read Request Message MAY target all or part of a + previously Advertised Buffer. + + * If the Data Source receives a valid RDMA Read Request Message, it + MUST respond with a valid RDMA Read Response Message. + + * The Data Sink MAY issue a zero-length RDMA Read Request Message by + setting the RDMA Read Message Size field to zero in the RDMA Read + Request Header. + + * If the Data Source receives a non-zero-length RDMA Read Message + Size, the Data Source RDMAP MUST validate the Data Source STag and + Data Source Tagged Offset contained in the RDMA Read Request + Header. + + * If the Data Source receives an RDMA Read Request Header with the + RDMA Read Message Size set to zero, the Data Source RDMAP: + + * MUST NOT validate the Data Source STag and Data Source Tagged + Offset contained in the RDMA Read Request Header, and + + * MUST respond with a zero-length RDMA Read Response Message. + + + + + + +Recio, et al. Standards Track [Page 34] + +RFC 5040 RDMA Protocol Specification October 2007 + + +5.2.2. RDMA Read Response Message + + The RDMA Read Response Message uses the DDP Tagged Buffer Model to + Deliver the contents of a previously requested Data Source Tagged + Buffer to the Data Sink, without any involvement from the ULP at the + Remote Peer. The RDMA Read Response Message has the following + semantics: + + * The RDMA Read Response Message for the associated RDMA Read + Request Message travels in the opposite direction. + + * An RDMA Read Response Message MUST reference a Tagged Buffer. + That is, the Data Source RDMAP layer MUST request that the DDP + mark the Message as Tagged. + + * The Data Source MUST ensure that a sufficient number of Untagged + Buffers are available on the RDMA Read Request Queue (Queue with + DDP Queue Number 1) to support the maximum number of RDMA Read + Requests negotiated by the ULP. + + * The RDMAP layer MUST Deliver the RDMA Read Response Message to the + ULP. + + * At the Remote Peer, when an invalid RDMA Read Response Message is + delivered to the Remote Peer's RDMAP layer, an error is surfaced + (see Section 7.1, "RDMAP Error Surfacing"). + + * The Tagged Offset of a Tagged Buffer MAY start at a non-zero + value. + + * The Data Source RDMAP layer MUST pass RDMA Read Response Messages + to the DDP layer, in the order that the RDMA Read Request Messages + were received by the RDMAP layer, at the Data Source. + + * The Data Sink MAY validate that the STag, Tagged Offset, and + length of the RDMA Read Response Message are the same as the STag, + Tagged Offset, and length included in the corresponding RDMA Read + Request Message. + + * A single RDMA Read Response Message MUST write to one remote + Tagged Buffer. If the Data Sink wishes to read multiple Tagged + Buffers, the Data Sink can use multiple RDMA Read Request + Messages. + + + + + + + + +Recio, et al. Standards Track [Page 35] + +RFC 5040 RDMA Protocol Specification October 2007 + + +5.3. Send Message Type + + The Send Message Type uses the DDP Untagged Buffer Model to transfer + data from the Data Source into an Untagged Buffer at the Data Sink. + + * A Send Message Type MUST reference an Untagged Buffer. That is, + the Local Peer's RDMAP layer MUST request that the DDP layer mark + the Message as Untagged. + + * One Send Message Type MUST consume one Untagged Buffer. + + * The ULP Message sent using a Send Message Type MAY be less than + or equal to the size of the consumed Untagged Buffer. The + RDMAP layer communicates to the ULP the size of the data + written into the Untagged Buffer. + + * If the ULP Message sent via Send Message Type is larger than + the Data Sink's Untagged Buffer, it is an error (see Section + 9.1, "RDMAP Error Surfacing"). + + * At the Remote Peer, the Send Message Type MUST be Delivered to the + Remote Peer's ULP in the order they were sent. + + * After the Send with Solicited Event or Send with Solicited Event + and Invalidate Message is Delivered to the ULP, the RDMAP MAY + generate an Event, if the Data Sink is configured to generate such + an Event. + + * At the Remote Peer, when an invalid Send Message Type is Delivered + to the Remote Peer's RDMAP layer, an error is surfaced (see + Section 7.1, "RDMAP Error Surfacing"). + + * The RDMAP does not specify the structure of the buffer(s) used by + an outbound RDMA Write nor does it specify how the buffer(s) are + addressed. For example, an implementation of RDMA may choose to + allow a gather-list of non-contiguous data blocks to be the source + of a Send Message Type. In this case, the data blocks would be + combined by the Data Source and sent as a single Send Message Type + to the Data Sink. + + * For a Send Message Type, the Local Peer's RDMAP layer MUST request + that the DDP layer set the Queue Number field to zero. + + * The Local Peer MUST issue Send Message Type Messages in the order + they were submitted by the ULP. + + + + + + +Recio, et al. Standards Track [Page 36] + +RFC 5040 RDMA Protocol Specification October 2007 + + + * The Data Source MAY pass a zero-length Send Message Type. A + zero-length Send Message Type MUST consume an Untagged Buffer at + the Data Sink. A Send with Invalidate or Send with Solicited + Event and Invalidate Message MUST reference an STag. That is, the + Local Peer's RDMAP layer MUST pass the RDMA control field and the + STag that will be Invalidated to the DDP layer. + + * When the Send with Invalidate and Send with Solicited Event and + Invalidate Message are Delivered to the Remote Peer's RDMAP layer, + the RDMAP layer MUST: + + * Verify the STag that is associated with the RDMAP Stream; and + + * Invalidate the STag if it is associated with the RDMAP Stream; + or issue a Terminate Message with the STag Cannot be + Invalidated Terminate Error Code, if the STag is not associated + with the RDMAP Stream. + +5.4. Terminate Message + + The Terminate Message uses the DDP Untagged Buffer Model to + transfer-error-related information from the Data Source into an + Untagged Buffer at the Data Sink and then ceases all further + communications on the underlying DDP Stream. The Terminate Message + has the following semantics: + + * A Terminate Message MUST reference an Untagged Buffer. That is, + the Local Peer's RDMAP layer MUST request that the DDP layer mark + the Message as Untagged. + + * A Terminate Message references the Terminate Queue. That is, the + Local Peer's RDMAP layer MUST request that the DDP layer set the + Queue Number field to two. + + * One Terminate Message MUST consume one Untagged Buffer. + + * On a single RDMAP Stream, the RDMAP layer MUST guarantee placement + of a single Terminate Message. + + * A Terminate Message MUST be Delivered to the Remote Peer's RDMAP + layer. The RDMAP layer MUST Deliver the Terminate Message to the + ULP. + + * At the Remote Peer, when an invalid Terminate Message is delivered + to the Remote Peer's RDMAP layer, an error is surfaced (see + Section 7.1 "RDMAP Error Surfacing"). + + + + + +Recio, et al. Standards Track [Page 37] + +RFC 5040 RDMA Protocol Specification October 2007 + + + * The RDMAP layer Completes in error all ULP operations that have + not been provided to the DDP layer. + + * After sending a Terminate Message on an RDMAP Stream, the Local + Peer MUST NOT send any more Messages on that specific RDMAP + Stream. + + * After receiving a Terminate Message on an RDMAP Stream, the Remote + Peer MAY stop sending Messages on that specific RDMAP Stream. + +5.5. Ordering and Completions + + It is important to understand the difference between Placement and + Delivery ordering since RDMAP provides quite different semantics for + the two. + + Note that many current protocols, both as used in the Internet and + elsewhere, assume that data is both Placed and Delivered in order. + Taking advantage of this fact allowed applications to take a variety + of shortcuts. For RDMAP, many of these shortcuts are no longer safe + to use, and could cause application failure. + + The following rules apply to implementations of the RDMAP protocol. + Note that in these rules, Send includes Send, Send with Invalidate, + Send with Solicited Event, and Send with Solicited Event and + Invalidate: + + 1. RDMAP does not provide ordering among Messages on different RDMAP + Streams. + + 2. RDMAP does not provide ordering between operations that are + generated from the two ends of an RDMAP Stream. + + 3. RDMA Messages that use Tagged and Untagged Buffers MAY be Placed + in any order. If an application uses overlapping buffers (points + different Messages or portions of a single Message at the same + buffer), then it is possible that the last incoming write to the + Data Sink buffer will not be the last outgoing data sent from the + Data Source. + + 4. For a Send operation, the contents of an Untagged Buffer at the + Data Sink MAY be indeterminate until the Send is Delivered to the + ULP at the Data Sink. + + 5. For an RDMA Write operation, the contents of the Tagged Buffer at + the Data Sink MAY be indeterminate until a subsequent Send is + Delivered to the ULP at the Data Sink. + + + + +Recio, et al. Standards Track [Page 38] + +RFC 5040 RDMA Protocol Specification October 2007 + + + 6. For an RDMA Read operation, the contents of the Tagged Buffer at + the Data Sink MAY be indeterminate until the RDMA Read Response + Message has been Delivered at the Local Peer. + + Statements 4, 5, and 6 imply "no peeking" at the data to see if it is + done. It is possible for some data to arrive before logically + earlier data does, and peeking may cause unpredictable application + failure. + + 7. If the ULP or Application modifies the contents of Tagged or + Untagged Buffers, which are being modified by an RDMA Operation + while the RDMAP is processing the RDMA Operation, the state of + the Buffers is indeterminate. + + 8. If the ULP or Application modifies the contents of Tagged or + Untagged Buffers, which are read by an RDMA Operation while the + RDMAP is processing the RDMA Operation, the results of the read + are indeterminate. + + 9. The Completion of an RDMA Write or Send Operation at the Local + Peer does not guarantee that the ULP Message has yet reached the + Remote Peer ULP Buffer or been examined by the Remote ULP. + + 10. Send Messages MUST be Delivered to the ULP at the Remote Peer + after they are Delivered to RDMAP by DDP and in the order that + they were Delivered to RDMAP. + + Note that DDP ordering rules ensure that this will be the same + order that they were submitted at the Local Peer and that any + prior RDMA Writes have been submitted for ordered Placement at + the Remote Peer. This means that when the ULP sees the Delivery + of the Send, the memory buffers targeted by any preceding RDMA + Writes and Sends are available to be accessed locally or remotely + as authorized. If the ULP overlaps its buffers for different + operations, the data from the RDMA Write or Send may be + overwritten by subsequent RDMA Operations before the ULP receives + and processes the Delivery. + + 11. RDMA Read Response Messages MUST be Delivered to the ULP at the + Remote Peer after they are Delivered to RDMAP by DDP and in the + order that the they were Delivered to RDMAP. + + DDP ordering rules ensure that this will be the same order that + they were submitted at the Local Peer. This means that when the + ULP sees the Delivery of the RDMA Read Response, the memory + buffers targeted by the RDMA Read Response are available to be + accessed locally or remotely as authorized. If the ULP overlaps + + + + +Recio, et al. Standards Track [Page 39] + +RFC 5040 RDMA Protocol Specification October 2007 + + + its buffers for different operations, the data from the RDMA Read + Response may be overwritten by subsequent RDMA Operations before + the ULP receives and processes the Delivery. + + 12. RDMA Read Request Messages, including zero-length RDMA Read + Requests, MUST NOT start processing at the Remote Peer until they + have been Delivered to RDMAP by DDP. + + Note: the ULP is assured that data written can be read back. For + example, if + + a) an RDMA Read Request is issued by the local peer, + b) the Request targets the same ULP Buffer as a preceding Send + or RDMA Write (in the same direction as the RDMA Read + Request), and + c) there are no other sources of update for the ULP Buffer, + + then the Remote Peer will send back the data written by the Send + or RDMA Write. That is, for this example, the ULP Buffer is + Advertised for use on a series of RDMA Messages, is only valid on + the RDMAP Stream for which it is Advertised, and is not locally + updated while the series of RDMAP Messages are performed. For + this example, order rule (12) assures that subsequent local or + remote accesses to the ULP Buffer contain the data written by the + Send or RDMA Write. + + RDMA Read Response Messages MAY be generated at the Remote Peer + after subsequent RDMA Write Messages or Send Messages have been + Placed or Delivered. Therefore, when an application does an RDMA + Read Request followed by an RDMA Write (or Send) to the same + buffer, it may get the data from the later RDMA Write (or Send) + in the RDMA Read Response Message, even though the operations + completed in order at the Local Peer. If this behavior is not + desired, the Local Peer ULP must Fence the later RDMA write (or + Send) by withholding the RDMA Write Message until all outstanding + RDMA Read Responses have been Delivered. + + 13. The RDMAP layer MUST submit RDMA Messages to the DDP layer in the + order the RDMA Operations are submitted to the RDMAP layer by the + ULP. + + 14. A Send or RDMA Write Message MUST NOT be considered Complete at + the Local Peer (Data Source) until it has been successfully + completed at the DDP layer. + + 15. RDMA Operations MUST be Completed at the Local Peer in the order + that they were submitted by the ULP. + + + + +Recio, et al. Standards Track [Page 40] + +RFC 5040 RDMA Protocol Specification October 2007 + + + 16. At the Data Sink, an incoming Send Message MUST be Delivered to + the ULP only after the DDP Message has been Delivered to the + RDMAP layer by the DDP layer. + + 17. RDMA Read Response Message processing at the Remote Peer (reading + the specified Tagged Buffer) MUST be started only after the RDMA + Read Request Message has been Delivered by the DDP layer (thus, + all previous RDMA Messages have been properly submitted for + ordered Placement). + + 18. Send Messages MAY be Completed at the Remote Peer (Data Sink) + before prior incoming RDMA Read Request Messages have completed + their response processing. + + 19. An RDMA Read operation MUST NOT be Completed at the Local Peer + until the DDP layer Delivers the associated incoming RDMA Read + Response Message. + + 20. If more than one outstanding RDMA Read Request Messages are + supported by both peers, the RDMA Read Response Messages MUST be + submitted to the DDP layer on the Remote Peer in the order the + RDMA Read Request Messages were Delivered by DDP, but the actual + read of the buffer contents MAY take place in any order at the + Remote Peer. + + This simplifies Local Peer Completion processing for RDMA Reads + in that a Delivered RDMA Read Response MUST be sufficient to + Complete the RDMA Read operation. + +6. RDMAP Stream Management + + RDMAP Stream management consists of RDMAP Stream Initialization and + RDMAP Stream Termination. + +6.1. Stream Initialization + + RDMAP Stream initialization occurs after the LLP Stream has been + created (e.g., for DDP/MPA over TCP, the first TCP Segment after the + SYN, SYN/ACK exchange). The ULP is responsible for transitioning the + LLP Stream into RDMA-enabled mode. The switch to RDMA mode typically + occurs sometime after LLP Stream setup. Once in RDMA enabled mode, + an implementation MUST send only RDMA Messages across the transport + Stream until the RDMAP Stream is torn down. + + For each direction of an RDMAP Stream: + + * For a given RDMAP Stream, the number of outstanding RDMA Read + Requests is limited per RDMAP Stream direction. + + + +Recio, et al. Standards Track [Page 41] + +RFC 5040 RDMA Protocol Specification October 2007 + + + * It is the ULP's responsibility to set the maximum number of + outstanding, inbound RDMA Read Requests per RDMAP Stream + direction. + + * The RDMAP layer MUST provide the maximum number of outstanding, + inbound RDMA Read Requests per RDMAP Stream direction that were + negotiated between the ULP and the Local Peer's RDMAP layer. The + negotiation mechanism is outside the scope of this specification. + + * It is the ULP's responsibility to set the maximum number of + outstanding, outbound RDMA Read Requests per RDMAP Stream + direction. + + * The RDMAP layer MUST provide the maximum number of outstanding, + outbound RDMA Read Requests for the RDMAP Stream direction that + were negotiated between the ULP and the Local Peer's RDMAP layer. + The negotiation mechanism is outside the scope of this + specification. + + * The Local Peer's ULP is responsible for negotiating with the + Remote Peer's ULP the maximum number of outstanding RDMA Read + Requests for the RDMAP Stream direction. It is recommended that + the ULP set the maximum number of outstanding, inbound RDMA Read + Requests equal to the maximum number of outstanding, outbound RDMA + Read Requests for a given RDMAP Stream direction. + + * For outbound RDMA Read Requests, the RDMAP layer MUST NOT exceed + the maximum number of outstanding, outbound RDMA Read Requests + that were negotiated between the ULP and the Local Peer's RDMAP + layer. + + * For inbound RDMA Read Requests, the RDMAP layer MUST NOT exceed + the maximum number of outstanding, inbound RDMA Read Requests that + were negotiated between the ULP and the Local Peer's RDMAP layer. + +6.2. Stream Teardown + + There are three methods for terminating an RDMAP Stream: ULP Graceful + Termination, RDMAP Abortive Termination, and LLP Abortive + Termination. + + The ULP is responsible for performing ULP Graceful Termination. + After a ULP Graceful Termination, either side of the Stream can + initiate LLP Graceful Termination, using the graceful termination + mechanism provided by the LLP. + + + + + + +Recio, et al. Standards Track [Page 42] + +RFC 5040 RDMA Protocol Specification October 2007 + + + RDMAP Abortive Termination allows the RDMAP to issue a Terminate + Message describing the reason the RDMAP Stream was terminated. The + next section (6.2.1, "RDMAP Abortive Termination") describes the + RDMAP Abortive Termination in detail. + + LLP Abortive Termination results due to an LLP error and causes the + RDMAP Stream to be torn down midstream, without an RDMAP Terminate + Message. While this last method is highly undesirable, it is + possible, and the ULP should take this into consideration. + +6.2.1. RDMAP Abortive Termination + + RDMAP defines a Terminate operation that SHOULD be invoked when + either an RDMAP error is encountered or an LLP error is surfaced to + the RDMAP layer by the LLP. + + It is not always possible to send the Terminate Message. For + example, certain LLP errors may occur that cause the LLP Stream to be + torn down a) before RDMAP is aware of the error, b) before RDMAP is + able to send the Terminate Message, or c) after RDMAP has posted the + Terminate Message to the LLP, but it has not yet been transmitted by + the LLP. + + Note that an RDMAP Abortive Termination may entail loss of data. In + general, when a Terminate Message is received, it is impossible to + tell for sure what unacknowledged RDMA Messages were Completed + successfully at the Remote Peer. Thus, the state of all outstanding + RDMA Messages is indeterminate, and the Messages SHOULD be considered + Completed in error. + + When a peer sends or receives a Terminate Message, it MAY immediately + tear down the LLP Stream. The peer SHOULD perform a graceful LLP + teardown to ensure the Terminate Message is successfully Delivered. + + See Section 4.8, "Terminate Header", for a description of the + Terminate Message and its contents. See Section 5.4, "Terminate + Message", for a description of the Terminate Message semantics. + +7. RDMAP Error Management + + The RDMAP protocol does not have RDMAP- or DDP-layer error recovery + operations built in. If everything is working, the LLP guarantees + will ensure that the Messages are arriving at the destination. + + If errors are detected at the RDMAP or DDP layer, then the RDMAP, + DDP, and LLP Streams are Abortively Terminated (see Section 4.8, + "Terminate Header"). + + + + +Recio, et al. Standards Track [Page 43] + +RFC 5040 RDMA Protocol Specification October 2007 + + + In general, poor implementations or improper ULP programming cause + the errors detected at the RDMAP and DDP layers. In these cases, + returning a diagnostic termination error Message and closing the + RDMAP Stream is far simpler than attempting to maintain the RDMAP + Stream, particularly when the cause of the error is not known. + + If an LLP does not support teardown of a Stream independent of other + Streams, and an RDMAP error results in the Termination of a specific + Stream, then the LLP MUST label the Stream as an erroneous Stream and + MUST NOT allow any further data transfer on that Stream after RDMAP + requests the Stream to be torn down. + + For a specific LLP connection, when all Streams are either gracefully + torn down or are labeled as erroneous Streams, the LLP connection + MUST be torn down. + + Since errors are detected at the Remote Peer (possibly long) after + RDMA Messages are passed to the DDP and the LLP at the Local Peer and + after the RDMA Operations conveyed by the Messages are Completed, the + sender cannot easily determine which of its Messages have been + received. (RDMA Reads are an exception to this rule.) + + For a list of errors returned to the Remote Peer as a result of an + Abortive Termination, see Section 4.8, "Terminate Header". + +7.1. RDMAP Error Surfacing + + If an error occurs at the Local Peer, the RDMAP layer MUST attempt to + inform the local ULP that the error has occurred. + + The Local Peer MUST send a Terminate Message for each of the + following cases: + + 1. For errors detected while creating RDMA Write, Send, Send with + Invalidate, Send with Solicited Event, Send with Solicited Event + and Invalidate, or RDMA Read Requests, or other reasons not + directly associated with an incoming Message, the Terminate + Message and Error code are sent instead of the request. In this + case, the Error Type and Error Code fields are included in the + Terminate Message, but the Terminated DDP Header and Terminated + RDMA Header fields are set to zero. + + 2. For errors detected on an incoming RDMA Write, Send, Send with + Invalidate, Send with Solicited Event, Send with Solicited Event + and Invalidate, or Read Response Message (after the Message has + been Delivered by DDP), the Terminate Message is sent at the + earliest possible opportunity, preferably in the next outgoing + RDMA Message. In this case, the Error Type, Error Code, ULP PDU + + + +Recio, et al. Standards Track [Page 44] + +RFC 5040 RDMA Protocol Specification October 2007 + + + Length, and Terminated DDP Header fields are included in the + Terminate Message, but the Terminated RDMA Header field is set to + zero. + + 3. For errors detected on an incoming RDMA Read Request Message + (after the Message has been Delivered by DDP), the Terminate + Message is sent at the earliest possible opportunity, preferably + in the next outgoing RDMA Message. In this case, the Error Type, + Error Code, ULP PDU Length, Terminated DDP Header, and Terminated + RDMA Header fields are included in the Terminate Message. + + 4. If more than one error is detected on incoming RDMA Messages, + before the Terminate Message can be sent, then the first RDMA + Message (and its associated DDP Segment) that experienced an + error MUST be captured by the Terminate Message, in accordance + with rules 2 and 3 above. + +7.2. Errors Detected at the Remote Peer on Incoming RDMA Messages + + On incoming RDMA Writes, RDMA Read Response, Sends, Send with + Invalidate, Send with Solicited Event, Send with Solicited Event and + Invalidate, and Terminate Messages, the following must be validated: + + 1. The DDP layer MUST validate all DDP Segment fields. + + 2. The RDMA OpCode MUST be valid. + + 3. The RDMA Version MUST be valid. + + Additionally, on incoming Send with Invalidate and Send with + Solicited Event and Invalidate Messages, the following must also + be validated: + + 4. The Invalidate STag MUST be valid. + + 5. The STag MUST be associated to this RDMAP Stream. + + On incoming RDMA Request Messages, the following must be validated: + + 1. The DDP layer MUST validate all Untagged DDP Segment fields. + + 2. The RDMA OpCode MUST be valid. + + 3. The RDMA Version MUST be valid. + + 4. For non-zero length RDMA Read Request Messages: + + a. The Data Source STag MUST be valid. + + + +Recio, et al. Standards Track [Page 45] + +RFC 5040 RDMA Protocol Specification October 2007 + + + b. The Data Source STag MUST be associated to this RDMAP Stream. + + c. The Data Source Tagged Offset MUST fall in the range of legal + offsets associated with the Data Source STag. + + d. The sum of the Data Source Tagged Offset and the RDMA Read + Message Size MUST fall in the range of legal offsets + associated with the Data Source STag. + + e. The sum of the Data Source Tagged Offset and the RDMA Read + Message Size MUST NOT cause the Data Source Tagged Offset to + wrap. + +8. Security Considerations + + This section references the resources that discuss protocol- specific + security considerations and implications of using RDMAP with existing + security services. A detailed analysis of the security issues around + implementation and use of the RDMAP can be found in [RDMASEC]. + + [RDMASEC] introduces the RDMA reference model and discusses how the + resources of this model are vulnerable to attacks and the types of + attack these vulnerabilities are subject to. It also details the + levels of Trust available in this peer-to-peer model and how this + defines the nature of resource sharing. + + The IPsec requirements for RDDP are based on the version of IPsec + specified in RFC 2401 [RFC2401] and related RFCs, as profiled by RFC + 3723 [RFC3723], despite the existence of a newer version of IPsec + specified in RFC 4301 [RFC4301] and related RFCs [RFC4303], + [RFC4306], [RFC4835]. One of the important early applications of the + RDDP protocols is their use with iSCSI [iSER]; RDDP's IPsec + requirements follow those of IPsec in order to facilitate that usage + by allowing a common profile of IPsec to be used with iSCSI and the + RDDP protocols. In the future, RFC 3723 may be updated to the newer + version of IPsec, and the IPsec security requirements of any such + update should apply uniformly to iSCSI and the RDDP protocols. + +8.1. Summary of RDMAP-Specific Security Requirements + + [RDMASEC] defines the security requirements for the implementation of + the components of the RDMA reference model, namely the RDMA enabled + NIC (RNIC) and the Privileged Resource Manager. An RDMAP + implementation conforming to this specification MUST conform to these + requirements. + + + + + + +Recio, et al. Standards Track [Page 46] + +RFC 5040 RDMA Protocol Specification October 2007 + + +8.1.1. RDMAP (RNIC) Requirements + + RDMAP provides several countermeasures for all types of attacks as + introduced in [RDMASEC]. In the following, this specification lists + all security requirements that MUST be implemented by the RNIC. A + more detailed discussion of RNIC security requirements can be found + in Section 5 of [RDMASEC]. + + 1. An RNIC MUST ensure that a specific Stream in a specific + Protection Domain cannot access an STag in a different Protection + Domain. + + 2. An RNIC MUST ensure that if an STag is limited in scope to a + single Stream, no other Stream can use the STag. + + 3. An RNIC MUST ensure that a Remote Peer is not able to access + memory outside of the buffer specified when the STag was enabled + for remote access. + + 4. An RNIC MUST provide a mechanism for the ULP to establish and + revoke the association of a ULP Buffer to an STag and TO range. + + 5. An RNIC MUST provide a mechanism for the ULP to establish and + revoke read, write, or read and write access to the ULP Buffer + referenced by an STag. + + 6. An RNIC MUST ensure that the network interface can no longer + modify an Advertised Buffer after the ULP revokes remote access + rights for an STag. + + 7. An RNIC MUST ensure that a Remote Peer is not able to invalidate + an STag enabled for remote access, if the STag is shared on + multiple streams. + + 8. An RNIC MUST choose the value of STags in a way difficult to + predict. It is RECOMMENDED to sparsely populate them over the + full available range. + + 9. An RNIC MUST NOT enable sharing a Completion Queue (CQ) across + ULPs that do not share partial mutual trust. + + 10. An RNIC MUST ensure that if a CQ overflows, any Streams that do + not use the CQ MUST remain unaffected. + + 11. An RNIC implementation SHOULD provide a mechanism to cap the + number of outstanding RDMA Read Requests. + + + + + +Recio, et al. Standards Track [Page 47] + +RFC 5040 RDMA Protocol Specification October 2007 + + + 12. An RNIC MUST NOT enable firmware to be loaded on the RNIC + directly from an untrusted Local Peer or Remote Peer, unless the + Peer is properly authenticated*, and the update is done via a + secure protocol, such as IPsec. + + * by a mechanism outside the scope of this specification. The + mechanism presumably entails authenticating that the remote ULP + has the right to perform the update. + +8.1.2. Privileged Resource Manager Requirements + + With RDMAP, all reservations of local resources are initiated from + local ULPs. To protect from local attacks including unfair resource + distribution and gaining unauthorized access to RNIC resources, a + Privileged Resource Manager (PRM) must be implemented, which manages + all local resource allocation. Note that the PRM must not be + provided as an independent component, and its functionality can also + be implemented as part of the privileged ULP or as part of the RNIC + itself. + + A PRM implementation must meet the following security requirements (a + more detailed discussion of PRM security requirements can be found in + Section 5 of [RDMASEC]): + + 1. All Non-Privileged ULP interactions with the RNIC Engine that + could affect other ULPs MUST be done using the Resource Manager + as a proxy. + + 2. All ULP resource allocation requests for scarce resources MUST + also be done using a Privileged Resource Manager. + + 3. The Privileged Resource Manager MUST NOT assume that different + ULPs share Partial Mutual Trust unless there is a mechanism to + ensure that the ULPs do indeed share partial mutual trust. + + 4. If Non-Privileged ULPs are supported, the Privileged Resource + Manager MUST verify that the Non-Privileged ULP has the right to + access a specific Data Buffer before allowing an STag for which + the ULP has access rights to be associated with a specific Data + Buffer. + + 5. The Privileged Resource Manager MUST control the allocation of CQ + entries. + + 6. The Privileged Resource Manager SHOULD prevent a Local Peer from + allocating more than its fair share of resources. + + + + + +Recio, et al. Standards Track [Page 48] + +RFC 5040 RDMA Protocol Specification October 2007 + + + 7. RDMA Read Request Queue resource consumption MUST be controlled + by the Privileged Resource Manager such that RDMAP/DDP Streams + that do not share Partial Mutual Trust do not share RDMA Read + Request Queue resources. + + 8. If an RNIC provides the ability to share receive buffers across + multiple Streams, the combination of the RNIC and the Privileged + Resource Manager MUST be able to detect if the Remote Peer is + attempting to consume more than its fair share of resources so + that the Local Peer can apply countermeasures to detect and + prevent the attack. + +8.2. Security Services for RDMAP + + RDMAP is using IP-based network services to control, read, and write + data buffers over the network. Therefore, all exchanged control and + data packets are vulnerable to spoofing, tampering, and information + disclosure attacks. + + RDMAP Streams that are subject to impersonation attacks or Stream + hijacking attacks can be authenticated, have their integrity + protected, and be protected from replay attacks. Furthermore, + confidentiality protection can be used to protect from eavesdropping. + +8.2.1. Available Security Services + + The IPsec protocol suite [RFC2401] defines strong countermeasures to + protect an IP stream from those attacks. Several levels of + protection can guarantee session confidentiality, per-packet source + authentication, per-packet integrity, and correct packet sequencing. + + RDMAP security may also profit from SSL or TLS security services + provided for TCP-based ULPs [RFC4346]. Used underneath RDMAP, these + security services also provide for stream authentication, data + integrity, and confidentiality. As discussed in [RDMASEC], + limitations on the maximum packet length to be carried over the + network and potentially inefficient out-of-order packet processing at + the data sink make SSL and TLS less appropriate for RDMAP than IPsec. + + If SSL is layered on top of RDMAP, SSL does not protect the RDMAP + headers. Thus, a man-in-the-middle attack can still occur by + modifying the RDMAP header to incorrectly place the data into the + wrong buffer, thus effectively corrupting the data stream. + + By remaining independent of ULP and LLP security protocols, RDMAP + will benefit from continuing improvements at those layers. Users are + provided flexibility to adapt to their specific security requirements + and the ability to adapt to future security challenges. Given this, + + + +Recio, et al. Standards Track [Page 49] + +RFC 5040 RDMA Protocol Specification October 2007 + + + the vulnerabilities of RDMAP to active third-party interference are + no greater than any other protocol running over an LLP such as TCP or + SCTP. + +8.2.2. Requirements for IPsec Services for RDMAP + + Because IPsec is designed to secure arbitrary IP packet streams, + including streams where packets are lost, RDMAP can run on top of + IPsec without any change. IPsec packets are processed (e.g., + integrity checked and possibly decrypted) in the order they are + received, and an RDMAP Data Sink will process the decrypted RDMA + Messages contained in these packets in the same manner as RDMA + Messages contained in unsecured IP packets. + + The IP Storage working group has defined the normative IPsec + requirements for IP Storage [RFC3723]. Portions of this + specification are applicable to the RDMAP. In particular, a + compliant implementation of IPsec services for RDMAP MUST meet the + requirements as outlined in Section 2.3 of [RFC3723]. Without + replicating the detailed discussion in [RFC3723], this includes the + following requirements: + + 1. The implementation MUST support IPsec ESP [RFC2406], as well as + the replay protection mechanisms of IPsec. When ESP is utilized, + per-packet data origin authentication, integrity, and replay + protection MUST be used. + + 2. It MUST support ESP in tunnel mode and MAY implement ESP in + transport mode. + + 3. It MUST support IKE [RFC2409] for peer authentication, + negotiation of security associations, and key management, using + the IPsec DOI [RFC2407]. + + 4. It MUST NOT interpret the receipt of a IKE Phase 2 delete message + as a reason for tearing down the RDMAP stream. Since IPsec + acceleration hardware may only be able to handle a limited number + of active IKE Phase 2 SAs, idle SAs may be dynamically brought + down, and a new SA be brought up again, if activity resumes. + + 5. It MUST support peer authentication using a pre-shared key, and + MAY support certificate-based peer authentication using digital + signatures. Peer authentication using the public key encryption + methods [RFC2409] SHOULD NOT be used. + + + + + + + +Recio, et al. Standards Track [Page 50] + +RFC 5040 RDMA Protocol Specification October 2007 + + + 6. It MUST support IKE Main Mode and SHOULD support Aggressive Mode. + IKE Main Mode with pre-shared key authentication SHOULD NOT be + used when either of the peers uses a dynamically assigned IP + address. + + 7. When digital signatures are used to achieve authentication, + either IKE Main Mode or IKE Aggressive Mode MAY be used. In + these cases, an IKE negotiator SHOULD use IKE Certificate Request + Payload(s) to specify the certificate authority (or authorities) + that are trusted in accordance with its local policy. IKE + negotiators SHOULD check the pertinent Certificate Revocation + List (CRL) before accepting a PKI certificate for use in IKE's + authentication procedures. + + 8. Access to locally stored secret information (pre-shared or + private key for digital signing) must be suitably restricted, + since compromise of the secret information nullifies the security + properties of the IKE/IPsec protocols. + + 9. It MUST follow the guidelines of Section 2.3.4 of [RFC3723] on + the setting of IKE parameters to achieve a high level of + interoperability without requiring extensive configuration. + + Furthermore, implementation and deployment of the IPsec services for + RDDP should follow the Security Considerations outlined in Section 5 + of [RFC3723]. + +9. IANA Considerations + + This document requests no direct action from IANA. The following + consideration is listed here as commentary. + + If RDMAP was enabled a priori for a ULP by connecting to a well-known + port, this well-known port would be registered for the RDMAP with + IANA. The registration of the well-known port will be the + responsibility of the ULP specification. + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 51] + +RFC 5040 RDMA Protocol Specification October 2007 + + +10. References + +10.1. Normative References + + [DDP] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct + Data Placement over Reliable Transports", RFC 5041, October + 2007. + + [iSER] Ko, M., Chadalapaka, M., Hufferd, J., Elzur, U., Shah, H., + and P. Thaler, "Internet Small Computer System Interface + (iSCSI) Extensions for Remote Direct Memory Access (RDMA)" + RFC 5046, October 2007. + + [MPA] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. + Carrier, "Marker PDU Aligned Framing for TCP + Specification", RFC 5044, October 2007. + + [RDMASEC] Pinkerton, J. and E. Deleganes, "Direct Data Placement + Protocol (DDP) / Remote Direct Memory Access Protocol + (RDMAP) Security", RFC 5042, October 2007. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC2406] Kent, S. and R. Atkinson, "IP Encapsulating Security + Payload (ESP)", RFC 2406, November 1998. + + [RFC2407] Piper, D., "The Internet IP Security Domain of + Interpretation of ISAKMP", RFC 2407, November 1998. + + [RFC2409] Harkins, D. and D. Carrel, "The Internet Key Exchange + (IKE)", RFC 2409, November 1998. + + [RFC3723] Aboba, B., Tseng, J., Walker, J., Rangan, V., and F. + Travostino, "Securing Block Storage Protocols over IP", RFC + 3723, April 2004. + + [RFC2401] Kent, S. and R. Atkinson, "Security Architecture for the + Internet Protocol", RFC 2401, November 1998. + + [SCTP] Stewart, R., Ed., "Stream Control Transmission Protocol", + RFC 4960, September 2007. + + [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC + 793, September 1981. + + + + + + +Recio, et al. Standards Track [Page 52] + +RFC 5040 RDMA Protocol Specification October 2007 + + +10.2. Informative References + + [RFC4301] Kent, S. and K. Seo, "Security Architecture for the + Internet Protocol", RFC 4301, December 2005. + + [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", RFC + 4303, December 2005. + + [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", RFC + 4306, December 2005. + + [RFC4346] Dierks, T. and E. Rescorla, "The TLS Protocol Version 1.1", + RFC 4346, April 2006. + + [RFC4835] Manral, V., "Cryptographic Algorithm Implementation + Requirements for Encapsulating Security Payload (ESP) and + Authentication Header (AH)", RFC 4835, April 2007. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 53] + +RFC 5040 RDMA Protocol Specification October 2007 + + +Appendix A. DDP Segment Formats for RDMA Messages + + This appendix is for information only and is NOT part of the + standard. It simply depicts the DDP Segment format for the various + RDMA Messages. + +A.1. DDP Segment for RDMA Write + + The following figure depicts an RDMA Write, DDP Segment: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP Control | RDMA Control | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Data Sink STag | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Data Sink Tagged Offset | + + + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | RDMA Write ULP Payload | + // // + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 11: RDMA Write, DDP Segment Format + + + + + + + + + + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 54] + +RFC 5040 RDMA Protocol Specification October 2007 + + +A.2. DDP Segment for RDMA Read Request + + The following figure depicts an RDMA Read Request, DDP Segment: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP Control | RDMA Control | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Reserved (Not Used) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP (RDMA Read Request) Queue Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP (RDMA Read Request) Message Sequence Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP (RDMA Read Request) Message Offset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Data Sink STag (SinkSTag) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + + Data Sink Tagged Offset (SinkTO) + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | RDMA Read Message Size (RDMARDSZ) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Data Source STag (SrcSTag) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + + Data Source Tagged Offset (SrcTO) + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 12: RDMA Read Request, DDP Segment format + + + + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 55] + +RFC 5040 RDMA Protocol Specification October 2007 + + +A.3. DDP Segment for RDMA Read Response + + The following figure depicts an RDMA Read Response, DDP Segment: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP Control | RDMA Control | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Data Sink STag | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Data Sink Tagged Offset | + + + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | RDMA Read Response ULP Payload | + // // + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 13: RDMA Read Response, DDP Segment Format + +A.4. DDP Segment for Send and Send with Solicited Event + + The following figure depicts a Send and Send with Solicited + Request, DDP Segment: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP Control | RDMA Control | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Reserved (Not Used) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Send) Queue Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Send) Message Sequence Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Send) Message Offset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Send ULP Payload | + // // + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 14: Send and Send with Solicited Event, DDP Segment Format + + + + + +Recio, et al. Standards Track [Page 56] + +RFC 5040 RDMA Protocol Specification October 2007 + + +A.5. DDP Segment for Send with Invalidate and Send with SE and + Invalidate + + The following figure depicts a Send with Invalidate and Send with + Solicited and Invalidate Request, DDP Segment: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP Control | RDMA Control | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Invalidate STag | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Send) Queue Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Send) Message Sequence Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Send) Message Offset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Send ULP Payload | + // // + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 15: Send with Invalidate and Send with SE and Invalidate, + DDP Segment Format + + + + + + + + + + + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 57] + +RFC 5040 RDMA Protocol Specification October 2007 + + +A.6. DDP Segment for Terminate + + The following figure depicts a Terminate, DDP Segment: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP Control | RDMA Control | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Reserved (Not Used) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP (Terminate) Queue Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP (Terminate) Message Sequence Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP (Terminate) Message Offset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Terminate Control | Reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DDP Segment Length (if any) | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + | | + + + + | Terminated DDP Header (if any) | + + + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + // // + | Terminated RDMA Header (if any) | + + + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 16: Terminate, DDP Segment Format + + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 58] + +RFC 5040 RDMA Protocol Specification October 2007 + + +Appendix B. Ordering and Completion Table + + The following table summarizes the ordering relationships that are + defined in Section 5.5, "Ordering and Completions", from the + standpoint of the local peer issuing the two Operations. Note that + in the table that follows, Send includes Send, Send with Invalidate, + Send with Solicited Event, and Send with Solicited Event and + Invalidate. + + ------+-------+----------------+----------------+---------------- + First | Later | Placement | Placement | Ordering + Op | Op | guarantee at | guarantee at | guarantee at + | | Remote Peer | Local Peer | Remote Peer + | | | | + ------+-------+----------------+----------------+---------------- + Send | Send | No placement | Not applicable | Completed in + | | guarantee. If | | order. + | | guarantee is | | + | | necessary, see | | + | | footnote 1. | | + ------+-------+----------------+----------------+---------------- + Send | RDMA | No placement | Not applicable | Not applicable + | Write | guarantee. If | | + | | guarantee is | | + | | necessary, see | | + | | footnote 1. | | + ------+-------+----------------+----------------+---------------- + Send | RDMA | No placement | RDMA Read | RDMA Read + | Read | guarantee | Response | Response + | | between Send | Payload will | Message will + | | Payload and | not be placed | not be + | | RDMA Read | at the local | generated until + | | Request Header | peer until the | Send has been + | | | Send Payload is| Completed + | | | placed at the | + | | | Remote Peer | + ------+-------+----------------+----------------+---------------- + RDMA | Send | No placement | Not applicable | Not applicable + Write | | guarantee. If | | + | | guarantee is | | + | | necessary, see | | + | | footnote 1. | | + + + + + + + + + +Recio, et al. Standards Track [Page 59] + +RFC 5040 RDMA Protocol Specification October 2007 + + + ------+-------+----------------+----------------+---------------- + RDMA | RDMA | No placement | Not applicable | Not applicable + Write | Write | guarantee. If | | + | | guarantee is | | + | | necessary, see | | + | | footnote 1. | | + ------+-------+----------------+----------------+---------------- + RDMA | RDMA | No placement | RDMA Read | Not applicable + Write | Read | guarantee | Response | + | | between RDMA | Payload will | + | | Write Payload | not be placed | + | | and RDMA Read | at the local | + | | Request Header | peer until the | + | | | RDMA Write | + | | | Payload is | + | | | placed at the | + | | | Remote Peer | + ------+-------+----------------+----------------+---------------- + RDMA | Send | No placement | Send Payload | Not applicable + Read | | guarantee | may be placed | + | | between RDMA | at the remote | + | | Read Request | peer before the| + | | Header and Send| RDMA Read | + | | payload | Response is | + | | | generated. | + | | | If guarantee is| + | | | necessary, see | + | | | footnote 2. | + ------+-------+----------------+----------------+---------------- + RDMA | RDMA | No placement | RDMA Write | Not applicable + Read | Write | guarantee | Payload may be | + | | between RDMA | placed at the | + | | Read Request | Remote Peer | + | | Header and RDMA| before the RDMA| + | | Write payload | Read Response | + | | | is generated. | + | | | If guarantee is| + | | | necessary, see | + | | | footnote 2. | + + + + + + + + + + + + +Recio, et al. Standards Track [Page 60] + +RFC 5040 RDMA Protocol Specification October 2007 + + + ------+-------+----------------+----------------+---------------- + RDMA | RDMA | No placement | No placement | Second RDMA + Read | Read | guarantee of | guarantee of | Read Response + | | the two RDMA | the two RDMA | will not be + | | Read Request | Read Response | generated until + | | Headers | Payloads. | first RDMA Read + | | Additionally, | | Response is + | | there is no | | generated. + | | guarantee that | | + | | the Tagged | | + | | Buffers | | + | | referenced in | | + | | the RDMA Read | | + | | will be read in| | + | | order | | + + Figure 17: Operation Ordering + + Footnote 1: If the guarantee is necessary, a ULP may insert an RDMA + Read operation and wait for it to complete to act as a Fence. + + Footnote 2: If the guarantee is necessary, a ULP may wait for the + RDMA Read operation to complete before performing the Send. + +Appendix C. Contributors + + Dwight Barron + Hewlett-Packard Company + 20555 SH 249 + Houston, TX 77070-2698 USA + Phone: 281-514-2769 + EMail: dwight.barron@hp.com + + + Caitlin Bestler + Broadcom Corporation + 16215 Alton Parkway + Irvine, CA 92619-7013 USA + Phone: 949-926-6383 + EMail: caitlinb@broadcom.com + + + John Carrier + Cray, Inc. + 411 First Avenue S, Suite 600 + Seattle, WA 98104-2860 USA + Phone: 206-701-2090 + EMail: carrier@cray.com + + + +Recio, et al. Standards Track [Page 61] + +RFC 5040 RDMA Protocol Specification October 2007 + + + Ted Compton + EMC Corporation + Research Triangle Park, NC 27709 USA + Phone: 919-248-6075 + EMail: compton_ted@emc.com + + + Uri Elzur + Broadcom Corporation + 16215 Alton Parkway + Irvine, California 92619-7013 USA + Phone: +1 (949) 585-6432 + EMail: Uri@Broadcom.com + + + Hari Ghadia + Gen10 Technology, Inc. + 1501 W Shady Grove Road + Grand Prairie, TX 75050 + Phone: (972) 301 3630 + EMail: hghadia@gen10technology.com + + + Howard C. Herbert + Intel Corporation + MS CH7-404 + 5000 West Chandler Blvd. + Chandler, Arizona 85226 + Phone: 480-554-3116 + EMail: howard.c.herbert@intel.com + + + Mike Ko + IBM + 650 Harry Rd. + San Jose, CA 95120 + Phone: (408) 927-2085 + EMail: mako@us.ibm.com + + + Mike Krause + Hewlett-Packard Company + 43LN + 19410 Homestead Road + Cupertino, CA 95014 USA + Phone: 408-447-3191 + EMail: krause@cup.hp.com + + + + +Recio, et al. Standards Track [Page 62] + +RFC 5040 RDMA Protocol Specification October 2007 + + + Dave Minturn + Intel Corporation + MS JF1-210 + 5200 North East Elam Young Parkway + Hillsboro, Oregon 97124 + Phone: 503-712-4106 + EMail: dave.b.minturn@intel.com + + + Mike Penna + Broadcom Corporation + 16215 Alton Parkway + Irvine, California 92619-7013 USA + Phone: +1 (949) 926-7149 + EMail: MPenna@Broadcom.com + + + Jim Pinkerton + Microsoft, Inc. + One Microsoft Way + Redmond, WA 98052 USA + EMail: jpink@microsoft.com + + + Hemal Shah + Broadcom Corporation + 5300 California Avenue + Irvine, CA 92617 USA + Phone: +1 (949) 926-6941 + EMail: hemal@broadcom.com + + + Allyn Romanow + Cisco Systems + 170 W Tasman Drive + San Jose, CA 95134 USA + Phone: +1 408 525 8836 + EMail: allyn@cisco.com + + + Tom Talpey + Network Appliance + 1601 Trapelo Road #16 + Waltham, MA 02451 USA + Phone: +1 (781) 768-5329 + EMail: thomas.talpey@netapp.com + + + + + +Recio, et al. Standards Track [Page 63] + +RFC 5040 RDMA Protocol Specification October 2007 + + + Patricia Thaler + Broadcom Corporation + 16215 Alton Parkway + Irvine, CA 92619-7013 USA + Phone: +1-916-570-2707 + EMail: pthaler@broadcom.com + + + Jim Wendt + Hewlett-Packard Company + 8000 Foothills Boulevard MS 5668 + Roseville, CA 95747-5668 USA + Phone: +1 916 785 5198 + EMail: jim_wendt@hp.com + + + Madeline Vega + IBM + 11400 Burnet Rd. Bld.45-2L-007 + Austin, TX 78758 USA + Phone: 512-838-7739 + EMail: mvega1@us.ibm.com + + + Claudia Salzberg + IBM + 11501 Burnet Rd. Bld.902-5B-014 + Austin, TX 78758 USA + Phone: 512-838-5156 + EMail: salzberg@us.ibm.com + + + + + + + + + + + + + + + + + + + + + +Recio, et al. Standards Track [Page 64] + +RFC 5040 RDMA Protocol Specification October 2007 + + +Authors' Addresses + + Renato J. Recio + IBM Corp. + 11501 Burnett Road + Austin, TX 78758 USA + Phone: 512-838-3685 + EMail: recio@us.ibm.com + + + Bernard Metzler + IBM Research GmbH + Zurich Research Laboratory + Saeumerstrasse 4 + CH-8803 Rueschlikon, Switzerland + Phone: +41 44 724 8605 + EMail: bmt@zurich.ibm.com + + + Paul R. Culley + Hewlett-Packard Company + 20555 SH 249 + Houston, TX 77070-2698 USA + Phone: 281-514-5543 + EMail: paul.culley@hp.com + + + Jeff Hilland + Hewlett-Packard Company + 20555 SH 249 + Houston, TX 77070-2698 USA + Phone: 281-514-9489 + EMail: jeff.hilland@hp.com + + + Dave Garcia + 24100 Hutchinson Rd. + Los Gatos, CA 95033 USA + Phone: +1 (831) 247-4464 + Email: Dave.Garcia@StanfordAlumni.org + + + + + + + + + + + +Recio, et al. Standards Track [Page 65] + +RFC 5040 RDMA Protocol Specification October 2007 + + +Full Copyright Statement + + Copyright (C) The IETF Trust (2007). + + This document is subject to the rights, licenses and restrictions + contained in BCP 78, and except as set forth therein, the authors + retain all their rights. + + This document and the information contained herein are provided on an + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND + THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS + OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF + THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + + The IETF takes no position regarding the validity or scope of any + Intellectual Property Rights or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; nor does it represent that it has + made any independent effort to identify any such rights. Information + on the procedures with respect to rights in RFC documents can be + found in BCP 78 and BCP 79. + + Copies of IPR disclosures made to the IETF Secretariat and any + assurances of licenses to be made available, or the result of an + attempt made to obtain a general license or permission for the use of + such proprietary rights by implementers or users of this + specification can be obtained from the IETF on-line IPR repository at + http://www.ietf.org/ipr. + + The IETF invites any interested party to bring to its attention any + copyrights, patents or patent applications, or other proprietary + rights that may cover technology that may be required to implement + this standard. Please address the information to the IETF at + ietf-ipr@ietf.org. + + + + + + + + + + + + +Recio, et al. Standards Track [Page 66] + |