diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc5045.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc5045.txt')
-rw-r--r-- | doc/rfc/rfc5045.txt | 1235 |
1 files changed, 1235 insertions, 0 deletions
diff --git a/doc/rfc/rfc5045.txt b/doc/rfc/rfc5045.txt new file mode 100644 index 0000000..0f04887 --- /dev/null +++ b/doc/rfc/rfc5045.txt @@ -0,0 +1,1235 @@ + + + + + + +Network Working Group C. Bestler, Ed. +Request for Comments: 5045 Neterion +Category: Informational L. Coene + Nokia Siemens Networks + October 2007 + + + Applicability of Remote Direct Memory Access Protocol (RDMA) + and Direct Data Placement Protocol (DDP) + +Status of This Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Abstract + + This document describes the applicability of Remote Direct Memory + Access Protocol (RDMAP) and the Direct Data Placement Protocol (DDP). + It compares and contrasts the different transport options over IP + that DDP can use, provides guidance to ULP developers on choosing + between available transports and/or how to be indifferent to the + specific transport layer used, compares use of DDP with direct use of + the supporting transports, and compares DDP over IP transports with + non-IP transports that support RDMA functionality. + + + + + + + + + + + + + + + + + + + + + + + + + +Bestler & Coene Informational [Page 1] + +RFC 5045 RDMA/DDP Applicability October 2007 + + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 + 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 + 3. Direct Placement . . . . . . . . . . . . . . . . . . . . . . . 5 + 3.1. Direct Placement Using Only the LLP . . . . . . . . . . . 5 + 3.2. Fewer Required ULP Interactions . . . . . . . . . . . . . 6 + 4. Tagged Messages . . . . . . . . . . . . . . . . . . . . . . . 6 + 4.1. Order-Independent Reception . . . . . . . . . . . . . . . 7 + 4.2. Reduced ULP Notifications . . . . . . . . . . . . . . . . 7 + 4.3. Simplified ULP Exchanges . . . . . . . . . . . . . . . . . 8 + 4.4. Order-Independent Sending . . . . . . . . . . . . . . . . 9 + 4.5. Untagged Messages and Tagged Buffers as ULP Credits . . . 10 + 5. RDMA Read . . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 6. LLP Comparisons . . . . . . . . . . . . . . . . . . . . . . . 13 + 6.1. Multistreaming Implications . . . . . . . . . . . . . . . 13 + 6.2. Out-of-Order Reception Implications . . . . . . . . . . . 13 + 6.3. Header and Marker Overhead . . . . . . . . . . . . . . . . 13 + 6.4. Middlebox Support . . . . . . . . . . . . . . . . . . . . 14 + 6.5. Processing Overhead . . . . . . . . . . . . . . . . . . . 14 + 6.6. Data Integrity Implications . . . . . . . . . . . . . . . 14 + 6.6.1. MPA/TCP Specifics . . . . . . . . . . . . . . . . . . 15 + 6.6.2. SCTP Specifics . . . . . . . . . . . . . . . . . . . . 15 + 6.7. Non-IP Transports . . . . . . . . . . . . . . . . . . . . 15 + 6.7.1. No RDMA-Layer Ack . . . . . . . . . . . . . . . . . . 16 + 6.8. Other IP Transports . . . . . . . . . . . . . . . . . . . 16 + 6.9. LLP-Independent Session Establishment . . . . . . . . . . 17 + 6.9.1. RDMA-Only Session Establishment . . . . . . . . . . . 17 + 6.9.2. RDMA-Conditional Session Establishment . . . . . . . . 18 + 7. Local Interface Implications . . . . . . . . . . . . . . . . . 18 + 8. Security Considerations . . . . . . . . . . . . . . . . . . . 19 + 8.1. Connection/Association Setup . . . . . . . . . . . . . . . 19 + 8.2. Tagged Buffer Exposure . . . . . . . . . . . . . . . . . . 19 + 8.3. Impact of Encrypted Transports . . . . . . . . . . . . . . 19 + 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 + 9.1. Normative References . . . . . . . . . . . . . . . . . . . 19 + 9.2. Informative References . . . . . . . . . . . . . . . . . . 19 + + + + + + + + + + + + + + +Bestler & Coene Informational [Page 2] + +RFC 5045 RDMA/DDP Applicability October 2007 + + +1. Introduction + + Remote Direct Memory Access Protocol (RDMAP) [RFC5040] and Direct + Data Placement (DDP) [RFC5041] work together to provide application- + independent efficient placement of application payload directly into + buffers specified by the Upper Layer Protocol (ULP). + + The DDP protocol is responsible for direct placement of received + payload into ULP-specified buffers. The RDMAP protocol provides + completion notifications to the ULP and support for Data-Sink- + initiated fetch of Advertised Buffers (RDMA Reads). + + DDP and RDMAP are both application-independent protocols that allow + the ULP to perform remote direct data placement. DDP can use + multiple standard IP transports including SCTP and TCP. + + By clarifying the situations where the functionality of these + protocols is applicable, this document can guide implementers and + application and protocol designers in selecting which protocols to + use. + + The applicability of RDMAP/DDP is driven by their unique + capabilities: + + o This document will discuss when common data placement procedures + are of more benefit to applications than application-specific + solutions built on top of direct use of the underlying transport. + + o DDP supports both Untagged and Tagged Buffers. Tagged Buffers + allow the Data Sink ULP to be indifferent to what order (or in + what messages) the Data Source sent the data, or in what order + packets are received. Typically, tagged data can be used for + payload transfer, while untagged is best used for control + messages. However each upper-layer protocol can determine the + optimal use of Tagged and Untagged Messages for itself. This + document will discuss when Data Source flexibility is of benefit + to applications. + + o RDMAP consolidates ULP notifications, thereby minimizing the + number of required ULP interactions. + + o RDMAP defines RDMA Reads, which allow remote access to Advertised + Buffers. This document will review the advantages of using RDMA + Reads as contrasted to alternate solutions. + + A more comprehensive introduction to the RDMAP and DDP protocols + and discussion of their security considerations can be found in + [RFC5042]. + + + +Bestler & Coene Informational [Page 3] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + Some non-IP transports, such as InfiniBand, directly integrate RDMA + features. This document will review the applicability of providing + RDMA services over ubiquitous IP transports instead of over + customized transport protocols. Due to the fact that DDP is defined + cleanly as a layer over existing IP transports, DDP has simpler + ordering rules than some prior RDMA protocols. This may have some + implications for application designers. + + The full capabilities of DDP and RDMAP can only be fully realized by + applications that are designed to exploit them. The coexistence of + RDMAP/DDP-aware local interfaces with traditional socket interfaces + will also be explored. + + Finally, DDP support is defined for at least two IP transports: SCTP + [RFC5043] and TCP [RFC5044]. The rationale for supporting both + transports is reviewed, as well as when each would be the appropriate + selection. + +2. Definitions + + Advertisement - the act of informing a Remote Peer that a local RDMA + Buffer is available to it. A Node makes available an RDMA Buffer + for incoming RDMA Read or RDMA Write access by informing its RDMA/ + DDP peer of the Tagged Buffer identifiers (STag, base address, and + buffer length). This Advertisement of Tagged Buffer information + is not defined by RDMA/DDP and is left to the ULP. A typical + method would be for the Local Peer to embed the Tagged Buffer's + Steering Tag, base address, and length in a Send Message destined + for the Remote Peer. + + Data Sink - The peer receiving a data payload. Note that the Data + Sink can be required to both send and receive RDMA/DDP Messages to + transfer a data payload. + + Data Source - The peer sending a data payload. Note that the Data + Source can be required to both send and receive RDMA/DDP Messages + to transfer a data payload. + + Lower Layer Protocol (LLP) - The transport protocol that provides + services to DDP. This is an IP transport with any required + adaptation layer. Adaptation layers are defined for SCTP and TCP. + + Steering Tag (STag) - An identifier of a Tagged Buffer on a Node, + valid as defined within a protocol specification. + + + + + + + +Bestler & Coene Informational [Page 4] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + Tagged Message - A DDP message that is directed to a ULP-specified + buffer based upon imbedded addressing information. In the + immediate sense, the destination buffer is specified by the + message sender. The message receiver is given no independent + indication that a Tagged Message has been received. + + Untagged Message - A DDP message that is directed to a ULP-specified + buffer based upon a Message Sequence Number being matched with a + receiver-supplied buffer. The destination buffer is specified by + the message receiver. The message receiver is notified by some + mechanism that an Untagged Message has been received. + + Upper Layer Protocol (ULP) - The direct user of RDMAP/DDP services. + In addition to protocols such as iSER [RFC5046] and NFSv4 over + RDMA [NFSDIRECT], the ULP may be embedded in an application or a + middleware layer, as is often the case for the Sockets Direct + Protocol (SDP) and Remote Procedure Call (RPC) protocols. + +3. Direct Placement + + Direct Data Placement optimizes the placement of ULP Payload into the + correct destination buffers, typically eliminating intermediate + copying. Placement is enabled without regard to order of arrival, + order of transmission, or requirement of per-placement interaction + with the ULP. + + RDMAP minimizes the required ULP interactions. This capability is + most valuable for applications that require multiple transport layer + packets for each required ULP interaction. + +3.1. Direct Placement Using Only the LLP + + Direct data placement can be achieved without RDMA. Pre-posting of + receive buffers could allow a non-RDMA network stack to place data + directly to user buffers. + + The degree to which DDP optimizes depends on which transport it is + being compared with, and on the nature of the local interface. + Without RDMAP/DDP, pre-posting buffers require the receiving side to + accurately predict the required buffers and their sizes. This is not + feasible for all ULPs. By contrast, DDP only requires the ULP to + predict the sequence and size of incoming Untagged Messages. + + An application that could predict incoming messages and required + nothing more than direct placement into buffers might be able to do + so with a properly designed local interface to native SCTP or TCP + (without RDMA). This is easier using native SCTP because the + + + + +Bestler & Coene Informational [Page 5] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + application would only have to predict the sequence of messages and + the maximum size of each message, not the exact size. + + The main benefit of DDP for such an application would be that pre- + posting of receive buffers is a mandated local interface capability, + and that predictions can always be made on a per-message basis (not + per byte). + + The Lower Layer Protocol, LLP, can also be used directly if ULP- + specific knowledge is built into the protocol stack to allow "parse + and place" handling of received packets. Such a solution either + requires interaction with the ULP or the protocol stack's knowledge + of ULP-specific syntax rules. + + DDP achieves the benefits of directly placing incoming payload + without requiring tight coupling between the ULP and the protocol + stack. However, "parse and place" capabilities can certainly provide + equivalent services to a limited number of ULPs. + +3.2. Fewer Required ULP Interactions + + While reducing the number of required ULP interactions is in itself + desirable, it is critical for high-speed connections. The burst + packet rate for a high-speed interface could easily exceed the host + system's ability to switch ULP contexts. + + Content access applications are important examples of applications + that require high bandwidth and can transfer a significant amount of + content between required ULP interactions. These applications + include file access protocols (NAS), storage access (SAN), database + access, and other application-specific forms of content access such + as HTTP, XML, and email. + +4. Tagged Messages + + This section covers the major benefits from the use of Tagged + Messages. + + A more critical advantage of DDP is the ability of the Data Source to + use Tagged Buffers. Tagging messages allows the Data Source to + choose the ordering and packetization of its payload deliveries. + With direct data placement based solely upon pre-posted receives, the + packetization and delivery of payload must be agreed by the ULP peers + in advance. + + The Upper Layer Protocol can allocate content between Untagged and/or + Tagged Messages to maximize the potential optimizations. Placing + content within an Untagged Message can deliver the content in the + + + +Bestler & Coene Informational [Page 6] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + same packet that signals completion to the receiver. This can + improve latency. It can even eliminate round trips. But it requires + making larger anonymous buffers to be available. + + Some examples of data that typically belongs in the Untagged Message + would include: + + short fixed-size control data that is inherently part of the + control message. This is especially true when the data is a + required part of the control message. + + relatively short payload that is almost always needed, especially + when its inclusion would eliminate a round-trip to fetch the data. + Examples would include the initial data on a write request and + Advertisements of Tagged Buffers. + + Tagged Messages standardize direct placement of data without per- + packet interaction with the upper layers. Even if there is an upper- + layer protocol encoding of what is being transferred, as is common + with middleware solutions, this information is not understood at the + application-independent layers. The directions on where to place the + incoming data cannot be accessed without switching to the ULP first. + DDP provides a standardized 'packing list', which can be interpreted + without requiring ULP interaction. Indeed, it is designed to be + implementable in hardware. + +4.1. Order-Independent Reception + + Tagged Messages are directed to a buffer based on an included + Steering Tag. Additionally, no notice is provided to the ULP for + each individual Tagged Message's arrival. Together these allow + Tagged Messages received out of order to be processed without + intermediate buffering or additional notifications to the ULP. + +4.2. Reduced ULP Notifications + + RDMAP offers both Tagged and Untagged Messages. No receiving-side + ULP interactions are required for Tagged Messages. By optimally + dividing traffic between Tagged and Untagged Messages, the ULP can + limit the number of events that must be dealt with at the ULP layer. + This typically reduces the number of context switches required and + improves performance. + + RDMAP further reduces required ULP interactions, consolidating + completion notifications of Tagged Messages with the completion + notification of a trailing Untagged Message. For most ULPs, this + radically reduces the number of ULP required interactions even + further. + + + +Bestler & Coene Informational [Page 7] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + While RDMAP consolidation of notices is beneficial to most + applications, it may be detrimental to some applications that benefit + from streamed delivery to enable ULP processing of received data as + promptly as possible. A ULP that uses RDMAP cannot begin processing + any portion of an exchange until it receives notification that the + entire exchange has been placed. An "exchange" here is a set of zero + or more Tagged Messages and a single terminating Untagged Message. + An application that would prefer to begin work on the received + payload as soon as possible, no matter what order it arrived in, + might prefer to work directly with the LLP. RDMAP is optimized for + applications that are more concerned when the entire exchange is + complete. + + An application that benefits from being able to begin processing of + each received packet as quickly as possible may find RDMAP interferes + with that goal. + + Such an application might be able to retain most of the benefits of + RDMAP by using the DDP layer directly. However, in addition to + taking on the responsibilities of the RDMAP layer, the application + would likely have more difficulty finding support for a DDP-only API. + Many hardware implementations may choose to tightly couple RDMAP and + DDP, and might not provide an API directly to DDP services. + + These features minimize the required interactions with the ULP. This + can be extremely beneficial for applications that use multiple + transport layer packets to accomplish what is a single ULP + interaction. + +4.3. Simplified ULP Exchanges + + The notification rules for Tagged Messages allows ULPs to create + multi-message "exchanges" consisting of zero or more Tagged Messages + that represent a single step in the ULP interaction. The receiving + ULP is notified that the Untagged Message has arrived, and implicitly + notified of any associated Tagged Messages. + + If a ULP cannot effectively use Tagged Messages, it would derive + little benefit from use of RDMAP/DDP by comparison to direct use of + SCTP. But, while Tagged Buffers are the justification for RDMAP/DDP, + Untagged Buffers are still necessary. Without Untagged Buffers, the + only method to exchange buffer Advertisements would require out-of- + band communications. Most RDMA-aware ULPs use Untagged Buffers for + requests and responses. Buffer Advertisements are typically done + within these Untagged Messages. + + More importantly, there would be no reliable method for the upper- + layer peers to synchronize. The absence of any guarantees about + + + +Bestler & Coene Informational [Page 8] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + ordering within or between Tagged Messages is fundamental to allowing + the DDP layer to optimize transfer of tagged payload. + + Therefore, no ULP can be defined entirely in terms of Tagged + Messages. Eventually, a notification that confirms delivery must be + generated from the RDMAP/DDP layer. + + Limiting use of Untagged Buffers to requests and responses by moving + all bulk data using tagged transfers can greatly simplify the amount + of prediction that the Data Sink must perform in pre-posting receive + buffers. For example, a typical RDMA-enabled interaction would + consist of the following: + + 1. Client sends transaction request to server as an Untagged + Message. + + 2. This message includes buffer Advertisements for the buffers where + the results are to be placed. + + 3. The server sends multiple Tagged Messages to the Advertised + buffers. + + 4. The server sends transaction reply as an Untagged Message to the + client. + + 5. Client receives single notification, indicating completion of the + interaction. + + With this type of exchange, the pacing and required size of Untagged + Buffers are highly predictable. The variability of response sizes is + absorbed by tagged transfers. + +4.4. Order-Independent Sending + + Use of Tagged Messages is especially applicable when the Data Sink + does not know the actual size, structure, or location of the content + it is requesting (or updating). + + For example, suppose the Data Sink ULP needs to fetch four related + pieces of data into four separate buffers. With SCTP, the Data Sink + ULP could receive four messages into four separate buffers, only + having to predict the maximum size of each. However, it would have + to dictate the order in which the Data Source supplied the separate + pieces. If the Data Source found it advantageous to fetch them in a + different order, it would have to use intermediate buffering to re- + order the pieces into the expected order even though the application + only required that all four be delivered and did not truly have an + ordering requirement. + + + +Bestler & Coene Informational [Page 9] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + Techniques, such as RAID striping and mirroring, represent this same + problem, but one step further. What appears to be a single resource + to the Data Sink is actually stored in separate locations by the Data + Source. Non RDMA protocols would either require the Data Source to + fetch the material in the desired order or force the Data Source to + use its own holding buffers to assemble an image of the destination + buffer. + + While sometimes referred to as a "buffer-to-buffer" solution, RDMA + more fundamentally enables remote buffer access. The ULP is free to + work with larger remote buffers than it has locally. This reduces + buffering requirements and the number of times the data must be + copied in an end-to-end transfer. + + There are numerous reasons why the Data Sink would not know the true + order or location of the requested data. It could be different for + each client, different records selected and/or different sort orders, + as well as RAID striping, file fragmentation, volume fragmentation, + volume mirroring, and server-side dynamic compositing of content + (such as server-side includes for HTTP). + + In all of these cases, the Data Source is free to assemble the + desired data in the Data Sink's buffer in whatever order the + component data becomes available to it. It is not constrained on + ordering. It does not have to assemble an image in its own memory + before creating it in the Data Sink's buffers. + + Note that while DDP enables use of Tagged Messages for bulk transfer, + there are some application scenarios where Untagged Messages would + still be used for bulk transfer. For example, a file server may not + expose its own memory to its clients. A client wishing to write may + Advertise a buffer upon which the server will issue RDMA Reads. + However, when performing a small write, it may be preferable to + include the data in the Untagged Message rather than incurring an + additional round trip with the RDMA Read and its response. + + Generally, the best use of an Untagged Message is to synchronize and + to deliver data that is naturally tied to the same message as the + synchronization. For initial data transfers, this has the additional + benefit of avoiding the need to Advertise specific Tagged Buffers for + indefinite time periods. Instead, anonymous buffers can be used for + initial data reception. Because anonymous buffers do not need to be + tied to specific messages in advance, this can be a major benefit. + +4.5. Untagged Messages and Tagged Buffers as ULP Credits + + The handling of end-to-end buffer credits differs considerably with + DDP than when the ULP directly uses either TCP or SCTP. + + + +Bestler & Coene Informational [Page 10] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + With both TCP and SCTP, buffer credits are based upon the receiver + granting transmit permission based on the total number of bytes. + These credits reflect system buffering resources and/or simple flow + control. They do not represent ULP resources. + + DDP defines no standard flow control, but presumes the existence of a + ULP mechanism. The presumed mechanism is that the Data Sink ULP has + issued credits to the Data Source, allowing the Data Source to send a + specific number of Untagged Messages. + + The ULP peers must ensure that the sender is aware of the maximum + size that can be sent to any specific target buffer. One method of + doing so is to use a standard size for all Untagged Buffers within a + given connection. For example, a ULP may specify an initial Untagged + Buffer size to be used immediately after session establishment, and + then optionally specify mechanisms for negotiating changes. + + Tagged Buffers are ULP resources Advertised directly from ULP to ULP. + A DDP put to a known Tagged Buffer is constrained only by transport + level flow control, not by available system buffering. + + Either Tagged or Untagged Buffers allows bypassing of system buffer + resources. Use of Tagged Buffers additionally allows the Data Source + to choose in what order to exercise the credits. + + To the extent allowed by the ULP, Tagged Buffers are also divisible + resources. The Data Sink can Advertise a single 100 KB buffer, and + then receive notifications from its peer that it had written 50 KB, + 20 KB, and 30 KB to that buffer in three successive transactions. + + ULP management of Tagged Buffer resources, independent of transport + and DDP layer credits, is an additional benefit of RDMA protocols. + Large bulk transfers cannot be blocked by limited general-purpose + buffering capacity. Applications can flow control based upon higher + level abstractions, such as number of outstanding requests, + independent of the amount of data that must be transferred. + + However, use of system buffering, as offered by direct use of the + underlying transports, can be preferable under certain circumstances. + + One example would be when the number of target ULP Buffers is + sufficiently large, and the rate at which any writes arrive is + sufficiently low, that pinning all the target ULP Buffers in memory + would be undesirable. The maximum transfer rate, and hence the + maximum amount of system buffering required, may be more stable and + predictable than the total ULP Buffer exposure. + + + + + +Bestler & Coene Informational [Page 11] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + Another example would be when the Data Sink wishes to receive a + stream of data at a predictable rate, but does not know in advance + what the size of each data packet will be. This is common from + streaming media that has been encoded with a variable bit rate. With + DDP, the Data Sink would either have to use Untagged Buffers large + enough for the largest packet, or Advertise a circular buffer. If, + for security or other reasons, the Data Sink did not want the size of + its buffer to be publicly known, using the underlying SCTP transport + directly may be preferable because of its byte-oriented credits. + +5. RDMA Read + + RDMA Reads are a further service provided by RDMAP. RDMA Reads allow + the Data Sink to fetch exactly the portion of the peer ULP Buffer + required on a "just in time" basis. This can be done without + requiring per-fetch support from the Data Source ULP. + + Storage servers may wish to limit the maximum write buffer allocated + to any single session. The storage server may be a very minimal + layer between the client and the disk storage media, or the server + may merely wish to limit the total resources that would be required + if all clients could push the entire payload they wished written at + their own convenience. + + In either case, there is little benefit in transferring data from the + Data Source far in advance of when it will be written to the + persistent storage media. RDMA Reads allow the Storage Server to + fetch the payload on a "just in time" basis. In this fashion, a + relatively small number of block-sized buffers can be used to execute + a single transaction that specified writing a large file, or a + Storage Server with numerous clients can fetch buffers from the + individual clients in the order that is most convenient to the + server. + + This same capability can be used when the desired portion of the + Advertised Buffer is not known in advance. For example, the + Advertised Buffer could contain performance statistics. The Data + Sink could request the portions of the data it required, without + requiring an interaction with the Data Source ULP. + + This is applicable for many applications that publish semi-volatile + data that does not require transactional validity checking (i.e., + authorized users have read access to the entire set of data). It is + less applicable when there are ULP consistency checks that must be + performed upon the data. Such applications would be better served by + having the client send a request, and having the server use RDMA + Writes to publish the requested data. Neither RDMAP nor DDP provide + mechanisms for bundling multiple disjoint updates into an atomic + + + +Bestler & Coene Informational [Page 12] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + operation. Therefore, use of an Advertised Buffer as a data resource + is subject to the same caveats as any randomly updated data resource, + such as flat files, that do not enforce their own consistency. + +6. LLP Comparisons + + Normally, the choice of underlying IP transport is irrelevant to the + ULP. RDMAP and DDP provides the same services over either. There + may be performance impacts of the choice, however. It is the + responsibility of the ULP to determine which IP transport is best + suited to its needs. + + SCTP provides for preservation of message boundaries. Each DDP + Segment will be delivered within a single SCTP packet. The + equivalent services are only available with TCP through the use of + the MPA (Marker PDU Alignment) adaptation layer. + +6.1. Multistreaming Implications + + SCTP also provides multi-streaming. When the same pair of hosts have + need for multiple DDP streams, this can be a major advantage. A + single SCTP association carries multiple DDP streams, consolidating + connection setup, congestion control, and acknowledgements. + + Completions are controlled by the DDP Source Sequence Number (DDP- + SSN) on a per-stream basis. Therefore, combining multiple DDP + Streams into a single SCTP association cannot result in a dropped + packet carrying data for one stream delaying completions on others. + +6.2. Out-of-Order Reception Implications + + The use of unordered Data Chunks with SCTP guarantees that the DDP + layer will be able to perform placements when IP datagrams are + received out of order. + + Placement of out-of-order DDP Segments carried over MPA/TCP is not + guaranteed, but certainly allowed. The ability of the MPA receiver + to process out-of-order DDP Segments may be impaired when alignment + of TCP segments and MPA FPDUs is lost. Using SCTP, each DDP Segment + is encoded in a single Data Chunk and never spread over multiple IP + datagrams. + +6.3. Header and Marker Overhead + + MPA and TCP headers together are smaller than the headers used by + SCTP and its adaptation layer. However, this advantage can be + reduced by the insertion of MPA markers. The difference in ULP + Payload per IP Datagram is not likely to be a significant factor. + + + +Bestler & Coene Informational [Page 13] + +RFC 5045 RDMA/DDP Applicability October 2007 + + +6.4. Middlebox Support + + Even with the MPA adaptation layer, DDP traffic carried over MPA/TCP + will appear to all network middleboxes as a normal TCP connection. + In many environments, there may be a requirement to use only TCP + connections to satisfy existing network elements and/or to facilitate + monitoring and control of connections. While SCTP is certainly just + as monitorable and controllable as TCP, there is no guarantee that + the network management infrastructure has the required support for + both. + +6.5. Processing Overhead + + A DDP stream delivered via MPA/TCP will require more processing + effort than one delivered over SCTP. However, this extra work may be + justified for many deployments where full SCTP support is unavailable + in the endpoints of the network, or where middleboxes impair the + usability of SCTP. + +6.6. Data Integrity Implications + + Both the SCTP [RFC4960] and MPA/TCP [RFC5044] adaptation provide end- + to-end CRC32c protection against data accidental corruption, or its + equivalent. + + A ULP that requires a greater degree of protection may add its own. + However, DDP and RDMAP headers will only be guaranteed to have the + equivalent of end-to-end CRC32c protection. A ULP that requires data + integrity checking more thorough than an end-to-end CRC32c should + first invalidate all STags that reference a buffer before applying + its own integrity check. + + CRC32c only provides protection against random corruption. To + protect against unauthorized alteration or forging of data packets, + security methods must be applied. The RDMA security document + [RFC5042] specifies usage of RFC 2406 [RFC2406] for both adaptation + layers. As stated in [RFC5042], note that the IPsec requirements for + RDDP are based on the version of IPsec specified in RFC 2401 + [RFC2401] and related RFCs, as profiled by RFC 3723 [RFC3723], + despite the existence of a newer version of IPsec specified in RFC + 4301 [RFC4301] and related RFCs. + + + + + + + + + + +Bestler & Coene Informational [Page 14] + +RFC 5045 RDMA/DDP Applicability October 2007 + + +6.6.1. MPA/TCP Specifics + + It is mandatory for MPA/TCP implementations to implement CRC32c, but + it is not mandatory to use the CRC32c during an RDMA connection. The + activating or deactivating of the CRC in MPA/TCP is an administrative + configuration operation at the local and remote end. The + administration of the CRC (ON/OFF) is invisible to the ULP. + + Applications should assume that disabling CRC32c will only be used + when the end-to-end protection is at least as effective as a + transport layer CRC32c. Applications should not use additional + integrity checks based solely on the possibility that CRC32c could be + disabled without equivalent integrity checks at a lower level. + + CRC32c must not be disabled unless equivalent or better end-to-end + integrity protection is provided. + + If the CRC is active/used for one direction/end, then the use of the + CRC is mandatory in both directions/ends. + + If both ends have been configured not to use the CRC, then this is + allowed as long as an equivalent protection (comparable to or better + than CRC) from undetected errors on the connection is provided. + +6.6.2. SCTP Specifics + + SCTP provides CRC32c protection automatically. The adaptation to + SCTP provides for no option to suppress SCTP CRC32c protection. + +6.7. Non-IP Transports + + DDP is defined to operate over ubiquitous IP transports such as SCTP + and TCP. This enables a new DDP-enabled node to be added anywhere to + an IP network. No DDP-specific support from middleboxes is required. + + There are non-IP transport fabric offering RDMA capabilities. + Because these capabilities are integrated with the transport protocol + they have some technical advantages when compared to RDMA over IP. + For example, fencing of RDMA Operations can be based upon transport + level acks. Because DDP is cleanly layered over an IP transport, any + explicit RDMA layer ack must be separate from the transport layer + ack. + + There may be deployments where the benefits of RDMA/transport + integration outweigh the benefits of being on an IP network. + + + + + + +Bestler & Coene Informational [Page 15] + +RFC 5045 RDMA/DDP Applicability October 2007 + + +6.7.1. No RDMA-Layer Ack + + DDP does not provide for its own acknowledgements. The only form of + ack provided at the RDMAP layer is an RDMA Read Response. DDP and + RDMAP rely almost entirely upon other layers for flow control and + pacing. The LLP is relied upon to guarantee delivery and avoid + network congestion, and ULP-level acking is relied upon for ULP + pacing and to avoid ULP Buffer overruns. + + Previous RDMA protocols, such as InfiniBand, have been able to use + their integration with the transport layer to provide stronger + ordering guarantees. It is important that application designers that + require such guarantees provide them through ULP interaction. + + Specifically: + + There is no ability for a local interface to "fence" outbound + messages to guarantee that prior Tagged Messages have been placed + prior to sending a Tagged Message. The only guarantees available + from the other side would be an RDMA Read Response (coming from + the RDMAP layer) or a response from the ULP layer. Remember that + the normal ordering rules only guarantee when the Data Sink ULP + will be notified of Untagged Messages; it does not control when + data is placed into receive buffers. + + Re-use of Tagged Buffers must be done with extreme care. The fact + that an Untagged Message indicates that all prior Tagged Messages + have been placed does not guarantee that no later Tagged Message + has. The best strategy is to change only the state of any given + Advertised Buffers with Untagged Messages. + + As covered elsewhere in this document, flow control of Untagged + Messages is the responsibility of the ULP. + +6.8. Other IP Transports + + Both TCP and SCTP provide DDP with reliable transport with TCP- + friendly rate control. Currently, DDP is defined to work over + reliable transports and implicitly relies upon some form of rate + control. + + DDP is fully compatible with a non-reliable protocol. Out-of-order + placement is obviously not dependent on whether the other DDP + Segments ever actually arrive. + + However, RDMAP requires the LLP to provide reliable service. An + alternate completion handling protocol would be required if DDP were + to be deployed over an unreliable IP transport. + + + +Bestler & Coene Informational [Page 16] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + As noted in the prior section on Tagged Buffers as ULP credits, + neither RDMAP nor DDP provides any flow control for Tagged Messages. + If no transport layer flow control is provided, an RDMAP/DDP + application would be limited only by the link layer rate, almost + inevitably resulting in severe network congestion. + + RDMAP encourages applications to be ignorant of the underlying + transport path MTU. The ULP is only notified when all messages + ending in a single Untagged Message have completed. The ULP is not + aware of the granularity or ordering of the underlying message. This + approach assumes that the ULP is only interested in the complete set + of messages, and has no use for a subset of them. + +6.9. LLP-Independent Session Establishment + + For an RDMAP/DDP application, the transport services provided by a + pair of SCTP streams and by a TCP connection both provide the same + service (reliable delivery of DDP Segments between two connected + RDMAP/DDP endpoints). + +6.9.1. RDMA-Only Session Establishment + + It is also possible to allow for transport-neutral establishment of + RDMAP/DDP sessions between endpoints. Combined, these two features + would allow most applications to be unconcerned as to which LLP was + actually in use. + + Specifically, the procedures for DDP Stream Session establishment + discussed in section 3 of the SCTP mapping, and section 13.3 of the + MPA/TCP mapping, both allow for the exchange of ULP-specific data + ("Private Data") before enabling the exchange of DDP Segments. This + delay can allow for proper selection and/or configuration of the + endpoints based upon the exchanged data. For example, each DDP + Stream Session associated with a single client session might be + assigned to the same DDP Protection Domain. + + To be transport neutral, the applications should exchange Private + Data as part of session establishment messages to determine how the + RDMA endpoints are to be configured. One side must be the Initiator, + and the other, the Responder. + + With SCTP, a pair of SCTP streams can be used for successive sessions + while the SCTP association remains open. With MPA/TCP, each + connection can be used for, at most, one session. However, the same + source/destination pair of ports can be re-used for a subsequent TCP + connection, as allowed by TCP. + + + + + +Bestler & Coene Informational [Page 17] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + Both SCTP and MPA limit the private data size to a maximum of 512 + bytes. + + MPA/TCP requires the end of the TCP connection that initiated the + conversion to MPA mode to send the first DDP Segment. SCTP does not + have this requirement. ULPs that wish to be transport neutral should + require the initiating end to send the first message. A zero-length + RDMA Write can be used for this purpose if the ULP logic itself does + naturally support this restriction. + +6.9.2. RDMA-Conditional Session Establishment + + It is sometimes desirable for the active side of a session to connect + with the passive side before knowing whether the passive side + supports RDMA. + + This style of session establishment can be supported with either TCP + or SCTP, but not as transparently as for RDMA-only sessions. Pre- + existing non-RDMA servers are also far more likely to be using TCP + than SCTP. + + With TCP, a normal TCP connection is established. It is then used by + the ULP to determine whether or not to convert to MPA mode and use + RDMA. This will typically be integral with other session- + establishment negotiations. + + With SCTP, the establishment of an association tests whether RDMA is + supported. If not supported, the application simply requests the + association without the RDMA adaptation indication. + + One key difference is that with SCTP the determination as to whether + the peer can support RDMA is made before the transport layer + association/connection is established, while with TCP the established + connection itself is used to determine whether RDMA is supported. + +7. Local Interface Implications + + Full utilization of DDP and RDMAP capabilities requires a local + interface that explicitly requests these services. Protocols such as + Sockets Direct Protocol (SDP) can allow applications to keep their + traditional byte-stream or message-stream interface and still enjoy + many of the benefits of the optimized wire level protocols. + + + + + + + + + +Bestler & Coene Informational [Page 18] + +RFC 5045 RDMA/DDP Applicability October 2007 + + +8. Security Considerations + + RDMA security considerations are discussed in the RDMA security + document [RFC5042]. This document will only deal with the more + usage-oriented aspects, and where there are implications in the + choice of underlying transport. + +8.1. Connection/Association Setup + + Both the SCTP and TCP adaptations allow for existing procedures to be + followed for the establishment of the SCTP association or TCP + connection. Use of DDP does not impair the use of any security + measures to filter, validate, and/or log the remote end of an + association/connection. + +8.2. Tagged Buffer Exposure + + DDP only exposes ULP memory to the extent explicitly allowed by ULP + actions. These include posting of receive operations and enabling of + Steering Tags. + + Neither RDMAP nor DDP places requirements on how ULPs Advertise + Buffers. A ULP may use a single Steering Tag for multiple buffer + Advertisements. However, the ULP should be aware that enforcement on + STag usage is likely limited to the overall range that is enabled. + If the Remote Peer writes into the 'wrong' Advertised Buffer, neither + the DDP nor the RDMAP layer will be aware of this. Nor is there any + report to the ULP on how the Remote Peer specifically used Tagged + Buffers. + + Unless the ULP peers have an adequate basis for mutual trust, the + receiving ULP might be well advised to use a distinct STag for each + interaction, and to invalidate it after each use, or to require its + peer to use the RDMAP option to invalidate the STag with its + responding Untagged Message. + +8.3. Impact of Encrypted Transports + + While DDP is cleanly layered over the LLP, its maximum benefit may be + limited when the LLP Stream is secured with a streaming cypher, such + as Transport Layer Security (TLS) [RFC4346]. If the LLP must decrypt + in order, it cannot provide out-of-order DDP Segments to the DDP + layer for placement purposes. IPsec [RFC2401] tunnel mode encrypts + entire IP Datagrams. IPsec transport mode encrypts TCP Segments or + SCTP packets, as does use of Datagram TLS (DTLS) [RFC4347] over UDP + beneath TCP or SCTP. Neither IPsec nor this use of DTLS precludes + providing out-of-order DDP Segments to the DDP layer for placement. + + + + +Bestler & Coene Informational [Page 19] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + Note that end-to-end use of cryptographic integrity protection may + allow suppression of MPA CRC generation and checking under certain + circumstances. This is one example where the LLP may be judged to + have "or equivalent" protection to an end-to-end CRC32c. + +9. References + +9.1. Normative References + + [RFC2401] Kent, S. and R. Atkinson, "Security Architecture for the + Internet Protocol", RFC 2401, November 1998. + + [RFC2406] Kent, S. and R. Atkinson, "IP Encapsulating Security + Payload (ESP)", RFC 2406, November 1998. + + [RFC4960] Stewart, R., "Stream Control Transmission Protocol", + RFC 4960, September 2007. + + [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. + Garcia, "A Remote Direct Memory Access Protocol + Specification", RFC 5040, October 2007. + + [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, + "Direct Data Placement over Reliable Transports", + RFC 5041, October 2007. + + [RFC5042] Pinkerton, J. and E. Deleganes, "DDP/RDMAP Security", + RFC 5042, October 2007. + + [RFC5043] Bestler, C. and R. Stewart, "Stream Control Transmission + Protocol (SCTP) Direct Data Placement (DDP) Adaptation", + RFC 5043, October 2007. + + [RFC5044] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. + Carrier, "Marker PDU Aligned Framing for TCP + Specification", RFC 5044, October 2007. + +9.2. Informative References + + [NFSDIRECT] Talpey, T., Callaghan, B., and I. Property, "NFS Direct + Data Placement", Work in Progress, June 2007. + + [RFC3723] Aboba, B., Tseng, J., Walker, J., Rangan, V., and F. + Travostino, "Securing Block Storage Protocols over IP", + RFC 3723, April 2004. + + [RFC4301] Kent, S. and K. Seo, "Security Architecture for the + Internet Protocol", RFC 4301, December 2005. + + + +Bestler & Coene Informational [Page 20] + +RFC 5045 RDMA/DDP Applicability October 2007 + + + [RFC4346] Dierks, T. and E. Rescorla, "The Transport Layer + Security (TLS) Protocol Version 1.1", RFC 4346, + April 2006. + + [RFC4347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer + Security", RFC 4347, April 2006. + + [RFC5046] Ko, M., Chadalapaka, M., Elzur, U., Shah, H., and P. + Thaler, "Internet Small Computer System Interface + (iSCSI) Extensions for Remote Direct Memory Access + (RDMA)", RFC 5046, October 2007. + +Authors' Addresses + + Caitlin Bestler (editor) + Neterion + 20230 Stevens Creek Blvd. + Suite C + Cupertino, CA 95014 + USA + + Phone: 408-366-4639 + EMail: caitlin.bestler@neterion.com + + + Lode Coene + Nokia Siemens Networks + Atealaan 26 + Herentals 2200 + Belgium + + Phone: +32-14-252081 + EMail: lode.coene@nsn.com + + + + + + + + + + + + + + + + + + +Bestler & Coene Informational [Page 21] + +RFC 5045 RDMA/DDP Applicability October 2007 + + +Full Copyright Statement + + Copyright (C) The IETF Trust (2007). + + This document is subject to the rights, licenses and restrictions + contained in BCP 78, and except as set forth therein, the authors + retain all their rights. + + This document and the information contained herein are provided on an + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND + THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS + OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF + THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + + The IETF takes no position regarding the validity or scope of any + Intellectual Property Rights or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; nor does it represent that it has + made any independent effort to identify any such rights. Information + on the procedures with respect to rights in RFC documents can be + found in BCP 78 and BCP 79. + + Copies of IPR disclosures made to the IETF Secretariat and any + assurances of licenses to be made available, or the result of an + attempt made to obtain a general license or permission for the use of + such proprietary rights by implementers or users of this + specification can be obtained from the IETF on-line IPR repository at + http://www.ietf.org/ipr. + + The IETF invites any interested party to bring to its attention any + copyrights, patents or patent applications, or other proprietary + rights that may cover technology that may be required to implement + this standard. Please address the information to the IETF at + ietf-ipr@ietf.org. + + + + + + + + + + + + +Bestler & Coene Informational [Page 22] + |