diff options
Diffstat (limited to 'doc/rfc/rfc4296.txt')
-rw-r--r-- | doc/rfc/rfc4296.txt | 1235 |
1 files changed, 1235 insertions, 0 deletions
diff --git a/doc/rfc/rfc4296.txt b/doc/rfc/rfc4296.txt new file mode 100644 index 0000000..465b88f --- /dev/null +++ b/doc/rfc/rfc4296.txt @@ -0,0 +1,1235 @@ + + + + + + +Network Working Group S. Bailey +Request for Comments: 4296 Sandburst +Category: Informational T. Talpey + NetApp + December 2005 + + + The Architecture of Direct Data Placement (DDP) + and Remote Direct Memory Access (RDMA) on Internet Protocols + +Status of This Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2005). + +Abstract + + This document defines an abstract architecture for Direct Data + Placement (DDP) and Remote Direct Memory Access (RDMA) protocols to + run on Internet Protocol-suite transports. This architecture does + not necessarily reflect the proper way to implement such protocols, + but is, rather, a descriptive tool for defining and understanding the + protocols. DDP allows the efficient placement of data into buffers + designated by Upper Layer Protocols (e.g., RDMA). RDMA provides the + semantics to enable Remote Direct Memory Access between peers in a + way consistent with application requirements. + + + + + + + + + + + + + + + + + + + + +Bailey & Talpey Informational [Page 1] + +RFC 4296 DDP and RDMA Architecture December 2005 + + +Table of Contents + + 1. Introduction ....................................................2 + 1.1. Terminology ................................................2 + 1.2. DDP and RDMA Protocols .....................................3 + 2. Architecture ....................................................4 + 2.1. Direct Data Placement (DDP) Protocol Architecture ..........4 + 2.1.1. Transport Operations ................................6 + 2.1.2. DDP Operations ......................................7 + 2.1.3. Transport Characteristics in DDP ...................10 + 2.2. Remote Direct Memory Access (RDMA) Protocol Architecture ..12 + 2.2.1. RDMA Operations ....................................14 + 2.2.2. Transport Characteristics in RDMA ..................16 + 3. Security Considerations ........................................17 + 3.1. Security Services .........................................18 + 3.2. Error Considerations ......................................19 + 4. Acknowledgements ...............................................19 + 5. Informative References .........................................20 + +1. Introduction + + This document defines an abstract architecture for Direct Data + Placement (DDP) and Remote Direct Memory Access (RDMA) protocols to + run on Internet Protocol-suite transports. This architecture does + not necessarily reflect the proper way to implement such protocols, + but is, rather, a descriptive tool for defining and understanding the + protocols. This document uses C language notation as a shorthand to + describe the architectural elements of DDP and RDMA protocols. The + choice of C notation is not intended to describe concrete protocols + or programming interfaces. + + The first part of the document describes the architecture of DDP + protocols, including what assumptions are made about the transports + on which DDP is built. The second part describes the architecture of + RDMA protocols layered on top of DDP. + +1.1. Terminology + + Before introducing the protocols, certain definitions will be useful + to guide discussion: + + o Placement - writing to a data buffer. + + o Operation - a protocol message, or sequence of messages, which + provide an architectural semantic, such as reading or writing of + a data buffer. + + + + + +Bailey & Talpey Informational [Page 2] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + o Delivery - informing any Upper Layer or application that a + particular message is available for use. Therefore, delivery + may be viewed as the "control" signal associated with a unit of + data. Note that the order of delivery is defined more strictly + than it is for placement. + + o Completion - informing any Upper Layer or application that a + particular operation has finished. A completion, for instance, + may require the delivery of several messages, or it may also + reflect that some local processing has finished. + + o Data Sink - the peer on which any placement occurs. + + o Data Source - the peer from which the placed data originates. + + o Steering Tag - a "handle" used to identify the buffer that is + the target of placement. A "tagged" message is one that + references such a handle. + + o RDMA Write - an Operation that places data from a local data + buffer to a remote data buffer specified by a Steering Tag. + + o RDMA Read - an Operation that places data to a local data buffer + specified by a Steering Tag from a remote data buffer specified + by another Steering Tag. + + o Send - an Operation that places data from a local data buffer to + a remote data buffer of the data sink's choice. Therefore, + sends are "untagged". + +1.2. DDP and RDMA Protocols + + The goal of the DDP protocol is to allow the efficient placement of + data into buffers designated by protocols layered above DDP (e.g., + RDMA). This is described in detail in [ROM]. Efficiency may be + characterized by the minimization of the number of transfers of the + data over the receiver's system buses. + + The goal of the RDMA protocol is to provide the semantics to enable + Remote Direct Memory Access between peers in a way consistent with + application requirements. The RDMA protocol provides facilities + immediately useful to existing and future networking, storage, and + other application protocols. [FCVI, IB, MYR, SDP, SRVNET, VI] + + The DDP and RDMA protocols work together to achieve their respective + goals. DDP provides facilities to safely steer payloads to specific + buffers at the Data Sink. RDMA provides facilities to Upper Layers + for identifying these buffers, controlling the transfer of data + + + +Bailey & Talpey Informational [Page 3] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + between peers' buffers, supporting authorized bidirectional transfer + between buffers, and signalling completion. Upper Layer Protocols + that do not require the features of RDMA may be layered directly on + top of DDP. + + The DDP and RDMA protocols are transport independent. The following + figure shows the relationship between RDMA, DDP, Upper Layer + Protocols, and Transport. + + +--------------------------------------------------+ + | Upper Layer Protocol | + +---------+------------+---------------------------+ + | | | RDMA | + | | +---------------------------+ + | | DDP | + | +----------------------------------------+ + | Transport | + +--------------------------------------------------+ + +2. Architecture + + The Architecture section is presented in two parts: Direct Data + Placement Protocol architecture and Remote Direct Memory Access + Protocol architecture. + +2.1. Direct Data Placement (DDP) Protocol Architecture + + The central idea of general-purpose DDP is that a data sender will + supplement the data it sends with placement information that allows + the receiver's network interface to place the data directly at its + final destination without any copying. DDP can be used to steer + received data to its final destination, without requiring layer- + specific behavior for each different layer. Data sent with such DDP + information is said to be `tagged'. + + The central components of the DDP architecture are the `buffer', + which is an object with beginning and ending addresses, and a method + (set()), which sets the value of an octet at an address. In many + cases, a buffer corresponds directly to a portion of host user + memory. However, DDP does not depend on this; a buffer could be a + disk file, or anything else that can be viewed as an addressable + collection of octets. Abstractly, a buffer provides the interface: + + typedef struct { + const address_t start; + const address_t end; + void set(address_t a, data_t v); + } ddp_buffer_t; + + + +Bailey & Talpey Informational [Page 4] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + address_t + + a reference to local memory + + data_t + + an octet data value. + + The protocol layering and in-line data flow of DDP is: + + DDP Client Protocol + (e.g., RDMA or Upper Layer Protocol) + | ^ + untagged messages | | untagged message delivery + tagged messages | | tagged message delivery + v | + DDP+---> data placement + ^ + | transport messages + v + Transport + (e.g., SCTP, DCCP, framed TCP) + ^ + | IP datagrams + v + . . . + + In addition to in-line data flow, the client protocol registers + buffers with DDP, and DDP performs buffer update (set()) operations + as a result of receiving tagged messages. + + DDP messages may be split into multiple, smaller DDP messages, each + in a separate transport message. However, if the transport is + unreliable or unordered, messages split across transport messages may + or may not provide useful behavior, in the same way as splitting + arbitrary Upper Layer messages across unreliable or unordered + transport messages may or may not provide useful behavior. In other + words, the same considerations apply to building client protocols on + different types of transports with or without the use of DDP. + + + + + + + + + + + + +Bailey & Talpey Informational [Page 5] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + A DDP message split across transport messages looks like: + + DDP message: Transport messages: + + stag=s, offset=o, message 1: + notify=y, id=i |type=ddp | + message= |stag=s | + |aabbccddee|-------. |offset=o | + ~ ... ~----. \ |notify=n | + |vvwwxxyyzz|-. \ \ |id=? | + | \ `--->|aabbccddee| + | \ ~ ... ~ + | +----->|iijjkkllmm| + | | + + | message 2: + \ | |type=ddp | + \ | |stag=s | + \ + |offset=o+n| + \ \ |notify=y | + \ \ |id=i | + \ `-->|nnooppqqrr| + \ ~ ... ~ + `---->|vvwwxxyyzz| + + Although this picture suggests that DDP information is carried in- + line with the message payload, components of the DDP information may + also be in transport-specific fields, or derived from transport- + specific control information if the transport permits. + +2.1.1. Transport Operations + + For the purposes of this architecture, the transport provides: + + void xpt_send(socket_t s, message_t m); + message_t xpt_recv(socket_t s); + msize_t xpt_max_msize(socket_t s); + + socket_t + + a transport address, including IP addresses, ports and other + transport-specific identifiers. + + message_t + + a string of octets. + + + + + + +Bailey & Talpey Informational [Page 6] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + msize_t (scalar) + + a message size. + + xpt_send(socket_t s, message_t m) + + send a transport message. + + xpt_recv(socket_t s) + + receive a transport message. + + xpt_max_msize(socket_t s) + + get the current maximum transport message size. Corresponds, + roughly, to the current path Maximum Transfer Unit (PMTU), + adjusted by underlying protocol overheads. + + Real implementations of xpt_send() and xpt_recv() typically return + error indications, but that is not relevant to this architecture. + +2.1.2. DDP Operations + + The DDP layer provides: + + void ddp_send(socket_t s, message_t m); + void ddp_send_ddp(socket_t s, message_t m, ddp_addr_t d, + ddp_notify_t n); + void ddp_post_recv(socket_t s, bdesc_t b); + ddp_ind_t ddp_recv(socket_t s); + bdesc_t ddp_register(socket_t s, ddp_buffer_t b); + void ddp_deregister(bhand_t bh); + msizes_t ddp_max_msizes(socket_t s); + + ddp_addr_t + + the buffer address portion of a tagged message: + + typedef struct { + stag_t stag; + address_t offset; + } ddp_addr_t; + + stag_t (scalar) + + a Steering Tag. A stag_t identifies the destination buffer for + tagged messages. stag_ts are generated when the buffer is + registered, communicated to the sender by some client protocol + + + +Bailey & Talpey Informational [Page 7] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + convention and inserted in DDP messages. stag_t values in this + DDP architecture are assumed to be completely opaque to the + client protocol, and implementation-dependent. However, + particular implementations, such as DDP on a multicast transport + (see below), may provide the buffer holder some control in + selecting stag_ts. + + ddp_notify_t + + the notification portion of a DDP message, used to signal + that the message represents the final fragment of a + multi-segmented DDP message: + + typedef struct { + boolean_t notify; + ddp_msg_id_t i; + } ddp_notify_t; + + ddp_msg_id_t (scalar) + + a DDP message identifier. msg_id_ts are chosen by the DDP + message receiver (buffer holder), communicated to the sender by + some client protocol convention and inserted in DDP messages. + Whether a message reception indication is requested for a DDP + message is a matter of client protocol convention. Unlike + stag_ts, the structure of msg_id_ts is opaque to DDP, and + therefore, it is completely in the hands of the client protocol. + + bdesc_t + + a description of a registered buffer: + + typedef struct { + bhand_t bh; + ddp_addr_t a; + } bdesc_t; + + `a.offset' is the starting offset of the registered buffer, + which may have no relationship to the `start' or `end' addresses + of that buffer. However, particular implementations, such as + DDP on a multicast transport (see below), may allow some client + protocol control over the starting offset. + + bhand_t + + an opaque buffer handle used to deregister a buffer. + + + + + +Bailey & Talpey Informational [Page 8] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + recv_message_t + + a description of a completed untagged receive buffer: + + typedef struct { + bdesc_t b; + length_t l; + } recv_message_t; + + ddp_ind_t + + an untagged message, a tagged message reception indication, or a + tagged message reception error: + + typedef union { + recv_message_t m; + ddp_msg_id_t i; + ddp_err_t e; + } ddp_ind_t; + + ddp_err_t + + indicates an error while receiving a tagged message, typically + `offset' out of bounds, or `stag' is not registered to the + socket. + + msizes_t + + The maximum untagged and tagged messages that fit in a single + transport message: + + typedef struct { + msize_t max_untagged; + msize_t max_tagged; + } msizes_t; + + ddp_send(socket_t s, message_t m) + + send an untagged message. + + ddp_send_ddp(socket_t s, message_t m, ddp_addr_t d, ddp_notify_t n) + + send a tagged message to remote buffer address d. + + + + + + + + +Bailey & Talpey Informational [Page 9] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + ddp_post_recv(socket_t s, bdesc_t b) + + post a registered buffer to accept a single received untagged + message. Each buffer is returned to the caller in a ddp_recv() + untagged message reception indication, in the order in which it + was posted. The same buffer may be enabled on multiple sockets; + receipt of an untagged message into the buffer from any of these + sockets unposts the buffer from all sockets. + + ddp_recv(socket_t s) + + get the next received untagged message, tagged message reception + indication, or tagged message error. + + ddp_register(socket_t s, ddp_buffer_t b) + + register a buffer for DDP on a socket. The same buffer may be + registered multiple times on the same or different sockets. The + same buffer registered on different sockets may result in a + common registration. Different buffers may also refer to + portions of the same underlying addressable object (buffer + aliasing). + + ddp_deregister(bhand_t bh) + + remove a registration from a buffer. + + ddp_max_msizes(socket_t s) + + get the current maximum untagged and tagged message sizes that + will fit in a single transport message. + +2.1.3. Transport Characteristics in DDP + + Certain characteristics of the transport on which DDP is mapped + determine the nature of the service provided to client protocols. + Fundamentally, the characteristics of the transport will not be + changed by the presence of DDP. The choice of transport is therefore + driven not by DDP, but by the requirements of the Upper Layer, and + employing the DDP service. + + Specifically, transports are: + + o reliable or unreliable, + + o ordered or unordered, + + o single source or multisource, + + + +Bailey & Talpey Informational [Page 10] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + o single destination or multidestination (multicast or anycast). + + Some transports support several combinations of these + characteristics. For example, SCTP [SCTP] is reliable, single + source, single destination (point-to-point) and supports both ordered + and unordered modes. + + DDP messages carried by transport are framed for processing by the + receiver, and may be further protected for integrity or privacy in + accordance with the transport capabilities. DDP does not provide + such functions. + + In general, transport characteristics equally affect transport and + DDP message delivery. However, there are several issues specific to + DDP messages. + + A key component of DDP is how the following operations on the + receiving side are ordered among themselves, and how they relate to + corresponding operations on the sending side: + + o set()s, + + o untagged message reception indications, and + + o tagged message reception indications. + + These relationships depend upon the characteristics of the underlying + transport in a way that is defined by the DDP protocol. For example, + if the transport is unreliable and unordered, the DDP protocol might + specify that the client protocol is subject to the consequences of + transport messages being lost or duplicated, rather than requiring + that different characteristics be presented to the client protocol. + + Buffer access must be implemented consistently across endpoint IP + addresses on transports allowing multiple IP addresses per endpoint, + for example, SCTP. In particular, the Steering Tag must be + consistently scoped and must address the same buffer across all IP + address associations belonging to the endpoint. Additionally, + operation ordering relationships across IP addresses within an + association (set(), get(), etc.) depend on the underlying transport. + If the above consistency relationships cannot be maintained by a + transport endpoint, then the endpoint is unsuitable for a DDP + connection. + + Multidestination data delivery is a transport characteristic that may + require specific consideration in a DDP protocol. As mentioned + above, the basic DDP model assumes that buffer address values + returned by ddp_register() are opaque to the client protocol, and can + + + +Bailey & Talpey Informational [Page 11] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + be implementation dependent. The most natural way to map DDP to a + multidestination transport is to require that all receivers produce + the same buffer address when registering a multidestination + destination buffer. Restriction of the DDP model to accommodate + multiple destinations involves engineering tradeoffs comparable to + those of providing non-DDP multidestination transport capability. + + A registered buffer is identified within DDP by its stag_t, which in + turn is associated with a socket. Therefore, this registration + grants a capability to the DDP peer, and the socket (using the + underlying properties of its chosen transport and possible security) + identifies the peer and authenticates the stag_t. + + The same buffer may be enabled by ddp_post_recv() on multiple + sockets. In this case any ddp_recv() untagged message reception + indication may be provided on a different socket from that on which + the buffer was posted. Such indications are not ordered among + multiple DDP sockets. + + When multiple sockets reference an untagged message reception buffer, + local interfaces are responsible for managing the mechanisms of + allocating posted buffers to received untagged messages, the handling + of received untagged messages when no buffer is available, and of + resource management among multiple sockets. Where underprovisioning + of buffers on multiple sockets is allowed, mechanisms should be + provided to manage buffer consumption on a per-socket or group of + related sockets basis. + + Architecturally, therefore, DDP is a flexible and general paradigm + that may be applied to any variety of transports. Implementations of + DDP may, however, adapt themselves to these differences in ways + appropriate to each transport. In all cases, the layering of DDP + must continue to express the transport's underlying characteristics. + +2.2. Remote Direct Memory Access (RDMA) Protocol Architecture + + Remote Direct Memory Access (RDMA) extends the capabilities of DDP + with two primary functions. + + First, it adds the ability to read from buffers registered to a + socket (RDMA Read). This allows a client protocol to perform + arbitrary, bidirectional data movement without involving the remote + client. When RDMA is implemented in hardware, arbitrary data + movement can be performed without involving the remote host CPU at + all. + + + + + + +Bailey & Talpey Informational [Page 12] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + In addition, RDMA specifies a transport-independent untagged message + service (Send) with characteristics that are both very efficient to + implement in hardware, and convenient for client protocols. + + The RDMA architecture is patterned after the traditional model for + device programming, where the client requests an operation using + Send-like actions (programmed I/O), the server performs the necessary + data transfers for the operation (DMA reads and writes), and notifies + the client of completion. The programmed I/O+DMA model efficiently + supports a high degree of concurrency and flexibility for both the + client and server, even when operations have a wide range of + intrinsic latencies. + + RDMA is layered as a client protocol on top of DDP: + + Client Protocol + | ^ + Sends | | Send reception indications + RDMA Read Requests | | RDMA Read Completion indications + RDMA Writes | | RDMA Write Completion indications + v | + RDMA + | ^ + untagged messages | | untagged message delivery + tagged messages | | tagged message delivery + v | + DDP+---> data placement + ^ + | transport messages + v + . . . + + In addition to in-line data flow, read (get()) and update (set()) + operations are performed on buffers registered with RDMA as a result + of RDMA Read Requests and RDMA Writes, respectively. + + An RDMA `buffer' extends a DDP buffer with a get() operation that + retrieves the value of the octet at address `a': + + typedef struct { + const address_t start; + const address_t end; + void set(address_t a, data_t v); + data_t get(address_t a); + } rdma_buffer_t; + + + + + + +Bailey & Talpey Informational [Page 13] + +RFC 4296 DDP and RDMA Architecture December 2005 + + +2.2.1. RDMA Operations + + The RDMA layer provides: + + void rdma_send(socket_t s, message_t m); + void rdma_write(socket_t s, message_t m, ddp_addr_t d, + rdma_notify_t n); + void rdma_read(socket_t s, ddp_addr_t s, ddp_addr_t d); + void rdma_post_recv(socket_t s, bdesc_t b); + rdma_ind_t rdma_recv(socket_t s); + bdesc_t rdma_register(socket_t s, rdma_buffer_t b, + bmode_t mode); + void rdma_deregister(bhand_t bh); + msizes_t rdma_max_msizes(socket_t s); + + Although, for clarity, these data transfer interfaces are + synchronous, rdma_read() and possibly rdma_send() (in the presence of + Send flow control) can require an arbitrary amount of time to + complete. To express the full concurrency and interleaving of RDMA + data transfer, these interfaces should also be reentrant. For + example, a client protocol may perform an rdma_send(), while an + rdma_read() operation is in progress. + + rdma_notify_t + + RDMA Write notification information, used to signal that the + message represents the final fragment of a multi-segmented RDMA + message: + + typedef struct { + boolean_t notify; + rdma_write_id_t i; + } rdma_notify_t; + + identical in function to ddp_notify_t, except that the type + rdma_write_id_t may not be equivalent to ddp_msg_id_t. + + rdma_write_id_t (scalar) + + an RDMA Write identifier. + + + + + + + + + + + +Bailey & Talpey Informational [Page 14] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + rdma_ind_t + + a Send message, or an RDMA error: + + typedef union { + recv_message_t m; + rdma_err_t e; + } rdma_ind_t; + + rdma_err_t + + an RDMA protocol error indication. RDMA errors include buffer + addressing errors corresponding to ddp_err_ts, and buffer + protection violations (e.g., RDMA Writing a buffer only + registered for reading). + + bmode_t + + buffer registration mode (permissions). Any combination of + permitting RDMA Read (BMODE_READ) and RDMA Write (BMODE_WRITE) + operations. + + rdma_send(socket_t s, message_t m) + + send a message, delivering it to the next untagged RDMA buffer + at the remote peer. + + rdma_write(socket_t s, message_t m, ddp_addr_t d, rdma_notify_t n) + + RDMA Write to remote buffer address d. + + rdma_read(socket_t s, ddp_addr_t s, length_t l, ddp_addr_t d) + + RDMA Read l octets from remote buffer address s to local buffer + address d. + + rdma_post_recv(socket_t s, bdesc_t b) + + post a registered buffer to accept a single Send message, to be + filled and returned in-order to a subsequent caller of + rdma_recv(). As with DDP, buffers may be enabled on multiple + sockets, in which case ordering guarantees are relaxed. Also as + with DDP, local interfaces must manage the mechanisms of + allocation and management of buffers posted to multiple sockets. + + + + + + + +Bailey & Talpey Informational [Page 15] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + rdma_recv(socket_t s); + + get the next received Send message, RDMA Write completion + identifier, or RDMA error. + + rdma_register(socket_t s, rdma_buffer_t b, bmode_t mode) + + register a buffer for RDMA on a socket (for read access, write + access or both). As with DDP, the same buffer may be registered + multiple times on the same or different sockets, and different + buffers may refer to portions of the same underlying addressable + object. + + rdma_deregister(bhand_t bh) + + remove a registration from a buffer. + + rdma_max_msizes(socket_t s) + + get the current maximum Send (max_untagged) and RDMA Read or + Write (max_tagged) operations that will fit in a single + transport message. The values returned by rdma_max_msizes() are + closely related to the values returned by ddp_max_msizes(), but + may not be equal. + +2.2.2. Transport Characteristics in RDMA + + As with DDP, RDMA can be used on transports with a variety of + different characteristics that manifest themselves directly in the + service provided by RDMA. Also, as with DDP, the fundamental + characteristics of the transport will not be changed by the presence + of RDMA. + + Like DDP, an RDMA protocol must specify how: + + o set()s, + + o get()s, + + o Send messages, and + + o RDMA Read completions + + are ordered among themselves and how they relate to corresponding + operations on the remote peer(s). These relationships are likely to + be a function of the underlying transport characteristics. + + + + + +Bailey & Talpey Informational [Page 16] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + There are some additional characteristics of RDMA that may translate + poorly to unreliable or multipoint transports due to attendant + complexities in managing endpoint state: + + o Send flow control + + o RDMA Read + + These difficulties can be overcome by placing restrictions on the + service provided by RDMA. However, many RDMA clients, especially + those that separate data transfer and application logic concerns, are + likely to depend upon capabilities only provided by RDMA on a point- + to-point, reliable transport. In other words, many potential Upper + Layers, which might avail themselves of RDMA services, are naturally + already biased toward these transport classes. + +3. Security Considerations + + Fundamentally, the DDP and RDMA protocols themselves should not + introduce additional vulnerabilities. They are intermediate + protocols and so should not perform or require functions such as + authorization, which are the domain of Upper Layers. However, the + DDP and RDMA protocols should allow mapping by strict Upper Layers + that are not permissive of new vulnerabilities; DDP and RDMAP + implementations should be prohibited from `cutting corners' that + create new vulnerabilities. Implementations must ensure that only + `supplied' resources (i.e., buffers) can be manipulated by DDP or + RDMAP messages. + + System integrity must be maintained in any RDMA solution. Mechanisms + must be specified to prevent RDMA or DDP operations from impairing + system integrity. For example, threats can include potential buffer + reuse or buffer overflow, and are not merely a security issue. Even + trusted peers must not be allowed to damage local integrity. Any DDP + and RDMA protocol must address the issue of giving end-systems and + applications the capabilities to offer protection from such + compromises. + + Because a Steering Tag exports access to a buffer, one critical + aspect of security is the scope of this access. It must be possible + to individually control specific attributes of the access provided by + a Steering Tag on the endpoint (socket) on which it was registered, + including remote read access, remote write access, and others that + might be identified. DDP and RDMA specifications must provide both + implementation requirements relevant to this issue, and guidelines to + assist implementors in making the appropriate design decisions. + + + + + +Bailey & Talpey Informational [Page 17] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + For example, it must not be possible for DDP to enable evasion of + buffer consistency checks at the recipient. The DDP and RDMA + specifications must allow the recipient to rely on its consistent + buffer contents by explicitly controlling peer access to buffer + regions at appropriate times. + + The use of DDP and RDMA on a transport connection may interact with + any security mechanism, and vice-versa. For example, if the security + mechanism is implemented above the transport layer, the DDP and RDMA + headers may not be protected. Therefore, such a layering may be + inappropriate, depending on requirements. + +3.1. Security Services + + The following end-to-end security services protect DDP and RDMAP + operation streams: + + o Authentication of the data source, to protect against peer + impersonation, stream hijacking, and man-in-the-middle attacks + exploiting capabilities offered by the RDMA implementation. + + Peer connections that do not pass authentication and + authorization checks must not be permitted to begin processing + in RDMA mode with an inappropriate endpoint. Once associated, + peer accesses to buffer regions must be authenticated and made + subject to authorization checks in the context of the + association and endpoint (socket) on which they are to be + performed, prior to any transfer operation or data being + accessed. The RDMA protocols must ensure that these region + protections be under strict application control. + + o Integrity, to protect against modification of the control + content and buffer content. + + While integrity is of concern to any transport, it is + important for the DDP and RDMAP protocols that the RDMA + control information carried in each operation be protected, in + order to direct the payloads appropriately. + + o Sequencing, to protect against replay attacks (a special case + of the above modifications). + + o Confidentiality, to protect the stream from eavesdropping. + + IPsec, operating to secure the connection on a packet-by-packet + basis, is a natural fit to securing RDMA placement, which operates in + conjunction with transport. Because RDMA enables an implementation + to avoid buffering, it is preferable to perform all applicable + + + +Bailey & Talpey Informational [Page 18] + +RFC 4296 DDP and RDMA Architecture December 2005 + + + security protection prior to processing of each segment by the + transport and RDMA layers. Such a layering enables the most + efficient secure RDMA implementation. + + The TLS record protocol, on the other hand, is layered on top of + reliable transports and cannot provide such security assurance until + an entire record is available, which may require the buffering and/or + assembly of several distinct messages prior to TLS processing. This + defers RDMA processing and introduces overheads that RDMA is designed + to avoid. In addition, TLS length restrictions on records themselves + impose additional buffering and processing for long operations that + must span multiple records. TLS therefore is viewed as potentially a + less natural fit for protecting the RDMA protocols. + + Any DDP and RDMAP specification must provide the means to satisfy the + above security service requirements. + + IPsec is sufficient to provide the required security services to the + DDP and RDMAP protocols, while enabling efficient implementations. + +3.2. Error Considerations + + Resource issues leading to denial-of-service attacks, overwrites and + other concurrent operations, the ordering of completions as required + by the RDMA protocol, and the granularity of transfer are all within + the required scope of any security analysis of RDMA and DDP. + + The RDMA operations require checking of what is essentially user + information, explicitly including addressing information and + operation type (read or write), and implicitly including protection + and attributes. The semantics associated with each class of error + resulting from possible failure of such checks must be clearly + defined, and the expected action to be taken by the protocols in each + case must be specified. + + In some cases, this will result in a catastrophic error on the RDMA + association; however, in others, a local or remote error may be + signalled. Certain of these errors may require consideration of + abstract local semantics. The result of the error on the RDMA + association must be carefully specified so as to provide useful + behavior, while not constraining the implementation. + +4. Acknowledgements + + The authors wish to acknowledge the valuable contributions of Caitlin + Bestler, David Black, Jeff Mogul, and Allyn Romanow. + + + + + +Bailey & Talpey Informational [Page 19] + +RFC 4296 DDP and RDMA Architecture December 2005 + + +5. Informative References + + [FCVI] ANSI Technical Committee T11, "Fibre Channel Standard + Virtual Interface Architecture Mapping", ANSI/NCITS 357- + 2001, March 2001, available from + http://www.t11.org/t11/stat.nsf/fcproj. + + [IB] InfiniBand Trade Association, "InfiniBand Architecture + Specification Volumes 1 and 2", Release 1.1, November 2002, + available from http://www.infinibandta.org/specs. + + [MYR] VMEbus International Trade Association, "Myrinet on VME + Protocol Specification", ANSI/VITA 26-1998, August 1998, + available from http://www.myri.com/open-specs. + + [ROM] Romanow, A., Mogul, J., Talpey, T., and S. Bailey, "Remote + Direct Memory Access (RDMA) over IP Problem Statement", RFC + 4297, December 2005. + + [SCTP] Stewart, R., Xie, Q., Morneault, K., Sharp, C., + Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., Zhang, + L., and V. Paxson, "Stream Control Transmission Protocol", + RFC 2960, October 2000. + + [SDP] InfiniBand Trade Association, "Sockets Direct Protocol + v1.0", Annex A of InfiniBand Architecture Specification + Volume 1, Release 1.1, November 2002, available from + http://www.infinibandta.org/specs. + + [SRVNET] R. Horst, "TNet: A reliable system area network", IEEE + Micro, pp. 37-45, February 1995. + + [VI] D. Cameron and G. Regnier, "The Virtual Interface + Architecture", ISBN 0971288704, Intel Press, April 2002, + more info at http://www.intel.com/intelpress/via/. + + + + + + + + + + + + + + + + +Bailey & Talpey Informational [Page 20] + +RFC 4296 DDP and RDMA Architecture December 2005 + + +Authors' Addresses + + Stephen Bailey + Sandburst Corporation + 600 Federal Street + Andover, MA 01810 USA + USA + + Phone: +1 978 689 1614 + EMail: steph@sandburst.com + + + Tom Talpey + Network Appliance + 1601 Trapelo Road + Waltham, MA 02451 USA + + Phone: +1 781 768 5329 + EMail: thomas.talpey@netapp.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Bailey & Talpey Informational [Page 21] + +RFC 4296 DDP and RDMA Architecture December 2005 + + +Full Copyright Statement + + Copyright (C) The Internet Society (2005). + + This document is subject to the rights, licenses and restrictions + contained in BCP 78, and except as set forth therein, the authors + retain all their rights. + + This document and the information contained herein are provided on an + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET + ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, + INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE + INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + + The IETF takes no position regarding the validity or scope of any + Intellectual Property Rights or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; nor does it represent that it has + made any independent effort to identify any such rights. Information + on the procedures with respect to rights in RFC documents can be + found in BCP 78 and BCP 79. + + Copies of IPR disclosures made to the IETF Secretariat and any + assurances of licenses to be made available, or the result of an + attempt made to obtain a general license or permission for the use of + such proprietary rights by implementers or users of this + specification can be obtained from the IETF on-line IPR repository at + http://www.ietf.org/ipr. + + The IETF invites any interested party to bring to its attention any + copyrights, patents or patent applications, or other proprietary + rights that may cover technology that may be required to implement + this standard. Please address the information to the IETF at ietf- + ipr@ietf.org. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + +Bailey & Talpey Informational [Page 22] + |