summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc4296.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc4296.txt')
-rw-r--r--doc/rfc/rfc4296.txt1235
1 files changed, 1235 insertions, 0 deletions
diff --git a/doc/rfc/rfc4296.txt b/doc/rfc/rfc4296.txt
new file mode 100644
index 0000000..465b88f
--- /dev/null
+++ b/doc/rfc/rfc4296.txt
@@ -0,0 +1,1235 @@
+
+
+
+
+
+
+Network Working Group S. Bailey
+Request for Comments: 4296 Sandburst
+Category: Informational T. Talpey
+ NetApp
+ December 2005
+
+
+ The Architecture of Direct Data Placement (DDP)
+ and Remote Direct Memory Access (RDMA) on Internet Protocols
+
+Status of This Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2005).
+
+Abstract
+
+ This document defines an abstract architecture for Direct Data
+ Placement (DDP) and Remote Direct Memory Access (RDMA) protocols to
+ run on Internet Protocol-suite transports. This architecture does
+ not necessarily reflect the proper way to implement such protocols,
+ but is, rather, a descriptive tool for defining and understanding the
+ protocols. DDP allows the efficient placement of data into buffers
+ designated by Upper Layer Protocols (e.g., RDMA). RDMA provides the
+ semantics to enable Remote Direct Memory Access between peers in a
+ way consistent with application requirements.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Bailey & Talpey Informational [Page 1]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+Table of Contents
+
+ 1. Introduction ....................................................2
+ 1.1. Terminology ................................................2
+ 1.2. DDP and RDMA Protocols .....................................3
+ 2. Architecture ....................................................4
+ 2.1. Direct Data Placement (DDP) Protocol Architecture ..........4
+ 2.1.1. Transport Operations ................................6
+ 2.1.2. DDP Operations ......................................7
+ 2.1.3. Transport Characteristics in DDP ...................10
+ 2.2. Remote Direct Memory Access (RDMA) Protocol Architecture ..12
+ 2.2.1. RDMA Operations ....................................14
+ 2.2.2. Transport Characteristics in RDMA ..................16
+ 3. Security Considerations ........................................17
+ 3.1. Security Services .........................................18
+ 3.2. Error Considerations ......................................19
+ 4. Acknowledgements ...............................................19
+ 5. Informative References .........................................20
+
+1. Introduction
+
+ This document defines an abstract architecture for Direct Data
+ Placement (DDP) and Remote Direct Memory Access (RDMA) protocols to
+ run on Internet Protocol-suite transports. This architecture does
+ not necessarily reflect the proper way to implement such protocols,
+ but is, rather, a descriptive tool for defining and understanding the
+ protocols. This document uses C language notation as a shorthand to
+ describe the architectural elements of DDP and RDMA protocols. The
+ choice of C notation is not intended to describe concrete protocols
+ or programming interfaces.
+
+ The first part of the document describes the architecture of DDP
+ protocols, including what assumptions are made about the transports
+ on which DDP is built. The second part describes the architecture of
+ RDMA protocols layered on top of DDP.
+
+1.1. Terminology
+
+ Before introducing the protocols, certain definitions will be useful
+ to guide discussion:
+
+ o Placement - writing to a data buffer.
+
+ o Operation - a protocol message, or sequence of messages, which
+ provide an architectural semantic, such as reading or writing of
+ a data buffer.
+
+
+
+
+
+Bailey & Talpey Informational [Page 2]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ o Delivery - informing any Upper Layer or application that a
+ particular message is available for use. Therefore, delivery
+ may be viewed as the "control" signal associated with a unit of
+ data. Note that the order of delivery is defined more strictly
+ than it is for placement.
+
+ o Completion - informing any Upper Layer or application that a
+ particular operation has finished. A completion, for instance,
+ may require the delivery of several messages, or it may also
+ reflect that some local processing has finished.
+
+ o Data Sink - the peer on which any placement occurs.
+
+ o Data Source - the peer from which the placed data originates.
+
+ o Steering Tag - a "handle" used to identify the buffer that is
+ the target of placement. A "tagged" message is one that
+ references such a handle.
+
+ o RDMA Write - an Operation that places data from a local data
+ buffer to a remote data buffer specified by a Steering Tag.
+
+ o RDMA Read - an Operation that places data to a local data buffer
+ specified by a Steering Tag from a remote data buffer specified
+ by another Steering Tag.
+
+ o Send - an Operation that places data from a local data buffer to
+ a remote data buffer of the data sink's choice. Therefore,
+ sends are "untagged".
+
+1.2. DDP and RDMA Protocols
+
+ The goal of the DDP protocol is to allow the efficient placement of
+ data into buffers designated by protocols layered above DDP (e.g.,
+ RDMA). This is described in detail in [ROM]. Efficiency may be
+ characterized by the minimization of the number of transfers of the
+ data over the receiver's system buses.
+
+ The goal of the RDMA protocol is to provide the semantics to enable
+ Remote Direct Memory Access between peers in a way consistent with
+ application requirements. The RDMA protocol provides facilities
+ immediately useful to existing and future networking, storage, and
+ other application protocols. [FCVI, IB, MYR, SDP, SRVNET, VI]
+
+ The DDP and RDMA protocols work together to achieve their respective
+ goals. DDP provides facilities to safely steer payloads to specific
+ buffers at the Data Sink. RDMA provides facilities to Upper Layers
+ for identifying these buffers, controlling the transfer of data
+
+
+
+Bailey & Talpey Informational [Page 3]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ between peers' buffers, supporting authorized bidirectional transfer
+ between buffers, and signalling completion. Upper Layer Protocols
+ that do not require the features of RDMA may be layered directly on
+ top of DDP.
+
+ The DDP and RDMA protocols are transport independent. The following
+ figure shows the relationship between RDMA, DDP, Upper Layer
+ Protocols, and Transport.
+
+ +--------------------------------------------------+
+ | Upper Layer Protocol |
+ +---------+------------+---------------------------+
+ | | | RDMA |
+ | | +---------------------------+
+ | | DDP |
+ | +----------------------------------------+
+ | Transport |
+ +--------------------------------------------------+
+
+2. Architecture
+
+ The Architecture section is presented in two parts: Direct Data
+ Placement Protocol architecture and Remote Direct Memory Access
+ Protocol architecture.
+
+2.1. Direct Data Placement (DDP) Protocol Architecture
+
+ The central idea of general-purpose DDP is that a data sender will
+ supplement the data it sends with placement information that allows
+ the receiver's network interface to place the data directly at its
+ final destination without any copying. DDP can be used to steer
+ received data to its final destination, without requiring layer-
+ specific behavior for each different layer. Data sent with such DDP
+ information is said to be `tagged'.
+
+ The central components of the DDP architecture are the `buffer',
+ which is an object with beginning and ending addresses, and a method
+ (set()), which sets the value of an octet at an address. In many
+ cases, a buffer corresponds directly to a portion of host user
+ memory. However, DDP does not depend on this; a buffer could be a
+ disk file, or anything else that can be viewed as an addressable
+ collection of octets. Abstractly, a buffer provides the interface:
+
+ typedef struct {
+ const address_t start;
+ const address_t end;
+ void set(address_t a, data_t v);
+ } ddp_buffer_t;
+
+
+
+Bailey & Talpey Informational [Page 4]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ address_t
+
+ a reference to local memory
+
+ data_t
+
+ an octet data value.
+
+ The protocol layering and in-line data flow of DDP is:
+
+ DDP Client Protocol
+ (e.g., RDMA or Upper Layer Protocol)
+ | ^
+ untagged messages | | untagged message delivery
+ tagged messages | | tagged message delivery
+ v |
+ DDP+---> data placement
+ ^
+ | transport messages
+ v
+ Transport
+ (e.g., SCTP, DCCP, framed TCP)
+ ^
+ | IP datagrams
+ v
+ . . .
+
+ In addition to in-line data flow, the client protocol registers
+ buffers with DDP, and DDP performs buffer update (set()) operations
+ as a result of receiving tagged messages.
+
+ DDP messages may be split into multiple, smaller DDP messages, each
+ in a separate transport message. However, if the transport is
+ unreliable or unordered, messages split across transport messages may
+ or may not provide useful behavior, in the same way as splitting
+ arbitrary Upper Layer messages across unreliable or unordered
+ transport messages may or may not provide useful behavior. In other
+ words, the same considerations apply to building client protocols on
+ different types of transports with or without the use of DDP.
+
+
+
+
+
+
+
+
+
+
+
+
+Bailey & Talpey Informational [Page 5]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ A DDP message split across transport messages looks like:
+
+ DDP message: Transport messages:
+
+ stag=s, offset=o, message 1:
+ notify=y, id=i |type=ddp |
+ message= |stag=s |
+ |aabbccddee|-------. |offset=o |
+ ~ ... ~----. \ |notify=n |
+ |vvwwxxyyzz|-. \ \ |id=? |
+ | \ `--->|aabbccddee|
+ | \ ~ ... ~
+ | +----->|iijjkkllmm|
+ | |
+ + | message 2:
+ \ | |type=ddp |
+ \ | |stag=s |
+ \ + |offset=o+n|
+ \ \ |notify=y |
+ \ \ |id=i |
+ \ `-->|nnooppqqrr|
+ \ ~ ... ~
+ `---->|vvwwxxyyzz|
+
+ Although this picture suggests that DDP information is carried in-
+ line with the message payload, components of the DDP information may
+ also be in transport-specific fields, or derived from transport-
+ specific control information if the transport permits.
+
+2.1.1. Transport Operations
+
+ For the purposes of this architecture, the transport provides:
+
+ void xpt_send(socket_t s, message_t m);
+ message_t xpt_recv(socket_t s);
+ msize_t xpt_max_msize(socket_t s);
+
+ socket_t
+
+ a transport address, including IP addresses, ports and other
+ transport-specific identifiers.
+
+ message_t
+
+ a string of octets.
+
+
+
+
+
+
+Bailey & Talpey Informational [Page 6]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ msize_t (scalar)
+
+ a message size.
+
+ xpt_send(socket_t s, message_t m)
+
+ send a transport message.
+
+ xpt_recv(socket_t s)
+
+ receive a transport message.
+
+ xpt_max_msize(socket_t s)
+
+ get the current maximum transport message size. Corresponds,
+ roughly, to the current path Maximum Transfer Unit (PMTU),
+ adjusted by underlying protocol overheads.
+
+ Real implementations of xpt_send() and xpt_recv() typically return
+ error indications, but that is not relevant to this architecture.
+
+2.1.2. DDP Operations
+
+ The DDP layer provides:
+
+ void ddp_send(socket_t s, message_t m);
+ void ddp_send_ddp(socket_t s, message_t m, ddp_addr_t d,
+ ddp_notify_t n);
+ void ddp_post_recv(socket_t s, bdesc_t b);
+ ddp_ind_t ddp_recv(socket_t s);
+ bdesc_t ddp_register(socket_t s, ddp_buffer_t b);
+ void ddp_deregister(bhand_t bh);
+ msizes_t ddp_max_msizes(socket_t s);
+
+ ddp_addr_t
+
+ the buffer address portion of a tagged message:
+
+ typedef struct {
+ stag_t stag;
+ address_t offset;
+ } ddp_addr_t;
+
+ stag_t (scalar)
+
+ a Steering Tag. A stag_t identifies the destination buffer for
+ tagged messages. stag_ts are generated when the buffer is
+ registered, communicated to the sender by some client protocol
+
+
+
+Bailey & Talpey Informational [Page 7]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ convention and inserted in DDP messages. stag_t values in this
+ DDP architecture are assumed to be completely opaque to the
+ client protocol, and implementation-dependent. However,
+ particular implementations, such as DDP on a multicast transport
+ (see below), may provide the buffer holder some control in
+ selecting stag_ts.
+
+ ddp_notify_t
+
+ the notification portion of a DDP message, used to signal
+ that the message represents the final fragment of a
+ multi-segmented DDP message:
+
+ typedef struct {
+ boolean_t notify;
+ ddp_msg_id_t i;
+ } ddp_notify_t;
+
+ ddp_msg_id_t (scalar)
+
+ a DDP message identifier. msg_id_ts are chosen by the DDP
+ message receiver (buffer holder), communicated to the sender by
+ some client protocol convention and inserted in DDP messages.
+ Whether a message reception indication is requested for a DDP
+ message is a matter of client protocol convention. Unlike
+ stag_ts, the structure of msg_id_ts is opaque to DDP, and
+ therefore, it is completely in the hands of the client protocol.
+
+ bdesc_t
+
+ a description of a registered buffer:
+
+ typedef struct {
+ bhand_t bh;
+ ddp_addr_t a;
+ } bdesc_t;
+
+ `a.offset' is the starting offset of the registered buffer,
+ which may have no relationship to the `start' or `end' addresses
+ of that buffer. However, particular implementations, such as
+ DDP on a multicast transport (see below), may allow some client
+ protocol control over the starting offset.
+
+ bhand_t
+
+ an opaque buffer handle used to deregister a buffer.
+
+
+
+
+
+Bailey & Talpey Informational [Page 8]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ recv_message_t
+
+ a description of a completed untagged receive buffer:
+
+ typedef struct {
+ bdesc_t b;
+ length_t l;
+ } recv_message_t;
+
+ ddp_ind_t
+
+ an untagged message, a tagged message reception indication, or a
+ tagged message reception error:
+
+ typedef union {
+ recv_message_t m;
+ ddp_msg_id_t i;
+ ddp_err_t e;
+ } ddp_ind_t;
+
+ ddp_err_t
+
+ indicates an error while receiving a tagged message, typically
+ `offset' out of bounds, or `stag' is not registered to the
+ socket.
+
+ msizes_t
+
+ The maximum untagged and tagged messages that fit in a single
+ transport message:
+
+ typedef struct {
+ msize_t max_untagged;
+ msize_t max_tagged;
+ } msizes_t;
+
+ ddp_send(socket_t s, message_t m)
+
+ send an untagged message.
+
+ ddp_send_ddp(socket_t s, message_t m, ddp_addr_t d, ddp_notify_t n)
+
+ send a tagged message to remote buffer address d.
+
+
+
+
+
+
+
+
+Bailey & Talpey Informational [Page 9]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ ddp_post_recv(socket_t s, bdesc_t b)
+
+ post a registered buffer to accept a single received untagged
+ message. Each buffer is returned to the caller in a ddp_recv()
+ untagged message reception indication, in the order in which it
+ was posted. The same buffer may be enabled on multiple sockets;
+ receipt of an untagged message into the buffer from any of these
+ sockets unposts the buffer from all sockets.
+
+ ddp_recv(socket_t s)
+
+ get the next received untagged message, tagged message reception
+ indication, or tagged message error.
+
+ ddp_register(socket_t s, ddp_buffer_t b)
+
+ register a buffer for DDP on a socket. The same buffer may be
+ registered multiple times on the same or different sockets. The
+ same buffer registered on different sockets may result in a
+ common registration. Different buffers may also refer to
+ portions of the same underlying addressable object (buffer
+ aliasing).
+
+ ddp_deregister(bhand_t bh)
+
+ remove a registration from a buffer.
+
+ ddp_max_msizes(socket_t s)
+
+ get the current maximum untagged and tagged message sizes that
+ will fit in a single transport message.
+
+2.1.3. Transport Characteristics in DDP
+
+ Certain characteristics of the transport on which DDP is mapped
+ determine the nature of the service provided to client protocols.
+ Fundamentally, the characteristics of the transport will not be
+ changed by the presence of DDP. The choice of transport is therefore
+ driven not by DDP, but by the requirements of the Upper Layer, and
+ employing the DDP service.
+
+ Specifically, transports are:
+
+ o reliable or unreliable,
+
+ o ordered or unordered,
+
+ o single source or multisource,
+
+
+
+Bailey & Talpey Informational [Page 10]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ o single destination or multidestination (multicast or anycast).
+
+ Some transports support several combinations of these
+ characteristics. For example, SCTP [SCTP] is reliable, single
+ source, single destination (point-to-point) and supports both ordered
+ and unordered modes.
+
+ DDP messages carried by transport are framed for processing by the
+ receiver, and may be further protected for integrity or privacy in
+ accordance with the transport capabilities. DDP does not provide
+ such functions.
+
+ In general, transport characteristics equally affect transport and
+ DDP message delivery. However, there are several issues specific to
+ DDP messages.
+
+ A key component of DDP is how the following operations on the
+ receiving side are ordered among themselves, and how they relate to
+ corresponding operations on the sending side:
+
+ o set()s,
+
+ o untagged message reception indications, and
+
+ o tagged message reception indications.
+
+ These relationships depend upon the characteristics of the underlying
+ transport in a way that is defined by the DDP protocol. For example,
+ if the transport is unreliable and unordered, the DDP protocol might
+ specify that the client protocol is subject to the consequences of
+ transport messages being lost or duplicated, rather than requiring
+ that different characteristics be presented to the client protocol.
+
+ Buffer access must be implemented consistently across endpoint IP
+ addresses on transports allowing multiple IP addresses per endpoint,
+ for example, SCTP. In particular, the Steering Tag must be
+ consistently scoped and must address the same buffer across all IP
+ address associations belonging to the endpoint. Additionally,
+ operation ordering relationships across IP addresses within an
+ association (set(), get(), etc.) depend on the underlying transport.
+ If the above consistency relationships cannot be maintained by a
+ transport endpoint, then the endpoint is unsuitable for a DDP
+ connection.
+
+ Multidestination data delivery is a transport characteristic that may
+ require specific consideration in a DDP protocol. As mentioned
+ above, the basic DDP model assumes that buffer address values
+ returned by ddp_register() are opaque to the client protocol, and can
+
+
+
+Bailey & Talpey Informational [Page 11]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ be implementation dependent. The most natural way to map DDP to a
+ multidestination transport is to require that all receivers produce
+ the same buffer address when registering a multidestination
+ destination buffer. Restriction of the DDP model to accommodate
+ multiple destinations involves engineering tradeoffs comparable to
+ those of providing non-DDP multidestination transport capability.
+
+ A registered buffer is identified within DDP by its stag_t, which in
+ turn is associated with a socket. Therefore, this registration
+ grants a capability to the DDP peer, and the socket (using the
+ underlying properties of its chosen transport and possible security)
+ identifies the peer and authenticates the stag_t.
+
+ The same buffer may be enabled by ddp_post_recv() on multiple
+ sockets. In this case any ddp_recv() untagged message reception
+ indication may be provided on a different socket from that on which
+ the buffer was posted. Such indications are not ordered among
+ multiple DDP sockets.
+
+ When multiple sockets reference an untagged message reception buffer,
+ local interfaces are responsible for managing the mechanisms of
+ allocating posted buffers to received untagged messages, the handling
+ of received untagged messages when no buffer is available, and of
+ resource management among multiple sockets. Where underprovisioning
+ of buffers on multiple sockets is allowed, mechanisms should be
+ provided to manage buffer consumption on a per-socket or group of
+ related sockets basis.
+
+ Architecturally, therefore, DDP is a flexible and general paradigm
+ that may be applied to any variety of transports. Implementations of
+ DDP may, however, adapt themselves to these differences in ways
+ appropriate to each transport. In all cases, the layering of DDP
+ must continue to express the transport's underlying characteristics.
+
+2.2. Remote Direct Memory Access (RDMA) Protocol Architecture
+
+ Remote Direct Memory Access (RDMA) extends the capabilities of DDP
+ with two primary functions.
+
+ First, it adds the ability to read from buffers registered to a
+ socket (RDMA Read). This allows a client protocol to perform
+ arbitrary, bidirectional data movement without involving the remote
+ client. When RDMA is implemented in hardware, arbitrary data
+ movement can be performed without involving the remote host CPU at
+ all.
+
+
+
+
+
+
+Bailey & Talpey Informational [Page 12]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ In addition, RDMA specifies a transport-independent untagged message
+ service (Send) with characteristics that are both very efficient to
+ implement in hardware, and convenient for client protocols.
+
+ The RDMA architecture is patterned after the traditional model for
+ device programming, where the client requests an operation using
+ Send-like actions (programmed I/O), the server performs the necessary
+ data transfers for the operation (DMA reads and writes), and notifies
+ the client of completion. The programmed I/O+DMA model efficiently
+ supports a high degree of concurrency and flexibility for both the
+ client and server, even when operations have a wide range of
+ intrinsic latencies.
+
+ RDMA is layered as a client protocol on top of DDP:
+
+ Client Protocol
+ | ^
+ Sends | | Send reception indications
+ RDMA Read Requests | | RDMA Read Completion indications
+ RDMA Writes | | RDMA Write Completion indications
+ v |
+ RDMA
+ | ^
+ untagged messages | | untagged message delivery
+ tagged messages | | tagged message delivery
+ v |
+ DDP+---> data placement
+ ^
+ | transport messages
+ v
+ . . .
+
+ In addition to in-line data flow, read (get()) and update (set())
+ operations are performed on buffers registered with RDMA as a result
+ of RDMA Read Requests and RDMA Writes, respectively.
+
+ An RDMA `buffer' extends a DDP buffer with a get() operation that
+ retrieves the value of the octet at address `a':
+
+ typedef struct {
+ const address_t start;
+ const address_t end;
+ void set(address_t a, data_t v);
+ data_t get(address_t a);
+ } rdma_buffer_t;
+
+
+
+
+
+
+Bailey & Talpey Informational [Page 13]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+2.2.1. RDMA Operations
+
+ The RDMA layer provides:
+
+ void rdma_send(socket_t s, message_t m);
+ void rdma_write(socket_t s, message_t m, ddp_addr_t d,
+ rdma_notify_t n);
+ void rdma_read(socket_t s, ddp_addr_t s, ddp_addr_t d);
+ void rdma_post_recv(socket_t s, bdesc_t b);
+ rdma_ind_t rdma_recv(socket_t s);
+ bdesc_t rdma_register(socket_t s, rdma_buffer_t b,
+ bmode_t mode);
+ void rdma_deregister(bhand_t bh);
+ msizes_t rdma_max_msizes(socket_t s);
+
+ Although, for clarity, these data transfer interfaces are
+ synchronous, rdma_read() and possibly rdma_send() (in the presence of
+ Send flow control) can require an arbitrary amount of time to
+ complete. To express the full concurrency and interleaving of RDMA
+ data transfer, these interfaces should also be reentrant. For
+ example, a client protocol may perform an rdma_send(), while an
+ rdma_read() operation is in progress.
+
+ rdma_notify_t
+
+ RDMA Write notification information, used to signal that the
+ message represents the final fragment of a multi-segmented RDMA
+ message:
+
+ typedef struct {
+ boolean_t notify;
+ rdma_write_id_t i;
+ } rdma_notify_t;
+
+ identical in function to ddp_notify_t, except that the type
+ rdma_write_id_t may not be equivalent to ddp_msg_id_t.
+
+ rdma_write_id_t (scalar)
+
+ an RDMA Write identifier.
+
+
+
+
+
+
+
+
+
+
+
+Bailey & Talpey Informational [Page 14]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ rdma_ind_t
+
+ a Send message, or an RDMA error:
+
+ typedef union {
+ recv_message_t m;
+ rdma_err_t e;
+ } rdma_ind_t;
+
+ rdma_err_t
+
+ an RDMA protocol error indication. RDMA errors include buffer
+ addressing errors corresponding to ddp_err_ts, and buffer
+ protection violations (e.g., RDMA Writing a buffer only
+ registered for reading).
+
+ bmode_t
+
+ buffer registration mode (permissions). Any combination of
+ permitting RDMA Read (BMODE_READ) and RDMA Write (BMODE_WRITE)
+ operations.
+
+ rdma_send(socket_t s, message_t m)
+
+ send a message, delivering it to the next untagged RDMA buffer
+ at the remote peer.
+
+ rdma_write(socket_t s, message_t m, ddp_addr_t d, rdma_notify_t n)
+
+ RDMA Write to remote buffer address d.
+
+ rdma_read(socket_t s, ddp_addr_t s, length_t l, ddp_addr_t d)
+
+ RDMA Read l octets from remote buffer address s to local buffer
+ address d.
+
+ rdma_post_recv(socket_t s, bdesc_t b)
+
+ post a registered buffer to accept a single Send message, to be
+ filled and returned in-order to a subsequent caller of
+ rdma_recv(). As with DDP, buffers may be enabled on multiple
+ sockets, in which case ordering guarantees are relaxed. Also as
+ with DDP, local interfaces must manage the mechanisms of
+ allocation and management of buffers posted to multiple sockets.
+
+
+
+
+
+
+
+Bailey & Talpey Informational [Page 15]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ rdma_recv(socket_t s);
+
+ get the next received Send message, RDMA Write completion
+ identifier, or RDMA error.
+
+ rdma_register(socket_t s, rdma_buffer_t b, bmode_t mode)
+
+ register a buffer for RDMA on a socket (for read access, write
+ access or both). As with DDP, the same buffer may be registered
+ multiple times on the same or different sockets, and different
+ buffers may refer to portions of the same underlying addressable
+ object.
+
+ rdma_deregister(bhand_t bh)
+
+ remove a registration from a buffer.
+
+ rdma_max_msizes(socket_t s)
+
+ get the current maximum Send (max_untagged) and RDMA Read or
+ Write (max_tagged) operations that will fit in a single
+ transport message. The values returned by rdma_max_msizes() are
+ closely related to the values returned by ddp_max_msizes(), but
+ may not be equal.
+
+2.2.2. Transport Characteristics in RDMA
+
+ As with DDP, RDMA can be used on transports with a variety of
+ different characteristics that manifest themselves directly in the
+ service provided by RDMA. Also, as with DDP, the fundamental
+ characteristics of the transport will not be changed by the presence
+ of RDMA.
+
+ Like DDP, an RDMA protocol must specify how:
+
+ o set()s,
+
+ o get()s,
+
+ o Send messages, and
+
+ o RDMA Read completions
+
+ are ordered among themselves and how they relate to corresponding
+ operations on the remote peer(s). These relationships are likely to
+ be a function of the underlying transport characteristics.
+
+
+
+
+
+Bailey & Talpey Informational [Page 16]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ There are some additional characteristics of RDMA that may translate
+ poorly to unreliable or multipoint transports due to attendant
+ complexities in managing endpoint state:
+
+ o Send flow control
+
+ o RDMA Read
+
+ These difficulties can be overcome by placing restrictions on the
+ service provided by RDMA. However, many RDMA clients, especially
+ those that separate data transfer and application logic concerns, are
+ likely to depend upon capabilities only provided by RDMA on a point-
+ to-point, reliable transport. In other words, many potential Upper
+ Layers, which might avail themselves of RDMA services, are naturally
+ already biased toward these transport classes.
+
+3. Security Considerations
+
+ Fundamentally, the DDP and RDMA protocols themselves should not
+ introduce additional vulnerabilities. They are intermediate
+ protocols and so should not perform or require functions such as
+ authorization, which are the domain of Upper Layers. However, the
+ DDP and RDMA protocols should allow mapping by strict Upper Layers
+ that are not permissive of new vulnerabilities; DDP and RDMAP
+ implementations should be prohibited from `cutting corners' that
+ create new vulnerabilities. Implementations must ensure that only
+ `supplied' resources (i.e., buffers) can be manipulated by DDP or
+ RDMAP messages.
+
+ System integrity must be maintained in any RDMA solution. Mechanisms
+ must be specified to prevent RDMA or DDP operations from impairing
+ system integrity. For example, threats can include potential buffer
+ reuse or buffer overflow, and are not merely a security issue. Even
+ trusted peers must not be allowed to damage local integrity. Any DDP
+ and RDMA protocol must address the issue of giving end-systems and
+ applications the capabilities to offer protection from such
+ compromises.
+
+ Because a Steering Tag exports access to a buffer, one critical
+ aspect of security is the scope of this access. It must be possible
+ to individually control specific attributes of the access provided by
+ a Steering Tag on the endpoint (socket) on which it was registered,
+ including remote read access, remote write access, and others that
+ might be identified. DDP and RDMA specifications must provide both
+ implementation requirements relevant to this issue, and guidelines to
+ assist implementors in making the appropriate design decisions.
+
+
+
+
+
+Bailey & Talpey Informational [Page 17]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ For example, it must not be possible for DDP to enable evasion of
+ buffer consistency checks at the recipient. The DDP and RDMA
+ specifications must allow the recipient to rely on its consistent
+ buffer contents by explicitly controlling peer access to buffer
+ regions at appropriate times.
+
+ The use of DDP and RDMA on a transport connection may interact with
+ any security mechanism, and vice-versa. For example, if the security
+ mechanism is implemented above the transport layer, the DDP and RDMA
+ headers may not be protected. Therefore, such a layering may be
+ inappropriate, depending on requirements.
+
+3.1. Security Services
+
+ The following end-to-end security services protect DDP and RDMAP
+ operation streams:
+
+ o Authentication of the data source, to protect against peer
+ impersonation, stream hijacking, and man-in-the-middle attacks
+ exploiting capabilities offered by the RDMA implementation.
+
+ Peer connections that do not pass authentication and
+ authorization checks must not be permitted to begin processing
+ in RDMA mode with an inappropriate endpoint. Once associated,
+ peer accesses to buffer regions must be authenticated and made
+ subject to authorization checks in the context of the
+ association and endpoint (socket) on which they are to be
+ performed, prior to any transfer operation or data being
+ accessed. The RDMA protocols must ensure that these region
+ protections be under strict application control.
+
+ o Integrity, to protect against modification of the control
+ content and buffer content.
+
+ While integrity is of concern to any transport, it is
+ important for the DDP and RDMAP protocols that the RDMA
+ control information carried in each operation be protected, in
+ order to direct the payloads appropriately.
+
+ o Sequencing, to protect against replay attacks (a special case
+ of the above modifications).
+
+ o Confidentiality, to protect the stream from eavesdropping.
+
+ IPsec, operating to secure the connection on a packet-by-packet
+ basis, is a natural fit to securing RDMA placement, which operates in
+ conjunction with transport. Because RDMA enables an implementation
+ to avoid buffering, it is preferable to perform all applicable
+
+
+
+Bailey & Talpey Informational [Page 18]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+ security protection prior to processing of each segment by the
+ transport and RDMA layers. Such a layering enables the most
+ efficient secure RDMA implementation.
+
+ The TLS record protocol, on the other hand, is layered on top of
+ reliable transports and cannot provide such security assurance until
+ an entire record is available, which may require the buffering and/or
+ assembly of several distinct messages prior to TLS processing. This
+ defers RDMA processing and introduces overheads that RDMA is designed
+ to avoid. In addition, TLS length restrictions on records themselves
+ impose additional buffering and processing for long operations that
+ must span multiple records. TLS therefore is viewed as potentially a
+ less natural fit for protecting the RDMA protocols.
+
+ Any DDP and RDMAP specification must provide the means to satisfy the
+ above security service requirements.
+
+ IPsec is sufficient to provide the required security services to the
+ DDP and RDMAP protocols, while enabling efficient implementations.
+
+3.2. Error Considerations
+
+ Resource issues leading to denial-of-service attacks, overwrites and
+ other concurrent operations, the ordering of completions as required
+ by the RDMA protocol, and the granularity of transfer are all within
+ the required scope of any security analysis of RDMA and DDP.
+
+ The RDMA operations require checking of what is essentially user
+ information, explicitly including addressing information and
+ operation type (read or write), and implicitly including protection
+ and attributes. The semantics associated with each class of error
+ resulting from possible failure of such checks must be clearly
+ defined, and the expected action to be taken by the protocols in each
+ case must be specified.
+
+ In some cases, this will result in a catastrophic error on the RDMA
+ association; however, in others, a local or remote error may be
+ signalled. Certain of these errors may require consideration of
+ abstract local semantics. The result of the error on the RDMA
+ association must be carefully specified so as to provide useful
+ behavior, while not constraining the implementation.
+
+4. Acknowledgements
+
+ The authors wish to acknowledge the valuable contributions of Caitlin
+ Bestler, David Black, Jeff Mogul, and Allyn Romanow.
+
+
+
+
+
+Bailey & Talpey Informational [Page 19]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+5. Informative References
+
+ [FCVI] ANSI Technical Committee T11, "Fibre Channel Standard
+ Virtual Interface Architecture Mapping", ANSI/NCITS 357-
+ 2001, March 2001, available from
+ http://www.t11.org/t11/stat.nsf/fcproj.
+
+ [IB] InfiniBand Trade Association, "InfiniBand Architecture
+ Specification Volumes 1 and 2", Release 1.1, November 2002,
+ available from http://www.infinibandta.org/specs.
+
+ [MYR] VMEbus International Trade Association, "Myrinet on VME
+ Protocol Specification", ANSI/VITA 26-1998, August 1998,
+ available from http://www.myri.com/open-specs.
+
+ [ROM] Romanow, A., Mogul, J., Talpey, T., and S. Bailey, "Remote
+ Direct Memory Access (RDMA) over IP Problem Statement", RFC
+ 4297, December 2005.
+
+ [SCTP] Stewart, R., Xie, Q., Morneault, K., Sharp, C.,
+ Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., Zhang,
+ L., and V. Paxson, "Stream Control Transmission Protocol",
+ RFC 2960, October 2000.
+
+ [SDP] InfiniBand Trade Association, "Sockets Direct Protocol
+ v1.0", Annex A of InfiniBand Architecture Specification
+ Volume 1, Release 1.1, November 2002, available from
+ http://www.infinibandta.org/specs.
+
+ [SRVNET] R. Horst, "TNet: A reliable system area network", IEEE
+ Micro, pp. 37-45, February 1995.
+
+ [VI] D. Cameron and G. Regnier, "The Virtual Interface
+ Architecture", ISBN 0971288704, Intel Press, April 2002,
+ more info at http://www.intel.com/intelpress/via/.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Bailey & Talpey Informational [Page 20]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+Authors' Addresses
+
+ Stephen Bailey
+ Sandburst Corporation
+ 600 Federal Street
+ Andover, MA 01810 USA
+ USA
+
+ Phone: +1 978 689 1614
+ EMail: steph@sandburst.com
+
+
+ Tom Talpey
+ Network Appliance
+ 1601 Trapelo Road
+ Waltham, MA 02451 USA
+
+ Phone: +1 781 768 5329
+ EMail: thomas.talpey@netapp.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Bailey & Talpey Informational [Page 21]
+
+RFC 4296 DDP and RDMA Architecture December 2005
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2005).
+
+ This document is subject to the rights, licenses and restrictions
+ contained in BCP 78, and except as set forth therein, the authors
+ retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+ ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+ INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+ INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr.
+
+ The IETF invites any interested party to bring to its attention any
+ copyrights, patents or patent applications, or other proprietary
+ rights that may cover technology that may be required to implement
+ this standard. Please address the information to the IETF at ietf-
+ ipr@ietf.org.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is currently provided by the
+ Internet Society.
+
+
+
+
+
+
+
+Bailey & Talpey Informational [Page 22]
+