From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc4391.txt | 1179 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1179 insertions(+) create mode 100644 doc/rfc/rfc4391.txt (limited to 'doc/rfc/rfc4391.txt') diff --git a/doc/rfc/rfc4391.txt b/doc/rfc/rfc4391.txt new file mode 100644 index 0000000..13a412b --- /dev/null +++ b/doc/rfc/rfc4391.txt @@ -0,0 +1,1179 @@ + + + + + + +Network Working Group J. Chu +Request for Comments: 4391 Sun Microsystems +Category: Standards Track V. Kashyap + IBM + April 2006 + + + Transmission of IP over InfiniBand (IPoIB) + + +Status of This Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2006). + +Abstract + + This document specifies a method for encapsulating and transmitting + IPv4/IPv6 and Address Resolution Protocol (ARP) packets over + InfiniBand (IB). It describes the link-layer address to be used when + resolving the IP addresses in IP over InfiniBand (IPoIB) subnets. + The document also describes the mapping from IP multicast addresses + to InfiniBand multicast addresses. In addition, this document + defines the setup and configuration of IPoIB links. + +Table of Contents + + 1. Introduction ....................................................2 + 2. IP over UD Mode .................................................2 + 3. InfiniBand Datalink .............................................3 + 4. Multicast Mapping ...............................................3 + 4.1. Broadcast-GID Parameters ...................................5 + 5. Setting Up an IPoIB Link ........................................6 + 6. Frame Format ....................................................6 + 7. Maximum Transmission Unit .......................................8 + 8. IPv6 Stateless Autoconfiguration ................................8 + 8.1. IPv6 Link-Local Address ....................................9 + 9. Address Mapping - Unicast .......................................9 + 9.1. Link Information ...........................................9 + 9.1.1. Link-Layer Address/Hardware Address ................11 + 9.1.2. Auxiliary Link Information .........................12 + + + +Chu & Kashyap Standards Track [Page 1] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + 9.2. Address Resolution in IPv4 Subnets ........................13 + 9.3. Address Resolution in IPv6 Subnets ........................14 + 9.4. Cautionary Note on QPN Caching ............................14 + 10. Sending and Receiving IP Multicast Packets ....................14 + 11. IP Multicast Routing ..........................................16 + 12. New Types of Vulnerability in IB Multicast ....................17 + 13. Security Considerations .......................................17 + 14. IANA Considerations ...........................................18 + 15. Acknowledgements ..............................................18 + 16. References ....................................................18 + 16.1. Normative References .....................................18 + 16.2. Informative References ...................................19 + +1. Introduction + + The InfiniBand specification [IBTA] can be found at + http://www.infinibandta.org. The document [RFC4392] provides a short + overview of InfiniBand architecture (IBA) along with considerations + for specifying IP over InfiniBand networks. + + IBA defines multiple modes of transport over which IP may be + implemented. The Unreliable Datagram (UD) transport mode best + matches the needs of IP and the need for universality as described in + [RFC4392]. + + This document specifies IPoIB over IB's UD mode. The implementation + of IP subnets over IB's other transport mechanisms is out of scope of + this document. + + This document describes the necessary steps required in order to lay + out an IP network on top of an IB network. It describes all the + elements of an IPoIB link, how to configure its associated + attributes, and how to set up basic broadcast and multicast services + for it. + + It further describes IP address resolution and the encapsulation of + IP and Address Resolution Protocol (ARP) packets in InfiniBand frame. + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC 2119 [RFC2119]. + +2. IP over UD Mode + + The unreliable datagram mode of communication is supported by all IB + elements be they IB routers, Host Channel Adapters (HCAs), or Target + Channel Adapters (TCAs). In addition to being the only universal + transmission method, it supports multicasting, partitioning, and a + + + +Chu & Kashyap Standards Track [Page 2] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + 32-bit Cyclic Redundancy Check (CRC) [IBTA]. Though multicasting + support is optional in IB fabrics, IPoIB architecture requires the + participating components to support it. + + All IPoIB implementations MUST support IP over the UD transport mode + of IBA. + +3. InfiniBand Datalink + + An IB subnet is formed by a network of IB nodes interconnected either + directly or via IB switches. IB subnets may be connected using IB + routers to form a fabric made of multiple IB subnets. Nodes residing + in different IB subnets can communicate directly with one another + through IB routers at the IB network layer. Multiple IP subnets may + be overlaid over this IB network. + + An IP subnet is configured over a communication facility or medium + over which nodes can communicate at the "link" layer [IPV6]. For + example, an ethernet segment is a link formed by interconnected + switches/hubs/bridges. The segment is therefore defined by the + physical topology of the network. This is not the case with IPoIB. + IPoIB subnets are built over an abstract "link". The link is defined + by its members and common characteristics such as the P_Key, link + MTU, and the Q_Key. + + Any two ports using UD communication mode in an IB fabric can + communicate only if they are in the same partition (i.e., have the + same P_Key and the same Q_Key) [RFC4392]. The link MTU provides a + limit to the size of the payload that may be used. The packet + transmission and routing within the IB fabric are also affected by + additional parameters such as the traffic class (TClass), hop limit + (HopLimit), service level (SL), and the flow label (FlowLabel) + [RFC4392]. The determination and use of these values for IPoIB + communication are described in the following sections. + +4. Multicast Mapping + + IB identifies multicast groups by the Multicast Global Identifiers + (MGIDs), which follow the same rules as IPv6 multicast addresses. + Hence the MGIDs follow the same rules regarding the transient + addresses and scope bits albeit in the context of the IB fabric. The + resultant address therefore resembles IPv6 multicast addresses. The + documents [IBTA, RFC4392] give a detailed description of IB + multicast. + + The IPoIB multicast mapping is depicted in figure 1. The same + mapping function is used for both IPv4 and IPv6 except for the IPoIB + signature field. + + + +Chu & Kashyap Standards Track [Page 3] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + Unless explicitly stated, all addresses and fields in the protocol + headers in this document are stored in the network byte order. + + | 8 | 4 | 4 | 16 bits | 16 bits | 80 bits | + +------ -+----+----+-----------------+---------+-------------------+ + |11111111|0001|scop||< P_Key >| group ID | + +--------+----+----+-----------------+---------+-------------------+ + + Figure 1 + + Since an MGID allocated for transporting IP multicast datagrams is + considered only a transient link-layer multicast address [RFC4392], + all IB MGIDs allocated for IPoIB purpose MUST set T-flag to 1 [IBTA]. + + A special signature is embedded to identify the MGID for IPoIB use + only. For IPv4 over IB, the signature MUST be "0x401B". For IPv6 + over IB, the signature MUST be "0x601B". + + The IP multicast address is used together with a given IPoIB link + P_Key to form the MGID of the IB multicast group. For IPv6 the lower + 80-bit of the group ID is used directly in the lower 80-bit of the + MGID. For IPv4, the group ID is only 28-bit long, and is placed + directly in the lower 28 bits of the MGID. The rest of the group ID + bits in the MGID are filled with 0. + + E.g., on an IPoIB link that is fully contained within a single IB + subnet with a P_Key of 0x8000, the MGIDs for the all-router multicast + group with group ID 2 [AARCH, IGMP3] are: + + FF12:401B:8000::2, for IPv4 in compressed format, and + FF12:601B:8000::2, for IPv6 in compressed format. + + A special case exists for the IPv4 limited broadcast address + "255.255.255.255" [HOSTS]. The address SHALL be mapped to the + "broadcast-GID", which is defined as follows: + + | 8 | 4 | 4 | 16 bits | 16 bits | 48 bits | 32 bits | + +--------+----+----+----------------+---------+----------+---------+ + |11111111|0001|scop|0100000000011011|< P_Key >|00.......0|| + +--------+----+----+----------------+---------+----------+---------+ + + Figure 2 + + All MGIDs used in the IPoIB subnet MUST use the same scop bits as in + the corresponding broadcast-GID. + + + + + + +Chu & Kashyap Standards Track [Page 4] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + +4.1. Broadcast-GID Parameters + + The broadcast-GID is set up with the following attributes: + + 1. P_Key + + A "Full Membership" P_Key (high-order bit is set to 1) MUST be + used so that all members may communicate with one another. + + 2. Q_Key + + It is RECOMMENDED that a controlled Q_Key be used with the + high-order bit set. This is to prevent non-privileged + software from fabricating and sending out bogus IP datagrams. + + 3. IB MTU + + The value assigned to the broadcast-GID must not be greater + than any physical link MTU spanned by the IPoIB subnet. + + The following attributes are required in multicast transmissions and + also in unicast transmissions if an IPoIB link covers more than a + single IB subnet. + + 4. Other parameters + + The selection of TClass, FlowLabel, and HopLimit values is + implementation dependent. But it must take into account the + topology of IB subnets comprising the IPoIB link in order to + allow successful communication between any two nodes in the + same IPoIB link. + + An SL also needs to be assigned to the broadcast-GID. This SL + is used in all multicast communication in the subnet. + + The broadcast-GID's scope bits need to be set based on whether + the IPoIB link is confined within an IB subnet or the IPoIB + link spans multiple IB subnets. A default of local-subnet + scope (i.e., 0x2) is RECOMMENDED. A node might determine the + scope bits to use by interactively searching for a broadcast- + GID of ever greater scope by first starting with the local- + scope. Or, an implementation might include the scope bits as + a configuration parameter. + + + + + + + + +Chu & Kashyap Standards Track [Page 5] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + +5. Setting Up an IPoIB Link + + The broadcast-GID, as defined in the previous section, MUST be set up + for an IPoIB subnet to be formed. Every IPoIB interface MUST + "FullMember" join the IB multicast group defined by the broadcast- + GID. This multicast group will henceforth be referred to as the + broadcast group. The join operation returns the MTU, the Q_Key, and + other parameters associated with the broadcast group. The node then + associates the parameters received as a result of the join operation + with its IPoIB interface. The broadcast group also serves to provide + a link-layer broadcast service for protocols like ARP, net-directed, + subnet-directed, and all-subnets-directed broadcasts in IPv4 over IB + networks. + + The join operation is successful only if the Subnet Manager (SM) + determines that the joining node can support the MTU registered with + the broadcast group [RFC4392] ensuring support for a common link MTU. + The SM also ensures that all the nodes joining the broadcast-GID have + paths to one another and can therefore send and receive unicast + packets. It further ensures that all the nodes do indeed form a + multicast tree that allows packets sent from any member to be + replicated to every other member. Thus, the IPoIB link is formed by + the IPoIB nodes joining the broadcast group. There is no physical + demarcation of the IPoIB link other than that determined by the + broadcast group membership. + + The P_Key is a configuration parameter that must be known before the + broadcast-GID can be formed. For a node to join a partition, one of + its ports must be assigned the relevant P_Key by the SM [RFC4392]. + + The method of creation of the broadcast group and the + assignment/choice of its parameters are up to the implementation + and/or the administrator of the IPoIB subnet. The broadcast group + may be created by the first IPoIB node to be initialized, or it can + be created administratively before the IPoIB subnet is set up. It is + RECOMMENDED that the creation and deletion of the broadcast group be + under administrative control. + + InfiniBand multicast management, which includes the creation, + joining, and leaving of IB multicast groups by IB nodes, is described + in [RFC4392]. + +6. Frame Format + + All IP and ARP datagrams transported over InfiniBand are prefixed by + a 4-octet encapsulation header as illustrated below. + + + + + +Chu & Kashyap Standards Track [Page 6] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | | + | Type | Reserved | + | | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 3 + + The "Reserved" field MUST be set to zero on send and ignored on + receive unless specified differently in a future document. + + The "Type" field SHALL indicate the encapsulated protocol as per the + following table. + + +----------+-------------+ + | Type | Protocol | + |------------------------| + | 0x800 | IPv4 | + |------------------------| + | 0x806 | ARP | + |------------------------| + | 0x8035 | RARP | + |------------------------| + | 0x86DD | IPv6 | + +------------------------+ + + Table 1 + + These values are taken from the "ETHER TYPE" numbers assigned by + Internet Assigned Numbers Authority (IANA) [IANA]. Other network + protocols, identified by different values of "ETHER TYPE", may use + the encapsulation format defined herein, but such use is outside of + the scope of this document. + + |<------ IB Frame headers -------->|<- Payload ->|<- IB trailers ->| + +-------+------+---------+---------+-------------+---------+-------+ + |Local | |Base |Datagram | 4-octet | | | + |Routing| GRH* |Transport|Extended | header |Invariant|Variant| + |Header |Header|Header |Transport| + | CRC | CRC | + | | | |Header | IP/ARP | | | + +-------+------+---------+---------+-------------+---------+-------+ + + Figure 4 + + Figure 4 depicts the IB frame encapsulating an IP/ARP datagram. The + InfiniBand specification requires the use of Global Routing Header + + + +Chu & Kashyap Standards Track [Page 7] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + (GRH) [RFC4392] when multicasting or when an InfiniBand packet + traverses from one IB subnet to another through an IB router. Its + use is optional when used for unicast transmission between nodes + within an IB subnet. The IPoIB implementation MUST be able to handle + packets received with or without the use of GRH. + +7. Maximum Transmission Unit + + IB MTU: The IB components, that is, IB links, switches, Channel + Adapters (CAs), and IB routers, may support maximum payloads of + 256, 512, 1024, 2048, or 4096 octets. The maximum IB payload + supported by the IB components in any IB path is the IB MTU for + the path. + + IPoIB-Link MTU: The IPoIB-link MTU is the MTU value associated with + the broadcast group. The IPoIB-link MTU can be set to any value + up to the smallest IB MTU supported by the IB components + comprising the IPoIB link. + + In order to reduce problems with fragmentation and path-MTU + discovery, this document requires that all IPoIB implementations + support an MTU of 2044 octets, that is, a 2048-octet IPoIB-link MTU + minus the 4-octet encapsulation overhead. Larger and smaller MTUs + MAY be supported subject to other existing MTU requirements [IPV6], + but the default configuration must support an MTU of 2044 octets. + +8. IPv6 Stateless Autoconfiguration + + IB architecture associates an EUI-64 identifier termed the Globally + Unique Identifier (GUID) [RFC4392, IBTA] with each port. The Local + Identifier (LID) is unique within an IB subnet only. + + The interface identifier may be chosen from the following: + + 1) The EUI-64-compliant GUID assigned by the manufacturer. + + 2) If the IPoIB subnet is fully contained within an IB subnet, any + of the unique 16-bit LIDs of the port associated with the IPoIB + interface. + + The LID values of a port may change after a reboot/power-cycle + of the IB node. Therefore, if a persistent value is desired, + it would be prudent not to use the LID to form the interface + identifier. + + On the other hand, the LID provides an identifier that can be + used to create a more anonymous IPv6 address since the LID is + not globally unique and is subject to change over time. + + + +Chu & Kashyap Standards Track [Page 8] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + It is RECOMMENDED that the link-local address be constructed from the + port's EUI-64 identifier as given below. + + [AARCH] requires that the interface identifier be created in the + "Modified EUI-64" format when derived from an EUI-64 identifier. + [IBTA] is unclear if the GUID should use IEEE EUI-64 format or the + "Modified EUI-64" format. Therefore, when creating an interface + identifier from the GUID, an implementation MUST do the following: + + => Determine if the GUID is a modified EUI-64 identifier ("u" bit + is toggled) as defined by [AARCH] + + => If the GUID is a modified EUI-64 identifier, then the "u" bit + MUST NOT be toggled when creating the interface identifier + + => If the GUID is an unmodified EUI-64 identifier, then the "u" + bit MUST be toggled in compliance with [AARCH] + +8.1. IPv6 Link-Local Address + + The IPv6 link-local address for an IPoIB interface is formed as + described in [AARCH] using the interface identifier as described in + the previous section. + +9. Address Mapping - Unicast + + Address resolution in IPv4 subnets is accomplished through Address + Resolution Protocol (ARP) [ARP]. It is accomplished in IPv6 subnets + using the Neighbor Discovery protocol [DISC]. + +9.1. Link Information + + An InfiniBand packet over the UD mode includes multiple headers such + as the LRH (local route header), GRH (global route header), BTH (base + transport header), DETH (datagram extended transport header) as + depicted in figure 4 and specified in the InfiniBand architecture + [IBTA]. All these headers comprise the link-layer in an IPoIB link. + + The parameters needed in these IBA headers constitute the link-layer + information that needs to be determined before an IP packet may be + transmitted across the IPoIB link. + + + + + + + + + + +Chu & Kashyap Standards Track [Page 9] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + The parameters that need to be determined are as follows: + + a) LID + + The LID is always needed. A packet always includes the LRH + that is targeted at the remote node's LID, or an IB router's + LID to get to the remote node in another IB subnet. + + b) Global Identifier (GID) + + The GID is not needed when exchanging information within an IB + subnet though it may be included in any packet. It is an + absolute necessity when transmitting across the IB subnet since + the IB routers use the GID to correctly forward the packets. + The source and destination GIDs are fields included in the GRH. + + The GID, if formed using the GUID, can be used to unambiguously + identify an endpoint. + + c) Queue Pair Number (QPN) + + Every unicast UD communication is always directed to a + particular queue pair (QP) at the peer. + + d) Q_Key + + A Q_Key is associated with each Unreliable Datagram QPN. The + received packets must contain a Q_Key that matches the QP's + Q_Key to be accepted. + + e) P_Key + + A successful communication between two IB nodes using UD mode + can occur only if the two nodes have compatible P_Keys. This + is referred to as being in the same partition [IBTA]. + + f) SL + + Every IBA packet contains an SL value. A path in IBA is + defined by the three-tuple (source LID, destination LID, SL). + The SL in turns is mapped to a virtual lane (VL) at every CA, + switch that sends/forwards the packet [RFC4392]. Multiple SLs + may be used between two endpoints to provide for load + balancing. SLs may be used for providing a Quality of Service + (QoS) infrastructure, or may be used to avoid deadlocks in the + IBA fabric. + + + + + +Chu & Kashyap Standards Track [Page 10] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + Another auxiliary piece of information, not included in the IBA + headers, is the following: + + g) Path rate + + IBA defines multiple link speeds. A higher-speed transmitter + can swamp switches and the CAs. To avoid such congestion, + every source transmitting at greater than 1x speeds is required + to determine the "path rate" before the data may be transmitted + [IBTA]. + +9.1.1. Link-Layer Address/Hardware Address + + Though the list of information required for a successful transmittal + of an IPoIB packet is large, not all the information need be + determined during the IP address resolution process. + + The 20-octet IPoIB link-layer address used in the source/target + link-layer address option in IPv6 and the "hardware address" in + IPv4/ARP has the same format. + + The format is as described below: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Reserved | Queue Pair Number | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + + + + | | + + GID + + | | + + + + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 5 + + a) Reserved Flags + + These 8 bits are reserved for future use. These bits MUST be + set to zero on send and ignored on receive unless specified + differently in a future document. + + + + + + + +Chu & Kashyap Standards Track [Page 11] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + b) QPN + + Every unicast communication in IB architecture is directed to a + specific QP [IBTA]. This QP number is included in the link + description. All IP communication to the relevant IPoIB + interface MUST be directed to this QPN. In the case of IPv4 + subnets, the Address Resolution Protocol (ARP) reply packets + are also directed to the same QPN. + + The choice of the QPN for IP/ARP communication is up to the + implementation. + + c) GID + + This is one of the GIDs of the port associated with the IPoIB + interface [IBTA]. IB associates multiple GIDs with a port. It + is RECOMMENDED that the GID formed by the combination of the IB + subnet prefix and the port's "Port GUID" [IBTA] be included in + the link-layer/hardware address. + +9.1.2. Auxiliary Link Information + + The rest of the parameters are determined as follows: + + a) LID + + The method of determining the peer's LID is not defined in this + document. It is up to the implementation to use any of the + IBA-approved methods to determine the destination LID. One + such method is to use the GID determined during the address + resolution, to retrieve the associated LID from the IB routing + infrastructure or the Subnet Administrator (SA). + + It is the responsibility of the administrator to ensure that + the IB subnet(s) have unicast connectivity between the IPoIB + nodes. The GID exchanged between two endpoints in a multicast + message (ARP/ND) does not guarantee the existence of a unicast + path between the two. + + There may be multiple LIDs, and hence paths, between the + endpoints. The criteria for selection of the LIDs are beyond + the scope of this document. + + + + + + + + + +Chu & Kashyap Standards Track [Page 12] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + b) Q_Key + + The Q_Key received on joining the broadcast group MUST be used + for all IPoIB communication over the particular IPoIB link. + + c) P_Key + + The P_Key to be used in the IP subnet is not discovered but is + a configuration parameter. + + d) SL + + The method of determining the SL is not defined in this + document. The SL is determined by any of the IBA-approved + methods. + + e) Path rate + + The implementation must leverage IB methods to determine the + path rate as required. + +9.2. Address Resolution in IPv4 Subnets + + The ARP packet header is as defined in [ARP]. The hardware type is + set to 32 (decimal) as specified by IANA [IANA]. The rest of the + fields are used as per [ARP]. + + 16 bits: hardware type + 16 bits: protocol + 8 bits: length of hardware address + 8 bits: length of protocol address + 16 bits: ARP operation + + The remaining fields in the packet hold the sender/target hardware + and protocol addresses. + + [ sender hardware address ] + [ sender protocol address ] + [ target hardware address ] + [ target protocol address ] + + The hardware address included in the ARP packet will be as specified + in section 9.1.1 and depicted in figure 5. + + The length of the hardware address used in ARP packet header + therefore is 20. + + + + + +Chu & Kashyap Standards Track [Page 13] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + +9.3. Address Resolution in IPv6 Subnets + + The Source/Target Link-layer address option is used in Router + Solicit, Router advertisements, Redirect, Neighbor Solicitation, and + Neighbor Advertisement messages when such messages are transmitted on + InfiniBand networks. + + The source/target address option is specified as follows: + + Type: + Source Link-layer address 1 + Target Link-layer address 2 + + Length: 3 + + Link-layer address: + + The link-layer address is as specified in section 9.1.1 and + depicted in figure 5. + + [DISC] specifies the length of source/target option in + number of 8-octets as indicated by a length of '3' above. + Since the IPoIB link-layer address is only 20 octets long, + two octets of zero MUST be prepended to fill the total + option length of 24 octets. + +9.4. Cautionary Note on QPN Caching + + The link-layer address for IPoIB includes the QPN, which might not be + constant across reboots or even across network interface resets. + Cached QPN entries, such as in static ARP entries or in Reverse + Address Resolution Protocol (RARP) servers, will only work if the + implementation(s) using these options ensure that the QPN associated + with an interface is invariant across reboots/network resets. + + It is RECOMMENDED that implementations revalidate ARP caches + periodically due to the aforementioned QPN-induced volatility of + IPoIB link-layer addresses. + +10. Sending and Receiving IP Multicast Packets + + Multicast in InfiniBand differs in a number of ways from multicast in + ethernet. This adds some complexity to an IPoIB implementation when + supporting IP multicast over IB. + + A) An IB multicast group must be explicitly created through the SA + before it can be used. + + + + +Chu & Kashyap Standards Track [Page 14] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + This implies that in order to send a packet destined for an IP + multicast address, the IPoIB implementation must check with the + SA on the outbound link first for a "MCMemberRecord" that + matches the MGID. If one does exist, the Multicast Local + Identifier (MLID) associated with the multicast group is used + as the Destination Local Identifier (DLID) for the packet. + Otherwise, it implies no member exists on the local link. If + the scope of the IP multicast group is beyond link-local, the + packet must be sent to the on-link routers through the use of + the all-router multicast group or the broadcast group. This is + to allow local routers to forward the packet to multicast + listeners on remote networks. The all-router multicast group + is preferred over the broadcast group for better efficiency. + If the all-router multicast group does not exist, the sender + can assume that there are no routers on the local link; hence + the packet can be safely dropped. + + B) A multicast sender must join the target multicast group before + outgoing multicast messages from it can be successfully routed. + The "SendOnlyNonMember" join is different from the regular + "FullMember" join in two aspects. First, both types of joins + enable multicast packets to be routed FROM the local port, but + only the "FullMember" join causes multicast packets to be + routed TO the port. Second, the sender port of a + "SendOnlyNonMember" join will not be counted as a member of the + multicast group for purposes of group creation and deletion. + + The following code snippet demonstrates the steps in a typical + implementation when processing an egress multicast packet. + + if the egress port is already a "SendOnlyNonMember", or a + "FullMember" + => send the packet + + else if the target multicast group exists + => do "SendOnlyNonMember" join + => send the packet + + else if scope > link-local AND the all-router multicast group exists + => send the packet to all routers + + else + => drop the packet + + Implementations should cache the information about the existence of + an IB multicast group, its MLID and other attributes. This is to + avoid expensive SA calls on every outgoing multicast packet. Senders + MUST subscribe to the multicast group create and delete traps in + + + +Chu & Kashyap Standards Track [Page 15] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + order to monitor the status of specific IB multicast groups. For + example, multicast packets directed to the all-router multicast group + due to a lack of listener on the local subnet must be forwarded to + the right multicast group if the group is created later. This + happens when a listener shows up on the local subnet. + + A node joining an IP multicast group must first construct an MGID + according to the rule described in section 4 above. Once the correct + MGID is calculated, the node must call the SA of the outbound link to + attempt a "FullMember" join of the IB multicast group corresponding + to the MGID. If the IB multicast group does not already exist, one + must be created first with the IPoIB link MTU. The MGID MUST use the + same P_Key, Q_Key, SL, MTU, and HopLimit as those used in the + broadcast-GID. The rest of attributes SHOULD follow the values used + in the broadcast-GID as well. + + The join request will cause the local port to be added to the + multicast group. It also enables the SM to program IB switches and + routers with the new multicast information to ensure the correct + forwarding of multicast packets for the group. + + When a node leaves an IP multicast group, it SHOULD make a + "FullMember" leave request to the SA. This gives the SM an + opportunity to update relevant forwarding information, to delete an + IB multicast group if the local port is the last FullMember to leave, + and to free up the MLID allocated for it. The specific algorithm is + implementation-dependent and is out of the scope of this document. + + Note that for an IPoIB link that spans more than one IB subnet + connected by IB routers, an adequate multicast forwarding support at + the IB level is required for multicast packets to reach listeners on + a remote IB subnet. The specific mechanism for this is beyond the + scope of IPoIB. + +11. IP Multicast Routing + + IP multicast routing requires each interface over which the router is + operating to be configured to listen to all link-layer multicast + addresses generated by IP [IPMULT, IP6MLD]. For an Ethernet + interface, this is often achieved by turning on the promiscuous + multicast mode on the interface. + + IBA does not provide any hardware support for promiscuous multicast + mode. Fortunately, a promiscuous multicast mode can be emulated in + the software running on a router through the following steps: + + A) Obtain a list of all active IB multicast groups from the local + SA. + + + +Chu & Kashyap Standards Track [Page 16] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + B) Make a "NonMember" join request to the SA for every group that + has a signature in its MGID matching the one for either IPv4 or + IPv6. + + C) Subscribe to the IB multicast group creation events using a + wildcarded MGID so that the router can "NonMember" join all IB + multicast groups created subsequently for IPv4 or IPv6. + + The "NonMember" join has the same effect as a "FullMember" join + except that the former will not be counted as a member of the + multicast group for purposes of group creation or deletion. That is, + when the last "FullMember" leaves a multicast group, the group can be + safely deleted by the SA without concerning any "NonMember" routers. + +12. New Types of Vulnerability in IB Multicast + + Many IB multicast functions are subject to failures due to a number + of possible resource constraints. These include the creation of IB + multicast groups, the join calls ("SendOnlyNonMember", "FullMember", + and "NonMember"), and the attaching of a QP to a multicast group. + + In general, the occurrence of these failure conditions is highly + implementation-dependent, and is believed to be rare. Usually, a + failed multicast operation at the IB level can be propagated back to + the IP level, causing the original operation to fail and the + initiator of the operation to be notified. But some IB multicast + functions are not tied to any foreground operation, making their + failures hard to detect. For example, if an IP multicast router + attempts to "NonMember" join a newly created multicast group in the + local subnet, but the join call fails, packet forwarding for that + particular multicast group will likely fail silently, that is, + without the attention of local multicast senders. This type of + problem can add more vulnerability to the already unreliable IP + multicast operations. + + Implementations SHOULD log error messages upon any failure from an IB + multicast operation. Network administrators should be aware of this + vulnerability, and preserve enough multicast resources at the points + where IP multicast will be used heavily. For example, HCAs with + ample multicast resources should be used at any IP multicast router. + +13. Security Considerations + + This document specifies IP transmission over a multicast network. + Any network of this kind is vulnerable to a sender claiming another's + identity and forging traffic or eavesdropping. It is the + responsibility of the higher layers or applications to implement + suitable countermeasures if this is a problem. + + + +Chu & Kashyap Standards Track [Page 17] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + Successful transmission of IP packets depends on the correct setup of + the IPoIB link, creation of the broadcast-GID, creation of the QP and + its attachment to the broadcast-GID, and the correct determination of + various link parameters such as the LID, service level, and path + rate. These operations, many of which involve interactions with the + SM/SA, MUST be protected by the underlying operating system. This is + to prevent malicious, non-privileged software from hijacking + important resources and configurations. + + Controlled Q_Keys SHOULD be used in all transmissions. This is to + prevent non-privileged software from fabricating IP datagrams. + +14. IANA Considerations + + To support ARP over InfiniBand, a value for the Address Resolution + Parameter "Number Hardware Type (hrd)" is required. IANA has + assigned the number "32" to indicate InfiniBand [IANA_ARP]. + + Future uses of the reserved bits in the frame format (Figure 3) and + link-layer address (Figure 5) MUST be published as RFCs. This + document requires that the reserved bits be set to zero on send and + ignored on receive. + +15. Acknowledgements + + The authors would like to thank Bruce Beukema, David Brean, Dan + Cassiday, Aditya Dube, Yaron Haviv, Michael Krause, Thomas Narten, + Erik Nordmark, Greg Pfister, Jim Pinkerton, Renato Recio, Kevin + Reilly, Kanoj Sarcar, Satya Sharma, Madhu Talluri, and David L. + Stevens for their suggestions and many clarifications on the IBA + specification. + +16. References + +16.1. Normative References + + [AARCH] Hinden, R. and S. Deering, "Internet Protocol Version 6 + (IPv6) Addressing Architecture", RFC 3513, April 2003. + + [ARP] Plummer, David C., "Ethernet Address Resolution + Protocol: Or converting network protocol addresses to + 48.bit Ethernet address for transmission on Ethernet + hardware ", STD 37, RFC 826, November 1982. + + [DISC] Narten, T., Nordmark, E., and W. Simpson, "Neighbor + Discovery for IP Version 6 (IPv6)", RFC 2461, December + 1998. + + + + +Chu & Kashyap Standards Track [Page 18] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + + [IANA] Internet Assigned Numbers Authority, URL + http://www.iana.org + + [IANA_ARP] URL http://www.iana.org/assignments/arp-parameters + + [IBTA] InfiniBand Architecture Specification, URL + http://www.infinibandta.org/specs + + [RFC4392] Kashyap, V., "IP over InfiniBand (IPoIB) Architecture", + RFC 4392, April 2006. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + +16.2. Informative References + + [HOSTS] Braden, R., "Requirements for Internet Hosts - + Communication Layers", STD 3, RFC 1122, October 1989. + + [IGMP3] Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A. + Thyagarajan, "Internet Group Management Protocol, + Version 3", RFC 3376, October 2002. + + [IP6MLD] Deering, S., Fenner, W., and B. Haberman, "Multicast + Listener Discovery (MLD) for IPv6", RFC 2710, October + 1999. + + [IPMULT] Deering, S., "Host extensions for IP multicasting", STD + 5, RFC 1112, August 1989. + + [IPV6] Deering, S. and R. Hinden, "Internet Protocol, Version 6 + (IPv6) Specification", RFC 2460, December 1998. + + + + + + + + + + + + + + + + + + + +Chu & Kashyap Standards Track [Page 19] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + +Authors' Addresses + + H.K. Jerry Chu + 17 Network Circle, UMPK17-201 + Menlo Park, CA 94025 + USA + + Phone: +1 650 786 5146 + EMail: jerry.chu@sun.com + + + Vivek Kashyap + 15350, SW Koll Parkway + Beaverton, OR 97006 + USA + + Phone: +1 503 578 3422 + EMail: vivk@us.ibm.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Chu & Kashyap Standards Track [Page 20] + +RFC 4391 IP over InfiniBand (IPoIB) April 2006 + + +Full Copyright Statement + + Copyright (C) The Internet Society (2006). + + This document is subject to the rights, licenses and restrictions + contained in BCP 78, and except as set forth therein, the authors + retain all their rights. + + This document and the information contained herein are provided on an + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET + ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, + INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE + INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + + The IETF takes no position regarding the validity or scope of any + Intellectual Property Rights or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; nor does it represent that it has + made any independent effort to identify any such rights. Information + on the procedures with respect to rights in RFC documents can be + found in BCP 78 and BCP 79. + + Copies of IPR disclosures made to the IETF Secretariat and any + assurances of licenses to be made available, or the result of an + attempt made to obtain a general license or permission for the use of + such proprietary rights by implementers or users of this + specification can be obtained from the IETF on-line IPR repository at + http://www.ietf.org/ipr. + + The IETF invites any interested party to bring to its attention any + copyrights, patents or patent applications, or other proprietary + rights that may cover technology that may be required to implement + this standard. Please address the information to the IETF at + ietf-ipr@ietf.org. + +Acknowledgement + + Funding for the RFC Editor function is provided by the IETF + Administrative Support Activity (IASA). + + + + + + + +Chu & Kashyap Standards Track [Page 21] + -- cgit v1.2.3