diff options
Diffstat (limited to 'doc/rfc/rfc9136.txt')
-rw-r--r-- | doc/rfc/rfc9136.txt | 1710 |
1 files changed, 1710 insertions, 0 deletions
diff --git a/doc/rfc/rfc9136.txt b/doc/rfc/rfc9136.txt new file mode 100644 index 0000000..6953dc3 --- /dev/null +++ b/doc/rfc/rfc9136.txt @@ -0,0 +1,1710 @@ + + + + +Internet Engineering Task Force (IETF) J. Rabadan, Ed. +Request for Comments: 9136 W. Henderickx +Category: Standards Track Nokia +ISSN: 2070-1721 J. Drake + W. Lin + Juniper + A. Sajassi + Cisco + October 2021 + + + IP Prefix Advertisement in Ethernet VPN (EVPN) + +Abstract + + The BGP MPLS-based Ethernet VPN (EVPN) (RFC 7432) mechanism provides + a flexible control plane that allows intra-subnet connectivity in an + MPLS and/or Network Virtualization Overlay (NVO) (RFC 7365) network. + In some networks, there is also a need for dynamic and efficient + inter-subnet connectivity across Tenant Systems and end devices that + can be physical or virtual and do not necessarily participate in + dynamic routing protocols. This document defines a new EVPN route + type for the advertisement of IP prefixes and explains some use-case + examples where this new route type is used. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc9136. + +Copyright Notice + + Copyright (c) 2021 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction + 1.1. Terminology + 2. Problem Statement + 2.1. Inter-Subnet Connectivity Requirements in Data Centers + 2.2. The Need for the EVPN IP Prefix Route + 3. The BGP EVPN IP Prefix Route + 3.1. IP Prefix Route Encoding + 3.2. Overlay Indexes and Recursive Lookup Resolution + 4. Overlay Index Use Cases + 4.1. TS IP Address Overlay Index Use Case + 4.2. Floating IP Overlay Index Use Case + 4.3. Bump-in-the-Wire Use Case + 4.4. IP-VRF-to-IP-VRF Model + 4.4.1. Interface-less IP-VRF-to-IP-VRF Model + 4.4.2. Interface-ful IP-VRF-to-IP-VRF with SBD IRB + 4.4.3. Interface-ful IP-VRF-to-IP-VRF with Unnumbered SBD IRB + 5. Security Considerations + 6. IANA Considerations + 7. References + 7.1. Normative References + 7.2. Informative References + Acknowledgments + Contributors + Authors' Addresses + +1. Introduction + + [RFC7365] provides a framework for Data Center (DC) Network + Virtualization over Layer 3 and specifies that the Network + Virtualization Edge (NVE) devices must provide Layer 2 and Layer 3 + virtualized network services in multi-tenant DCs. [RFC8365] + discusses the use of EVPN as the technology of choice to provide + Layer 2 or intra-subnet services in these DCs. This document, along + with [RFC9135], specifies the use of EVPN for Layer 3 or inter-subnet + connectivity services. + + [RFC9135] defines some fairly common inter-subnet forwarding + scenarios where Tenant Systems (TSs) can exchange packets with TSs + located in remote subnets. In order to achieve this, [RFC9135] + describes how Media Access Control (MAC) and IPs encoded in TS RT-2 + routes are not only used to populate MAC Virtual Routing and + Forwarding (MAC-VRF) and overlay Address Resolution Protocol (ARP) + tables but also IP-VRF tables with the encoded TS host routes (/32 or + /128). In some cases, EVPN may advertise IP prefixes and therefore + provide aggregation in the IP-VRF tables, as opposed to propagating + individual host routes. This document complements the scenarios + described in [RFC9135] and defines how EVPN may be used to advertise + IP prefixes. Interoperability between EVPN and Layer 3 Virtual + Private Network (VPN) [RFC4364] IP Prefix routes is out of the scope + of this document. + + Section 2.1 describes the inter-subnet connectivity requirements in + DCs. Section 2.2 explains why a new EVPN route type is required for + IP prefix advertisements. Sections 3, 4, and 5 will describe this + route type and how it is used in some specific use cases. + +1.1. Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in BCP + 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + + AC: Attachment Circuit + + ARP: Address Resolution Protocol + + BD: Broadcast Domain. As per [RFC7432], an EVI consists of a + single BD or multiple BDs. In case of VLAN-bundle and + VLAN-based service models (see [RFC7432]), a BD is + equivalent to an EVI. In case of a VLAN-aware bundle + service model, an EVI contains multiple BDs. Also, in this + document, "BD" and "subnet" are equivalent terms. + + BD Route Target: Refers to the broadcast-domain-assigned Route + Target [RFC4364]. In case of a VLAN-aware bundle service + model, all the BD instances in the MAC-VRF share the same + Route Target. + + BT: Bridge Table. The instantiation of a BD in a MAC-VRF, as + per [RFC7432]. + + CE: Customer Edge + + DA: Destination Address + + DGW: Data Center Gateway + + Ethernet A-D Route: Ethernet Auto-Discovery (A-D) route, as per + [RFC7432]. + + Ethernet NVO Tunnel: Refers to Network Virtualization Overlay + tunnels with Ethernet payload. Examples of this type of + tunnel are VXLAN or GENEVE. + + EVI: EVPN Instance spanning the NVE/PE devices that are + participating on that EVPN, as per [RFC7432]. + + EVPN: Ethernet VPN, as per [RFC7432]. + + GENEVE: Generic Network Virtualization Encapsulation, as per + [RFC8926]. + + GRE: Generic Routing Encapsulation + + GW IP: Gateway IP address + + IPL: IP Prefix Length + + IP NVO Tunnel: Refers to Network Virtualization Overlay tunnels with + IP payload (no MAC header in the payload). + + IP-VRF: A Virtual Routing and Forwarding table for IP routes on an + NVE/PE. The IP routes could be populated by EVPN and IP- + VPN address families. An IP-VRF is also an instantiation + of a Layer 3 VPN in an NVE/PE. + + IRB: Integrated Routing and Bridging interface. It connects an + IP-VRF to a BD (or subnet). + + MAC: Media Access Control + + MAC-VRF: A Virtual Routing and Forwarding table for MAC addresses on + an NVE/PE, as per [RFC7432]. A MAC-VRF is also an + instantiation of an EVI in an NVE/PE. + + ML: MAC Address Length + + ND: Neighbor Discovery + + NVE: Network Virtualization Edge + + NVO: Network Virtualization Overlay + + PE: Provider Edge + + RT-2: EVPN Route Type 2, i.e., MAC/IP Advertisement route, as + defined in [RFC7432]. + + RT-5: EVPN Route Type 5, i.e., IP Prefix route, as defined in + Section 3. + + SBD: Supplementary Broadcast Domain. A BD that does not have + any ACs, only IRB interfaces, and is used to provide + connectivity among all the IP-VRFs of the tenant. The SBD + is only required in IP-VRF-to-IP-VRF use cases (see + Section 4.4). + + SN: Subnet + + TS: Tenant System + + VA: Virtual Appliance + + VM: Virtual Machine + + VNI: Virtual Network Identifier. As in [RFC8365], the term is + used as a representation of a 24-bit NVO instance + identifier, with the understanding that "VNI" will refer to + a VXLAN Network Identifier in VXLAN, or a Virtual Network + Identifier in GENEVE, etc., unless it is stated otherwise. + + VSID: Virtual Subnet Identifier + + VTEP: VXLAN Termination End Point, as per [RFC7348]. + + VXLAN: Virtual eXtensible Local Area Network, as per [RFC7348]. + + This document also assumes familiarity with the terminology of + [RFC7365], [RFC7432], and [RFC8365]. + +2. Problem Statement + + This section describes the inter-subnet connectivity requirements in + DCs and why a specific route type to advertise IP prefixes is needed. + +2.1. Inter-Subnet Connectivity Requirements in Data Centers + + [RFC7432] is used as the control plane for an NVO solution in DCs, + where NVE devices can be located in hypervisors or Top-of-Rack (ToR) + switches, as described in [RFC8365]. + + The following considerations apply to TSs that are physical or + virtual systems identified by MAC (and possibly IP addresses) and are + connected to BDs by Attachment Circuits: + + * The Tenant Systems may be VMs that generate traffic from their own + MAC and IP. + + * The Tenant Systems may be VA entities that forward traffic to/from + IP addresses of different end devices sitting behind them. + + - These VAs can be firewalls, load balancers, NAT devices, other + appliances, or virtual gateways with virtual routing instances. + + - These VAs do not necessarily participate in dynamic routing + protocols and hence rely on the EVPN NVEs to advertise the + routes on their behalf. + + - In all these cases, the VA will forward traffic to other TSs + using its own source MAC, but the source IP will be the one + associated with the end device sitting behind the VA or a + translated IP address (part of a public NAT pool) if the VA is + performing NAT. + + - Note that the same IP address and endpoint could exist behind + two of these TSs. One example of this would be certain + appliance resiliency mechanisms, where a virtual IP or floating + IP can be owned by one of the two VAs running the resiliency + protocol (the Master VA). The Virtual Router Redundancy + Protocol (VRRP) [RFC5798] is one particular example of this. + Another example is multihomed subnets, i.e., the same subnet is + connected to two VAs. + + - Although these VAs provide IP connectivity to VMs and the + subnets behind them, they do not always have their own IP + interface connected to the EVPN NVE; Layer 2 firewalls are + examples of VAs not supporting IP interfaces. + + Figure 1 illustrates some of the examples described above. + + NVE1 + +-----------+ + TS1(VM)--| (BD-10) |-----+ + M1/IP1 +-----------+ | DGW1 + +---------+ +-------------+ + | |----| (BD-10) | + SN1---+ NVE2 | | | IRB1\ | + | +-----------+ | | | (IP-VRF)|---+ + SN2---TS2(VA)--| (BD-10) |-| | +-------------+ _|_ + | M2/IP2 +-----------+ | VXLAN/ | ( ) + IP4---+ <-+ | GENEVE | DGW2 ( WAN ) + | | | +-------------+ (___) + vIP23 (floating) | |----| (BD-10) | | + | +---------+ | IRB2\ | | + SN1---+ <-+ NVE3 | | | | (IP-VRF)|---+ + | M3/IP3 +-----------+ | | | +-------------+ + SN3---TS3(VA)--| (BD-10) |---+ | | + | +-----------+ | | + IP5---+ | | + | | + NVE4 | | NVE5 +--SN5 + +---------------------+ | | +-----------+ | + IP6------| (BD-1) | | +-| (BD-10) |--TS4(VA)--SN6 + | \ | | +-----------+ | + | (IP-VRF) |--+ ESI4 +--SN7 + | / \IRB3 | + |---| (BD-2) (BD-10) | + SN4| +---------------------+ + + + Note: + ESI4 = Ethernet Segment Identifier 4 + + Figure 1: DC Inter-subnet Use Cases + + Where: + + NVE1, NVE2, NVE3, NVE4, NVE5, DGW1, and DGW2 share the same BD for a + particular tenant. BD-10 is comprised of the collection of BD + instances defined in all the NVEs. All the hosts connected to BD-10 + belong to the same IP subnet. The hosts connected to BD-10 are + listed below: + + * TS1 is a VM that generates/receives traffic to/from IP1, where IP1 + belongs to the BD-10 subnet. + + * TS2 and TS3 are VAs that send/receive traffic to/from the subnets + and hosts sitting behind them (SN1, SN2, SN3, IP4, and IP5). + Their IP addresses (IP2 and IP3) belong to the BD-10 subnet, and + they can also generate/receive traffic. When these VAs receive + packets destined to their own MAC addresses (M2 and M3), they will + route the packets to the proper subnet or host. These VAs do not + support routing protocols to advertise the subnets connected to + them and can move to a different server and NVE when the cloud + management system decides to do so. These VAs may also support + redundancy mechanisms for some subnets, similar to VRRP, where a + floating IP is owned by the Master VA and only the Master VA + forwards traffic to a given subnet. For example, vIP23 in + Figure 1 is a floating IP that can be owned by TS2 or TS3 + depending on which system is the Master. Only the Master will + forward traffic to SN1. + + * Integrated Routing and Bridging interfaces IRB1, IRB2, and IRB3 + have their own IP addresses that belong to the BD-10 subnet too. + These IRB interfaces connect the BD-10 subnet to Virtual Routing + and Forwarding (IP-VRF) instances that can route the traffic to + other subnets for the same tenant (within the DC or at the other + end of the WAN). + + * TS4 is a Layer 2 VA that provides connectivity to subnets SN5, + SN6, and SN7 but does not have an IP address itself in the BD-10. + TS4 is connected to a port on NVE5 that is assigned to Ethernet + Segment Identifier 4 (ESI4). + + For a BD to which an ingress NVE is attached, "Overlay Index" is + defined as an identifier that the ingress EVPN NVE requires in order + to forward packets to a subnet or host in a remote subnet. As an + example, vIP23 (Figure 1) is an Overlay Index that any NVE attached + to BD-10 needs to know in order to forward packets to SN1. The IRB3 + IP address is an Overlay Index required to get to SN4, and ESI4 is an + Overlay Index needed to forward traffic to SN5. In other words, the + Overlay Index is a next hop in the overlay address space that can be + an IP address, a MAC address, or an ESI. When advertised along with + an IP prefix, the Overlay Index requires a recursive resolution to + find out the egress NVE to which the EVPN packets need to be sent. + + All the DC use cases in Figure 1 require inter-subnet forwarding; + therefore, the individual host routes and subnets: + + a) must be advertised from the NVEs (since VAs and VMs do not + participate in dynamic routing protocols) and + + b) may be associated with an Overlay Index that can be a VA IP + address, a floating IP address, a MAC address, or an ESI. The + Overlay Index is further discussed in Section 3.2. + +2.2. The Need for the EVPN IP Prefix Route + + [RFC7432] defines a MAC/IP Advertisement route (also referred to as + "RT-2") where a MAC address can be advertised together with an IP + address length and IP address (IP). While a variable IP address + length might have been used to indicate the presence of an IP prefix + in a route type 2, there are several specific use cases in which + using this route type to deliver IP prefixes is not suitable. + + One example of such use cases is the "floating IP" example described + in Section 2.1. In this example, it is necessary to decouple the + advertisement of the prefixes from the advertisement of a MAC address + of either M2 or M3; otherwise, the solution gets highly inefficient + and does not scale. + + For example, if 1,000 prefixes are advertised from M2 (using RT-2) + and the floating IP owner changes from M2 to M3, 1,000 routes would + be withdrawn by M2 and readvertised by M3. However, if a separate + route type is used, 1,000 routes can be advertised as associated with + the floating IP address (vIP23), and only one RT-2 can be used for + advertising the ownership of the floating IP, i.e., vIP23 and M2 in + the route type 2. When the floating IP owner changes from M2 to M3, + a single RT-2 withdrawal/update is required to indicate the change. + The remote DGW will not change any of the 1,000 prefixes associated + with vIP23 but will only update the ARP resolution entry for vIP23 + (now pointing at M3). + + An EVPN route (type 5) for the advertisement of IP prefixes is + described in this document. This new route type has a differentiated + role from the RT-2 route and addresses the inter-subnet connectivity + scenarios for DCs (or NVO-based networks in general) described in + this document. Using this new RT-5, an IP prefix may be advertised + along with an Overlay Index, which can be a GW IP address, a MAC, or + an ESI. The IP prefix may also be advertised without an Overlay + Index, in which case the BGP next hop will point at the egress NVE, + Area Border Router (ABR), or ASBR, and the MAC in the EVPN Router's + MAC Extended Community will provide the inner MAC destination address + to be used. As discussed throughout the document, the EVPN RT-2 does + not meet the requirements for all the DC use cases; therefore, this + EVPN route type 5 is required. + + The EVPN route type 5 decouples the IP prefix advertisements from the + MAC/IP Advertisement routes in EVPN. Hence: + + a) The clean and clear advertisements of IPv4 or IPv6 prefixes in a + Network Layer Reachability Information (NLRI) message without MAC + addresses are allowed. + + b) Since the route type is different from the MAC/IP Advertisement + route, the current procedures described in [RFC7432] do not need + to be modified. + + c) A flexible implementation is allowed where the prefix can be + linked to different types of Overlay/Underlay Indexes: overlay IP + addresses, overlay MAC addresses, overlay ESIs, underlay BGP next + hops, etc. + + d) An EVPN implementation not requiring IP prefixes can simply + discard them by looking at the route type value. + + The following sections describe how EVPN is extended with a route + type for the advertisement of IP prefixes and how this route is used + to address the inter-subnet connectivity requirements existing in the + DC. + +3. The BGP EVPN IP Prefix Route + + The BGP EVPN NLRI as defined in [RFC7432] is shown below: + + +-----------------------------------+ + | Route Type (1 octet) | + +-----------------------------------+ + | Length (1 octet) | + +-----------------------------------+ + | Route Type specific (variable) | + +-----------------------------------+ + + Figure 2: BGP EVPN NLRI + + This document defines an additional route type (RT-5) in the IANA + "EVPN Route Types" registry [EVPNRouteTypes] to be used for the + advertisement of EVPN routes using IP prefixes: + + Value: 5 + Description: IP Prefix + + According to Section 5.4 of [RFC7606], a node that doesn't recognize + the route type 5 (RT-5) will ignore it. Therefore, an NVE following + this document can still be attached to a BD where an NVE ignoring RT- + 5s is attached. Regular procedures described in [RFC7432] would + apply in that case for both NVEs. In case two or more NVEs are + attached to different BDs of the same tenant, they MUST support the + RT-5 for the proper inter-subnet forwarding operation of the tenant. + + The detailed encoding of this route and associated procedures are + described in the following sections. + +3.1. IP Prefix Route Encoding + + An IP Prefix route type for IPv4 has the Length field set to 34 and + consists of the following fields: + + +---------------------------------------+ + | RD (8 octets) | + +---------------------------------------+ + |Ethernet Segment Identifier (10 octets)| + +---------------------------------------+ + | Ethernet Tag ID (4 octets) | + +---------------------------------------+ + | IP Prefix Length (1 octet, 0 to 32) | + +---------------------------------------+ + | IP Prefix (4 octets) | + +---------------------------------------+ + | GW IP Address (4 octets) | + +---------------------------------------+ + | MPLS Label (3 octets) | + +---------------------------------------+ + + Figure 3: EVPN IP Prefix Route NLRI for IPv4 + + An IP Prefix route type for IPv6 has the Length field set to 58 and + consists of the following fields: + + +---------------------------------------+ + | RD (8 octets) | + +---------------------------------------+ + |Ethernet Segment Identifier (10 octets)| + +---------------------------------------+ + | Ethernet Tag ID (4 octets) | + +---------------------------------------+ + | IP Prefix Length (1 octet, 0 to 128) | + +---------------------------------------+ + | IP Prefix (16 octets) | + +---------------------------------------+ + | GW IP Address (16 octets) | + +---------------------------------------+ + | MPLS Label (3 octets) | + +---------------------------------------+ + + Figure 4: EVPN IP Prefix Route NLRI for IPv6 + + Where: + + * The Length field of the BGP EVPN NLRI for an EVPN IP Prefix route + MUST be either 34 (if IPv4 addresses are carried) or 58 (if IPv6 + addresses are carried). The IP prefix and gateway IP address MUST + be from the same IP address family. + + * The Route Distinguisher (RD) and Ethernet Tag ID MUST be used as + defined in [RFC7432] and [RFC8365]. In particular, the RD is + unique per MAC-VRF (or IP-VRF). The MPLS Label field is set to + either an MPLS label or a VNI, as described in [RFC8365] for other + EVPN route types. + + * The Ethernet Segment Identifier MUST be a non-zero 10-octet + identifier if the ESI is used as an Overlay Index (see the + definition of "Overlay Index" in Section 3.2). It MUST be all + bytes zero otherwise. The ESI format is described in [RFC7432]. + + * The IP prefix length can be set to a value between 0 and 32 (bits) + for IPv4 and between 0 and 128 for IPv6, and it specifies the + number of bits in the prefix. The value MUST NOT be greater than + 128. + + * The IP prefix is a 4- or 16-octet field (IPv4 or IPv6). + + * The GW IP Address field is a 4- or 16-octet field (IPv4 or IPv6) + and will encode a valid IP address as an Overlay Index for the IP + prefixes. The GW IP field MUST be all bytes zero if it is not + used as an Overlay Index. Refer to Section 3.2 for the definition + and use of the Overlay Index. + + * The MPLS Label field is encoded as 3 octets, where the high-order + 20 bits contain the label value, as per [RFC7432]. When sending, + the label value SHOULD be zero if a recursive resolution based on + an Overlay Index is used. If the received MPLS label value is + zero, the route MUST contain an Overlay Index, and the ingress + NVE/PE MUST perform a recursive resolution to find the egress NVE/ + PE. If the received label is zero and the route does not contain + an Overlay Index, it MUST be "treat as withdraw" [RFC7606]. + + The RD, Ethernet Tag ID, IP prefix length, and IP prefix are part of + the route key used by BGP to compare routes. The rest of the fields + are not part of the route key. + + An IP Prefix route MAY be sent along with an EVPN Router's MAC + Extended Community (defined in [RFC9135]) to carry the MAC address + that is used as the Overlay Index. Note that the MAC address may be + that of a TS. + + As described in Section 3.2, certain data combinations in a received + route would imply a treat-as-withdraw handling of the route + [RFC7606]. + +3.2. Overlay Indexes and Recursive Lookup Resolution + + RT-5 routes support recursive lookup resolution through the use of + Overlay Indexes as follows: + + * An Overlay Index can be an ESI or IP address in the address space + of the tenant or MAC address, and it is used by an NVE as the next + hop for a given IP prefix. An Overlay Index always needs a + recursive route resolution on the NVE/PE that installs the RT-5 + into one of its IP-VRFs so that the NVE knows to which egress NVE/ + PE it needs to forward the packets. It is important to note that + recursive resolution of the Overlay Index applies upon + installation into an IP-VRF and not upon BGP propagation (for + instance, on an ASBR). Also, as a result of the recursive + resolution, the egress NVE/PE is not necessarily the same NVE that + originated the RT-5. + + * The Overlay Index is indicated along with the RT-5 in the ESI + field, GW IP field, or EVPN Router's MAC Extended Community, + depending on whether the IP prefix next hop is an ESI, an IP + address, or a MAC address in the tenant space. The Overlay Index + for a given IP prefix is set by local policy at the NVE that + originates an RT-5 for that IP prefix (typically managed by the + cloud management system). + + * In order to enable the recursive lookup resolution at the ingress + NVE, an NVE that is a possible egress NVE for a given Overlay + Index must originate a route advertising itself as the BGP next + hop on the path to the system denoted by the Overlay Index. For + instance: + + - If an NVE receives an RT-5 that specifies an Overlay Index, the + NVE cannot use the RT-5 in its IP-VRF unless (or until) it can + recursively resolve the Overlay Index. + + - If the RT-5 specifies an ESI as the Overlay Index, a recursive + resolution can only be done if the NVE has received and + installed an RT-1 (auto-discovery per EVI) route specifying + that ESI. + + - If the RT-5 specifies a GW IP address as the Overlay Index, a + recursive resolution can only be done if the NVE has received + and installed an RT-2 (MAC/IP Advertisement route) specifying + that IP address in the IP Address field of its NLRI. + + - If the RT-5 specifies a MAC address as the Overlay Index, a + recursive resolution can only be done if the NVE has received + and installed an RT-2 (MAC/IP Advertisement route) specifying + that MAC address in the MAC Address field of its NLRI. + + Note that the RT-1 or RT-2 routes needed for the recursive + resolution may arrive before or after the given RT-5 route. + + * Irrespective of the recursive resolution, if there is no IGP or + BGP route to the BGP next hop of an RT-5, BGP MUST NOT install the + RT-5 even if the Overlay Index can be resolved. + + * The ESI and GW IP fields may both be zero at the same time. + However, they MUST NOT both be non-zero at the same time. A route + containing a non-zero GW IP and a non-zero ESI (at the same time) + SHOULD be treat as withdraw [RFC7606]. + + * If either the ESI or the GW IP are non-zero, then the non-zero one + is the Overlay Index, regardless of whether the EVPN Router's MAC + Extended Community is present or the value of the label. In case + the GW IP is the Overlay Index (hence, ESI is zero), the EVPN + Router's MAC Extended Community is ignored if present. + + * A route where ESI, GW IP, MAC, and Label are all zero at the same + time SHOULD be treat as withdraw. + + The indirection provided by the Overlay Index and its recursive + lookup resolution is required to achieve fast convergence in case of + a failure of the object represented by the Overlay Index (see the + example described in Section 2.2). + + Table 1 shows the different RT-5 field combinations allowed by this + specification and what Overlay Index must be used by the receiving + NVE/PE in each case. Cases where there is no Overlay Index are + indicated as "None" in Table 1. If there is no Overlay Index, the + receiving NVE/PE will not perform any recursive resolution, and the + actual next hop is given by the RT-5's BGP next hop. + + +==========+==========+==========+============+===============+ + | ESI | GW IP | MAC* | Label | Overlay Index | + +==========+==========+==========+============+===============+ + | Non-Zero | Zero | Zero | Don't Care | ESI | + +----------+----------+----------+------------+---------------+ + | Non-Zero | Zero | Non-Zero | Don't Care | ESI | + +----------+----------+----------+------------+---------------+ + | Zero | Non-Zero | Zero | Don't Care | GW IP | + +----------+----------+----------+------------+---------------+ + | Zero | Zero | Non-Zero | Zero | MAC | + +----------+----------+----------+------------+---------------+ + | Zero | Zero | Non-Zero | Non-Zero | MAC or None** | + +----------+----------+----------+------------+---------------+ + | Zero | Zero | Zero | Non-Zero | None*** | + +----------+----------+----------+------------+---------------+ + + Table 1: RT-5 Fields and Indicated Overlay Index + + Table Notes: + + * MAC with "Zero" value means no EVPN Router's MAC Extended + Community is present along with the RT-5. "Non-Zero" indicates + that the extended community is present and carries a valid MAC + address. The encoding of a MAC address MUST be the 6-octet MAC + address specified by [IEEE-802.1Q]. Examples of invalid MAC + addresses are broadcast or multicast MAC addresses. The route + MUST be treat as withdraw in case of an invalid MAC address. + The presence of the EVPN Router's MAC Extended Community alone + is not enough to indicate the use of the MAC address as the + Overlay Index since the extended community can be used for + other purposes. + + ** In this case, the Overlay Index may be the RT-5's MAC address + or "None", depending on the local policy of the receiving NVE/ + PE. Note that the advertising NVE/PE that sets the Overlay + Index SHOULD advertise an RT-2 for the MAC Overlay Index if + there are receiving NVE/PEs configured to use the MAC as the + Overlay Index. This case in Table 1 is used in the IP-VRF-to- + IP-VRF implementations described in Sections 4.4.1 and 4.4.3. + The support of a MAC Overlay Index in this model is OPTIONAL. + + *** The Overlay Index is "None". This is a special case used for + IP-VRF-to-IP-VRF where the NVE/PEs are connected by IP NVO + tunnels as opposed to Ethernet NVO tunnels. + + If the combination of ESI, GW IP, MAC, and Label in the receiving + RT-5 is different than the combinations shown in Table 1, the router + will process the route as per the rules described at the beginning of + this section (Section 3.2). + + Table 2 shows the different inter-subnet use cases described in this + document and the corresponding coding of the Overlay Index in the + route type 5 (RT-5). + + +=========+=====================+===========================+ + | Section | Use Case | Overlay Index in the RT-5 | + +=========+=====================+===========================+ + | 4.1 | TS IP address | GW IP | + +---------+---------------------+---------------------------+ + | 4.2 | Floating IP address | GW IP | + +---------+---------------------+---------------------------+ + | 4.3 | "Bump-in-the-wire" | ESI or MAC | + +---------+---------------------+---------------------------+ + | 4.4 | IP-VRF-to-IP-VRF | GW IP, MAC, or None | + +---------+---------------------+---------------------------+ + + Table 2: Use Cases and Overlay Indexes for Recursive + Resolution + + The above use cases are representative of the different Overlay + Indexes supported by the RT-5 (GW IP, ESI, MAC, or None). + +4. Overlay Index Use Cases + + This section describes some use cases for the Overlay Index types + used with the IP Prefix route. Although the examples use IPv4 + prefixes and subnets, the descriptions of the RT-5 are valid for the + same cases with IPv6, except that IP Prefixes, IPL, and GW IP are + replaced by the corresponding IPv6 values. + +4.1. TS IP Address Overlay Index Use Case + + Figure 5 illustrates an example of inter-subnet forwarding for + subnets sitting behind VAs (on TS2 and TS3). + + IP4---+ NVE2 DGW1 + | +-----------+ +---------+ +-------------+ + SN2---TS2(VA)--| (BD-10) |-| |----| (BD-10) | + | M2/IP2 +-----------+ | | | IRB1\ | + -+---+ | | | (IP-VRF)|---+ + | | | +-------------+ _|_ + SN1 | VXLAN/ | ( ) + | | GENEVE | DGW2 ( WAN ) + -+---+ NVE3 | | +-------------+ (___) + | M3/IP3 +-----------+ | |----| (BD-10) | | + SN3---TS3(VA)--| (BD-10) |-| | | IRB2\ | | + | +-----------+ +---------+ | (IP-VRF)|---+ + IP5---+ +-------------+ + + Figure 5: TS IP Address Use Case + + An example of inter-subnet forwarding between subnet SN1, which uses + a 24-bit IP prefix (written as SN1/24 in the future), and a subnet + sitting in the WAN is described below. NVE2, NVE3, DGW1, and DGW2 + are running BGP EVPN. TS2 and TS3 do not participate in dynamic + routing protocols, and they only have a static route to forward the + traffic to the WAN. SN1/24 is dual-homed to NVE2 and NVE3. + + In this case, a GW IP is used as an Overlay Index. Although a + different Overlay Index type could have been used, this use case + assumes that the operator knows the VA's IP addresses beforehand, + whereas the VA's MAC address is unknown and the VA's ESI is zero. + Because of this, the GW IP is the suitable Overlay Index to be used + with the RT-5s. The NVEs know the GW IP to be used for a given + prefix by policy. + + (1) NVE2 advertises the following BGP routes on behalf of TS2: + + * Route type 2 (MAC/IP Advertisement route) containing: ML = 48 + (MAC address length), M = M2 (MAC address), IPL = 32 (IP + prefix length), IP = IP2, and BGP Encapsulation Extended + Community [RFC9012] with the corresponding tunnel type. The + MAC and IP addresses may be learned via ARP snooping. + + * Route type 5 (IP Prefix route) containing: IPL = 24, IP = + SN1, ESI = 0, and GW IP address = IP2. The prefix and GW IP + are learned by policy. + + (2) Similarly, NVE3 advertises the following BGP routes on behalf of + TS3: + + * Route type 2 (MAC/IP Advertisement route) containing: ML = + 48, M = M3, IPL = 32, IP = IP3 (and BGP Encapsulation + Extended Community). + + * Route type 5 (IP Prefix route) containing: IPL = 24, IP = + SN1, ESI = 0, and GW IP address = IP3. + + (3) DGW1 and DGW2 import both received routes based on the Route + Targets: + + * Based on the BD-10 Route Target in DGW1 and DGW2, the MAC/IP + Advertisement route is imported, and M2 is added to the BD-10 + along with its corresponding tunnel information. For + instance, if VXLAN is used, the VTEP will be derived from the + MAC/IP Advertisement route BGP next hop and VNI from the MPLS + Label1 field. M2/IP2 is added to the ARP table. Similarly, + M3 is added to BD-10, and M3/IP3 is added to the ARP table. + + * Based on the BD-10 Route Target in DGW1 and DGW2, the IP + Prefix route is also imported, and SN1/24 is added to the IP- + VRF with Overlay Index IP2 pointing at the local BD-10. In + this example, it is assumed that the RT-5 from NVE2 is + preferred over the RT-5 from NVE3. If both routes were + equally preferable and ECMP enabled, SN1/24 would also be + added to the routing table with Overlay Index IP3. + + (4) When DGW1 receives a packet from the WAN with destination IPx, + where IPx belongs to SN1/24: + + * A destination IP lookup is performed on the DGW1 IP-VRF + table, and Overlay Index = IP2 is found. Since IP2 is an + Overlay Index, a recursive route resolution is required for + IP2. + + * IP2 is resolved to M2 in the ARP table, and M2 is resolved to + the tunnel information given by the BD FIB (e.g., remote VTEP + and VNI for the VXLAN case). + + * The IP packet destined to IPx is encapsulated with: + + - Inner source MAC = IRB1 MAC. + + - Inner destination MAC = M2. + + - Tunnel information provided by the BD (VNI, VTEP IPs, and + MACs for the VXLAN case). + + (5) When the packet arrives at NVE2: + + * Based on the tunnel information (VNI for the VXLAN case), the + BD-10 context is identified for a MAC lookup. + + * Encapsulation is stripped off and, based on a MAC lookup + (assuming MAC forwarding on the egress NVE), the packet is + forwarded to TS2, where it will be properly routed. + + (6) Should TS2 move from NVE2 to NVE3, MAC Mobility procedures will + be applied to the MAC route M2/IP2, as defined in [RFC7432]. + Route type 5 prefixes are not subject to MAC Mobility + procedures; hence, no changes in the DGW IP-VRF table will occur + for TS2 mobility -- i.e., all the prefixes will still be + pointing at IP2 as the Overlay Index. There is an indirection + for, e.g., SN1/24, which still points at Overlay Index IP2 in + the routing table, but IP2 will be simply resolved to a + different tunnel based on the outcome of the MAC Mobility + procedures for the MAC/IP Advertisement route M2/IP2. + + Note that in the opposite direction, TS2 will send traffic based on + its static-route next-hop information (IRB1 and/or IRB2), and regular + EVPN procedures will be applied. + +4.2. Floating IP Overlay Index Use Case + + Sometimes TSs work in active/standby mode where an upstream floating + IP owned by the active TS is used as the Overlay Index to get to some + subnets behind the TS. This redundancy mode, already introduced in + Sections 2.1 and 2.2, is illustrated in Figure 6. + + NVE2 DGW1 + +-----------+ +---------+ +-------------+ + +---TS2(VA)--| (BD-10) |-| |----| (BD-10) | + | M2/IP2 +-----------+ | | | IRB1\ | + | <-+ | | | (IP-VRF)|---+ + | | | | +-------------+ _|_ + SN1 vIP23 (floating) | VXLAN/ | ( ) + | | | GENEVE | DGW2 ( WAN ) + | <-+ NVE3 | | +-------------+ (___) + | M3/IP3 +-----------+ | |----| (BD-10) | | + +---TS3(VA)--| (BD-10) |-| | | IRB2\ | | + +-----------+ +---------+ | (IP-VRF)|---+ + +-------------+ + + Figure 6: Floating IP Overlay Index for Redundant TS + + In this use case, a GW IP is used as an Overlay Index for the same + reasons as in Section 4.1. However, this GW IP is a floating IP that + belongs to the active TS. Assuming TS2 is the active TS and owns + vIP23: + + (1) NVE2 advertises the following BGP routes for TS2: + + * Route type 2 (MAC/IP Advertisement route) containing: ML = + 48, M = M2, IPL = 32, and IP = vIP23 (as well as BGP + Encapsulation Extended Community). The MAC and IP addresses + may be learned via ARP snooping. + + * Route type 5 (IP Prefix route) containing: IPL = 24, IP = + SN1, ESI = 0, and GW IP address = vIP23. The prefix and GW + IP are learned by policy. + + (2) NVE3 advertises the following BGP route for TS3 (it does not + advertise an RT-2 for M3/vIP23): + + * Route type 5 (IP Prefix route) containing: IPL = 24, IP = + SN1, ESI = 0, and GW IP address = vIP23. The prefix and GW + IP are learned by policy. + + (3) DGW1 and DGW2 import both received routes based on the Route + Target: + + * M2 is added to the BD-10 FIB along with its corresponding + tunnel information. For the VXLAN use case, the VTEP will be + derived from the MAC/IP Advertisement route BGP next hop and + VNI from the VNI field. M2/vIP23 is added to the ARP table. + + * SN1/24 is added to the IP-VRF in DGW1 and DGW2 with Overlay + Index vIP23 pointing at M2 in the local BD-10. + + (4) When DGW1 receives a packet from the WAN with destination IPx, + where IPx belongs to SN1/24: + + * A destination IP lookup is performed on the DGW1 IP-VRF + table, and Overlay Index = vIP23 is found. Since vIP23 is an + Overlay Index, a recursive route resolution for vIP23 is + required. + + * vIP23 is resolved to M2 in the ARP table, and M2 is resolved + to the tunnel information given by the BD (remote VTEP and + VNI for the VXLAN case). + + * The IP packet destined to IPx is encapsulated with: + + - Inner source MAC = IRB1 MAC. + + - Inner destination MAC = M2. + + - Tunnel information provided by the BD FIB (VNI, VTEP IPs, + and MACs for the VXLAN case). + + (5) When the packet arrives at NVE2: + + * Based on the tunnel information (VNI for the VXLAN case), the + BD-10 context is identified for a MAC lookup. + + * Encapsulation is stripped off and, based on a MAC lookup + (assuming MAC forwarding on the egress NVE), the packet is + forwarded to TS2, where it will be properly routed. + + (6) When the redundancy protocol running between TS2 and TS3 + appoints TS3 as the new active TS for SN1, TS3 will now own the + floating vIP23 and will signal this new ownership using a + gratuitous ARP REPLY message (explained in [RFC5227]) or + similar. Upon receiving the new owner's notification, NVE3 will + issue a route type 2 for M3/vIP23, and NVE2 will withdraw the + RT-2 for M2/vIP23. DGW1 and DGW2 will update their ARP tables + with the new MAC resolving the floating IP. No changes are made + in the IP-VRF table. + +4.3. Bump-in-the-Wire Use Case + + Figure 7 illustrates an example of inter-subnet forwarding for an IP + Prefix route that carries subnet SN1. In this use case, TS2 and TS3 + are Layer 2 VA devices without any IP addresses that can be included + as an Overlay Index in the GW IP field of the IP Prefix route. Their + MAC addresses are M2 and M3, respectively, and are connected to BD- + 10. Note that IRB1 and IRB2 (in DGW1 and DGW2, respectively) have IP + addresses in a subnet different than SN1. + + NVE2 DGW1 + M2 +-----------+ +---------+ +-------------+ + +---TS2(VA)--| (BD-10) |-| |----| (BD-10) | + | ESI23 +-----------+ | | | IRB1\ | + | + | | | (IP-VRF)|---+ + | | | | +-------------+ _|_ + SN1 | | VXLAN/ | ( ) + | | | GENEVE | DGW2 ( WAN ) + | + NVE3 | | +-------------+ (___) + | ESI23 +-----------+ | |----| (BD-10) | | + +---TS3(VA)--| (BD-10) |-| | | IRB2\ | | + M3 +-----------+ +---------+ | (IP-VRF)|---+ + +-------------+ + + Figure 7: Bump-in-the-Wire Use Case + + Since TS2 and TS3 cannot participate in any dynamic routing protocol + and neither has an IP address assigned, there are two potential + Overlay Index types that can be used when advertising SN1: + + a) an ESI, i.e., ESI23, that can be provisioned on the attachment + ports of NVE2 and NVE3, as shown in Figure 7 or + + b) the VA's MAC address, which can be added to NVE2 and NVE3 by + policy. + + The advantage of using an ESI as the Overlay Index as opposed to the + VA's MAC address is that the forwarding to the egress NVE can be done + purely based on the state of the AC in the Ethernet segment (notified + by the Ethernet A-D per EVI route), and all the EVPN multihoming + redundancy mechanisms can be reused. For instance, the mass + withdrawal mechanism described in [RFC7432] for fast failure + detection and propagation can be used. It is assumed per this + section that an ESI Overlay Index is used in this use case, but this + use case does not preclude the use of the VA's MAC address as an + Overlay Index. If a MAC is used as the Overlay Index, the control + plane must follow the procedures described in Section 4.4.3. + + The model supports VA redundancy in a similar way to the one + described in Section 4.2 for the floating IP Overlay Index use case, + except that it uses the EVPN Ethernet A-D per EVI route instead of + the MAC advertisement route to advertise the location of the Overlay + Index. The procedure is explained below: + + (1) Assuming TS2 is the active TS in ESI23, NVE2 advertises the + following BGP routes: + + * Route type 1 (Ethernet A-D route for BD-10) containing: ESI = + ESI23 and the corresponding tunnel information (VNI field), + as well as the BGP Encapsulation Extended Community as per + [RFC8365]. + + * Route type 5 (IP Prefix route) containing: IPL = 24, IP = + SN1, ESI = ESI23, and GW IP address = 0. The EVPN Router's + MAC Extended Community defined in [RFC9135] is added and + carries the MAC address (M2) associated with the TS behind + which SN1 sits. M2 may be learned by policy; however, the + MAC in the Extended Community is preferred if sent with the + route. + + (2) NVE3 advertises the following BGP route for TS3 (no AD per EVI + route is advertised): + + * Route type 5 (IP Prefix route) containing: IPL = 24, IP = + SN1, ESI = 23, and GW IP address = 0. The EVPN Router's MAC + Extended Community is added and carries the MAC address (M3) + associated with the TS behind which SN1 sits. M3 may be + learned by policy; however, the MAC in the Extended Community + is preferred if sent with the route. + + (3) DGW1 and DGW2 import the received routes based on the Route + Target: + + * The tunnel information to get to ESI23 is installed in DGW1 + and DGW2. For the VXLAN use case, the VTEP will be derived + from the Ethernet A-D route BGP next hop and VNI from the + VNI/VSID field (see [RFC8365]). + + * The RT-5 coming from the NVE that advertised the RT-1 is + selected, and SN1/24 is added to the IP-VRF in DGW1 and DGW2 + with Overlay Index ESI23 and MAC = M2. + + (4) When DGW1 receives a packet from the WAN with destination IPx, + where IPx belongs to SN1/24: + + * A destination IP lookup is performed on the DGW1 IP-VRF + table, and Overlay Index = ESI23 is found. Since ESI23 is an + Overlay Index, a recursive route resolution is required to + find the egress NVE where ESI23 resides. + + * The IP packet destined to IPx is encapsulated with: + + - Inner source MAC = IRB1 MAC. + + - Inner destination MAC = M2 (this MAC will be obtained from + the EVPN Router's MAC Extended Community received along + with the RT-5 for SN1). Note that the EVPN Router's MAC + Extended Community is used in this case to carry the TS's + MAC address, as opposed to the MAC address of the NVE/PE. + + - Tunnel information for the NVO tunnel is provided by the + Ethernet A-D route per EVI for ESI23 (VNI and VTEP IP for + the VXLAN case). + + (5) When the packet arrives at NVE2: + + * Based on the tunnel demultiplexer information (VNI for the + VXLAN case), the BD-10 context is identified for a MAC lookup + (assuming a MAC-based disposition model [RFC7432]), or the + VNI may directly identify the egress interface (for an MPLS- + based disposition model, which in this context is a VNI-based + disposition model). + + * Encapsulation is stripped off and, based on a MAC lookup + (assuming MAC forwarding on the egress NVE) or a VNI lookup + (in case of VNI forwarding), the packet is forwarded to TS2, + where it will be forwarded to SN1. + + (6) If the redundancy protocol running between TS2 and TS3 follows + an active/standby model and there is a failure, TS3 is appointed + as the new active TS for SN1. TS3 will now own the connectivity + to SN1 and will signal this new ownership. Upon receiving the + new owner's notification, NVE3's AC will become active and issue + a route type 1 for ESI23, whereas NVE2 will withdraw its + Ethernet A-D route for ESI23. DGW1 and DGW2 will update their + tunnel information to resolve ESI23. The inner destination MAC + will be changed to M3. + +4.4. IP-VRF-to-IP-VRF Model + + This use case is similar to the scenario described in Section 9.1 of + [RFC9135]; however, the new requirement here is the advertisement of + IP prefixes as opposed to only host routes. + + In the examples described in Sections 4.1, 4.2, and 4.3, the BD + instance can connect IRB interfaces and any other Tenant Systems + connected to it. EVPN provides connectivity for: + + 1. Traffic destined to the IRB or TS IP interfaces, as well as + + 2. Traffic destined to IP subnets sitting behind the TS, e.g., SN1 + or SN2. + + In order to provide connectivity for (1), MAC/IP Advertisement routes + (RT-2) are needed so that IRB or TS MACs and IPs can be distributed. + Connectivity type (2) is accomplished by the exchange of IP Prefix + routes (RT-5) for IPs and subnets sitting behind certain Overlay + Indexes, e.g., GW IP, ESI, or TS MAC. + + In some cases, IP Prefix routes may be advertised for subnets and IPs + sitting behind an IRB. This use case is referred to as the "IP-VRF- + to-IP-VRF" model. + + [RFC9135] defines an asymmetric IRB model and a symmetric IRB model + based on the required lookups at the ingress and egress NVE. The + asymmetric model requires an IP lookup and a MAC lookup at the + ingress NVE, whereas only a MAC lookup is needed at the egress NVE; + the symmetric model requires IP and MAC lookups at both the ingress + and egress NVE. From that perspective, the IP-VRF-to-IP-VRF use case + described in this section is a symmetric IRB model. + + Note that in an IP-VRF-to-IP-VRF scenario, out of the many subnets + that a tenant may have, it may be the case that only a few are + attached to a given IP-VRF of the NVE/PE. In order to provide inter- + subnet connectivity among the set of NVE/PEs where the tenant is + connected, a new SBD is created on all of them if a recursive + resolution is needed. This SBD is instantiated as a regular BD (with + no ACs) in each NVE/PE and has an IRB interface that connects the SBD + to the IP-VRF. The IRB interface's IP or MAC address is used as the + Overlay Index for a recursive resolution. + + Depending on the existence and characteristics of the SBD and IRB + interfaces for the IP-VRFs, there are three different IP-VRF-to-IP- + VRF scenarios identified and described in this document: + + 1. Interface-less model: no SBD and no Overlay Indexes required. + + 2. Interface-ful with an SBD IRB model: requires SBD as well as GW + IP addresses as Overlay Indexes. + + 3. Interface-ful with an unnumbered SBD IRB model: requires SBD as + well as MAC addresses as Overlay Indexes. + + Inter-subnet IP multicast is outside the scope of this document. + +4.4.1. Interface-less IP-VRF-to-IP-VRF Model + + Figure 8 depicts the Interface-less IP-VRF-to-IP-VRF model. + + NVE1(M1) + +------------+ + IP1+----| (BD-1) | DGW1(M3) + | \ | +---------+ +--------+ + | (IP-VRF)|----| |-|(IP-VRF)|----+ + | / | | | +--------+ | + +---| (BD-2) | | | _+_ + | +------------+ | | ( ) + SN1| | VXLAN/ | ( WAN )--H1 + | NVE2(M2) | GENEVE/| (___) + | +------------+ | MPLS | + + +---| (BD-2) | | | DGW2(M4) | + | \ | | | +--------+ | + | (IP-VRF)|----| |-|(IP-VRF)|----+ + | / | +---------+ +--------+ + SN2+----| (BD-3) | + +------------+ + + Figure 8: Interface-less IP-VRF-to-IP-VRF Model + + In this case: + + a) The NVEs and DGWs must provide connectivity between hosts in SN1, + SN2, and IP1 and hosts sitting at the other end of the WAN -- for + example, H1. It is assumed that the DGWs import/export IP and/or + VPN-IP routes to/from the WAN. + + b) The IP-VRF instances in the NVE/DGWs are directly connected + through NVO tunnels, and no IRBs and/or BD instances are + instantiated to connect the IP-VRFs. + + c) The solution must provide Layer 3 connectivity among the IP-VRFs + for Ethernet NVO tunnels -- for instance, VXLAN or GENEVE. + + d) The solution may provide Layer 3 connectivity among the IP-VRFs + for IP NVO tunnels -- for example, GENEVE (with IP payload). + + In order to meet the above requirements, the EVPN route type 5 will + be used to advertise the IP prefixes, along with the EVPN Router's + MAC Extended Community as defined in [RFC9135] if the advertising + NVE/DGW uses Ethernet NVO tunnels. Each NVE/DGW will advertise an + RT-5 for each of its prefixes with the following fields: + + * RD as per [RFC7432]. + + * Ethernet Tag ID = 0. + + * IP prefix length and IP address, as explained in the previous + sections. + + * GW IP address = 0. + + * ESI = 0. + + * MPLS label or VNI corresponding to the IP-VRF. + + Each RT-5 will be sent with a Route Target identifying the tenant + (IP-VRF) and may be sent with two BGP extended communities: + + * The first one is the BGP Encapsulation Extended Community, as per + [RFC9012], identifying the tunnel type. + + * The second one is the EVPN Router's MAC Extended Community, as per + [RFC9135], containing the MAC address associated with the NVE + advertising the route. This MAC address identifies the NVE/DGW + and MAY be reused for all the IP-VRFs in the NVE. The EVPN + Router's MAC Extended Community must be sent if the route is + associated with an Ethernet NVO tunnel -- for instance, VXLAN. If + the route is associated with an IP NVO tunnel -- for instance, + GENEVE with an IP payload -- the EVPN Router's MAC Extended + Community should not be sent. + + The following example illustrates the procedure to advertise and + forward packets to SN1/24 (IPv4 prefix advertised from NVE1): + + (1) NVE1 advertises the following BGP route: + + * Route type 5 (IP Prefix route) containing: + + - IPL = 24, IP = SN1, Label = 10. + + - GW IP = set to 0. + + - BGP Encapsulation Extended Community [RFC9012]. + + - EVPN Router's MAC Extended Community that contains M1. + + - Route Target identifying the tenant (IP-VRF). + + (2) DGW1 imports the received routes from NVE1: + + * DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 + Route Target. + + * Since GW IP = ESI = 0, the label is a non-zero value, and the + local policy indicates this interface-less model, DGW1, will + use the label and next hop of the RT-5, as well as the MAC + address conveyed in the EVPN Router's MAC Extended Community + (as the inner destination MAC address) to set up the + forwarding state and later encapsulate the routed IP packets. + + (3) When DGW1 receives a packet from the WAN with destination IPx, + where IPx belongs to SN1/24: + + * A destination IP lookup is performed on the DGW1 IP-VRF + table. The lookup yields SN1/24. + + * Since the RT-5 for SN1/24 had a GW IP = ESI = 0, a non-zero + label, and a next hop, and since the model is interface-less, + DGW1 will not need a recursive lookup to resolve the route. + + * The IP packet destined to IPx is encapsulated with: inner + source MAC = DGW1 MAC, inner destination MAC = M1, outer + source IP (tunnel source IP) = DGW1 IP, and outer destination + IP (tunnel destination IP) = NVE1 IP. The source and inner + destination MAC addresses are not needed if IP NVO tunnels + are used. + + (4) When the packet arrives at NVE1: + + * NVE1 will identify the IP-VRF for an IP lookup based on the + label (the inner destination MAC is not needed to identify + the IP-VRF). + + * An IP lookup is performed in the routing context, where SN1 + turns out to be a local subnet associated with BD-2. A + subsequent lookup in the ARP table and the BD FIB will + provide the forwarding information for the packet in BD-2. + + The model described above is called an "interface-less" model since + the IP-VRFs are connected directly through tunnels, and they don't + require those tunnels to be terminated in SBDs instead, as in + Sections 4.4.2 or 4.4.3. + +4.4.2. Interface-ful IP-VRF-to-IP-VRF with SBD IRB + + Figure 9 depicts the Interface-ful IP-VRF-to-IP-VRF with SBD IRB + model. + + NVE1 + +------------+ DGW1 + IP10+---+(BD-1) | +---------------+ +------------+ + | \ | | | | | + |(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ + | / IRB(M1/IP1) IRB(M3/IP3) | | + +---+(BD-2) | | | +------------+ _+_ + | +------------+ | | ( ) + SN1| | VXLAN/ | ( WAN )--H1 + | NVE2 | GENEVE/ | (___) + | +------------+ | MPLS | DGW2 + + +---+(BD-2) | | | +------------+ | + | \ | | | | | | + |(IP-VRF)-(SBD)| |(SBD)-(IP-VRF)|-----+ + | / IRB(M2/IP2) IRB(M4/IP4) | + SN2+----+(BD-3) | +---------------+ +------------+ + +------------+ + + Figure 9: Interface-ful with SBD IRB Model + + In this model: + + a) As in Section 4.4.1, the NVEs and DGWs must provide connectivity + between hosts in SN1, SN2, and IP10 and in hosts sitting at the + other end of the WAN. + + b) However, the NVE/DGWs are now connected through Ethernet NVO + tunnels terminated in the SBD instance. The IP-VRFs use IRB + interfaces for their connectivity to the SBD. + + c) Each SBD IRB has an IP and a MAC address, where the IP address + must be reachable from other NVEs or DGWs. + + d) The SBD is attached to all the NVE/DGWs in the tenant domain BDs. + + e) The solution must provide Layer 3 connectivity for Ethernet NVO + tunnels -- for instance, VXLAN or GENEVE (with Ethernet payload). + + EVPN type 5 routes will be used to advertise the IP prefixes, whereas + EVPN RT-2 routes will advertise the MAC/IP addresses of each SBD IRB + interface. Each NVE/DGW will advertise an RT-5 for each of its + prefixes with the following fields: + + * RD as per [RFC7432]. + + * Ethernet Tag ID = 0. + + * IP prefix length and IP address, as explained in the previous + sections. + + * GW IP address = IRB-IP of the SBD (this is the Overlay Index that + will be used for the recursive route resolution). + + * ESI = 0. + + * Label value should be zero since the RT-5 route requires a + recursive lookup resolution to an RT-2 route. It is ignored on + reception, and the MPLS label or VNI from the RT-2's MPLS Label1 + field is used when forwarding packets. + + Each RT-5 will be sent with a Route Target identifying the tenant + (IP-VRF). The EVPN Router's MAC Extended Community should not be + sent in this case. + + The following example illustrates the procedure to advertise and + forward packets to SN1/24 (IPv4 prefix advertised from NVE1): + + (1) NVE1 advertises the following BGP routes: + + * Route type 5 (IP Prefix route) containing: + + - IPL = 24, IP = SN1, Label = SHOULD be set to 0. + + - GW IP = IP1 (SBD IRB's IP). + + - Route Target identifying the tenant (IP-VRF). + + * Route type 2 (MAC/IP Advertisement route for the SBD IRB) + containing: + + - ML = 48, M = M1, IPL = 32, IP = IP1, Label = 10. + + - A BGP Encapsulation Extended Community [RFC9012]. + + - Route Target identifying the SBD. This Route Target may + be the same as the one used with the RT-5. + + (2) DGW1 imports the received routes from NVE1: + + * DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 + Route Target. + + - Since GW IP is different from zero, the GW IP (IP1) will + be used as the Overlay Index for the recursive route + resolution to the RT-2 carrying IP1. + + (3) When DGW1 receives a packet from the WAN with destination IPx, + where IPx belongs to SN1/24: + + * A destination IP lookup is performed on the DGW1 IP-VRF + table. The lookup yields SN1/24, which is associated with + the Overlay Index IP1. The forwarding information is derived + from the RT-2 received for IP1. + + * The IP packet destined to IPx is encapsulated with: inner + source MAC = M3, inner destination MAC = M1, outer source IP + (source VTEP) = DGW1 IP, and outer destination IP + (destination VTEP) = NVE1 IP. + + (4) When the packet arrives at NVE1: + + * NVE1 will identify the IP-VRF for an IP lookup based on the + label and the inner MAC DA. + + * An IP lookup is performed in the routing context, where SN1 + turns out to be a local subnet associated with BD-2. A + subsequent lookup in the ARP table and the BD FIB will + provide the forwarding information for the packet in BD-2. + + The model described above is called an "interface-ful with SBD IRB" + model because the tunnels connecting the DGWs and NVEs need to be + terminated into the SBD. The SBD is connected to the IP-VRFs via SBD + IRB interfaces, and that allows the recursive resolution of RT-5s to + GW IP addresses. + +4.4.3. Interface-ful IP-VRF-to-IP-VRF with Unnumbered SBD IRB + + Figure 10 depicts the Interface-ful IP-VRF-to-IP-VRF with unnumbered + SBD IRB model. Note that this model is similar to the one described + in Section 4.4.2, only without IP addresses on the SBD IRB + interfaces. + + NVE1 + +------------+ DGW1 + IP1+----+(BD-1) | +---------------+ +------------+ + | \ | | | | | + |(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ + | / IRB(M1)| | IRB(M3) | | + +---+(BD-2) | | | +------------+ _+_ + | +------------+ | | ( ) + SN1| | VXLAN/ | ( WAN )--H1 + | NVE2 | GENEVE/ | (___) + | +------------+ | MPLS | DGW2 + + +---+(BD-2) | | | +------------+ | + | \ | | | | | | + |(IP-VRF)-(SBD)| (SBD)-(IP-VRF) |-----+ + | / IRB(M2)| | IRB(M4) | + SN2+----+(BD-3) | +---------------+ +------------+ + +------------+ + + Figure 10: Interface-ful with Unnumbered SBD IRB Model + + In this model: + + a) As in Sections 4.4.1 and 4.4.2, the NVEs and DGWs must provide + connectivity between hosts in SN1, SN2, and IP1 and in hosts + sitting at the other end of the WAN. + + b) As in Section 4.4.2, the NVE/DGWs are connected through Ethernet + NVO tunnels terminated in the SBD instance. The IP-VRFs use IRB + interfaces for their connectivity to the SBD. + + c) However, each SBD IRB has a MAC address only and no IP address + (which is why the model refers to an "unnumbered" SBD IRB). In + this model, there is no need to have IP reachability to the SBD + IRB interfaces themselves, and there is a requirement to limit + the number of IP addresses used. + + d) As in Section 4.4.2, the SBD is composed of all the NVE/DGW BDs + of the tenant that need inter-subnet forwarding. + + e) As in Section 4.4.2, the solution must provide Layer 3 + connectivity for Ethernet NVO tunnels -- for instance, VXLAN or + GENEVE (with Ethernet payload). + + This model will also make use of the RT-5 recursive resolution. EVPN + type 5 routes will advertise the IP prefixes along with the EVPN + Router's MAC Extended Community used for the recursive lookup, + whereas EVPN RT-2 routes will advertise the MAC addresses of each SBD + IRB interface (this time without an IP). + + Each NVE/DGW will advertise an RT-5 for each of its prefixes with the + same fields as described in Section 4.4.2, except: + + * GW IP address = set to 0. + + Each RT-5 will be sent with a Route Target identifying the tenant + (IP-VRF) and the EVPN Router's MAC Extended Community containing the + MAC address associated with the SBD IRB interface. This MAC address + may be reused for all the IP-VRFs in the NVE. + + The example is similar to the one in Section 4.4.2: + + (1) NVE1 advertises the following BGP routes: + + * Route type 5 (IP Prefix route) containing the same values as + in the example in Section 4.4.2, except: + + - GW IP = SHOULD be set to 0. + + - EVPN Router's MAC Extended Community containing M1 (this + will be used for the recursive lookup to an RT-2). + + * Route type 2 (MAC route for the SBD IRB) with the same values + as in Section 4.4.2, except: + + - ML = 48, M = M1, IPL = 0, Label = 10. + + (2) DGW1 imports the received routes from NVE1: + + * DGW1 installs SN1/24 in the IP-VRF identified by the RT-5 + Route Target. + + - The MAC contained in the EVPN Router's MAC Extended + Community sent along with the RT-5 (M1) will be used as + the Overlay Index for the recursive route resolution to + the RT-2 carrying M1. + + (3) When DGW1 receives a packet from the WAN with destination IPx, + where IPx belongs to SN1/24: + + * A destination IP lookup is performed on the DGW1 IP-VRF + table. The lookup yields SN1/24, which is associated with + the Overlay Index M1. The forwarding information is derived + from the RT-2 received for M1. + + * The IP packet destined to IPx is encapsulated with: inner + source MAC = M3, inner destination MAC = M1, outer source IP + (source VTEP) = DGW1 IP, and outer destination IP + (destination VTEP) = NVE1 IP. + + (4) When the packet arrives at NVE1: + + * NVE1 will identify the IP-VRF for an IP lookup based on the + label and the inner MAC DA. + + * An IP lookup is performed in the routing context, where SN1 + turns out to be a local subnet associated with BD-2. A + subsequent lookup in the ARP table and the BD FIB will + provide the forwarding information for the packet in BD-2. + + The model described above is called an "interface-ful with unnumbered + SBD IRB" model (as in Section 4.4.2) but without the SBD IRB having + an IP address. + +5. Security Considerations + + This document provides a set of procedures to achieve inter-subnet + forwarding across NVEs or PEs attached to a group of BDs that belong + to the same tenant (or VPN). The security considerations discussed + in [RFC7432] apply to the intra-subnet forwarding or communication + within each of those BDs. In addition, the security considerations + in [RFC4364] should also be understood, since this document and + [RFC4364] may be used in similar applications. + + Contrary to [RFC4364], this document does not describe PE/CE route + distribution techniques but rather considers the CEs as TSs or VAs + that do not run dynamic routing protocols. This can be considered a + security advantage, since dynamic routing protocols can be blocked on + the NVE/PE ACs, not allowing the tenant to interact with the + infrastructure's dynamic routing protocols. + + In this document, the RT-5 may use a regular BGP next hop for its + resolution or an Overlay Index that requires a recursive resolution + to a different EVPN route (an RT-2 or an RT-1). In the latter case, + it is worth noting that any action that ends up filtering or + modifying the RT-2 or RT-1 routes used to convey the Overlay Indexes + will modify the resolution of the RT-5 and therefore the forwarding + of packets to the remote subnet. + +6. IANA Considerations + + IANA has registered value 5 in the "EVPN Route Types" registry + [EVPNRouteTypes] defined by [RFC7432] as follows: + + +=======+=============+===========+ + | Value | Description | Reference | + +=======+=============+===========+ + | 5 | IP Prefix | RFC 9136 | + +-------+-------------+-----------+ + + Table 3 + +7. References + +7.1. Normative References + + [EVPNRouteTypes] + IANA, "EVPN Route Types", + <https://www.iana.org/assignments/evpn>. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., + Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based + Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February + 2015, <https://www.rfc-editor.org/info/rfc7432>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + + [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., + Uttaro, J., and W. Henderickx, "A Network Virtualization + Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, + DOI 10.17487/RFC8365, March 2018, + <https://www.rfc-editor.org/info/rfc8365>. + + [RFC9012] Patel, K., Van de Velde, G., Sangli, S., and J. Scudder, + "The BGP Tunnel Encapsulation Attribute", RFC 9012, + DOI 10.17487/RFC9012, April 2021, + <https://www.rfc-editor.org/info/rfc9012>. + + [RFC9135] Sajassi, A., Salam, S., Thoria, S., Drake, J., and J. + Rabadan, "Integrated Routing and Bridging in Ethernet VPN + (EVPN)", RFC 9135, DOI 10.17487/RFC9135, October 2021, + <https://www.rfc-editor.org/info/rfc9135>. + +7.2. Informative References + + [IEEE-802.1Q] + IEEE, "IEEE Standard for Local and Metropolitan Area + Networks -- Bridges and Bridged Networks", + DOI 10.1109/IEEESTD.2018.8403927, IEEE Std 802.1Q, July + 2018, + <https://standards.ieee.org/standard/802_1Q-2018.html>. + + [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private + Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February + 2006, <https://www.rfc-editor.org/info/rfc4364>. + + [RFC5227] Cheshire, S., "IPv4 Address Conflict Detection", RFC 5227, + DOI 10.17487/RFC5227, July 2008, + <https://www.rfc-editor.org/info/rfc5227>. + + [RFC5798] Nadas, S., Ed., "Virtual Router Redundancy Protocol (VRRP) + Version 3 for IPv4 and IPv6", RFC 5798, + DOI 10.17487/RFC5798, March 2010, + <https://www.rfc-editor.org/info/rfc5798>. + + [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, + L., Sridhar, T., Bursell, M., and C. Wright, "Virtual + eXtensible Local Area Network (VXLAN): A Framework for + Overlaying Virtualized Layer 2 Networks over Layer 3 + Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, + <https://www.rfc-editor.org/info/rfc7348>. + + [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. + Rekhter, "Framework for Data Center (DC) Network + Virtualization", RFC 7365, DOI 10.17487/RFC7365, October + 2014, <https://www.rfc-editor.org/info/rfc7365>. + + [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. + Patel, "Revised Error Handling for BGP UPDATE Messages", + RFC 7606, DOI 10.17487/RFC7606, August 2015, + <https://www.rfc-editor.org/info/rfc7606>. + + [RFC8926] Gross, J., Ed., Ganga, I., Ed., and T. Sridhar, Ed., + "Geneve: Generic Network Virtualization Encapsulation", + RFC 8926, DOI 10.17487/RFC8926, November 2020, + <https://www.rfc-editor.org/info/rfc8926>. + +Acknowledgments + + The authors would like to thank Mukul Katiyar, Jeffrey Zhang, and + Alex Nichol for their valuable feedback and contributions. Tony + Przygienda and Thomas Morin also helped improve this document with + their feedback. Special thanks to Eric Rosen for his detailed + review, which really helped improve the readability and clarify the + concepts. We also thank Alvaro Retana for his thorough review. + +Contributors + + In addition to the authors listed on the front page, the following + coauthors have also contributed to this document: + + Senthil Sathappan + Florin Balus + Aldrin Isaac + Senad Palislamovic + Samir Thoria + +Authors' Addresses + + Jorge Rabadan (editor) + Nokia + 777 E. Middlefield Road + Mountain View, CA 94043 + United States of America + + Email: jorge.rabadan@nokia.com + + + Wim Henderickx + Nokia + + Email: wim.henderickx@nokia.com + + + John Drake + Juniper + + Email: jdrake@juniper.net + + + Wen Lin + Juniper + + Email: wlin@juniper.net + + + Ali Sajassi + Cisco + + Email: sajassi@cisco.com |