diff options
Diffstat (limited to 'doc/rfc/rfc8014.txt')
-rw-r--r-- | doc/rfc/rfc8014.txt | 1963 |
1 files changed, 1963 insertions, 0 deletions
diff --git a/doc/rfc/rfc8014.txt b/doc/rfc/rfc8014.txt new file mode 100644 index 0000000..aa89ee2 --- /dev/null +++ b/doc/rfc/rfc8014.txt @@ -0,0 +1,1963 @@ + + + + + + +Internet Engineering Task Force (IETF) D. Black +Request for Comments: 8014 Dell EMC +Category: Informational J. Hudson +ISSN: 2070-1721 L. Kreeger + M. Lasserre + Independent + T. Narten + IBM + December 2016 + + + An Architecture for + Data-Center Network Virtualization over Layer 3 (NVO3) + +Abstract + + This document presents a high-level overview architecture for + building data-center Network Virtualization over Layer 3 (NVO3) + networks. The architecture is given at a high level, showing the + major components of an overall system. An important goal is to + divide the space into individual smaller components that can be + implemented independently with clear inter-component interfaces and + interactions. It should be possible to build and implement + individual components in isolation and have them interoperate with + other independently implemented components. That way, implementers + have flexibility in implementing individual components and can + optimize and innovate within their respective components without + requiring changes to other components. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Not all documents + approved by the IESG are a candidate for any level of Internet + Standard; see Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc8014. + + + + + + + +Black, et al. Informational [Page 1] + +RFC 8014 NVO3 Architecture December 2016 + + +Copyright Notice + + Copyright (c) 2016 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Black, et al. Informational [Page 2] + +RFC 8014 NVO3 Architecture December 2016 + + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 + 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 + 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 5 + 3.1. VN Service (L2 and L3) . . . . . . . . . . . . . . . . . 7 + 3.1.1. VLAN Tags in L2 Service . . . . . . . . . . . . . . . 8 + 3.1.2. Packet Lifetime Considerations . . . . . . . . . . . 8 + 3.2. Network Virtualization Edge (NVE) Background . . . . . . 9 + 3.3. Network Virtualization Authority (NVA) Background . . . . 10 + 3.4. VM Orchestration Systems . . . . . . . . . . . . . . . . 11 + 4. Network Virtualization Edge (NVE) . . . . . . . . . . . . . . 12 + 4.1. NVE Co-located with Server Hypervisor . . . . . . . . . . 12 + 4.2. Split-NVE . . . . . . . . . . . . . . . . . . . . . . . . 13 + 4.2.1. Tenant VLAN Handling in Split-NVE Case . . . . . . . 14 + 4.3. NVE State . . . . . . . . . . . . . . . . . . . . . . . . 14 + 4.4. Multihoming of NVEs . . . . . . . . . . . . . . . . . . . 15 + 4.5. Virtual Access Point (VAP) . . . . . . . . . . . . . . . 16 + 5. Tenant System Types . . . . . . . . . . . . . . . . . . . . . 16 + 5.1. Overlay-Aware Network Service Appliances . . . . . . . . 16 + 5.2. Bare Metal Servers . . . . . . . . . . . . . . . . . . . 17 + 5.3. Gateways . . . . . . . . . . . . . . . . . . . . . . . . 17 + 5.3.1. Gateway Taxonomy . . . . . . . . . . . . . . . . . . 18 + 5.3.1.1. L2 Gateways (Bridging) . . . . . . . . . . . . . 18 + 5.3.1.2. L3 Gateways (Only IP Packets) . . . . . . . . . . 18 + 5.4. Distributed Inter-VN Gateways . . . . . . . . . . . . . . 19 + 5.5. ARP and Neighbor Discovery . . . . . . . . . . . . . . . 20 + 6. NVE-NVE Interaction . . . . . . . . . . . . . . . . . . . . . 20 + 7. Network Virtualization Authority (NVA) . . . . . . . . . . . 21 + 7.1. How an NVA Obtains Information . . . . . . . . . . . . . 21 + 7.2. Internal NVA Architecture . . . . . . . . . . . . . . . . 22 + 7.3. NVA External Interface . . . . . . . . . . . . . . . . . 22 + 8. NVE-NVA Protocol . . . . . . . . . . . . . . . . . . . . . . 24 + 8.1. NVE-NVA Interaction Models . . . . . . . . . . . . . . . 24 + 8.2. Direct NVE-NVA Protocol . . . . . . . . . . . . . . . . . 25 + 8.3. Propagating Information Between NVEs and NVAs . . . . . . 25 + 9. Federated NVAs . . . . . . . . . . . . . . . . . . . . . . . 26 + 9.1. Inter-NVA Peering . . . . . . . . . . . . . . . . . . . . 29 + 10. Control Protocol Work Areas . . . . . . . . . . . . . . . . . 29 + 11. NVO3 Data-Plane Encapsulation . . . . . . . . . . . . . . . . 29 + 12. Operations, Administration, and Maintenance (OAM) . . . . . . 30 + 13. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 + 14. Security Considerations . . . . . . . . . . . . . . . . . . . 31 + 15. Informative References . . . . . . . . . . . . . . . . . . . 32 + Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 34 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 + + + + + +Black, et al. Informational [Page 3] + +RFC 8014 NVO3 Architecture December 2016 + + +1. Introduction + + This document presents a high-level architecture for building data- + center Network Virtualization over Layer 3 (NVO3) networks. The + architecture is given at a high level, which shows the major + components of an overall system. An important goal is to divide the + space into smaller individual components that can be implemented + independently with clear inter-component interfaces and interactions. + It should be possible to build and implement individual components in + isolation and have them interoperate with other independently + implemented components. That way, implementers have flexibility in + implementing individual components and can optimize and innovate + within their respective components without requiring changes to other + components. + + The motivation for overlay networks is given in "Problem Statement: + Overlays for Network Virtualization" [RFC7364]. "Framework for Data + Center (DC) Network Virtualization" [RFC7365] provides a framework + for discussing overlay networks generally and the various components + that must work together in building such systems. This document + differs from the framework document in that it doesn't attempt to + cover all possible approaches within the general design space. + Rather, it describes one particular approach that the NVO3 WG has + focused on. + +2. Terminology + + This document uses the same terminology as [RFC7365]. In addition, + the following terms are used: + + NV Domain: A Network Virtualization Domain is an administrative + construct that defines a Network Virtualization Authority (NVA), + the set of Network Virtualization Edges (NVEs) associated with + that NVA, and the set of virtual networks the NVA manages and + supports. NVEs are associated with a (logically centralized) NVA, + and an NVE supports communication for any of the virtual networks + in the domain. + + NV Region: A region over which information about a set of virtual + networks is shared. The degenerate case of a single NV Domain + corresponds to an NV Region corresponding to that domain. The + more interesting case occurs when two or more NV Domains share + information about part or all of a set of virtual networks that + they manage. Two NVAs share information about particular virtual + networks for the purpose of supporting connectivity between + tenants located in different NV Domains. NVAs can share + information about an entire NV Domain, or just individual virtual + networks. + + + +Black, et al. Informational [Page 4] + +RFC 8014 NVO3 Architecture December 2016 + + + Tenant System Interface (TSI): The interface to a Virtual Network + (VN) as presented to a Tenant System (TS, see [RFC7365]). The TSI + logically connects to the NVE via a Virtual Access Point (VAP). + To the Tenant System, the TSI is like a Network Interface Card + (NIC); the TSI presents itself to a Tenant System as a normal + network interface. + + VLAN: Unless stated otherwise, the terms "VLAN" and "VLAN Tag" are + used in this document to denote a Customer VLAN (C-VLAN) + [IEEE.802.1Q]; the terms are used interchangeably to improve + readability. + +3. Background + + Overlay networks are an approach for providing network virtualization + services to a set of Tenant Systems (TSs) [RFC7365]. With overlays, + data traffic between tenants is tunneled across the underlying data + center's IP network. The use of tunnels provides a number of + benefits by decoupling the network as viewed by tenants from the + underlying physical network across which they communicate. + Additional discussion of some NVO3 use cases can be found in + [USECASES]. + + Tenant Systems connect to Virtual Networks (VNs), with each VN having + associated attributes defining properties of the network (such as the + set of members that connect to it). Tenant Systems connected to a + virtual network typically communicate freely with other Tenant + Systems on the same VN, but communication between Tenant Systems on + one VN and those external to the VN (whether on another VN or + connected to the Internet) is carefully controlled and governed by + policy. The NVO3 architecture does not impose any restrictions to + the application of policy controls even within a VN. + + A Network Virtualization Edge (NVE) [RFC7365] is the entity that + implements the overlay functionality. An NVE resides at the boundary + between a Tenant System and the overlay network as shown in Figure 1. + An NVE creates and maintains local state about each VN for which it + is providing service on behalf of a Tenant System. + + + + + + + + + + + + + +Black, et al. Informational [Page 5] + +RFC 8014 NVO3 Architecture December 2016 + + + +--------+ +--------+ + | Tenant +--+ +----| Tenant | + | System | | (') | System | + +--------+ | ................ ( ) +--------+ + | +-+--+ . . +--+-+ (_) + | | NVE|--. .--| NVE| | + +--| | . . | |---+ + +-+--+ . . +--+-+ + / . . + / . L3 Overlay . +--+-++--------+ + +--------+ / . Network . | NVE|| Tenant | + | Tenant +--+ . .- -| || System | + | System | . . +--+-++--------+ + +--------+ ................ + | + +----+ + | NVE| + | | + +----+ + | + | + ===================== + | | + +--------+ +--------+ + | Tenant | | Tenant | + | System | | System | + +--------+ +--------+ + + + Figure 1: NVO3 Generic Reference Model + + The following subsections describe key aspects of an overlay system + in more detail. Section 3.1 describes the service model (Ethernet + vs. IP) provided to Tenant Systems. Section 3.2 describes NVEs in + more detail. Section 3.3 introduces the Network Virtualization + Authority, from which NVEs obtain information about virtual networks. + Section 3.4 provides background on Virtual Machine (VM) orchestration + systems and their use of virtual networks. + + + + + + + + + + + + + +Black, et al. Informational [Page 6] + +RFC 8014 NVO3 Architecture December 2016 + + +3.1. VN Service (L2 and L3) + + A VN provides either Layer 2 (L2) or Layer 3 (L3) service to + connected tenants. For L2 service, VNs transport Ethernet frames, + and a Tenant System is provided with a service that is analogous to + being connected to a specific L2 C-VLAN. L2 broadcast frames are + generally delivered to all (and multicast frames delivered to a + subset of) the other Tenant Systems on the VN. To a Tenant System, + it appears as if they are connected to a regular L2 Ethernet link. + Within the NVO3 architecture, tenant frames are tunneled to remote + NVEs based on the Media Access Control (MAC) addresses of the frame + headers as originated by the Tenant System. On the underlay, NVO3 + packets are forwarded between NVEs based on the outer addresses of + tunneled packets. + + For L3 service, VNs are routed networks that transport IP datagrams, + and a Tenant System is provided with a service that supports only IP + traffic. Within the NVO3 architecture, tenant frames are tunneled to + remote NVEs based on the IP addresses of the packet originated by the + Tenant System; any L2 destination addresses provided by Tenant + Systems are effectively ignored by the NVEs and overlay network. For + L3 service, the Tenant System will be configured with an IP subnet + that is effectively a point-to-point link, i.e., having only the + Tenant System and a next-hop router address on it. + + L2 service is intended for systems that need native L2 Ethernet + service and the ability to run protocols directly over Ethernet + (i.e., not based on IP). L3 service is intended for systems in which + all the traffic can safely be assumed to be IP. It is important to + note that whether or not an NVO3 network provides L2 or L3 service to + a Tenant System, the Tenant System does not generally need to be + aware of the distinction. In both cases, the virtual network + presents itself to the Tenant System as an L2 Ethernet interface. An + Ethernet interface is used in both cases simply as a widely supported + interface type that essentially all Tenant Systems already support. + Consequently, no special software is needed on Tenant Systems to use + an L3 vs. an L2 overlay service. + + NVO3 can also provide a combined L2 and L3 service to tenants. A + combined service provides L2 service for intra-VN communication but + also provides L3 service for L3 traffic entering or leaving the VN. + Architecturally, the handling of a combined L2/L3 service within the + NVO3 architecture is intended to match what is commonly done today in + non-overlay environments by devices providing a combined bridge/ + router service. With combined service, the virtual network itself + retains the semantics of L2 service, and all traffic is processed + according to its L2 semantics. In addition, however, traffic + requiring IP processing is also processed at the IP level. + + + +Black, et al. Informational [Page 7] + +RFC 8014 NVO3 Architecture December 2016 + + + The IP processing for a combined service can be implemented on a + standalone device attached to the virtual network (e.g., an IP + router) or implemented locally on the NVE (see Section 5.4 on + Distributed Inter-VN Gateways). For unicast traffic, NVE + implementation of a combined service may result in a packet being + delivered to another Tenant System attached to the same NVE (on + either the same or a different VN), tunneled to a remote NVE, or even + forwarded outside the NV Domain. For multicast or broadcast packets, + the combination of NVE L2 and L3 processing may result in copies of + the packet receiving both L2 and L3 treatments to realize delivery to + all of the destinations involved. This distributed NVE + implementation of IP routing results in the same network delivery + behavior as if the L2 processing of the packet included delivery of + the packet to an IP router attached to the L2 VN as a Tenant System, + with the router having additional network attachments to other + networks, either virtual or not. + +3.1.1. VLAN Tags in L2 Service + + An NVO3 L2 virtual network service may include encapsulated L2 VLAN + tags provided by a Tenant System but does not use encapsulated tags + in deciding where and how to forward traffic. Such VLAN tags can be + passed through so that Tenant Systems that send or expect to receive + them can be supported as appropriate. + + The processing of VLAN tags that an NVE receives from a TS is + controlled by settings associated with the VAP. Just as in the case + with ports on Ethernet switches, a number of settings are possible. + For example, Customer VLAN Tags (C-TAGs) can be passed through + transparently, could always be stripped upon receipt from a Tenant + System, could be compared against a list of explicitly configured + tags, etc. + + Note that there are additional considerations when VLAN tags are used + to identify both the VN and a Tenant System VLAN within that VN, as + described in Section 4.2.1. + +3.1.2. Packet Lifetime Considerations + + For L3 service, Tenant Systems should expect the IPv4 Time to Live + (TTL) or IPv6 Hop Limit in the packets they send to be decremented by + at least 1. For L2 service, neither the TTL nor the Hop Limit (when + the packet is IP) is modified. The underlay network manages TTLs and + Hop Limits in the outer IP encapsulation -- the values in these + fields could be independent from or related to the values in the same + fields of tenant IP packets. + + + + + +Black, et al. Informational [Page 8] + +RFC 8014 NVO3 Architecture December 2016 + + +3.2. Network Virtualization Edge (NVE) Background + + Tenant Systems connect to NVEs via a Tenant System Interface (TSI). + The TSI logically connects to the NVE via a Virtual Access Point + (VAP), and each VAP is associated with one VN as shown in Figure 2. + To the Tenant System, the TSI is like a NIC; the TSI presents itself + to a Tenant System as a normal network interface. On the NVE side, a + VAP is a logical network port (virtual or physical) into a specific + virtual network. Note that two different Tenant Systems (and TSIs) + attached to a common NVE can share a VAP (e.g., TS1 and TS2 in + Figure 2) so long as they connect to the same VN. + + | Data-Center Network (IP) | + | | + +-----------------------------------------+ + | | + | Tunnel Overlay | + +------------+---------+ +---------+------------+ + | +----------+-------+ | | +-------+----------+ | + | | Overlay Module | | | | Overlay Module | | + | +---------+--------+ | | +---------+--------+ | + | | | | | | + NVE1 | | | | | | NVE2 + | +--------+-------+ | | +--------+-------+ | + | | VNI1 VNI2 | | | | VNI1 VNI2 | | + | +-+----------+---+ | | +-+-----------+--+ | + | | VAP1 | VAP2 | | | VAP1 | VAP2| + +----+----------+------+ +----+-----------+-----+ + | | | | + |\ | | | + | \ | | /| + -------+--\-------+-------------------+---------/-+------- + | \ | Tenant | / | + TSI1 |TSI2\ | TSI3 TSI1 TSI2/ TSI3 + +---+ +---+ +---+ +---+ +---+ +---+ + |TS1| |TS2| |TS3| |TS4| |TS5| |TS6| + +---+ +---+ +---+ +---+ +---+ +---+ + + Figure 2: NVE Reference Model + + The Overlay Module performs the actual encapsulation and + decapsulation of tunneled packets. The NVE maintains state about the + virtual networks it is a part of so that it can provide the Overlay + Module with information such as the destination address of the NVE to + tunnel a packet to and the Context ID that should be placed in the + encapsulation header to identify the virtual network that a tunneled + packet belongs to. + + + + +Black, et al. Informational [Page 9] + +RFC 8014 NVO3 Architecture December 2016 + + + On the side facing the data-center network, the NVE sends and + receives native IP traffic. When ingressing traffic from a Tenant + System, the NVE identifies the egress NVE to which the packet should + be sent, adds an overlay encapsulation header, and sends the packet + on the underlay network. When receiving traffic from a remote NVE, + an NVE strips off the encapsulation header and delivers the + (original) packet to the appropriate Tenant System. When the source + and destination Tenant System are on the same NVE, no encapsulation + is needed and the NVE forwards traffic directly. + + Conceptually, the NVE is a single entity implementing the NVO3 + functionality. In practice, there are a number of different + implementation scenarios, as described in detail in Section 4. + +3.3. Network Virtualization Authority (NVA) Background + + Address dissemination refers to the process of learning, building, + and distributing the mapping/forwarding information that NVEs need in + order to tunnel traffic to each other on behalf of communicating + Tenant Systems. For example, in order to send traffic to a remote + Tenant System, the sending NVE must know the destination NVE for that + Tenant System. + + One way to build and maintain mapping tables is to use learning, as + 802.1 bridges do [IEEE.802.1Q]. When forwarding traffic to multicast + or unknown unicast destinations, an NVE could simply flood traffic. + While flooding works, it can lead to traffic hot spots and to + problems in larger networks (e.g., excessive amounts of flooded + traffic). + + Alternatively, to reduce the scope of where flooding must take place, + or to eliminate it all together, NVEs can make use of a Network + Virtualization Authority (NVA). An NVA is the entity that provides + address mapping and other information to NVEs. NVEs interact with an + NVA to obtain any required address-mapping information they need in + order to properly forward traffic on behalf of tenants. The term + "NVA" refers to the overall system, without regard to its scope or + how it is implemented. NVAs provide a service, and NVEs access that + service via an NVE-NVA protocol as discussed in Section 8. + + Even when an NVA is present, Ethernet bridge MAC address learning + could be used as a fallback mechanism, should the NVA be unable to + provide an answer or for other reasons. This document does not + consider flooding approaches in detail, as there are a number of + benefits in using an approach that depends on the presence of an NVA. + + For the rest of this document, it is assumed that an NVA exists and + will be used. NVAs are discussed in more detail in Section 7. + + + +Black, et al. Informational [Page 10] + +RFC 8014 NVO3 Architecture December 2016 + + +3.4. VM Orchestration Systems + + VM orchestration systems manage server virtualization across a set of + servers. Although VM management is a separate topic from network + virtualization, the two areas are closely related. Managing the + creation, placement, and movement of VMs also involves creating, + attaching to, and detaching from virtual networks. A number of + existing VM orchestration systems have incorporated aspects of + virtual network management into their systems. + + Note also that although this section uses the terms "VM" and + "hypervisor" throughout, the same issues apply to other + virtualization approaches, including Linux Containers (LXC), BSD + Jails, Network Service Appliances as discussed in Section 5.1, etc. + From an NVO3 perspective, it should be assumed that where the + document uses the term "VM" and "hypervisor", the intention is that + the discussion also applies to other systems, where, e.g., the host + operating system plays the role of the hypervisor in supporting + virtualization, and a container plays the equivalent role as a VM. + + When a new VM image is started, the VM orchestration system + determines where the VM should be placed, interacts with the + hypervisor on the target server to load and start the VM, and + controls when a VM should be shut down or migrated elsewhere. VM + orchestration systems also have knowledge about how a VM should + connect to a network, possibly including the name of the virtual + network to which a VM is to connect. The VM orchestration system can + pass such information to the hypervisor when a VM is instantiated. + VM orchestration systems have significant (and sometimes global) + knowledge over the domain they manage. They typically know on what + servers a VM is running, and metadata associated with VM images can + be useful from a network virtualization perspective. For example, + the metadata may include the addresses (MAC and IP) the VMs will use + and the name(s) of the virtual network(s) they connect to. + + VM orchestration systems run a protocol with an agent running on the + hypervisor of the servers they manage. That protocol can also carry + information about what virtual network a VM is associated with. When + the orchestrator instantiates a VM on a hypervisor, the hypervisor + interacts with the NVE in order to attach the VM to the virtual + networks it has access to. In general, the hypervisor will need to + communicate significant VM state changes to the NVE. In the reverse + direction, the NVE may need to communicate network connectivity + information back to the hypervisor. Examples of deployed VM + orchestration systems include VMware's vCenter Server, Microsoft's + System Center Virtual Machine Manager, and systems based on OpenStack + and its associated plugins (e.g., Nova and Neutron). Each can pass + information about what virtual networks a VM connects to down to the + + + +Black, et al. Informational [Page 11] + +RFC 8014 NVO3 Architecture December 2016 + + + hypervisor. The protocol used between the VM orchestration system + and hypervisors is generally proprietary. + + It should be noted that VM orchestration systems may not have direct + access to all networking-related information a VM uses. For example, + a VM may make use of additional IP or MAC addresses that the VM + management system is not aware of. + +4. Network Virtualization Edge (NVE) + + As introduced in Section 3.2, an NVE is the entity that implements + the overlay functionality. This section describes NVEs in more + detail. An NVE will have two external interfaces: + + Facing the Tenant System: On the side facing the Tenant System, an + NVE interacts with the hypervisor (or equivalent entity) to + provide the NVO3 service. An NVE will need to be notified when a + Tenant System "attaches" to a virtual network (so it can validate + the request and set up any state needed to send and receive + traffic on behalf of the Tenant System on that VN). Likewise, an + NVE will need to be informed when the Tenant System "detaches" + from the virtual network so that it can reclaim state and + resources appropriately. + + Facing the Data-Center Network: On the side facing the data-center + network, an NVE interfaces with the data-center underlay network, + sending and receiving tunneled packets to and from the underlay. + The NVE may also run a control protocol with other entities on the + network, such as the Network Virtualization Authority. + +4.1. NVE Co-located with Server Hypervisor + + When server virtualization is used, the entire NVE functionality will + typically be implemented as part of the hypervisor and/or virtual + switch on the server. In such cases, the Tenant System interacts + with the hypervisor, and the hypervisor interacts with the NVE. + Because the interaction between the hypervisor and NVE is implemented + entirely in software on the server, there is no "on-the-wire" + protocol between Tenant Systems (or the hypervisor) and the NVE that + needs to be standardized. While there may be APIs between the NVE + and hypervisor to support necessary interaction, the details of such + APIs are not in scope for the NVO3 WG at the time of publication of + this memo. + + Implementing NVE functionality entirely on a server has the + disadvantage that server CPU resources must be spent implementing the + NVO3 functionality. Experimentation with overlay approaches and + previous experience with TCP and checksum adapter offloads suggest + + + +Black, et al. Informational [Page 12] + +RFC 8014 NVO3 Architecture December 2016 + + + that offloading certain NVE operations (e.g., encapsulation and + decapsulation operations) onto the physical network adapter can + produce performance advantages. As has been done with checksum and/ + or TCP server offload and other optimization approaches, there may be + benefits to offloading common operations onto adapters where + possible. Just as important, the addition of an overlay header can + disable existing adapter offload capabilities that are generally not + prepared to handle the addition of a new header or other operations + associated with an NVE. + + While the exact details of how to split the implementation of + specific NVE functionality between a server and its network adapters + are an implementation matter and outside the scope of IETF + standardization, the NVO3 architecture should be cognizant of and + support such separation. Ideally, it may even be possible to bypass + the hypervisor completely on critical data-path operations so that + packets between a Tenant System and its VN can be sent and received + without having the hypervisor involved in each individual packet + operation. + +4.2. Split-NVE + + Another possible scenario leads to the need for a split-NVE + implementation. An NVE running on a server (e.g., within a + hypervisor) could support NVO3 service towards the tenant but not + perform all NVE functions (e.g., encapsulation) directly on the + server; some of the actual NVO3 functionality could be implemented on + (i.e., offloaded to) an adjacent switch to which the server is + attached. While one could imagine a number of link types between a + server and the NVE, one simple deployment scenario would involve a + server and NVE separated by a simple L2 Ethernet link. A more + complicated scenario would have the server and NVE separated by a + bridged access network, such as when the NVE resides on a Top of Rack + (ToR) switch, with an embedded switch residing between servers and + the ToR switch. + + For the split-NVE case, protocols will be needed that allow the + hypervisor and NVE to negotiate and set up the necessary state so + that traffic sent across the access link between a server and the NVE + can be associated with the correct virtual network instance. + Specifically, on the access link, traffic belonging to a specific + Tenant System would be tagged with a specific VLAN C-TAG that + identifies which specific NVO3 virtual network instance it connects + to. The hypervisor-NVE protocol would negotiate which VLAN C-TAG to + use for a particular virtual network instance. More details of the + protocol requirements for functionality between hypervisors and NVEs + can be found in [NVE-NVA]. + + + + +Black, et al. Informational [Page 13] + +RFC 8014 NVO3 Architecture December 2016 + + +4.2.1. Tenant VLAN Handling in Split-NVE Case + + Preserving tenant VLAN tags across an NVO3 VN, as described in + Section 3.1.1, poses additional complications in the split-NVE case. + The portion of the NVE that performs the encapsulation function needs + access to the specific VLAN tags that the Tenant System is using in + order to include them in the encapsulated packet. When an NVE is + implemented entirely within the hypervisor, the NVE has access to the + complete original packet (including any VLAN tags) sent by the + tenant. In the split-NVE case, however, the VLAN tag used between + the hypervisor and offloaded portions of the NVE normally only + identifies the specific VN that traffic belongs to. In order to + allow a tenant to preserve VLAN information from end to end between + Tenant Systems in the split-NVE case, additional mechanisms would be + needed (e.g., carry an additional VLAN tag by carrying both a C-TAG + and a Service VLAN Tag (S-TAG) as specified in [IEEE.802.1Q] where + the C-TAG identifies the tenant VLAN end to end and the S-TAG + identifies the VN locally between each Tenant System and the + corresponding NVE). + +4.3. NVE State + + NVEs maintain internal data structures and state to support the + sending and receiving of tenant traffic. An NVE may need some or all + of the following information: + + 1. An NVE keeps track of which attached Tenant Systems are connected + to which virtual networks. When a Tenant System attaches to a + virtual network, the NVE will need to create or update the local + state for that virtual network. When the last Tenant System + detaches from a given VN, the NVE can reclaim state associated + with that VN. + + 2. For tenant unicast traffic, an NVE maintains a per-VN table of + mappings from Tenant System (inner) addresses to remote NVE + (outer) addresses. + + 3. For tenant multicast (or broadcast) traffic, an NVE maintains a + per-VN table of mappings and other information on how to deliver + tenant multicast (or broadcast) traffic. If the underlying + network supports IP multicast, the NVE could use IP multicast to + deliver tenant traffic. In such a case, the NVE would need to + know what IP underlay multicast address to use for a given VN. + Alternatively, if the underlying network does not support + multicast, a source NVE could use unicast replication to deliver + traffic. In such a case, an NVE would need to know which remote + NVEs are participating in the VN. An NVE could use both + approaches, switching from one mode to the other depending on + + + +Black, et al. Informational [Page 14] + +RFC 8014 NVO3 Architecture December 2016 + + + factors such as bandwidth efficiency and group membership + sparseness. [FRAMEWORK-MCAST] discusses the subject of multicast + handling in NVO3 in further detail. + + 4. An NVE maintains necessary information to encapsulate outgoing + traffic, including what type of encapsulation and what value to + use for a Context ID to identify the VN within the encapsulation + header. + + 5. In order to deliver incoming encapsulated packets to the correct + Tenant Systems, an NVE maintains the necessary information to map + incoming traffic to the appropriate VAP (i.e., TSI). + + 6. An NVE may find it convenient to maintain additional per-VN + information such as QoS settings, Path MTU information, Access + Control Lists (ACLs), etc. + +4.4. Multihoming of NVEs + + NVEs may be multihomed. That is, an NVE may have more than one IP + address associated with it on the underlay network. Multihoming + happens in two different scenarios. First, an NVE may have multiple + interfaces connecting it to the underlay. Each of those interfaces + will typically have a different IP address, resulting in a specific + Tenant Address (on a specific VN) being reachable through the same + NVE but through more than one underlay IP address. Second, a + specific Tenant System may be reachable through more than one NVE, + each having one or more underlay addresses. In both cases, NVE + address-mapping functionality needs to support one-to-many mappings + and enable a sending NVE to (at a minimum) be able to fail over from + one IP address to another, e.g., should a specific NVE underlay + address become unreachable. + + Finally, multihomed NVEs introduce complexities when source unicast + replication is used to implement tenant multicast as described in + Section 4.3. Specifically, an NVE should only receive one copy of a + replicated packet. + + Multihoming is needed to support important use cases. First, a bare + metal server may have multiple uplink connections to either the same + or different NVEs. Having only a single physical path to an upstream + NVE, or indeed, having all traffic flow through a single NVE would be + considered unacceptable in highly resilient deployment scenarios that + seek to avoid single points of failure. Moreover, in today's + networks, the availability of multiple paths would require that they + be usable in an active-active fashion (e.g., for load balancing). + + + + + +Black, et al. Informational [Page 15] + +RFC 8014 NVO3 Architecture December 2016 + + +4.5. Virtual Access Point (VAP) + + The VAP is the NVE side of the interface between the NVE and the TS. + Traffic to and from the tenant flows through the VAP. If an NVE runs + into difficulties sending traffic received on the VAP, it may need to + signal such errors back to the VAP. Because the VAP is an emulation + of a physical port, its ability to signal NVE errors is limited and + lacks sufficient granularity to reflect all possible errors an NVE + may encounter (e.g., inability to reach a particular destination). + Some errors, such as an NVE losing all of its connections to the + underlay, could be reflected back to the VAP by effectively disabling + it. This state change would reflect itself on the TS as an interface + going down, allowing the TS to implement interface error handling + (e.g., failover) in the same manner as when a physical interface + becomes disabled. + +5. Tenant System Types + + This section describes a number of special Tenant System types and + how they fit into an NVO3 system. + +5.1. Overlay-Aware Network Service Appliances + + Some Network Service Appliances [NVE-NVA] (virtual or physical) + provide tenant-aware services. That is, the specific service they + provide depends on the identity of the tenant making use of the + service. For example, firewalls are now becoming available that + support multitenancy where a single firewall provides virtual + firewall service on a per-tenant basis, using per-tenant + configuration rules and maintaining per-tenant state. Such + appliances will be aware of the VN an activity corresponds to while + processing requests. Unlike server virtualization, which shields VMs + from needing to know about multitenancy, a Network Service Appliance + may explicitly support multitenancy. In such cases, the Network + Service Appliance itself will be aware of network virtualization and + either embed an NVE directly or implement a split-NVE as described in + Section 4.2. Unlike server virtualization, however, the Network + Service Appliance may not be running a hypervisor, and the VM + orchestration system may not interact with the Network Service + Appliance. The NVE on such appliances will need to support a control + plane to obtain the necessary information needed to fully participate + in an NV Domain. + + + + + + + + + +Black, et al. Informational [Page 16] + +RFC 8014 NVO3 Architecture December 2016 + + +5.2. Bare Metal Servers + + Many data centers will continue to have at least some servers + operating as non-virtualized (or "bare metal") machines running a + traditional operating system and workload. In such systems, there + will be no NVE functionality on the server, and the server will have + no knowledge of NVO3 (including whether overlays are even in use). + In such environments, the NVE functionality can reside on the first- + hop physical switch. In such a case, the network administrator would + (manually) configure the switch to enable the appropriate NVO3 + functionality on the switch port connecting the server and associate + that port with a specific virtual network. Such configuration would + typically be static, since the server is not virtualized and, once + configured, is unlikely to change frequently. Consequently, this + scenario does not require any protocol or standards work. + +5.3. Gateways + + Gateways on VNs relay traffic onto and off of a virtual network. + Tenant Systems use gateways to reach destinations outside of the + local VN. Gateways receive encapsulated traffic from one VN, remove + the encapsulation header, and send the native packet out onto the + data-center network for delivery. Outside traffic enters a VN in a + reverse manner. + + Gateways can be either virtual (i.e., implemented as a VM) or + physical (i.e., a standalone physical device). For performance + reasons, standalone hardware gateways may be desirable in some cases. + Such gateways could consist of a simple switch forwarding traffic + from a VN onto the local data-center network or could embed router + functionality. On such gateways, network interfaces connecting to + virtual networks will (at least conceptually) embed NVE (or split- + NVE) functionality within them. As in the case with Network Service + Appliances, gateways may not support a hypervisor and will need an + appropriate control-plane protocol to obtain the information needed + to provide NVO3 service. + + Gateways handle several different use cases. For example, one use + case consists of systems supporting overlays together with systems + that do not (e.g., bare metal servers). Gateways could be used to + connect legacy systems supporting, e.g., L2 VLANs, to specific + virtual networks, effectively making them part of the same virtual + network. Gateways could also forward traffic between a virtual + network and other hosts on the data-center network or relay traffic + between different VNs. Finally, gateways can provide external + connectivity such as Internet or VPN access. + + + + + +Black, et al. Informational [Page 17] + +RFC 8014 NVO3 Architecture December 2016 + + +5.3.1. Gateway Taxonomy + + As can be seen from the discussion above, there are several types of + gateways that can exist in an NVO3 environment. This section breaks + them down into the various types that could be supported. Note that + each of the types below could be either implemented in a centralized + manner or distributed to coexist with the NVEs. + +5.3.1.1. L2 Gateways (Bridging) + + L2 Gateways act as Layer 2 bridges to forward Ethernet frames based + on the MAC addresses present in them. + + L2 VN to Legacy L2: This type of gateway bridges traffic between L2 + VNs and other legacy L2 networks such as VLANs or L2 VPNs. + + L2 VN to L2 VN: The main motivation for this type of gateway is to + create separate groups of Tenant Systems using L2 VNs such that + the gateway can enforce network policies between each L2 VN. + +5.3.1.2. L3 Gateways (Only IP Packets) + + L3 Gateways forward IP packets based on the IP addresses present in + the packets. + + L3 VN to Legacy L2: This type of gateway forwards packets between L3 + VNs and legacy L2 networks such as VLANs or L2 VPNs. The original + sender's destination MAC address in any frames that the gateway + forwards from a legacy L2 network would be the MAC address of the + gateway. + + L3 VN to Legacy L3: This type of gateway forwards packets between L3 + VNs and legacy L3 networks. These legacy L3 networks could be + local to the data center, be in the WAN, or be an L3 VPN. + + L3 VN to L2 VN: This type of gateway forwards packets between L3 VNs + and L2 VNs. The original sender's destination MAC address in any + frames that the gateway forwards from a L2 VN would be the MAC + address of the gateway. + + L2 VN to L2 VN: This type of gateway acts similar to a traditional + router that forwards between L2 interfaces. The original sender's + destination MAC address in any frames that the gateway forwards + from any of the L2 VNs would be the MAC address of the gateway. + + L3 VN to L3 VN: The main motivation for this type of gateway is to + create separate groups of Tenant Systems using L3 VNs such that + the gateway can enforce network policies between each L3 VN. + + + +Black, et al. Informational [Page 18] + +RFC 8014 NVO3 Architecture December 2016 + + +5.4. Distributed Inter-VN Gateways + + The relaying of traffic from one VN to another deserves special + consideration. Whether traffic is permitted to flow from one VN to + another is a matter of policy and would not (by default) be allowed + unless explicitly enabled. In addition, NVAs are the logical place + to maintain policy information about allowed inter-VN communication. + Policy enforcement for inter-VN communication can be handled in (at + least) two different ways. Explicit gateways could be the central + point for such enforcement, with all inter-VN traffic forwarded to + such gateways for processing. Alternatively, the NVA can provide + such information directly to NVEs by either providing a mapping for a + target Tenant System (TS) on another VN or indicating that such + communication is disallowed by policy. + + When inter-VN gateways are centralized, traffic between TSs on + different VNs can take suboptimal paths, i.e., triangular routing + results in paths that always traverse the gateway. In the worst + case, traffic between two TSs connected to the same NVE can be hair- + pinned through an external gateway. As an optimization, individual + NVEs can be part of a distributed gateway that performs such + relaying, reducing or completely eliminating triangular routing. In + a distributed gateway, each ingress NVE can perform such relaying + activity directly so long as it has access to the policy information + needed to determine whether cross-VN communication is allowed. + Having individual NVEs be part of a distributed gateway allows them + to tunnel traffic directly to the destination NVE without the need to + take suboptimal paths. + + The NVO3 architecture supports distributed gateways for the case of + inter-VN communication. Such support requires that NVO3 control + protocols include mechanisms for the maintenance and distribution of + policy information about what type of cross-VN communication is + allowed so that NVEs acting as distributed gateways can tunnel + traffic from one VN to another as appropriate. + + Distributed gateways could also be used to distribute other + traditional router services to individual NVEs. The NVO3 + architecture does not preclude such implementations but does not + define or require them as they are outside the scope of the NVO3 + architecture. + + + + + + + + + + +Black, et al. Informational [Page 19] + +RFC 8014 NVO3 Architecture December 2016 + + +5.5. ARP and Neighbor Discovery + + Strictly speaking, for an L2 service, special processing of the + Address Resolution Protocol (ARP) [RFC826] and IPv6 Neighbor + Discovery (ND) [RFC4861] is not required. ARP requests are + broadcast, and an NVO3 can deliver ARP requests to all members of a + given L2 virtual network just as it does for any packet sent to an L2 + broadcast address. Similarly, ND requests are sent via IP multicast, + which NVO3 can support by delivering via L2 multicast. However, as a + performance optimization, an NVE can intercept ARP (or ND) requests + from its attached TSs and respond to them directly using information + in its mapping tables. Since an NVE will have mechanisms for + determining the NVE address associated with a given TS, the NVE can + leverage the same mechanisms to suppress sending ARP and ND requests + for a given TS to other members of the VN. The NVO3 architecture + supports such a capability. + +6. NVE-NVE Interaction + + Individual NVEs will interact with each other for the purposes of + tunneling and delivering traffic to remote TSs. At a minimum, a + control protocol may be needed for tunnel setup and maintenance. For + example, tunneled traffic may need to be encrypted or integrity + protected, in which case it will be necessary to set up appropriate + security associations between NVE peers. It may also be desirable to + perform tunnel maintenance (e.g., continuity checks) on a tunnel in + order to detect when a remote NVE becomes unreachable. Such generic + tunnel setup and maintenance functions are not generally + NVO3-specific. Hence, the NVO3 architecture expects to leverage + existing tunnel maintenance protocols rather than defining new ones. + + Some NVE-NVE interactions may be specific to NVO3 (in particular, be + related to information kept in mapping tables) and agnostic to the + specific tunnel type being used. For example, when tunneling traffic + for TS-X to a remote NVE, it is possible that TS-X is not presently + associated with the remote NVE. Normally, this should not happen, + but there could be race conditions where the information an NVE has + learned from the NVA is out of date relative to actual conditions. + In such cases, the remote NVE could return an error or warning + indication, allowing the sending NVE to attempt a recovery or + otherwise attempt to mitigate the situation. + + The NVE-NVE interaction could signal a range of indications, for + example: + + o "No such TS here", upon a receipt of a tunneled packet for an + unknown TS + + + + +Black, et al. Informational [Page 20] + +RFC 8014 NVO3 Architecture December 2016 + + + o "TS-X not here, try the following NVE instead" (i.e., a redirect) + + o "Delivered to correct NVE but could not deliver packet to TS-X" + + When an NVE receives information from a remote NVE that conflicts + with the information it has in its own mapping tables, it should + consult with the NVA to resolve those conflicts. In particular, it + should confirm that the information it has is up to date, and it + might indicate the error to the NVA so as to nudge the NVA into + following up (as appropriate). While it might make sense for an NVE + to update its mapping table temporarily in response to an error from + a remote NVE, any changes must be handled carefully as doing so can + raise security considerations if the received information cannot be + authenticated. That said, a sending NVE might still take steps to + mitigate a problem, such as applying rate limiting to data traffic + towards a particular NVE or TS. + +7. Network Virtualization Authority (NVA) + + Before sending traffic to and receiving traffic from a virtual + network, an NVE must obtain the information needed to build its + internal forwarding tables and state as listed in Section 4.3. An + NVE can obtain such information from a Network Virtualization + Authority (NVA). + + The NVA is the entity that is expected to provide address mapping and + other information to NVEs. NVEs can interact with an NVA to obtain + any required information they need in order to properly forward + traffic on behalf of tenants. The term "NVA" refers to the overall + system, without regard to its scope or how it is implemented. + +7.1. How an NVA Obtains Information + + There are two primary ways in which an NVA can obtain the address + dissemination information it manages: from the VM orchestration + system and/or directly from the NVEs themselves. + + On virtualized systems, the NVA may be able to obtain the address- + mapping information associated with VMs from the VM orchestration + system itself. If the VM orchestration system contains a master + database for all the virtualization information, having the NVA + obtain information directly from the orchestration system would be a + natural approach. Indeed, the NVA could effectively be co-located + with the VM orchestration system itself. In such systems, the VM + orchestration system communicates with the NVE indirectly through the + hypervisor. + + + + + +Black, et al. Informational [Page 21] + +RFC 8014 NVO3 Architecture December 2016 + + + However, as described in Section 4, not all NVEs are associated with + hypervisors. In such cases, NVAs cannot leverage VM orchestration + protocols to interact with an NVE and will instead need to peer + directly with them. By peering directly with an NVE, NVAs can obtain + information about the TSs connected to that NVE and can distribute + information to the NVE about the VNs those TSs are associated with. + For example, whenever a Tenant System attaches to an NVE, that NVE + would notify the NVA that the TS is now associated with that NVE. + Likewise, when a TS detaches from an NVE, that NVE would inform the + NVA. By communicating directly with NVEs, both the NVA and the NVE + are able to maintain up-to-date information about all active tenants + and the NVEs to which they are attached. + +7.2. Internal NVA Architecture + + For reliability and fault tolerance reasons, an NVA would be + implemented in a distributed or replicated manner without single + points of failure. How the NVA is implemented, however, is not + important to an NVE so long as the NVA provides a consistent and + well-defined interface to the NVE. For example, an NVA could be + implemented via database techniques whereby a server stores address- + mapping information in a traditional (possibly replicated) database. + Alternatively, an NVA could be implemented in a distributed fashion + using an existing (or modified) routing protocol to maintain and + distribute mappings. So long as there is a clear interface between + the NVE and NVA, how an NVA is architected and implemented is not + important to an NVE. + + A number of architectural approaches could be used to implement NVAs + themselves. NVAs manage address bindings and distribute them to + where they need to go. One approach would be to use the Border + Gateway Protocol (BGP) [RFC4364] (possibly with extensions) and route + reflectors. Another approach could use a transaction-based database + model with replicated servers. Because the implementation details + are local to an NVA, there is no need to pick exactly one solution + technology, so long as the external interfaces to the NVEs (and + remote NVAs) are sufficiently well defined to achieve + interoperability. + +7.3. NVA External Interface + + Conceptually, from the perspective of an NVE, an NVA is a single + entity. An NVE interacts with the NVA, and it is the NVA's + responsibility to ensure that interactions between the NVE and NVA + result in consistent behavior across the NVA and all other NVEs using + the same NVA. Because an NVA is built from multiple internal + components, an NVA will have to ensure that information flows to all + internal NVA components appropriately. + + + +Black, et al. Informational [Page 22] + +RFC 8014 NVO3 Architecture December 2016 + + + One architectural question is how the NVA presents itself to the NVE. + For example, an NVA could be required to provide access via a single + IP address. If NVEs only have one IP address to interact with, it + would be the responsibility of the NVA to handle NVA component + failures, e.g., by using a "floating IP address" that migrates among + NVA components to ensure that the NVA can always be reached via the + one address. Having all NVA accesses through a single IP address, + however, adds constraints to implementing robust failover, load + balancing, etc. + + In the NVO3 architecture, an NVA is accessed through one or more IP + addresses (or an IP address/port combination). If multiple IP + addresses are used, each IP address provides equivalent + functionality, meaning that an NVE can use any of the provided + addresses to interact with the NVA. Should one address stop working, + an NVE is expected to failover to another. While the different + addresses result in equivalent functionality, one address may respond + more quickly than another, e.g., due to network conditions, load on + the server, etc. + + To provide some control over load balancing, NVA addresses may have + an associated priority. Addresses are used in order of priority, + with no explicit preference among NVA addresses having the same + priority. To provide basic load balancing among NVAs of equal + priorities, NVEs could use some randomization input to select among + equal-priority NVAs. Such a priority scheme facilitates failover and + load balancing, for example, by allowing a network operator to + specify a set of primary and backup NVAs. + + It may be desirable to have individual NVA addresses responsible for + a subset of information about an NV Domain. In such a case, NVEs + would use different NVA addresses for obtaining or updating + information about particular VNs or TS bindings. Key questions with + such an approach are how information would be partitioned and how an + NVE could determine which address to use to get the information it + needs. + + Another possibility is to treat the information on which NVA + addresses to use as cached (soft-state) information at the NVEs, so + that any NVA address can be used to obtain any information, but NVEs + are informed of preferences for which addresses to use for particular + information on VNs or TS bindings. That preference information would + be cached for future use to improve behavior, e.g., if all requests + for a specific subset of VNs are forwarded to a specific NVA + component, the NVE can optimize future requests within that subset by + sending them directly to that NVA component via its address. + + + + + +Black, et al. Informational [Page 23] + +RFC 8014 NVO3 Architecture December 2016 + + +8. NVE-NVA Protocol + + As outlined in Section 4.3, an NVE needs certain information in order + to perform its functions. To obtain such information from an NVA, an + NVE-NVA protocol is needed. The NVE-NVA protocol provides two + functions. First, it allows an NVE to obtain information about the + location and status of other TSs with which it needs to communicate. + Second, the NVE-NVA protocol provides a way for NVEs to provide + updates to the NVA about the TSs attached to that NVE (e.g., when a + TS attaches or detaches from the NVE) or about communication errors + encountered when sending traffic to remote NVEs. For example, an NVE + could indicate that a destination it is trying to reach at a + destination NVE is unreachable for some reason. + + While having a direct NVE-NVA protocol might seem straightforward, + the existence of existing VM orchestration systems complicates the + choices an NVE has for interacting with the NVA. + +8.1. NVE-NVA Interaction Models + + An NVE interacts with an NVA in at least two (quite different) ways: + + o NVEs embedded within the same server as the hypervisor can obtain + necessary information entirely through the hypervisor-facing side + of the NVE. Such an approach is a natural extension to existing + VM orchestration systems supporting server virtualization because + an existing protocol between the hypervisor and VM orchestration + system already exists and can be leveraged to obtain any needed + information. Specifically, VM orchestration systems used to + create, terminate, and migrate VMs already use well-defined + (though typically proprietary) protocols to handle the + interactions between the hypervisor and VM orchestration system. + For such systems, it is a natural extension to leverage the + existing orchestration protocol as a sort of proxy protocol for + handling the interactions between an NVE and the NVA. Indeed, + existing implementations can already do this. + + o Alternatively, an NVE can obtain needed information by interacting + directly with an NVA via a protocol operating over the data-center + underlay network. Such an approach is needed to support NVEs that + are not associated with systems performing server virtualization + (e.g., as in the case of a standalone gateway) or where the NVE + needs to communicate directly with the NVA for other reasons. + + The NVO3 architecture will focus on support for the second model + above. Existing virtualization environments are already using the + first model, but they are not sufficient to cover the case of + + + + +Black, et al. Informational [Page 24] + +RFC 8014 NVO3 Architecture December 2016 + + + standalone gateways -- such gateways may not support virtualization + and do not interface with existing VM orchestration systems. + +8.2. Direct NVE-NVA Protocol + + An NVE can interact directly with an NVA via an NVE-NVA protocol. + Such a protocol can be either independent of the NVA internal + protocol or an extension of it. Using a purpose-specific protocol + would provide architectural separation and independence between the + NVE and NVA. The NVE and NVA interact in a well-defined way, and + changes in the NVA (or NVE) do not need to impact each other. Using + a dedicated protocol also ensures that both NVE and NVA + implementations can evolve independently and without dependencies on + each other. Such independence is important because the upgrade path + for NVEs and NVAs is quite different. Upgrading all the NVEs at a + site will likely be more difficult in practice than upgrading NVAs + because of their large number -- one on each end device. In + practice, it would be prudent to assume that once an NVE has been + implemented and deployed, it may be challenging to get subsequent NVE + extensions and changes implemented and deployed, whereas an NVA (and + its associated internal protocols) is more likely to evolve over time + as experience is gained from usage and upgrades will involve fewer + nodes. + + Requirements for a direct NVE-NVA protocol can be found in [NVE-NVA]. + +8.3. Propagating Information Between NVEs and NVAs + + Information flows between NVEs and NVAs in both directions. The NVA + maintains information about all VNs in the NV Domain so that NVEs do + not need to do so themselves. NVEs obtain information from the NVA + about where a given remote TS destination resides. NVAs, in turn, + obtain information from NVEs about the individual TSs attached to + those NVEs. + + While the NVA could push information relevant to every virtual + network to every NVE, such an approach scales poorly and is + unnecessary. In practice, a given NVE will only need and want to + know about VNs to which it is attached. Thus, an NVE should be able + to subscribe to updates only for the virtual networks it is + interested in receiving updates for. The NVO3 architecture supports + a model where an NVE is not required to have full mapping tables for + all virtual networks in an NV Domain. + + Before sending unicast traffic to a remote TS (or TSs for broadcast + or multicast traffic), an NVE must know where the remote TS(s) + currently reside. When a TS attaches to a virtual network, the NVE + obtains information about that VN from the NVA. The NVA can provide + + + +Black, et al. Informational [Page 25] + +RFC 8014 NVO3 Architecture December 2016 + + + that information to the NVE at the time the TS attaches to the VN, + either because the NVE requests the information when the attach + operation occurs or because the VM orchestration system has initiated + the attach operation and provides associated mapping information to + the NVE at the same time. + + There are scenarios where an NVE may wish to query the NVA about + individual mappings within a VN. For example, when sending traffic + to a remote TS on a remote NVE, that TS may become unavailable (e.g., + because it has migrated elsewhere or has been shut down, in which + case the remote NVE may return an error indication). In such + situations, the NVE may need to query the NVA to obtain updated + mapping information for a specific TS or to verify that the + information is still correct despite the error condition. Note that + such a query could also be used by the NVA as an indication that + there may be an inconsistency in the network and that it should take + steps to verify that the information it has about the current state + and location of a specific TS is still correct. + + For very large virtual networks, the amount of state an NVE needs to + maintain for a given virtual network could be significant. Moreover, + an NVE may only be communicating with a small subset of the TSs on + such a virtual network. In such cases, the NVE may find it desirable + to maintain state only for those destinations it is actively + communicating with. In such scenarios, an NVE may not want to + maintain full mapping information about all destinations on a VN. + However, if it needs to communicate with a destination for which it + does not have mapping information, it will need to be able to query + the NVA on demand for the missing information on a per-destination + basis. + + The NVO3 architecture will need to support a range of operations + between the NVE and NVA. Requirements for those operations can be + found in [NVE-NVA]. + +9. Federated NVAs + + An NVA provides service to the set of NVEs in its NV Domain. Each + NVA manages network virtualization information for the virtual + networks within its NV Domain. An NV Domain is administered by a + single entity. + + In some cases, it will be necessary to expand the scope of a specific + VN or even an entire NV Domain beyond a single NVA. For example, an + administrator managing multiple data centers may wish to operate all + of its data centers as a single NV Region. Such cases are handled by + having different NVAs peer with each other to exchange mapping + information about specific VNs. NVAs operate in a federated manner + + + +Black, et al. Informational [Page 26] + +RFC 8014 NVO3 Architecture December 2016 + + + with a set of NVAs operating as a loosely coupled federation of + individual NVAs. If a virtual network spans multiple NVAs (e.g., + located at different data centers), and an NVE needs to deliver + tenant traffic to an NVE that is part of a different NV Domain, it + still interacts only with its NVA, even when obtaining mappings for + NVEs associated with a different NV Domain. + + Figure 3 shows a scenario where two separate NV Domains (A and B) + share information about a VN. VM1 and VM2 both connect to the same + VN, even though the two VMs are in separate NV Domains. There are + two cases to consider. In the first case, NV Domain B does not allow + NVE-A to tunnel traffic directly to NVE-B. There could be a number + of reasons for this. For example, NV Domains A and B may not share a + common address space (i.e., traversal through a NAT device is + required), or for policy reasons, a domain might require that all + traffic between separate NV Domains be funneled through a particular + device (e.g., a firewall). In such cases, NVA-2 will advertise to + NVA-1 that VM1 on the VN is available and direct that traffic between + the two nodes be forwarded via IP-G (an IP Gateway). IP-G would then + decapsulate received traffic from one NV Domain, translate it + appropriately for the other domain, and re-encapsulate the packet for + delivery. + + xxxxxx xxxx +-----+ + +-----+ xxxxxx xxxxxx xxxxxx xxxxx | VM2 | + | VM1 | xx xx xxx xx |-----| + |-----| xx x xx x |NVE-B| + |NVE-A| x x +----+ x x +-----+ + +--+--+ x NV Domain A x |IP-G|--x x | + +-------x xx--+ | x xx | + x x +----+ x NV Domain B x | + +---x xx xx x---+ + | xxxx xx +->xx xx + | xxxxxxxx | xx xx + +---+-+ | xx xx + |NVA-1| +--+--+ xx xxx + +-----+ |NVA-2| xxxx xxxx + +-----+ xxxxx + + Figure 3: VM1 and VM2 in Different NV Domains + + NVAs at one site share information and interact with NVAs at other + sites, but only in a controlled manner. It is expected that policy + and access control will be applied at the boundaries between + different sites (and NVAs) so as to minimize dependencies on external + NVAs that could negatively impact the operation within a site. It is + an architectural principle that operations involving NVAs at one site + not be immediately impacted by failures or errors at another site. + + + +Black, et al. Informational [Page 27] + +RFC 8014 NVO3 Architecture December 2016 + + + (Of course, communication between NVEs in different NV Domains may be + impacted by such failures or errors.) It is a strong requirement + that an NVA continue to operate properly for local NVEs even if + external communication is interrupted (e.g., should communication + between a local and remote NVA fail). + + At a high level, a federation of interconnected NVAs has some + analogies to BGP and Autonomous Systems. Like an Autonomous System, + NVAs at one site are managed by a single administrative entity and do + not interact with external NVAs except as allowed by policy. + Likewise, the interface between NVAs at different sites is well + defined so that the internal details of operations at one site are + largely hidden to other sites. Finally, an NVA only peers with other + NVAs that it has a trusted relationship with, i.e., where a VN is + intended to span multiple NVAs. + + Reasons for using a federated model include: + + o Provide isolation among NVAs operating at different sites at + different geographic locations. + + o Control the quantity and rate of information updates that flow + (and must be processed) between different NVAs in different data + centers. + + o Control the set of external NVAs (and external sites) a site peers + with. A site will only peer with other sites that are cooperating + in providing an overlay service. + + o Allow policy to be applied between sites. A site will want to + carefully control what information it exports (and to whom) as + well as what information it is willing to import (and from whom). + + o Allow different protocols and architectures to be used for intra- + NVA vs. inter-NVA communication. For example, within a single + data center, a replicated transaction server using database + techniques might be an attractive implementation option for an + NVA, and protocols optimized for intra-NVA communication would + likely be different from protocols involving inter-NVA + communication between different sites. + + o Allow for optimized protocols rather than using a one-size-fits- + all approach. Within a data center, networks tend to have lower + latency, higher speed, and higher redundancy when compared with + WAN links interconnecting data centers. The design constraints + and trade-offs for a protocol operating within a data-center + network are different from those operating over WAN links. While + a single protocol could be used for both cases, there could be + + + +Black, et al. Informational [Page 28] + +RFC 8014 NVO3 Architecture December 2016 + + + advantages to using different and more specialized protocols for + the intra- and inter-NVA case. + +9.1. Inter-NVA Peering + + To support peering between different NVAs, an inter-NVA protocol is + needed. The inter-NVA protocol defines what information is exchanged + between NVAs. It is assumed that the protocol will be used to share + addressing information between data centers and must scale well over + WAN links. + +10. Control Protocol Work Areas + + The NVO3 architecture consists of two major distinct entities: NVEs + and NVAs. In order to provide isolation and independence between + these two entities, the NVO3 architecture calls for well-defined + protocols for interfacing between them. For an individual NVA, the + architecture calls for a logically centralized entity that could be + implemented in a distributed or replicated fashion. While the IETF + may choose to define one or more specific architectural approaches to + building individual NVAs, there is little need to pick exactly one + approach to the exclusion of others. An NVA for a single domain will + likely be deployed as a single vendor product; thus, there is little + benefit in standardizing the internal structure of an NVA. + + Individual NVAs peer with each other in a federated manner. The NVO3 + architecture calls for a well-defined interface between NVAs. + + Finally, a hypervisor-NVE protocol is needed to cover the split-NVE + scenario described in Section 4.2. + +11. NVO3 Data-Plane Encapsulation + + When tunneling tenant traffic, NVEs add an encapsulation header to + the original tenant packet. The exact encapsulation to use for NVO3 + does not seem to be critical. The main requirement is that the + encapsulation support a Context ID of sufficient size. A number of + encapsulations already exist that provide a VN Context of sufficient + size for NVO3. For example, Virtual eXtensible Local Area Network + (VXLAN) [RFC7348] has a 24-bit VXLAN Network Identifier (VNI). + Network Virtualization using Generic Routing Encapsulation (NVGRE) + [RFC7637] has a 24-bit Tenant Network ID (TNI). MPLS-over-GRE + provides a 20-bit label field. While there is widespread recognition + that a 12-bit VN Context would be too small (only 4096 distinct + values), it is generally agreed that 20 bits (1 million distinct + values) and 24 bits (16.8 million distinct values) are sufficient for + a wide variety of deployment scenarios. + + + + +Black, et al. Informational [Page 29] + +RFC 8014 NVO3 Architecture December 2016 + + +12. Operations, Administration, and Maintenance (OAM) + + The simplicity of operating and debugging overlay networks will be + critical for successful deployment. + + Overlay networks are based on tunnels between NVEs, so the + Operations, Administration, and Maintenance (OAM) [RFC6291] framework + for overlay networks can draw from prior IETF OAM work for tunnel- + based networks, specifically L2VPN OAM [RFC6136]. RFC 6136 focuses + on Fault Management and Performance Management as fundamental to + L2VPN service delivery, leaving the Configuration Management, + Accounting Management, and Security Management components of the Open + Systems Interconnection (OSI) Fault, Configuration, Accounting, + Performance, and Security (FCAPS) taxonomy [M.3400] for further + study. This section does likewise for NVO3 OAM, but those three + areas continue to be important parts of complete OAM functionality + for NVO3. + + The relationship between the overlay and underlay networks is a + consideration for fault and performance management -- a fault in the + underlay may manifest as fault and/or performance issues in the + overlay. Diagnosing and fixing such issues are complicated by NVO3 + abstracting the underlay network away from the overlay network (e.g., + intermediate nodes on the underlay network path between NVEs are + hidden from overlay VNs). + + NVO3-specific OAM techniques, protocol constructs, and tools are + needed to provide visibility beyond this abstraction to diagnose and + correct problems that appear in the overlay. Two examples are + underlay-aware traceroute [TRACEROUTE-VXLAN] and ping protocol + constructs for overlay networks [VXLAN-FAILURE] [NVO3-OVERLAY]. + + NVO3-specific tools and techniques are best viewed as complements to + (i.e., not as replacements for) single-network tools that apply to + the overlay and/or underlay networks. Coordination among the + individual network tools (for the overlay and underlay networks) and + NVO3-aware, dual-network tools is required to achieve effective + monitoring and fault diagnosis. For example, the defect detection + intervals and performance measurement intervals ought to be + coordinated among all tools involved in order to provide consistency + and comparability of results. + + For further discussion of NVO3 OAM requirements, see [NVO3-OAM]. + + + + + + + + +Black, et al. Informational [Page 30] + +RFC 8014 NVO3 Architecture December 2016 + + +13. Summary + + This document presents the overall architecture for NVO3. The + architecture calls for three main areas of protocol work: + + 1. A hypervisor-NVE protocol to support split-NVEs as discussed in + Section 4.2 + + 2. An NVE-NVA protocol for disseminating VN information (e.g., inner + to outer address mappings) + + 3. An NVA-NVA protocol for exchange of information about specific + virtual networks between federated NVAs + + It should be noted that existing protocols or extensions of existing + protocols are applicable. + +14. Security Considerations + + The data plane and control plane described in this architecture will + need to address potential security threats. + + For the data plane, tunneled application traffic may need protection + against being misdelivered, being modified, or having its content + exposed to an inappropriate third party. In all cases, encryption + between authenticated tunnel endpoints (e.g., via use of IPsec + [RFC4301]) and enforcing policies that control which endpoints and + VNs are permitted to exchange traffic can be used to mitigate risks. + + For the control plane, a combination of authentication and encryption + can be used between NVAs, between the NVA and NVE, as well as between + different components of the split-NVE approach. All entities will + need to properly authenticate with each other and enable encryption + for their interactions as appropriate to protect sensitive + information. + + Leakage of sensitive information about users or other entities + associated with VMs whose traffic is virtualized can also be covered + by using encryption for the control-plane protocols and enforcing + policies that control which NVO3 components are permitted to exchange + control-plane traffic. + + Control-plane elements such as NVEs and NVAs need to collect + performance and other data in order to carry out their functions. + This data can sometimes be unexpectedly sensitive, for example, + allowing non-obvious inferences of activity within a VM. This + provides a reason to minimize the data collected in some environments + in order to limit potential exposure of sensitive information. As + + + +Black, et al. Informational [Page 31] + +RFC 8014 NVO3 Architecture December 2016 + + + noted briefly in RFC 6973 [RFC6973] and RFC 7258 [RFC7258], there is + an inevitable tension between being privacy sensitive and taking into + account network operations in NVO3 protocol development. + + See the NVO3 framework security considerations in RFC 7365 [RFC7365] + for further discussion. + +15. Informative References + + [FRAMEWORK-MCAST] + Ghanwani, A., Dunbar, L., McBride, M., Bannai, V., and R. + Krishnan, "A Framework for Multicast in Network + Virtualization Overlays", Work in Progress, + draft-ietf-nvo3-mcast-framework-05, May 2016. + + [IEEE.802.1Q] + IEEE, "IEEE Standard for Local and metropolitan area + networks--Bridges and Bridged Networks", IEEE 802.1Q-2014, + DOI 10.1109/ieeestd.2014.6991462, + <http://ieeexplore.ieee.org/servlet/ + opac?punumber=6991460>. + + [M.3400] ITU-T, "TMN management functions", ITU-T + Recommendation M.3400, February 2000, + <https://www.itu.int/rec/T-REC-M.3400-200002-I/>. + + [NVE-NVA] Kreeger, L., Dutt, D., Narten, T., and D. Black, "Network + Virtualization NVE to NVA Control Protocol Requirements", + Work in Progress, draft-ietf-nvo3-nve-nva-cp-req-05, March + 2016. + + [NVO3-OAM] Chen, H., Ed., Ashwood-Smith, P., Xia, L., Iyengar, R., + Tsou, T., Sajassi, A., Boucadair, M., Jacquenet, C., + Daikoku, M., Ghanwani, A., and R. Krishnan, "NVO3 + Operations, Administration, and Maintenance Requirements", + Work in Progress, draft-ashwood-nvo3-oam-requirements-04, + October 2015. + + [NVO3-OVERLAY] + Kumar, N., Pignataro, C., Rao, D., and S. Aldrin, + "Detecting NVO3 Overlay Data Plane failures", Work in + Progress, draft-kumar-nvo3-overlay-ping-01, January 2014. + + [RFC826] Plummer, D., "Ethernet Address Resolution Protocol: Or + Converting Network Protocol Addresses to 48.bit Ethernet + Address for Transmission on Ethernet Hardware", STD 37, + RFC 826, DOI 10.17487/RFC0826, November 1982, + <http://www.rfc-editor.org/info/rfc826>. + + + +Black, et al. Informational [Page 32] + +RFC 8014 NVO3 Architecture December 2016 + + + [RFC4301] Kent, S. and K. Seo, "Security Architecture for the + Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, + December 2005, <http://www.rfc-editor.org/info/rfc4301>. + + [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private + Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February + 2006, <http://www.rfc-editor.org/info/rfc4364>. + + [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, + "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, + DOI 10.17487/RFC4861, September 2007, + <http://www.rfc-editor.org/info/rfc4861>. + + [RFC6136] Sajassi, A., Ed. and D. Mohan, Ed., "Layer 2 Virtual + Private Network (L2VPN) Operations, Administration, and + Maintenance (OAM) Requirements and Framework", RFC 6136, + DOI 10.17487/RFC6136, March 2011, + <http://www.rfc-editor.org/info/rfc6136>. + + [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, + D., and S. Mansfield, "Guidelines for the Use of the "OAM" + Acronym in the IETF", BCP 161, RFC 6291, + DOI 10.17487/RFC6291, June 2011, + <http://www.rfc-editor.org/info/rfc6291>. + + [RFC6973] Cooper, A., Tschofenig, H., Aboba, B., Peterson, J., + Morris, J., Hansen, M., and R. Smith, "Privacy + Considerations for Internet Protocols", RFC 6973, + DOI 10.17487/RFC6973, July 2013, + <http://www.rfc-editor.org/info/rfc6973>. + + [RFC7258] Farrell, S. and H. Tschofenig, "Pervasive Monitoring Is an + Attack", BCP 188, RFC 7258, DOI 10.17487/RFC7258, May + 2014, <http://www.rfc-editor.org/info/rfc7258>. + + [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, + L., Sridhar, T., Bursell, M., and C. Wright, "Virtual + eXtensible Local Area Network (VXLAN): A Framework for + Overlaying Virtualized Layer 2 Networks over Layer 3 + Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, + <http://www.rfc-editor.org/info/rfc7348>. + + [RFC7364] Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L., + Kreeger, L., and M. Napierala, "Problem Statement: + Overlays for Network Virtualization", RFC 7364, + DOI 10.17487/RFC7364, October 2014, + <http://www.rfc-editor.org/info/rfc7364>. + + + + +Black, et al. Informational [Page 33] + +RFC 8014 NVO3 Architecture December 2016 + + + [RFC7365] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. + Rekhter, "Framework for Data Center (DC) Network + Virtualization", RFC 7365, DOI 10.17487/RFC7365, October + 2014, <http://www.rfc-editor.org/info/rfc7365>. + + [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network + Virtualization Using Generic Routing Encapsulation", + RFC 7637, DOI 10.17487/RFC7637, September 2015, + <http://www.rfc-editor.org/info/rfc7637>. + + [TRACEROUTE-VXLAN] + Nordmark, E., Appanna, C., Lo, A., Boutros, S., and A. + Dubey, "Layer-Transcending Traceroute for Overlay Networks + like VXLAN", Work in Progress, draft-nordmark-nvo3- + transcending-traceroute-03, July 2016. + + [USECASES] + Yong, L., Dunbar, L., Toy, M., Isaac, A., and V. Manral, + "Use Cases for Data Center Network Virtualization Overlay + Networks", Work in Progress, draft-ietf-nvo3-use-case-15, + December 2016. + + [VXLAN-FAILURE] + Jain, P., Singh, K., Balus, F., Henderickx, W., and V. + Bannai, "Detecting VXLAN Segment Failure", Work in + Progress, draft-jain-nvo3-vxlan-ping-00, June 2013. + +Acknowledgements + + Helpful comments and improvements to this document have come from + Alia Atlas, Abdussalam Baryun, Spencer Dawkins, Linda Dunbar, Stephen + Farrell, Anton Ivanov, Lizhong Jin, Suresh Krishnan, Mirja Kuehlwind, + Greg Mirsky, Carlos Pignataro, Dennis (Xiaohong) Qin, Erik Smith, + Takeshi Takahashi, Ziye Yang, and Lucy Yong. + + + + + + + + + + + + + + + + + +Black, et al. Informational [Page 34] + +RFC 8014 NVO3 Architecture December 2016 + + +Authors' Addresses + + David Black + Dell EMC + + Email: david.black@dell.com + + Jon Hudson + Independent + + Email: jon.hudson@gmail.com + + + Lawrence Kreeger + Independent + + Email: lkreeger@gmail.com + + + Marc Lasserre + Independent + + Email: mmlasserre@gmail.com + + + Thomas Narten + IBM + + Email: narten@us.ibm.com + + + + + + + + + + + + + + + + + + + + + + +Black, et al. Informational [Page 35] + |