From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc7637.txt | 955 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 955 insertions(+) create mode 100644 doc/rfc/rfc7637.txt (limited to 'doc/rfc/rfc7637.txt') diff --git a/doc/rfc/rfc7637.txt b/doc/rfc/rfc7637.txt new file mode 100644 index 0000000..4d5f3f0 --- /dev/null +++ b/doc/rfc/rfc7637.txt @@ -0,0 +1,955 @@ + + + + + + +Independent Submission P. Garg, Ed. +Request for Comments: 7637 Y. Wang, Ed. +Category: Informational Microsoft +ISSN: 2070-1721 September 2015 + + + NVGRE: Network Virtualization Using Generic Routing Encapsulation + +Abstract + + This document describes the usage of the Generic Routing + Encapsulation (GRE) header for Network Virtualization (NVGRE) in + multi-tenant data centers. Network Virtualization decouples virtual + networks and addresses from physical network infrastructure, + providing isolation and concurrency between multiple virtual networks + on the same physical network infrastructure. This document also + introduces a Network Virtualization framework to illustrate the use + cases, but the focus is on specifying the data-plane aspect of NVGRE. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This is a contribution to the RFC Series, independently of any other + RFC stream. The RFC Editor has chosen to publish this document at + its discretion and makes no statement about its value for + implementation or deployment. Documents approved for publication by + the RFC Editor are not a candidate for any level of Internet + Standard; see Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc7637. + +Copyright Notice + + Copyright (c) 2015 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. + + + + + +Garg & Wang Informational [Page 1] + +RFC 7637 NVGRE September 2015 + + +Table of Contents + + 1. Introduction ....................................................2 + 1.1. Terminology ................................................4 + 2. Conventions Used in This Document ...............................4 + 3. Network Virtualization Using GRE (NVGRE) ........................4 + 3.1. NVGRE Endpoint .............................................5 + 3.2. NVGRE Frame Format .........................................5 + 3.3. Inner Tag as Defined by IEEE 802.1Q ........................8 + 3.4. Reserved VSID ..............................................8 + 4. NVGRE Deployment Considerations .................................9 + 4.1. ECMP Support ...............................................9 + 4.2. Broadcast and Multicast Traffic ............................9 + 4.3. Unicast Traffic ............................................9 + 4.4. IP Fragmentation ..........................................10 + 4.5. Address/Policy Management and Routing .....................10 + 4.6. Cross-Subnet, Cross-Premise Communication .................10 + 4.7. Internet Connectivity .....................................12 + 4.8. Management and Control Planes .............................12 + 4.9. NVGRE-Aware Devices .......................................12 + 4.10. Network Scalability with NVGRE ...........................13 + 5. Security Considerations ........................................14 + 6. Normative References ...........................................14 + Contributors ......................................................16 + Authors' Addresses ................................................17 + +1. Introduction + + Conventional data center network designs cater to largely static + workloads and cause fragmentation of network and server capacity [6] + [7]. There are several issues that limit dynamic allocation and + consolidation of capacity. Layer 2 networks use the Rapid Spanning + Tree Protocol (RSTP), which is designed to eliminate loops by + blocking redundant paths. These eliminated paths translate to wasted + capacity and a highly oversubscribed network. There are alternative + approaches such as the Transparent Interconnection of Lots of Links + (TRILL) that address this problem [13]. + + The network utilization inefficiencies are exacerbated by network + fragmentation due to the use of VLANs for broadcast isolation. VLANs + are used for traffic management and also as the mechanism for + providing security and performance isolation among services belonging + to different tenants. The Layer 2 network is carved into smaller- + sized subnets (typically, one subnet per VLAN), with VLAN tags + configured on all the Layer 2 switches connected to server racks that + host a given tenant's services. The current VLAN limits + theoretically allow for 4,000 such subnets to be created. The size + + + + +Garg & Wang Informational [Page 2] + +RFC 7637 NVGRE September 2015 + + + of the broadcast domain is typically restricted due to the overhead + of broadcast traffic. The 4,000-subnet limit on VLANs is no longer + sufficient in a shared infrastructure servicing multiple tenants. + + Data center operators must be able to achieve high utilization of + server and network capacity. In order to achieve efficiency, it + should be possible to assign workloads that operate in a single Layer + 2 network to any server in any rack in the network. It should also + be possible to migrate workloads to any server anywhere in the + network while retaining the workloads' addresses. This can be + achieved today by stretching VLANs; however, when workloads migrate, + the network needs to be reconfigured and that is typically error + prone. By decoupling the workload's location on the LAN from its + network address, the network administrator configures the network + once, not every time a service migrates. This decoupling enables any + server to become part of any server resource pool. + + The following are key design objectives for next-generation data + centers: + + a) location-independent addressing + + b) the ability to a scale the number of logical Layer 2 / Layer 3 + networks, irrespective of the underlying physical topology or + the number of VLANs + + c) preserving Layer 2 semantics for services and allowing them to + retain their addresses as they move within and across data + centers + + d) providing broadcast isolation as workloads move around without + burdening the network control plane + + This document describes use of the Generic Routing Encapsulation + (GRE) header [3] [4] for network virtualization. Network + virtualization decouples a virtual network from the underlying + physical network infrastructure by virtualizing network addresses. + Combined with a management and control plane for the virtual-to- + physical mapping, network virtualization can enable flexible virtual + machine placement and movement and provide network isolation for a + multi-tenant data center. + + Network virtualization enables customers to bring their own address + spaces into a multi-tenant data center, while the data center + administrators can place the customer virtual machines anywhere in + the data center without reconfiguring their network switches or + routers, irrespective of the customer address spaces. + + + + +Garg & Wang Informational [Page 3] + +RFC 7637 NVGRE September 2015 + + +1.1. Terminology + + Please refer to RFCs 7364 [10] and 7365 [11] for more formal + definitions of terminology. The following terms are used in this + document. + + Customer Address (CA): This is the virtual IP address assigned and + configured on the virtual Network Interface Controller (NIC) within + each VM. This is the only address visible to VMs and applications + running within VMs. + + Network Virtualization Edge (NVE): This is an entity that performs + the network virtualization encapsulation and decapsulation. + + Provider Address (PA): This is the IP address used in the physical + network. PAs are associated with VM CAs through the network + virtualization mapping policy. + + Virtual Machine (VM): This is an instance of an OS running on top of + the hypervisor over a physical machine or server. Multiple VMs can + share the same physical server via the hypervisor, yet are completely + isolated from each other in terms of CPU usage, storage, and other OS + resources. + + Virtual Subnet Identifier (VSID): This is a 24-bit ID that uniquely + identifies a virtual subnet or virtual Layer 2 broadcast domain. + +2. Conventions Used in This Document + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC 2119 [1]. + + In this document, these words will appear with that interpretation + only when in ALL CAPS. Lowercase uses of these words are not to be + interpreted as carrying the significance defined in RFC 2119. + +3. Network Virtualization Using GRE (NVGRE) + + This section describes Network Virtualization using GRE (NVGRE). + Network virtualization involves creating virtual Layer 2 topologies + on top of a physical Layer 3 network. Connectivity in the virtual + topology is provided by tunneling Ethernet frames in GRE over IP over + the physical network. + + In NVGRE, every virtual Layer 2 network is associated with a 24-bit + identifier, called a Virtual Subnet Identifier (VSID). A VSID is + carried in an outer header as defined in Section 3.2. This allows + + + +Garg & Wang Informational [Page 4] + +RFC 7637 NVGRE September 2015 + + + unique identification of a tenant's virtual subnet to various devices + in the network. A 24-bit VSID supports up to 16 million virtual + subnets in the same management domain, in contrast to only 4,000 that + is achievable with VLANs. Each VSID represents a virtual Layer 2 + broadcast domain, which can be used to identify a virtual subnet of a + given tenant. To support multi-subnet virtual topology, data center + administrators can configure routes to facilitate communication + between virtual subnets of the same tenant. + + GRE is a Proposed Standard from the IETF [3] [4] and provides a way + for encapsulating an arbitrary protocol over IP. NVGRE leverages the + GRE header to carry VSID information in each packet. The VSID + information in each packet can be used to build multi-tenant-aware + tools for traffic analysis, traffic inspection, and monitoring. + + The following sections detail the packet format for NVGRE; describe + the functions of an NVGRE endpoint; illustrate typical traffic flow + both within and across data centers; and discuss address/policy + management, and deployment considerations. + +3.1. NVGRE Endpoint + + NVGRE endpoints are the ingress/egress points between the virtual and + the physical networks. The NVGRE endpoints are the NVEs as defined + in the Network Virtualization over Layer 3 (NVO3) Framework document + [11]. Any physical server or network device can be an NVGRE + endpoint. One common deployment is for the endpoint to be part of a + hypervisor. The primary function of this endpoint is to + encapsulate/decapsulate Ethernet data frames to and from the GRE + tunnel, ensure Layer 2 semantics, and apply isolation policy scoped + on VSID. The endpoint can optionally participate in routing and + function as a gateway in the virtual topology. To encapsulate an + Ethernet frame, the endpoint needs to know the location information + for the destination address in the frame. This information can be + provisioned via a management plane or obtained via a combination of + control-plane distribution or data-plane learning approaches. This + document assumes that the location information, including VSID, is + available to the NVGRE endpoint. + +3.2. NVGRE Frame Format + + The GRE header format as specified in RFCs 2784 [3] and 2890 [4] is + used for communication between NVGRE endpoints. NVGRE leverages the + Key extension specified in RFC 2890 [4] to carry the VSID. The + packet format for Layer 2 encapsulation in GRE is shown in Figure 1. + + + + + + +Garg & Wang Informational [Page 5] + +RFC 7637 NVGRE September 2015 + + + Outer Ethernet Header: + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Outer) Destination MAC Address | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |(Outer)Destination MAC Address | (Outer)Source MAC Address | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Outer) Source MAC Address | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |Optional Ethertype=C-Tag 802.1Q| Outer VLAN Tag Information | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Ethertype 0x0800 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Outer IPv4 Header: + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |Version| HL |Type of Service| Total Length | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Identification |Flags| Fragment Offset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Time to Live | Protocol 0x2F | Header Checksum | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Outer) Source Address | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Outer) Destination Address | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + GRE Header: + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |0| |1|0| Reserved0 | Ver | Protocol Type 0x6558 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Virtual Subnet ID (VSID) | FlowID | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + Inner Ethernet Header + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Inner) Destination MAC Address | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |(Inner)Destination MAC Address | (Inner)Source MAC Address | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | (Inner) Source MAC Address | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Ethertype 0x0800 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + + + + + + + +Garg & Wang Informational [Page 6] + +RFC 7637 NVGRE September 2015 + + + Inner IPv4 Header: + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |Version| HL |Type of Service| Total Length | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Identification |Flags| Fragment Offset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Time to Live | Protocol | Header Checksum | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Source Address | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Destination Address | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Options | Padding | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Original IP Payload | + | | + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 1: GRE Encapsulation Frame Format + + Note: HL stands for Header Length. + + The outer/delivery headers include the outer Ethernet header and the + outer IP header: + + o The outer Ethernet header: The source Ethernet address in the + outer frame is set to the MAC address associated with the NVGRE + endpoint. The destination endpoint may or may not be on the same + physical subnet. The destination Ethernet address is set to the + MAC address of the next-hop IP address for the destination NVE. + The outer VLAN tag information is optional and can be used for + traffic management and broadcast scalability on the physical + network. + + o The outer IP header: Both IPv4 and IPv6 can be used as the + delivery protocol for GRE. The IPv4 header is shown for + illustrative purposes. Henceforth, the IP address in the outer + frame is referred to as the Provider Address (PA). There can be + one or more PA associated with an NVGRE endpoint, with policy + controlling the choice of which PA to use for a given Customer + Address (CA) for a customer VM. + + In the GRE header: + + o The C (Checksum Present) and S (Sequence Number Present) bits in + the GRE header MUST be zero. + + + + +Garg & Wang Informational [Page 7] + +RFC 7637 NVGRE September 2015 + + + o The K (Key Present) bit in the GRE header MUST be set to one. The + 32-bit Key field in the GRE header is used to carry the Virtual + Subnet ID (VSID) and the FlowID: + + - Virtual Subnet ID (VSID): This is a 24-bit value that is used + to identify the NVGRE-based Virtual Layer 2 Network. + + - FlowID: This is an 8-bit value that is used to provide per-flow + entropy for flows in the same VSID. The FlowID MUST NOT be + modified by transit devices. The encapsulating NVE SHOULD + provide as much entropy as possible in the FlowID. If a FlowID + is not generated, it MUST be set to all zeros. + + o The Protocol Type field in the GRE header is set to 0x6558 + (Transparent Ethernet Bridging) [2]. + + In the inner headers (headers of the GRE payload): + + o The inner Ethernet frame comprises an inner Ethernet header + followed by optional inner IP header, followed by the IP payload. + The inner frame could be any Ethernet data frame not just IP. + Note that the inner Ethernet frame's Frame Check Sequence (FCS) is + not encapsulated. + + o For illustrative purposes, IPv4 headers are shown as the inner IP + headers, but IPv6 headers may be used. Henceforth, the IP address + contained in the inner frame is referred to as the Customer + Address (CA). + +3.3. Inner Tag as Defined by IEEE 802.1Q + + The inner Ethernet header of NVGRE MUST NOT contain the tag as + defined by IEEE 802.1Q [5]. The encapsulating NVE MUST remove any + existing IEEE 802.1Q tag before encapsulation of the frame in NVGRE. + A decapsulating NVE MUST drop the frame if the inner Ethernet frame + contains an IEEE 802.1Q tag. + +3.4. Reserved VSID + + The VSID range from 0-0xFFF is reserved for future use. + + The VSID 0xFFFFFF is reserved for vendor-specific NVE-to-NVE + communication. The sender NVE SHOULD verify the receiver NVE's + vendor before sending a packet using this VSID; however, such a + verification mechanism is out of scope of this document. + Implementations SHOULD choose a mechanism that meets their + requirements. + + + + +Garg & Wang Informational [Page 8] + +RFC 7637 NVGRE September 2015 + + +4. NVGRE Deployment Considerations + +4.1. ECMP Support + + Equal-Cost Multipath (ECMP) may be used to provide load balancing. + If ECMP is used, it is RECOMMENDED that the ECMP hash is calculated + either using the outer IP frame fields and entire Key field (32 bits) + or the inner IP and transport frame fields. + +4.2. Broadcast and Multicast Traffic + + To support broadcast and multicast traffic inside a virtual subnet, + one or more administratively scoped multicast addresses [8] [9] can + be assigned for the VSID. All multicast or broadcast traffic + originating from within a VSID is encapsulated and sent to the + assigned multicast address. From an administrative standpoint, it is + possible for network operators to configure a PA multicast address + for each multicast address that is used inside a VSID; this + facilitates optimal multicast handling. Depending on the hardware + capabilities of the physical network devices and the physical network + architecture, multiple virtual subnets may use the same physical IP + multicast address. + + Alternatively, based upon the configuration at the NVE, broadcast and + multicast in the virtual subnet can be supported using N-way unicast. + In N-way unicast, the sender NVE would send one encapsulated packet + to every NVE in the virtual subnet. The sender NVE can encapsulate + and send the packet as described in Section 4.3 ("Unicast Traffic"). + This alleviates the need for multicast support in the physical + network. + +4.3. Unicast Traffic + + The NVGRE endpoint encapsulates a Layer 2 packet in GRE using the + source PA associated with the endpoint with the destination PA + corresponding to the location of the destination endpoint. As + outlined earlier, there can be one or more PAs associated with an + endpoint and policy will control which ones get used for + communication. The encapsulated GRE packet is bridged and routed + normally by the physical network to the destination PA. Bridging + uses the outer Ethernet encapsulation for scope on the LAN. The only + requirement is bidirectional IP connectivity from the underlying + physical network. On the destination, the NVGRE endpoint + decapsulates the GRE packet to recover the original Layer 2 frame. + Traffic flows similarly on the reverse path. + + + + + + +Garg & Wang Informational [Page 9] + +RFC 7637 NVGRE September 2015 + + +4.4. IP Fragmentation + + Section 5.1 of RFC 2003 [12] specifies mechanisms for handling + fragmentation when encapsulating IP within IP. The subset of + mechanisms NVGRE selects are intended to ensure that NVGRE- + encapsulated frames are not fragmented after encapsulation en route + to the destination NVGRE endpoint and that traffic sources can + leverage Path MTU discovery. + + A sender NVE MUST NOT fragment NVGRE packets. A receiver NVE MAY + discard fragmented NVGRE packets. It is RECOMMENDED that the MTU of + the physical network accommodates the larger frame size due to + encapsulation. Path MTU or configuration via control plane can be + used to meet this requirement. + +4.5. Address/Policy Management and Routing + + Address acquisition is beyond the scope of this document and can be + obtained statically, dynamically, or using stateless address + autoconfiguration. CA and PA space can be either IPv4 or IPv6. In + fact, the address families don't have to match; for example, a CA can + be IPv4 while the PA is IPv6, and vice versa. + +4.6. Cross-Subnet, Cross-Premise Communication + + One application of this framework is that it provides a seamless path + for enterprises looking to expand their virtual machine hosting + capabilities into public clouds. Enterprises can bring their entire + IP subnet(s) and isolation policies, thus making the transition to or + from the cloud simpler. It is possible to move portions of an IP + subnet to the cloud; however, that requires additional configuration + on the enterprise network and is not discussed in this document. + Enterprises can continue to use existing communications models like + site-to-site VPN to secure their traffic. + + A VPN gateway is used to establish a secure site-to-site tunnel over + the Internet, and all the enterprise services running in virtual + machines in the cloud use the VPN gateway to communicate back to the + enterprise. For simplicity, we use a VPN gateway configured as a VM + (shown in Figure 2) to illustrate cross-subnet, cross-premise + communication. + + + + + + + + + + +Garg & Wang Informational [Page 10] + +RFC 7637 NVGRE September 2015 + + + +-----------------------+ +-----------------------+ + | Server 1 | | Server 2 | + | +--------+ +--------+ | | +-------------------+ | + | | VM1 | | VM2 | | | | VPN Gateway | | + | | IP=CA1 | | IP=CA2 | | | | Internal External| | + | | | | | | | | IP=CAg IP=GAdc | | + | +--------+ +--------+ | | +-------------------+ | + | Hypervisor | | | Hypervisor| ^ | + +-----------------------+ +-------------------:---+ + | IP=PA1 | IP=PA4 | : + | | | : + | +-------------------------+ | : VPN + +-----| Layer 3 Network |------+ : Tunnel + +-------------------------+ : + | : + +-----------------------------------------------:--+ + | : | + | Internet : | + | : | + +-----------------------------------------------:--+ + | v + | +-------------------+ + | | VPN Gateway | + |---| | + IP=GAcorp| External IP=GAcorp| + +-------------------+ + | + +-----------------------+ + | Corp Layer 3 Network | + | (In CA Space) | + +-----------------------+ + | + +---------------------------+ + | Server X | + | +----------+ +----------+ | + | | Corp VMe1| | Corp VMe2| | + | | IP=CAe1 | | IP=CAe2 | | + | +----------+ +----------+ | + | Hypervisor | + +---------------------------+ + + Figure 2: Cross-Subnet, Cross-Premise Communication + + The packet flow is similar to the unicast traffic flow between VMs; + the key difference in this case is that the packet needs to be sent + to a VPN gateway before it gets forwarded to the destination. As + part of routing configuration in the CA space, a per-tenant VPN + gateway is provisioned for communication back to the enterprise. The + + + +Garg & Wang Informational [Page 11] + +RFC 7637 NVGRE September 2015 + + + example illustrates an outbound connection between VM1 inside the + data center and VMe1 inside the enterprise network. When the + outbound packet from CA1 to CAe1 reaches the hypervisor on Server 1, + the NVE in Server 1 can perform the equivalent of a route lookup on + the packet. The cross-premise packet will match the default gateway + rule, as CAe1 is not part of the tenant virtual network in the data + center. The virtualization policy will indicate the packet to be + encapsulated and sent to the PA of the tenant VPN gateway (PA4) + running as a VM on Server 2. The packet is decapsulated on Server 2 + and delivered to the VM gateway. The gateway in turn validates and + sends the packet on the site-to-site VPN tunnel back to the + enterprise network. As the communication here is external to the + data center, the PA address for the VPN tunnel is globally routable. + The outer header of this packet is sourced from GAdc destined to + GAcorp. This packet is routed through the Internet to the enterprise + VPN gateway, which is the other end of the site-to-site tunnel; at + that point, the VPN gateway decapsulates the packet and sends it + inside the enterprise where the CAe1 is routable on the network. The + reverse path is similar once the packet reaches the enterprise VPN + gateway. + +4.7. Internet Connectivity + + To enable connectivity to the Internet, an Internet gateway is needed + that bridges the virtualized CA space to the public Internet address + space. The gateway needs to perform translation between the + virtualized world and the Internet. For example, the NVGRE endpoint + can be part of a load balancer or a NAT that replaces the VPN Gateway + on Server 2 shown in Figure 2. + +4.8. Management and Control Planes + + There are several protocols that can manage and distribute policy; + however, it is outside the scope of this document. Implementations + SHOULD choose a mechanism that meets their scale requirements. + +4.9. NVGRE-Aware Devices + + One example of a typical deployment consists of virtualized servers + deployed across multiple racks connected by one or more layers of + Layer 2 switches, which in turn may be connected to a Layer 3 routing + domain. Even though routing in the physical infrastructure will work + without any modification with NVGRE, devices that perform specialized + processing in the network need to be able to parse GRE to get access + to tenant-specific information. Devices that understand and parse + the VSID can provide rich multi-tenant-aware services inside the data + center. As outlined earlier, it is imperative to exploit multiple + paths inside the network through techniques such as ECMP. The Key + + + +Garg & Wang Informational [Page 12] + +RFC 7637 NVGRE September 2015 + + + field (a 32-bit field, including both the VSID and the optional + FlowID) can provide additional entropy to the switches to exploit + path diversity inside the network. A diverse ecosystem is expected + to emerge as more and more devices become multi-tenant aware. In the + interim, without requiring any hardware upgrades, there are + alternatives to exploit path diversity with GRE by associating + multiple PAs with NVGRE endpoints with policy controlling the choice + of which PA to use. + + It is expected that communication can span multiple data centers and + also cross the virtual/physical boundary. Typical scenarios that + require virtual-to-physical communication include access to storage + and databases. Scenarios demanding lossless Ethernet functionality + may not be amenable to NVGRE, as traffic is carried over an IP + network. NVGRE endpoints mediate between the network-virtualized and + non-network-virtualized environments. This functionality can be + incorporated into Top-of-Rack switches, storage appliances, load + balancers, routers, etc., or built as a stand-alone appliance. + + It is imperative to consider the impact of any solution on host + performance. Today's server operating systems employ sophisticated + acceleration techniques such as checksum offload, Large Send Offload + (LSO), Receive Segment Coalescing (RSC), Receive Side Scaling (RSS), + Virtual Machine Queue (VMQ), etc. These technologies should become + NVGRE aware. IPsec Security Associations (SAs) can be offloaded to + the NIC so that computationally expensive cryptographic operations + are performed at line rate in the NIC hardware. These SAs are based + on the IP addresses of the endpoints. As each packet on the wire + gets translated, the NVGRE endpoint SHOULD intercept the offload + requests and do the appropriate address translation. This will + ensure that IPsec continues to be usable with network virtualization + while taking advantage of hardware offload capabilities for improved + performance. + +4.10. Network Scalability with NVGRE + + One of the key benefits of using NVGRE is the IP address scalability + and in turn MAC address table scalability that can be achieved. An + NVGRE endpoint can use one PA to represent multiple CAs. This lowers + the burden on the MAC address table sizes at the Top-of-Rack + switches. One obvious benefit is in the context of server + virtualization, which has increased the demands on the network + infrastructure. By embedding an NVGRE endpoint in a hypervisor, it + is possible to scale significantly. This framework enables location + information to be preconfigured inside an NVGRE endpoint, thus + allowing broadcast ARP traffic to be proxied locally. This approach + can scale to large-sized virtual subnets. These virtual subnets can + be spread across multiple Layer 3 physical subnets. It allows + + + +Garg & Wang Informational [Page 13] + +RFC 7637 NVGRE September 2015 + + + workloads to be moved around without imposing a huge burden on the + network control plane. By eliminating most broadcast traffic and + converting others to multicast, the routers and switches can function + more optimally by building efficient multicast trees. By using + server and network capacity efficiently, it is possible to drive down + the cost of building and managing data centers. + +5. Security Considerations + + This proposal extends the Layer 2 subnet across the data center and + increases the scope for spoofing attacks. Mitigations of such + attacks are possible with authentication/encryption using IPsec or + any other IP-based mechanism. The control plane for policy + distribution is expected to be secured by using any of the existing + security protocols. Further management traffic can be isolated in a + separate subnet/VLAN. + + The checksum in the GRE header is not supported. The mitigation of + this is to deploy an NVGRE-based solution in a network that provides + error detection along the NVGRE packet path, for example, using + Ethernet Cyclic Redundancy Check (CRC) or IPsec or any other error + detection mechanism. + +6. Normative References + + [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement + Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, + . + + [2] IANA, "IEEE 802 Numbers", + . + + [3] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. Traina, + "Generic Routing Encapsulation (GRE)", RFC 2784, + DOI 10.17487/RFC2784, March 2000, + . + + [4] Dommety, G., "Key and Sequence Number Extensions to GRE", + RFC 2890, DOI 10.17487/RFC2890, September 2000, + . + + [5] IEEE, "IEEE Standard for Local and metropolitan area + networks--Media Access Control (MAC) Bridges and Virtual Bridged + Local Area Networks", IEEE Std 802.1Q. + + [6] Greenberg, A., et al., "VL2: A Scalable and Flexible Data Center + Network", Communications of the ACM, + DOI 10.1145/1897852.1897877, 2011. + + + +Garg & Wang Informational [Page 14] + +RFC 7637 NVGRE September 2015 + + + [7] Greenberg, A., et al., "The Cost of a Cloud: Research Problems + in Data Center Networks", ACM SIGCOMM Computer Communication + Review, DOI 10.1145/1496091.1496103, 2009. + + [8] Hinden, R. and S. Deering, "IP Version 6 Addressing + Architecture", RFC 4291, DOI 10.17487/RFC4291, February 2006, + . + + [9] Meyer, D., "Administratively Scoped IP Multicast", BCP 23, + RFC 2365, DOI 10.17487/RFC2365, July 1998, + . + + [10] Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L., Kreeger, + L., and M. Napierala, "Problem Statement: Overlays for Network + Virtualization", RFC 7364, DOI 10.17487/RFC7364, October 2014, + . + + [11] Lasserre, M., Balus, F., Morin, T., Bitar, N., and Y. Rekhter, + "Framework for Data Center (DC) Network Virtualization", + RFC 7365, DOI 10.17487/RFC7365, October 2014, + . + + [12] Perkins, C., "IP Encapsulation within IP", RFC 2003, + DOI 10.17487/RFC2003, October 1996, + . + + [13] Touch, J. and R. Perlman, "Transparent Interconnection of Lots + of Links (TRILL): Problem and Applicability Statement", + RFC 5556, DOI 10.17487/RFC5556, May 2009, + . + + + + + + + + + + + + + + + + + + + + + +Garg & Wang Informational [Page 15] + +RFC 7637 NVGRE September 2015 + + +Contributors + + Murari Sridharan + Microsoft Corporation + 1 Microsoft Way + Redmond, WA 98052 + United States + Email: muraris@microsoft.com + + Albert Greenberg + Microsoft Corporation + 1 Microsoft Way + Redmond, WA 98052 + United States + Email: albert@microsoft.com + + Narasimhan Venkataramiah + Microsoft Corporation + 1 Microsoft Way + Redmond, WA 98052 + United States + Email: navenkat@microsoft.com + + Kenneth Duda + Arista Networks, Inc. + 5470 Great America Pkwy + Santa Clara, CA 95054 + United States + Email: kduda@aristanetworks.com + + Ilango Ganga + Intel Corporation + 2200 Mission College Blvd. + M/S: SC12-325 + Santa Clara, CA 95054 + United States + Email: ilango.s.ganga@intel.com + + Geng Lin + Google + 1600 Amphitheatre Parkway + Mountain View, CA 94043 + United States + Email: genglin@google.com + + + + + + + +Garg & Wang Informational [Page 16] + +RFC 7637 NVGRE September 2015 + + + Mark Pearson + Hewlett-Packard Co. + 8000 Foothills Blvd. + Roseville, CA 95747 + United States + Email: mark.pearson@hp.com + + Patricia Thaler + Broadcom Corporation + 3151 Zanker Road + San Jose, CA 95134 + United States + Email: pthaler@broadcom.com + + Chait Tumuluri + Emulex Corporation + 3333 Susan Street + Costa Mesa, CA 92626 + United States + Email: chait@emulex.com + +Authors' Addresses + + Pankaj Garg (editor) + Microsoft Corporation + 1 Microsoft Way + Redmond, WA 98052 + United States + Email: pankajg@microsoft.com + + Yu-Shun Wang (editor) + Microsoft Corporation + 1 Microsoft Way + Redmond, WA 98052 + United States + Email: yushwang@microsoft.com + + + + + + + + + + + + + + + +Garg & Wang Informational [Page 17] + -- cgit v1.2.3