diff options
Diffstat (limited to 'doc/rfc/rfc7365.txt')
| -rw-r--r-- | doc/rfc/rfc7365.txt | 1459 | 
1 files changed, 1459 insertions, 0 deletions
| diff --git a/doc/rfc/rfc7365.txt b/doc/rfc/rfc7365.txt new file mode 100644 index 0000000..ec5095b --- /dev/null +++ b/doc/rfc/rfc7365.txt @@ -0,0 +1,1459 @@ + + + + + + +Internet Engineering Task Force (IETF)                       M. Lasserre +Request for Comments: 7365                                      F. Balus +Category: Informational                                   Alcatel-Lucent +ISSN: 2070-1721                                                 T. Morin +                                                                  Orange +                                                                N. Bitar +                                                                 Verizon +                                                              Y. Rekhter +                                                                 Juniper +                                                            October 2014 + + +         Framework for Data Center (DC) Network Virtualization + +Abstract + +   This document provides a framework for Data Center (DC) Network +   Virtualization over Layer 3 (NVO3) and defines a reference model +   along with logical components required to design a solution. + +Status of This Memo + +   This document is not an Internet Standards Track specification; it is +   published for informational purposes. + +   This document is a product of the Internet Engineering Task Force +   (IETF).  It represents the consensus of the IETF community.  It has +   received public review and has been approved for publication by the +   Internet Engineering Steering Group (IESG).  Not all documents +   approved by the IESG are a candidate for any level of Internet +   Standard; see Section 2 of RFC 5741. + +   Information about the current status of this document, any errata, +   and how to provide feedback on it may be obtained at +   http://www.rfc-editor.org/info/rfc7365. + + + + + + + + + + + + + + + + +Lasserre, et al.              Informational                     [Page 1] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +Copyright Notice + +   Copyright (c) 2014 IETF Trust and the persons identified as the +   document authors.  All rights reserved. + +   This document is subject to BCP 78 and the IETF Trust's Legal +   Provisions Relating to IETF Documents +   (http://trustee.ietf.org/license-info) in effect on the date of +   publication of this document.  Please review these documents +   carefully, as they describe your rights and restrictions with respect +   to this document.  Code Components extracted from this document must +   include Simplified BSD License text as described in Section 4.e of +   the Trust Legal Provisions and are provided without warranty as +   described in the Simplified BSD License. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Lasserre, et al.              Informational                     [Page 2] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +Table of Contents + +   1. Introduction ....................................................4 +      1.1. General Terminology ........................................4 +      1.2. DC Network Architecture ....................................7 +   2. Reference Models ................................................8 +      2.1. Generic Reference Model ....................................8 +      2.2. NVE Reference Model .......................................10 +      2.3. NVE Service Types .........................................11 +           2.3.1. L2 NVE Providing Ethernet LAN-Like Service .........11 +           2.3.2. L3 NVE Providing IP/VRF-Like Service ...............11 +      2.4. Operational Management Considerations .....................12 +   3. Functional Components ..........................................12 +      3.1. Service Virtualization Components .........................12 +           3.1.1. Virtual Access Points (VAPs) .......................12 +           3.1.2. Virtual Network Instance (VNI) .....................12 +           3.1.3. Overlay Modules and VN Context .....................14 +           3.1.4. Tunnel Overlays and Encapsulation Options ..........14 +           3.1.5. Control-Plane Components ...........................14 +                  3.1.5.1. Distributed vs. Centralized +                           Control Plane .............................14 +                  3.1.5.2. Auto-provisioning and Service Discovery ...15 +                  3.1.5.3. Address Advertisement and Tunnel Mapping ..15 +                  3.1.5.4. Overlay Tunneling .........................16 +      3.2. Multihoming ...............................................16 +      3.3. VM Mobility ...............................................17 +   4. Key Aspects of Overlay Networks ................................17 +      4.1. Pros and Cons .............................................18 +      4.2. Overlay Issues to Consider ................................19 +           4.2.1. Data Plane vs. Control Plane Driven ................19 +           4.2.2. Coordination between Data Plane and Control Plane ..19 +           4.2.3. Handling Broadcast, Unknown Unicast, and +                  Multicast (BUM) Traffic ............................20 +           4.2.4. Path MTU ...........................................20 +           4.2.5. NVE Location Trade-Offs ............................21 +           4.2.6. Interaction between Network Overlays and +                  Underlays ..........................................22 +   5. Security Considerations ........................................22 +   6. Informative References .........................................24 +   Acknowledgments ...................................................26 +   Authors' Addresses ................................................26 + + + + + + + + + + +Lasserre, et al.              Informational                     [Page 3] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +1.  Introduction + +   This document provides a framework for Data Center (DC) Network +   Virtualization over Layer 3 (NVO3) tunnels.  This framework is +   intended to aid in standardizing protocols and mechanisms to support +   large-scale network virtualization for data centers. + +   [RFC7364] defines the rationale for using overlay networks in order +   to build large multi-tenant data center networks.  Compute, storage +   and network virtualization are often used in these large data centers +   to support a large number of communication domains and end systems. + +   This document provides reference models and functional components of +   data center overlay networks as well as a discussion of technical +   issues that have to be addressed. + +1.1.  General Terminology + +   This document uses the following terminology: + +   NVO3 Network: An overlay network that provides a Layer 2 (L2) or +   Layer 3 (L3) service to Tenant Systems over an L3 underlay network +   using the architecture and protocols as defined by the NVO3 Working +   Group. + +   Network Virtualization Edge (NVE): An NVE is the network entity that +   sits at the edge of an underlay network and implements L2 and/or L3 +   network virtualization functions.  The network-facing side of the NVE +   uses the underlying L3 network to tunnel tenant frames to and from +   other NVEs.  The tenant-facing side of the NVE sends and receives +   Ethernet frames to and from individual Tenant Systems.  An NVE could +   be implemented as part of a virtual switch within a hypervisor, a +   physical switch or router, or a Network Service Appliance, or it +   could be split across multiple devices. + +   Virtual Network (VN): A VN is a logical abstraction of a physical +   network that provides L2 or L3 network services to a set of Tenant +   Systems.  A VN is also known as a Closed User Group (CUG). + +   Virtual Network Instance (VNI): A specific instance of a VN from the +   perspective of an NVE. + +   Virtual Network Context (VN Context) Identifier: Field in an overlay +   encapsulation header that identifies the specific VN the packet +   belongs to.  The egress NVE uses the VN Context identifier to deliver +   the packet to the correct Tenant System.  The VN Context identifier +   can be a locally significant identifier or a globally unique +   identifier. + + + +Lasserre, et al.              Informational                     [Page 4] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   Underlay or Underlying Network: The network that provides the +   connectivity among NVEs and that NVO3 packets are tunneled over, +   where an NVO3 packet carries an NVO3 overlay header followed by a +   tenant packet.  The underlay network does not need to be aware that +   it is carrying NVO3 packets.  Addresses on the underlay network +   appear as "outer addresses" in encapsulated NVO3 packets.  In +   general, the underlay network can use a completely different protocol +   (and address family) from that of the overlay.  In the case of NVO3, +   the underlay network is IP. + +   Data Center (DC): A physical complex housing physical servers, +   network switches and routers, network service appliances, and +   networked storage.  The purpose of a data center is to provide +   application, compute, and/or storage services.  One such service is +   virtualized infrastructure data center services, also known as +   "Infrastructure as a Service". + +   Virtual Data Center (Virtual DC): A container for virtualized +   compute, storage, and network services.  A virtual DC is associated +   with a single tenant and can contain multiple VNs and Tenant Systems +   connected to one or more of these VNs. + +   Virtual Machine (VM): A software implementation of a physical machine +   that runs programs as if they were executing on a physical, non- +   virtualized machine.  Applications (generally) do not know they are +   running on a VM as opposed to running on a "bare metal" host or +   server, though some systems provide a para-virtualization environment +   that allows an operating system or application to be aware of the +   presence of virtualization for optimization purposes. + +   Hypervisor: Software running on a server that allows multiple VMs to +   run on the same physical server.  The hypervisor manages and provides +   shared computation, memory, and storage services and network +   connectivity to the VMs that it hosts.  Hypervisors often embed a +   virtual switch (see below). + +   Server: A physical end-host machine that runs user applications.  A +   standalone (or "bare metal") server runs a conventional operating +   system hosting a single-tenant application.  A virtualized server +   runs a hypervisor supporting one or more VMs. + +   Virtual Switch (vSwitch): A function within a hypervisor (typically +   implemented in software) that provides similar forwarding services to +   a physical Ethernet switch.  A vSwitch forwards Ethernet frames +   between VMs running on the same server or between a VM and a physical +   Network Interface Card (NIC) connecting the server to a physical +   Ethernet switch or router.  A vSwitch also enforces network isolation +   between VMs that by policy are not permitted to communicate with each + + + +Lasserre, et al.              Informational                     [Page 5] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   other (e.g., by honoring VLANs).  A vSwitch may be bypassed when an +   NVE is enabled on the host server. + +   Tenant: The customer using a virtual network and any associated +   resources (e.g., compute, storage, and network).  A tenant could be +   an enterprise or a department/organization within an enterprise. + +   Tenant System: A physical or virtual system that can play the role of +   a host or a forwarding element such as a router, switch, firewall, +   etc.  It belongs to a single tenant and connects to one or more VNs +   of that tenant. + +   Tenant Separation: Refers to isolating traffic of different tenants +   such that traffic from one tenant is not visible to or delivered to +   another tenant, except when allowed by policy.  Tenant separation +   also refers to address space separation, whereby different tenants +   can use the same address space without conflict. + +   Virtual Access Points (VAPs): A logical connection point on the NVE +   for connecting a Tenant System to a virtual network.  Tenant Systems +   connect to VNIs at an NVE through VAPs.  VAPs can be physical ports +   or virtual ports identified through logical interface identifiers +   (e.g., VLAN ID or internal vSwitch Interface ID connected to a VM). + +   End Device: A physical device that connects directly to the DC +   underlay network.  This is in contrast to a Tenant System, which +   connects to a corresponding tenant VN.  An End Device is administered +   by the DC operator rather than a tenant and is part of the DC +   infrastructure.  An End Device may implement NVO3 technology in +   support of NVO3 functions.  Examples of an End Device include hosts +   (e.g., server or server blade), storage systems (e.g., file servers +   and iSCSI storage systems), and network devices (e.g., firewall, +   load-balancer, and IPsec gateway). + +   Network Virtualization Authority (NVA): Entity that provides +   reachability and forwarding information to NVEs. + + + + + + + + + + + + + + + +Lasserre, et al.              Informational                     [Page 6] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +1.2.  DC Network Architecture + +   A generic architecture for data centers is depicted in Figure 1: + +                                ,---------. +                              ,'           `. +                             (  IP/MPLS WAN ) +                              `.           ,' +                                `-+------+' +                                 \      / +                          +--------+   +--------+ +                          |   DC   |+-+|   DC   | +                          |gateway |+-+|gateway | +                          +--------+   +--------+ +                                |       / +                                .--. .--. +                              (    '    '.--. +                            .-.' Intra-DC     ' +                           (     network      ) +                            (             .'-' +                             '--'._.'.    )\ \ +                             / /     '--'  \ \ +                            / /      | |    \ \ +                   +--------+   +--------+   +--------+ +                   | access |   | access |   | access | +                   | switch |   | switch |   | switch | +                   +--------+   +--------+   +--------+ +                      /     \    /    \     /      \ +                   __/_      \  /      \   /_      _\__ +             '--------'   '--------'   '--------'   '--------' +             :  End   :   :  End   :   :  End   :   :  End   : +             : Device :   : Device :   : Device :   : Device : +             '--------'   '--------'   '--------'   '--------' + +             Figure 1: A Generic Architecture for Data Centers + +   An example of multi-tier DC network architecture is presented in +   Figure 1.  It provides a view of the physical components inside a DC. + +   A DC network is usually composed of intra-DC networks and network +   services, and inter-DC network and network connectivity services. + +   DC networking elements can act as strict L2 switches and/or provide +   IP routing capabilities, including network service virtualization. + +   In some DC architectures, some tier layers could provide L2 and/or L3 +   services.  In addition, some tier layers may be collapsed, and +   Internet connectivity, inter-DC connectivity, and VPN support may be + + + +Lasserre, et al.              Informational                     [Page 7] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   handled by a smaller number of nodes.  Nevertheless, one can assume +   that the network functional blocks in a DC fit in the architecture +   depicted in Figure 1. + +   The following components can be present in a DC: + +   -  Access switch: Hardware-based Ethernet switch aggregating all +      Ethernet links from the End Devices in a rack representing the +      entry point in the physical DC network for the hosts.  It may also +      provide routing functionality, virtual IP network connectivity, or +      Layer 2 tunneling over IP, for instance.  Access switches are +      usually multihomed to aggregation switches in the Intra-DC +      network.  A typical example of an access switch is a Top-of-Rack +      (ToR) switch.  Other deployment scenarios may use an intermediate +      Blade Switch before the ToR, or an End-of-Row (EoR) switch, to +      provide similar functions to a ToR. + +   -  Intra-DC Network: Network composed of high-capacity core nodes +      (Ethernet switches/routers).  Core nodes may provide virtual +      Ethernet bridging and/or IP routing services. + +   -  DC Gateway (DC GW): Gateway to the outside world providing DC +      interconnect and connectivity to Internet and VPN customers.  In +      the current DC network model, this may be simply a router +      connected to the Internet and/or an IP VPN/L2VPN PE.  Some network +      implementations may dedicate DC GWs for different connectivity +      types (e.g., a DC GW for Internet and another for VPN). + +   Note that End Devices may be single-homed or multihomed to access +   switches. + +2.  Reference Models + +2.1.  Generic Reference Model + +   Figure 2 depicts a DC reference model for network virtualization +   overlays where NVEs provide a logical interconnect between Tenant +   Systems that belong to a specific VN. + + + + + + + + + + + + + +Lasserre, et al.              Informational                     [Page 8] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +         +--------+                                    +--------+ +         | Tenant +--+                            +----| Tenant | +         | System |  |                           (')   | System | +         +--------+  |    .................     (   )  +--------+ +                     |  +---+           +---+    (_) +                     +--|NVE|---+   +---|NVE|-----+ +                        +---+   |   |   +---+ +                        / .    +-----+      . +                       /  . +--| NVA |--+   . +                      /   . |  +-----+   \  . +                     |    . |             \ . +                     |    . |   Overlay   +--+--++--------+ +         +--------+  |    . |   Network   | NVE || Tenant | +         | Tenant +--+    . |             |     || System | +         | System |       .  \ +---+      +--+--++--------+ +         +--------+       .....|NVE|......... +                               +---+ +                                 | +                                 | +                       ===================== +                         |               | +                     +--------+      +--------+ +                     | Tenant |      | Tenant | +                     | System |      | System | +                     +--------+      +--------+ + +      Figure 2: Generic Reference Model for DC Network Virtualization +                                 Overlays + +   In order to obtain reachability information, NVEs may exchange +   information directly between themselves via a control-plane protocol. +   In this case, a control-plane module resides in every NVE. + +   It is also possible for NVEs to communicate with an external Network +   Virtualization Authority (NVA) to obtain reachability and forwarding +   information.  In this case, a protocol is used between NVEs and +   NVA(s) to exchange information. + +   It should be noted that NVAs may be organized in clusters for +   redundancy and scalability and can appear as one logically +   centralized controller.  In this case, inter-NVA communication is +   necessary to synchronize state among nodes within a cluster or share +   information across clusters.  The information exchanged between NVAs +   of the same cluster could be different from the information exchanged +   across clusters. + + + + + + +Lasserre, et al.              Informational                     [Page 9] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   A Tenant System can be attached to an NVE in several ways: + +   -  locally, by being co-located in the same End Device + +   -  remotely, via a point-to-point connection or a switched network + +   When an NVE is co-located with a Tenant System, the state of the +   Tenant System can be determined without protocol assistance.  For +   instance, the operational status of a VM can be communicated via a +   local API.  When an NVE is remotely connected to a Tenant System, the +   state of the Tenant System or NVE needs to be exchanged directly or +   via a management entity, using a control-plane protocol or API, or +   directly via a data-plane protocol. + +   The functional components in Figure 2 do not necessarily map directly +   to the physical components described in Figure 1.  For example, an +   End Device can be a server blade with VMs and a virtual switch.  A VM +   can be a Tenant System, and the NVE functions may be performed by the +   host server.  In this case, the Tenant System and NVE function are +   co-located.  Another example is the case where the End Device is the +   Tenant System and the NVE function can be implemented by the +   connected ToR.  In this case, the Tenant System and NVE function are +   not co-located. + +   Underlay nodes utilize L3 technologies to interconnect NVE nodes. +   These nodes perform forwarding based on outer L3 header information, +   and generally do not maintain state for each tenant service, albeit +   some applications (e.g., multicast) may require control-plane or +   forwarding-plane information that pertains to a tenant, group of +   tenants, tenant service, or a set of services that belong to one or +   more tenants.  Mechanisms to control the amount of state maintained +   in the underlay may be needed. + +2.2.  NVE Reference Model + +   Figure 3 depicts the NVE reference model.  One or more VNIs can be +   instantiated on an NVE.  A Tenant System interfaces with a +   corresponding VNI via a VAP.  An overlay module provides tunneling +   overlay functions (e.g., encapsulation and decapsulation of tenant +   traffic, tenant identification, and mapping, etc.). + + + + + + + + + + + +Lasserre, et al.              Informational                    [Page 10] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +                     +-------- L3 Network -------+ +                     |                           | +                     |        Tunnel Overlay     | +         +------------+---------+       +---------+------------+ +         | +----------+-------+ |       | +---------+--------+ | +         | |  Overlay Module  | |       | |  Overlay Module  | | +         | +---------+--------+ |       | +---------+--------+ | +         |           |VN Context|       | VN Context|          | +         |           |          |       |           |          | +         |  +--------+-------+  |       |  +--------+-------+  | +         |  | |VNI|   .  |VNI|  |       |  | |VNI|   .  |VNI|  | +    NVE1 |  +-+------------+-+  |       |  +-+-----------+--+  | NVE2 +         |    |   VAPs     |    |       |    |    VAPs   |     | +         +----+------------+----+       +----+-----------+-----+ +              |            |                 |           | +              |            |                 |           | +             Tenant Systems                 Tenant Systems + +                  Figure 3: Generic NVE Reference Model + +   Note that some NVE functions (e.g., data-plane and control-plane +   functions) may reside in one device or may be implemented separately +   in different devices. + +2.3.  NVE Service Types + +   An NVE provides different types of virtualized network services to +   multiple tenants, i.e., an L2 service or an L3 service.  Note that an +   NVE may be capable of providing both L2 and L3 services for a tenant. +   This section defines the service types and associated attributes. + +2.3.1.  L2 NVE Providing Ethernet LAN-Like Service + +   An L2 NVE implements Ethernet LAN emulation, an Ethernet-based +   multipoint service similar to an IETF Virtual Private LAN Service +   (VPLS) [RFC4761][RFC4762] or Ethernet VPN [EVPN] service, where the +   Tenant Systems appear to be interconnected by a LAN environment over +   an L3 overlay.  As such, an L2 NVE provides per-tenant virtual +   switching instance (L2 VNI) and L3 (IP/MPLS) tunneling encapsulation +   of tenant Media Access Control (MAC) frames across the underlay. +   Note that the control plane for an L2 NVE could be implemented +   locally on the NVE or in a separate control entity. + +2.3.2.  L3 NVE Providing IP/VRF-Like Service + +   An L3 NVE provides virtualized IP forwarding service, similar to IETF +   IP VPN (e.g., BGP/MPLS IP VPN [RFC4364]) from a service definition +   perspective.  That is, an L3 NVE provides per-tenant forwarding and + + + +Lasserre, et al.              Informational                    [Page 11] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   routing instance (L3 VNI) and L3 (IP/MPLS) tunneling encapsulation of +   tenant IP packets across the underlay.  Note that routing could be +   performed locally on the NVE or in a separate control entity. + +2.4.  Operational Management Considerations + +   NVO3 services are overlay services over an IP underlay. + +   As far as the IP underlay is concerned, existing IP Operations, +   Administration, and Maintenance (OAM) facilities are used. + +   With regard to the NVO3 overlay, both L2 and L3 services can be +   offered.  It is expected that existing fault and performance OAM +   facilities will be used.  Sections 4.1 and 4.2.6 provide further +   discussion of additional fault and performance management issues to +   consider. + +   As far as configuration is concerned, the DC environment is driven by +   the need to bring new services up rapidly and is typically very +   dynamic, specifically in the context of virtualized services.  It is +   therefore critical to automate the configuration of NVO3 services. + +3. Functional Components + +   This section decomposes the network virtualization architecture into +   the functional components described in Figure 3 to make it easier to +   discuss solution options for these components. + +3.1.  Service Virtualization Components + +3.1.1.  Virtual Access Points (VAPs) + +   Tenant Systems are connected to VNIs through Virtual Access Points +   (VAPs). + +   VAPs can be physical ports or virtual ports identified through +   logical interface identifiers (e.g., VLAN ID and internal vSwitch +   Interface ID connected to a VM). + +3.1.2.  Virtual Network Instance (VNI) + +   A VNI is a specific VN instance on an NVE.  Each VNI defines a +   forwarding context that contains reachability information and +   policies. + + + + + + + +Lasserre, et al.              Informational                    [Page 12] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +3.1.3.  Overlay Modules and VN Context + +   Mechanisms for identifying each tenant service are required to allow +   the simultaneous overlay of multiple tenant services over the same +   underlay L3 network topology.  In the data plane, each NVE, upon +   sending a tenant packet, must be able to encode the VN Context for +   the destination NVE in addition to the L3 tunneling information +   (e.g., source IP address identifying the source NVE and the +   destination IP address identifying the destination NVE, or MPLS +   label).  This allows the destination NVE to identify the tenant +   service instance and therefore appropriately process and forward the +   tenant packet. + +   The overlay module provides tunneling overlay functions: tunnel +   initiation/termination as in the case of stateful tunnels (see +   Section 3.1.4) and/or encapsulation/decapsulation of frames from the +   VAPs/L3 underlay. + +   In a multi-tenant context, tunneling aggregates frames from/to +   different VNIs.  Tenant identification and traffic demultiplexing are +   based on the VN Context identifier. + +   The following approaches can be considered: + +   -  VN Context identifier per Tenant: This is a globally unique (on a +      per-DC administrative domain) VN identifier used to identify the +      corresponding VNI.  Examples of such identifiers in existing +      technologies are IEEE VLAN IDs and Service Instance IDs (I-SIDs) +      that identify virtual L2 domains when using IEEE 802.1Q and IEEE +      802.1ah, respectively.  Note that multiple VN identifiers can +      belong to a tenant. + +   -  One VN Context identifier per VNI: Each VNI value is automatically +      generated by the egress NVE, or a control plane associated with +      that NVE, and usually distributed by a control-plane protocol to +      all the related NVEs.  An example of this approach is the use of +      per-VRF MPLS labels in IP VPN [RFC4364].  The VNI value is +      therefore locally significant to the egress NVE. + +   -  One VN Context identifier per VAP: A value locally significant to +      an NVE is assigned and usually distributed by a control-plane +      protocol to identify a VAP.  An example of this approach is the +      use of per-CE MPLS labels in IP VPN [RFC4364]. + +   Note that when using one VN Context per VNI or per VAP, an additional +   global identifier (e.g., a VN identifier or name) may be used by the +   control plane to identify the tenant context. + + + + +Lasserre, et al.              Informational                    [Page 13] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +3.1.4.  Tunnel Overlays and Encapsulation Options + +   Once the VN Context identifier is added to the frame, an L3 tunnel +   encapsulation is used to transport the frame to the destination NVE. + +   Different IP tunneling options (e.g., Generic Routing Encapsulation +   (GRE), the Layer 2 Tunneling Protocol (L2TP), and IPsec) and MPLS +   tunneling can be used.  Tunneling could be stateless or stateful. +   Stateless tunneling simply entails the encapsulation of a tenant +   packet with another header necessary for forwarding the packet across +   the underlay (e.g., IP tunneling over an IP underlay).  Stateful +   tunneling, on the other hand, entails maintaining tunneling state at +   the tunnel endpoints (i.e., NVEs).  Tenant packets on an ingress NVE +   can then be transmitted over such tunnels to a destination (egress) +   NVE by encapsulating the packets with a corresponding tunneling +   header.  The tunneling state at the endpoints may be configured or +   dynamically established.  Solutions should specify the tunneling +   technology used and whether it is stateful or stateless.  In this +   document, however, tunneling and tunneling encapsulation are used +   interchangeably to simply mean the encapsulation of a tenant packet +   with a tunneling header necessary to carry the packet between an +   ingress NVE and an egress NVE across the underlay.  It should be +   noted that stateful tunneling, especially when configuration is +   involved, does impose management overhead and scale constraints. +   When confidentiality is required, the use of opportunistic security +   [OPPSEC] can be used as a stateless tunneling solution. + +3.1.5.  Control-Plane Components + +3.1.5.1.  Distributed vs. Centralized Control Plane + +   Control- and management-plane entities can be centralized or +   distributed.  Both approaches have been used extensively in the past. +   The routing model of the Internet is a good example of a distributed +   approach.  Transport networks have usually used a centralized +   approach to manage transport paths. + +   It is also possible to combine the two approaches, i.e., using a +   hybrid model.  A global view of network state can have many benefits, +   but it does not preclude the use of distributed protocols within the +   network.  Centralized models provide a facility to maintain global +   state and distribute that state to the network.  When used in +   combination with distributed protocols, greater network efficiencies, +   improved reliability, and robustness can be achieved.  Domain- and/or +   deployment-specific constraints define the balance between +   centralized and distributed approaches. + + + + + +Lasserre, et al.              Informational                    [Page 14] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +3.1.5.2.  Auto-provisioning and Service Discovery + +   NVEs must be able to identify the appropriate VNI for each Tenant +   System.  This is based on state information that is often provided by +   external entities.  For example, in an environment where a VM is a +   Tenant System, this information is provided by VM orchestration +   systems, since these are the only entities that have visibility of +   which VM belongs to which tenant. + +   A mechanism for communicating this information to the NVE is +   required.  VAPs have to be created and mapped to the appropriate VNI. +   Depending upon the implementation, this control interface can be +   implemented using an auto-discovery protocol between Tenant Systems +   and their local NVE or through management entities.  In either case, +   appropriate security and authentication mechanisms to verify that +   Tenant System information is not spoofed or altered are required. +   This is one critical aspect for providing integrity and tenant +   isolation in the system. + +   NVEs may learn reachability information for VNIs on other NVEs via a +   control protocol that exchanges such information among NVEs or via a +   management-control entity. + +3.1.5.3.  Address Advertisement and Tunnel Mapping + +   As traffic reaches an ingress NVE on a VAP, a lookup is performed to +   determine which NVE or local VAP the packet needs to be sent to.  If +   the packet is to be sent to another NVE, the packet is encapsulated +   with a tunnel header containing the destination information +   (destination IP address or MPLS label) of the egress NVE. +   Intermediate nodes (between the ingress and egress NVEs) switch or +   route traffic based upon the tunnel destination information. + +   A key step in the above process consists of identifying the +   destination NVE the packet is to be tunneled to.  NVEs are +   responsible for maintaining a set of forwarding or mapping tables +   that hold the bindings between destination VM and egress NVE +   addresses.  Several ways of populating these tables are possible: +   control plane driven, management plane driven, or data plane driven. + +   When a control-plane protocol is used to distribute address +   reachability and tunneling information, the auto-provisioning and +   service discovery could be accomplished by the same protocol.  In +   this scenario, the auto-provisioning and service discovery could be +   combined with (be inferred from) the address advertisement and +   associated tunnel mapping.  Furthermore, a control-plane protocol + + + + + +Lasserre, et al.              Informational                    [Page 15] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   that carries both MAC and IP addresses eliminates the need for the +   Address Resolution Protocol (ARP) and hence addresses one of the +   issues with explosive ARP handling as discussed in [RFC6820]. + +3.1.5.4.  Overlay Tunneling + +   For overlay tunneling, and dependent upon the tunneling technology +   used for encapsulating the Tenant System packets, it may be +   sufficient to have one or more local NVE addresses assigned and used +   in the source and destination fields of a tunneling encapsulation +   header.  Other information that is part of the tunneling +   encapsulation header may also need to be configured.  In certain +   cases, local NVE configuration may be sufficient while in other +   cases, some tunneling-related information may need to be shared among +   NVEs.  The information that needs to be shared will be technology +   dependent.  For instance, potential information could include tunnel +   identity, encapsulation type, and/or tunnel resources.  In certain +   cases, such as when using IP multicast in the underlay, tunnels that +   interconnect NVEs may need to be established.  When tunneling +   information needs to be exchanged or shared among NVEs, a control- +   plane protocol may be required.  For instance, it may be necessary to +   provide active/standby status information between NVEs, up/down +   status information, pruning/grafting information for multicast +   tunnels, etc. + +   In addition, a control plane may be required to set up the tunnel +   path for some tunneling technologies.  This applies to both unicast +   and multicast tunneling. + +3.2.  Multihoming + +   Multihoming techniques can be used to increase the reliability of an +   NVO3 network.  It is also important to ensure that the physical +   diversity in an NVO3 network is taken into account to avoid single +   points of failure. + +   Multihoming can be enabled in various nodes, from Tenant Systems into +   ToRs, ToRs into core switches/routers, and core nodes into DC GWs. + +   The NVO3 underlay nodes (i.e., from NVEs to DC GWs) rely on IP +   routing techniques or MPLS re-rerouting capabilities as the means to +   re-route traffic upon failures. + +   When a Tenant System is co-located with the NVE, the Tenant System is +   effectively single-homed to the NVE via a virtual port.  When the +   Tenant System and the NVE are separated, the Tenant System is +   connected to the NVE via a logical L2 construct such as a VLAN, and +   it can be multihomed to various NVEs.  An NVE may provide an L2 + + + +Lasserre, et al.              Informational                    [Page 16] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   service to the end system or an l3 service.  An NVE may be multihomed +   to a next layer in the DC at L2 or L3.  When an NVE provides an L2 +   service and is not co-located with the end system, loop-avoidance +   techniques must be used.  Similarly, when the NVE provides L3 +   service, similar dual-homing techniques can be used.  When the NVE +   provides an L3 service to the end system, it is possible that no +   dynamic routing protocol is enabled between the end system and the +   NVE.  The end system can be multihomed to multiple physically +   separated L3 NVEs over multiple interfaces.  When one of the links +   connected to an NVE fails, the other interfaces can be used to reach +   the end system. + +   External connectivity from a DC can be handled by two or more DC +   gateways.  Each gateway provides access to external networks such as +   VPNs or the Internet.  A gateway may be connected to two or more edge +   nodes in the external network for redundancy.  When a connection to +   an upstream node is lost, the alternative connection is used, and the +   failed route withdrawn. + +3.3.  VM Mobility + +   In DC environments utilizing VM technologies, an important feature is +   that VMs can move from one server to another server in the same or +   different L2 physical domains (within or across DCs) in a seamless +   manner. + +   A VM can be moved from one server to another in stopped or suspended +   state ("cold" VM mobility) or in running/active state ("hot" VM +   mobility).  With "hot" mobility, VM L2 and L3 addresses need to be +   preserved.  With "cold" mobility, it may be desired to preserve at +   least VM L3 addresses. + +   Solutions to maintain connectivity while a VM is moved are necessary +   in the case of "hot" mobility.  This implies that connectivity among +   VMs is preserved.  For instance, for L2 VNs, ARP caches are updated +   accordingly. + +   Upon VM mobility, NVE policies that define connectivity among VMs +   must be maintained. + +   During VM mobility, it is expected that the path to the VM's default +   gateway assures adequate QoS to VM applications, i.e., QoS that +   matches the expected service-level agreement for these applications. + +4.  Key Aspects of Overlay Networks + +   The intent of this section is to highlight specific issues that +   proposed overlay solutions need to address. + + + +Lasserre, et al.              Informational                    [Page 17] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +4.1.  Pros and Cons + +   An overlay network is a layer of virtual network topology on top of +   the physical network. + +   Overlay networks offer the following key advantages: + +   -  Unicast tunneling state management and association of Tenant +      Systems reachability are handled at the edge of the network (at +      the NVE).  Intermediate transport nodes are unaware of such state. +      Note that when multicast is enabled in the underlay network to +      build multicast trees for tenant VNs, there would be more state +      related to tenants in the underlay core network. + +   -  Tunneling is used to aggregate traffic and hide tenant addresses +      from the underlay network and hence offers the advantage of +      minimizing the amount of forwarding state required within the +      underlay network. + +   -  Decoupling of the overlay addresses (MAC and IP) used by VMs from +      the underlay network provides tenant separation and separation of +      the tenant address spaces from the underlay address space. + + +   -  Overlay networks support of a large number of virtual network +      identifiers. + +   Overlay networks also create several challenges: + +   -  Overlay networks typically have no control of underlay networks +      and lack underlay network information (e.g., underlay +      utilization): + +      o  Overlay networks and/or their associated management entities +         typically probe the network to measure link or path properties, +         such as available bandwidth or packet loss rate.  It is +         difficult to accurately evaluate network properties.  It might +         be preferable for the underlay network to expose usage and +         performance information. + +      o  Miscommunication or lack of coordination between overlay and +         underlay networks can lead to an inefficient usage of network +         resources. + +      o  When multiple overlays co-exist on top of a common underlay +         network, the lack of coordination between overlays can lead to +         performance issues and/or resource usage inefficiencies. + + + + +Lasserre, et al.              Informational                    [Page 18] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   -  Traffic carried over an overlay might fail to traverse firewalls +      and NAT devices. + +   -  Multicast service scalability: Multicast support may be required +      in the underlay network to address tenant flood containment or +      efficient multicast handling.  The underlay may also be required +      to maintain multicast state on a per-tenant basis or even on a +      per-individual multicast flow of a given tenant.  Ingress +      replication at the NVE eliminates that additional multicast state +      in the underlay core, but depending on the multicast traffic +      volume, it may cause inefficient use of bandwidth. + +4.2.  Overlay Issues to Consider + +4.2.1.  Data Plane vs. Control Plane Driven + +   In the case of an L2 NVE, it is possible to dynamically learn MAC +   addresses against VAPs.  It is also possible that such addresses be +   known and controlled via management or a control protocol for both L2 +   NVEs and L3 NVEs.  Dynamic data-plane learning implies that flooding +   of unknown destinations be supported and hence implies that broadcast +   and/or multicast be supported or that ingress replication be used as +   described in Section 4.2.3.  Multicasting in the underlay network for +   dynamic learning may lead to significant scalability limitations. +   Specific forwarding rules must be enforced to prevent loops from +   happening.  This can be achieved using a spanning tree, a shortest +   path tree, or a split-horizon mesh. + +   It should be noted that the amount of state to be distributed is +   dependent upon network topology and the number of virtual machines. +   Different forms of caching can also be utilized to minimize state +   distribution between the various elements.  The control plane should +   not require an NVE to maintain the locations of all the Tenant +   Systems whose VNs are not present on the NVE.  The use of a control +   plane does not imply that the data plane on NVEs has to maintain all +   the forwarding state in the control plane. + +4.2.2.  Coordination between Data Plane and Control Plane + +   For an L2 NVE, the NVE needs to be able to determine MAC addresses of +   the Tenant Systems connected via a VAP.  This can be achieved via +   data-plane learning or a control plane.  For an L3 NVE, the NVE needs +   to be able to determine the IP addresses of the Tenant Systems +   connected via a VAP. + + + + + + + +Lasserre, et al.              Informational                    [Page 19] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   In both cases, coordination with the NVE control protocol is needed +   such that when the NVE determines that the set of addresses behind a +   VAP has changed, it triggers the NVE control plane to distribute this +   information to its peers. + +4.2.3.  Handling Broadcast, Unknown Unicast, and Multicast (BUM) Traffic + +   There are several options to support packet replication needed for +   broadcast, unknown unicast, and multicast.  Typical methods include: + +   - Ingress replication + +   - Use of underlay multicast trees + +   There is a bandwidth vs. state trade-off between the two approaches. +   Depending upon the degree of replication required (i.e., the number +   of hosts per group) and the amount of multicast state to maintain, +   trading bandwidth for state should be considered. + +   When the number of hosts per group is large, the use of underlay +   multicast trees may be more appropriate.  When the number of hosts is +   small (e.g., 2-3) and/or the amount of multicast traffic is small, +   ingress replication may not be an issue. + +   Depending upon the size of the data center network and hence the +   number of (S,G) entries, and also the duration of multicast flows, +   the use of underlay multicast trees can be a challenge. + +   When flows are well known, it is possible to pre-provision such +   multicast trees.  However, it is often difficult to predict +   application flows ahead of time; hence, programming of (S,G) entries +   for short-lived flows could be impractical. + +   A possible trade-off is to use in the underlay shared multicast trees +   as opposed to dedicated multicast trees. + +4.2.4. Path MTU + +   When using overlay tunneling, an outer header is added to the +   original frame.  This can cause the MTU of the path to the egress +   tunnel endpoint to be exceeded. + +   It is usually not desirable to rely on IP fragmentation for +   performance reasons.  Ideally, the interface MTU as seen by a Tenant +   System is adjusted such that no fragmentation is needed. + + + + + + +Lasserre, et al.              Informational                    [Page 20] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   It is possible for the MTU to be configured manually or to be +   discovered dynamically.  Various Path MTU discovery techniques exist +   in order to determine the proper MTU size to use: + +   -  Classical ICMP-based Path MTU Discovery [RFC1191] [RFC1981] + +      Tenant Systems rely on ICMP messages to discover the MTU of the +      end-to-end path to its destination.  This method is not always +      possible, such as when traversing middleboxes (e.g., firewalls) +      that disable ICMP for security reasons. + +   -  Extended Path MTU Discovery techniques such as those defined in +      [RFC4821] + +      Tenant Systems send probe packets of different sizes and rely on +      confirmation of receipt or lack thereof from receivers to allow a +      sender to discover the MTU of the end-to-end paths. + +   While it could also be possible to rely on the NVE to perform +   segmentation and reassembly operations without relying on the Tenant +   Systems to know about the end-to-end MTU, this would lead to +   undesired performance and congestion issues as well as significantly +   increase the complexity of hardware NVEs required for buffering and +   reassembly logic. + +   Preferably, the underlay network should be designed in such a way +   that the MTU can accommodate the extra tunneling and possibly +   additional NVO3 header encapsulation overhead. + +4.2.5.  NVE Location Trade-Offs + +   In the case of DC traffic, traffic originated from a VM is native +   Ethernet traffic.  This traffic can be switched by a local virtual +   switch or ToR switch and then by a DC gateway.  The NVE function can +   be embedded within any of these elements. + +   There are several criteria to consider when deciding where the NVE +   function should happen: + +   -  Processing and memory requirements + +      o  Datapath (e.g., lookups, filtering, and +         encapsulation/decapsulation) + +      o  Control-plane processing (e.g., routing, signaling, and OAM) +         and where specific control-plane functions should be enabled + +   -  FIB/RIB size + + + +Lasserre, et al.              Informational                    [Page 21] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   -  Multicast support + +      o  Routing/signaling protocols + +      o  Packet replication capability + +      o  Multicast FIB + +   -  Fragmentation support + +   -  QoS support (e.g., marking, policing, and queuing) + +   -  Resiliency + +4.2.6.  Interaction between Network Overlays and Underlays + +   When multiple overlays co-exist on top of a common underlay network, +   resources (e.g., bandwidth) should be provisioned to ensure that +   traffic from overlays can be accommodated and QoS objectives can be +   met.  Overlays can have partially overlapping paths (nodes and +   links). + +   Each overlay is selfish by nature.  It sends traffic so as to +   optimize its own performance without considering the impact on other +   overlays, unless the underlay paths are traffic engineered on a per- +   overlay basis to avoid congestion of underlay resources. + +   Better visibility between overlays and underlays, or general +   coordination in placing overlay demands on an underlay network, may +   be achieved by providing mechanisms to exchange performance and +   liveliness information between the underlay and overlay(s) or by the +   use of such information by a coordination system.  Such information +   may include: + +   -  Performance metrics (throughput, delay, loss, jitter) such as +      defined in [RFC3148], [RFC2679], [RFC2680], and [RFC3393]. + +   -  Cost metrics + +5.  Security Considerations + +   There are three points of view when considering security for NVO3. +   First, the service offered by a service provider via NVO3 technology +   to a tenant must meet the mutually agreed security requirements. +   Second, a network implementing NVO3 must be able to trust the virtual +   network identity associated with packets received from a tenant. +   Third, an NVO3 network must consider the security associated with +   running as an overlay across the underlay network. + + + +Lasserre, et al.              Informational                    [Page 22] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   To meet a tenant's security requirements, the NVO3 service must +   deliver packets from the tenant to the indicated destination(s) in +   the overlay network and external networks.  The NVO3 service provides +   data confidentiality through data separation.  The use of both VNIs +   and tunneling of tenant traffic by NVEs ensures that NVO3 data is +   kept in a separate context and thus separated from other tenant +   traffic.  The infrastructure supporting an NVO3 service (e.g., +   management systems, NVEs, NVAs, and intermediate underlay networks) +   should be limited to authorized access so that data integrity can be +   expected.  If a tenant requires that its data be confidential, then +   the Tenant System may choose to encrypt its data before transmission +   into the NVO3 service. + +   An NVO3 service must be able to verify the VNI received on a packet +   from the tenant.  To ensure this, not only tenant data but also NVO3 +   control data must be secured (e.g., control traffic between NVAs and +   NVEs, between NVAs, and between NVEs).  Since NVEs and NVAs play a +   central role in NVO3, it is critical that secure access to NVEs and +   NVAs be ensured such that no unauthorized access is possible.  As +   discussed in Section 3.1.5.2, identification of Tenant Systems is +   based upon state that is often provided by management systems (e.g., +   a VM orchestration system in a virtualized environment).  Secure +   access to such management systems must also be ensured.  When an NVE +   receives data from a Tenant System, the tenant identity needs to be +   verified in order to guarantee that it is authorized to access the +   corresponding VN.  This can be achieved by identifying incoming +   packets against specific VAPs in some cases.  In other circumstances, +   authentication may be necessary.  Once this verification is done, the +   packet is allowed into the NVO3 overlay, and no integrity protection +   is provided on the overlay packet encapsulation (e.g., the VNI, +   destination NVE, etc.). + +   Since an NVO3 service can run across diverse underlay networks, when +   the underlay network is not trusted to provide at least data +   integrity, data encryption is needed to assure correct packet +   delivery. + +   It is also desirable to restrict the types of information (e.g., +   topology information as discussed in Section 4.2.6) that can be +   exchanged between an NVO3 service and underlay networks based upon +   their agreed security requirements. + + + + + + + + + + +Lasserre, et al.              Informational                    [Page 23] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +6.  Informative References + +   [EVPN]     Sajassi, A., Aggarwal, R., Bitar, N., Isaac, A., and J. +              Uttaro, "BGP MPLS Based Ethernet VPN", Work in Progress, +              draft-ietf-l2vpn-evpn-10, October 2014. + +   [OPPSEC]   Dukhovni, V. "Opportunistic Security: Some Protection Most +              of the Time", Work in Progress, draft-dukhovni- +              opportunistic-security-04, August 2014. + +   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, +              November 1990, <http://www.rfc-editor.org/info/rfc1191>. + +   [RFC1981]  McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery +              for IP version 6", RFC 1981, August 1996, +              <http://www.rfc-editor.org/info/rfc1981>. + +   [RFC2679]  Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way +              Delay Metric for IPPM", RFC 2679, September 1999, +              <http://www.rfc-editor.org/info/rfc2679>. + +   [RFC2680]  Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way +              Packet Loss Metric for IPPM", RFC 2680, September 1999, +              <http://www.rfc-editor.org/info/rfc2680>. + +   [RFC3148]  Mathis, M. and M. Allman, "A Framework for Defining +              Empirical Bulk Transfer Capacity Metrics", RFC 3148, July +              2001, <http://www.rfc-editor.org/info/rfc3148>. + +   [RFC3393]  Demichelis, C. and P. Chimento, "IP Packet Delay Variation +              Metric for IP Performance Metrics (IPPM)", RFC 3393, +              November 2002, <http://www.rfc-editor.org/info/rfc3393>. + +   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private +              Networks (VPNs)", RFC 4364, February 2006, +              <http://www.rfc-editor.org/info/rfc4364>. + +   [RFC4761]  Kompella, K., Ed., and Y. Rekhter, Ed., "Virtual Private +              LAN Service (VPLS) Using BGP for Auto-Discovery and +              Signaling", RFC 4761, January 2007, +              <http://www.rfc-editor.org/info/rfc4761>. + +   [RFC4762]  Lasserre, M., Ed., and V. Kompella, Ed., "Virtual Private +              LAN Service (VPLS) Using Label Distribution Protocol (LDP) +              Signaling", RFC 4762, January 2007, +              <http://www.rfc-editor.org/info/rfc4762>. + + + + + +Lasserre, et al.              Informational                    [Page 24] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +   [RFC4821]  Mathis, M. and J. Heffner, "Packetization Layer Path MTU +              Discovery", RFC 4821, March 2007, +              <http://www.rfc-editor.org/info/rfc4821>. + +   [RFC6820]  Narten, T., Karir, M., and I. Foo, "Address Resolution +              Problems in Large Data Center Networks", RFC 6820, January +              2013, <http://www.rfc-editor.org/info/rfc6820>. + +   [RFC7364]  Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L., +              Kreeger, L., and M. Napierala, "Problem Statement: +              Overlays for Network Virtualization", RFC 7364, October +              2014, <http://www.rfc-editor.org/info/rfc7364>. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Lasserre, et al.              Informational                    [Page 25] + +RFC 7365         Framework for DC Network Virtualization    October 2014 + + +Acknowledgments + +   In addition to the authors, the following people contributed to this +   document: Dimitrios Stiliadis, Rotem Salomonovitch, Lucy Yong, Thomas +   Narten, Larry Kreeger, and David Black. + +Authors' Addresses + +   Marc Lasserre +   Alcatel-Lucent +   EMail: marc.lasserre@alcatel-lucent.com + + +   Florin Balus +   Alcatel-Lucent +   777 E. Middlefield Road +   Mountain View, CA 94043 +   United States +   EMail: florin.balus@alcatel-lucent.com + + +   Thomas Morin +   Orange +   EMail: thomas.morin@orange.com + + +   Nabil Bitar +   Verizon +   50 Sylvan Road +   Waltham, MA 02145 +   United States +   EMail: nabil.n.bitar@verizon.com + + +   Yakov Rekhter +   Juniper +   EMail: yakov@juniper.net + + + + + + + + + + + + + + +Lasserre, et al.              Informational                    [Page 26] + |