diff options
Diffstat (limited to 'doc/rfc/rfc8584.txt')
-rw-r--r-- | doc/rfc/rfc8584.txt | 1795 |
1 files changed, 1795 insertions, 0 deletions
diff --git a/doc/rfc/rfc8584.txt b/doc/rfc/rfc8584.txt new file mode 100644 index 0000000..87b6e00 --- /dev/null +++ b/doc/rfc/rfc8584.txt @@ -0,0 +1,1795 @@ + + + + + + +Internet Engineering Task Force (IETF) J. Rabadan, Ed. +Request for Comments: 8584 Nokia +Updates: 7432 S. Mohanty, Ed. +Category: Standards Track A. Sajassi +ISSN: 2070-1721 Cisco + J. Drake + Juniper + K. Nagaraj + S. Sathappan + Nokia + April 2019 + + + Framework for Ethernet VPN Designated Forwarder Election Extensibility + +Abstract + + An alternative to the default Designated Forwarder (DF) selection + algorithm in Ethernet VPNs (EVPNs) is defined. The DF is the + Provider Edge (PE) router responsible for sending Broadcast, Unknown + Unicast, and Multicast (BUM) traffic to a multihomed Customer Edge + (CE) device on a given VLAN on a particular Ethernet Segment (ES). + In addition, the ability to influence the DF election result for a + VLAN based on the state of the associated Attachment Circuit (AC) is + specified. This document clarifies the DF election Finite State + Machine in EVPN services. Therefore, it updates the EVPN + specification (RFC 7432). + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc8584. + + + + + + + + + + +Rabadan, et al. Standards Track [Page 1] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + +Copyright Notice + + Copyright (c) 2019 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction ....................................................3 + 1.1. Conventions and Terminology ................................3 + 1.2. Default Designated Forwarder (DF) Election in EVPN + Services ...................................................5 + 1.3. Problem Statement ..........................................8 + 1.3.1. Unfair Load Balancing and Service Disruption ........8 + 1.3.2. Traffic Black-Holing on Individual AC Failures .....10 + 1.4. The Need for Extending the Default DF Election in + EVPN Services .............................................12 + 2. Designated Forwarder Election Protocol and BGP Extensions ......13 + 2.1. The DF Election Finite State Machine (FSM) ................13 + 2.2. The DF Election Extended Community ........................16 + 2.2.1. Backward Compatibility .............................19 + 3. The Highest Random Weight DF Election Algorithm ................19 + 3.1. HRW and Consistent Hashing ................................20 + 3.2. HRW Algorithm for EVPN DF Election ........................20 + 4. The AC-Influenced DF Election Capability .......................22 + 4.1. AC-Influenced DF Election Capability for + VLAN-Aware Bundle Services ................................24 + 5. Solution Benefits ..............................................25 + 6. Security Considerations ........................................26 + 7. IANA Considerations ............................................27 + 8. References .....................................................28 + 8.1. Normative References ......................................28 + 8.2. Informative References ....................................29 + Acknowledgments ...................................................30 + Contributors ......................................................30 + Authors' Addresses ................................................31 + + + + + + +Rabadan, et al. Standards Track [Page 2] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + +1. Introduction + + The Designated Forwarder (DF) in Ethernet VPNs (EVPNs) is the + Provider Edge (PE) router responsible for sending Broadcast, Unknown + Unicast, and Multicast (BUM) traffic to a multihomed Customer Edge + (CE) device on a given VLAN on a particular Ethernet Segment (ES). + The DF is elected from the set of multihomed PEs attached to a given + ES, each of which advertises an ES route for the ES as identified by + its Ethernet Segment Identifier (ESI). By default, the EVPN uses a + DF election algorithm referred to as "service carving". The DF + election algorithm is based on a modulus function (V mod N) that + takes the number of PEs in the ES (N) and the VLAN value (V) as + input. This document addresses inefficiencies in the default DF + election algorithm by defining a new DF election algorithm and an + ability to influence the DF election result for a VLAN, depending on + the state of the associated Attachment Circuit (AC). In order to + avoid any ambiguity with the identifier used in the DF election + algorithm, this document uses the term "Ethernet Tag" instead of + "VLAN". This document also creates a registry with IANA for future + DF election algorithms and capabilities (see Section 7). It also + presents a formal definition and clarification of the DF election + Finite State Machine (FSM). Therefore, this document updates + [RFC7432], and EVPN implementations MUST conform to the + prescribed FSM. + + The procedures described in this document apply to DF election in all + EVPN solutions, including those described in [RFC7432] and [RFC8214]. + Apart from the formal description of the FSM, this document does not + intend to update other procedures described in [RFC7432]; it only + aims to improve the behavior of the DF election on PEs that are + upgraded to follow the procedures described in this document. + +1.1. Conventions and Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + + o AC: Attachment Circuit. An AC has an Ethernet Tag associated + with it. + + o ACS: Attachment Circuit Status. + + o BUM: Broadcast, unknown unicast, and multicast. + + o DF: Designated Forwarder. + + + +Rabadan, et al. Standards Track [Page 3] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + o NDF: Non-Designated Forwarder. + + o BDF: Backup Designated Forwarder. + + o Ethernet A-D per ES route: Refers to Route Type 1 as defined in + [RFC7432] or to Auto-discovery per Ethernet Segment route. + + o Ethernet A-D per EVI route: Refers to Route Type 1 as defined in + [RFC7432] or to Auto-discovery per EVPN Instance route. + + o ES: Ethernet Segment. + + o ESI: Ethernet Segment Identifier. + + o EVI: EVPN Instance. + + o MAC-VRF: A Virtual Routing and Forwarding table for Media Access + Control (MAC) addresses on a PE. + + o BD: Broadcast Domain. An EVI may be comprised of one BD + (VLAN-based or VLAN Bundle services) or multiple BDs (VLAN-aware + Bundle services). + + o Bridge table: An instantiation of a BD on a MAC-VRF. + + o HRW: Highest Random Weight. + + o VID: VLAN Identifier. + + o CE-VID: Customer Edge VLAN Identifier. + + o Ethernet Tag: Used to represent a BD that is configured on a given + ES for the purpose of DF election. Note that any of the following + may be used to represent a BD: VIDs (including Q-in-Q tags), + configured IDs, VNIs (Virtual Extensible Local Area Network + (VXLAN) Network Identifiers), normalized VIDs, I-SIDs (Service + Instance Identifiers), etc., as long as the representation of the + BDs is configured consistently across the multihomed PEs attached + to that ES. The Ethernet Tag value MUST be different from zero. + + o Ethernet Tag ID: Refers to the identifier used in the EVPN routes + defined in [RFC7432]. Its value may be the same as the Ethernet + Tag value (see the definition for Ethernet Tag) when advertising + routes for VLAN-aware Bundle services. Note that in the case of + VLAN-based or VLAN Bundle services, the Ethernet Tag ID is zero. + + + + + + +Rabadan, et al. Standards Track [Page 4] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + o DF election procedure: Also called "DF election". Refers to the + process in its entirety, including the discovery of the PEs in the + ES, the creation and maintenance of the PE candidate list, and the + selection of a PE. + + o DF algorithm: A component of the DF election procedure. Strictly + refers to the selection of a PE for a given <ES, Ethernet Tag>. + + o RR: Route Reflector. A network routing component for BGP + [RFC4456]. It offers an alternative to the logical full-mesh + requirement of the Internal Border Gateway Protocol (IBGP). The + purpose of the RR is concentration. Multiple BGP routers can peer + with a central point, the RR -- acting as a route reflector server + -- rather than peer with every other router in a full mesh. This + results in an O(N) peering as opposed to O(N^2). + + o TTL: Time To Live. + + This document also assumes that the reader is familiar with the + terminology provided in [RFC7432]. + +1.2. Default Designated Forwarder (DF) Election in EVPN Services + + [RFC7432] defines the DF as the EVPN PE responsible for: + + o Flooding BUM traffic on a given Ethernet Tag on a particular ES to + the CE. This is valid for Single-Active and All-Active EVPN + multihoming. + + o Sending unicast traffic on a given Ethernet Tag on a particular ES + to the CE. This is valid for Single-Active multihoming. + + + + + + + + + + + + + + + + + + + + +Rabadan, et al. Standards Track [Page 5] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + Figure 1 illustrates an example that we will use to explain the DF + function. + + +---------------+ + | IP/MPLS | + | Core | + +----+ ES1 +----+ +----+ + | CE1|-----| | | |____ES2 + +----+ | PE1| | PE2| \ + | | +----+ \+----+ + +----+ | | CE2| + | +----+ /+----+ + | | |____/ | + | | PE3| ES2 / + | +----+ / + | | / + +-------------+----+ / + | PE4|____/ES2 + | | + +----+ + + Figure 1: EVPN Multihoming + + Figure 1 illustrates a case where there are two ESes: ES1 and ES2. + PE1 is attached to CE1 via ES1, whereas PE2, PE3, and PE4 are + attached to CE2 via ES2, i.e., PE2, PE3, and PE4 form a redundancy + group. Since CE2 is multihomed to different PEs on the same ES, it + is necessary for PE2, PE3, and PE4 to agree on a DF to satisfy the + above-mentioned requirements. + + The effect of forwarding loops in a Layer 2 network is particularly + severe because of the broadcast nature of Ethernet traffic and the + lack of a TTL. Therefore, it is very important that, in the case of + a multihomed CE, only one of the PEs be used to send BUM traffic + to it. + + One of the prerequisites for this support is that participating PEs + must agree amongst themselves as to who would act as the DF. This + needs to be achieved through a distributed algorithm in which each + participating PE independently and unambiguously selects one of the + participating PEs as the DF, and the result should be consistent and + unanimous. + + The default algorithm for DF election defined by [RFC7432] at the + granularity of (ESI, EVI) is referred to as "service carving". In + this document, service carving and the default DF election algorithm + are used interchangeably. With service carving, it is possible to + elect multiple DFs per ES (one per EVI) in order to perform load + + + +Rabadan, et al. Standards Track [Page 6] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + balancing of traffic destined to a given ES. The objective is that + the load-balancing procedures should carve up the BD space among the + redundant PE nodes evenly, in such a way that every PE is the DF for + a distinct set of EVIs. + + The DF election algorithm (as described in [RFC7432], Section 8.5) is + based on a modulus operation. The PEs to which the ES (for which DF + election is to be carried out per EVI) is multihomed form an ordered + (ordinal) list in ascending order by PE IP address value. For + example, there are N PEs: PE0, PE1,... PE(N-1) ranked as per + increasing IP addresses in the ordinal list; then, for each VLAN with + Ethernet Tag V, configured on ES1, PEx is the DF for VLAN V on ES1 + when x equals (V mod N). In the case of a VLAN Bundle, only the + lowest VLAN is used. In the case when the planned density is high + (meaning there are a significant number of VLANs and the Ethernet + Tags are uniformly distributed), the thinking is that the DF election + will be spread across the PEs hosting that ES and good load balancing + can be achieved. + + However, the described default DF election algorithm has some + undesirable properties and, in some cases, can be somewhat disruptive + and unfair. This document describes some of those issues and defines + a mechanism for dealing with them. These mechanisms do involve + changes to the default DF election algorithm, but they do not require + any changes to the EVPN route exchange, and changes in the EVPN + routes will be minimal. + + In addition, there is a need to extend the DF election procedures so + that new algorithms and capabilities are possible. A single + algorithm (the default DF election algorithm) may not meet the + requirements in all the use cases. + + Note that while [RFC7432] elects a DF per <ES, EVI>, this document + elects a DF per <ES, BD>. This means that unlike [RFC7432], where + for a VLAN-aware Bundle service EVI there is only one DF for the EVI, + this document specifies that there will be multiple DFs, one for each + BD configured in that EVI. + + + + + + + + + + + + + + +Rabadan, et al. Standards Track [Page 7] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + +1.3. Problem Statement + + This section describes some potential issues with the default DF + election algorithm. + +1.3.1. Unfair Load Balancing and Service Disruption + + There are three fundamental problems with the current default DF + election algorithm. + + 1. The algorithm will not perform well when the Ethernet Tag follows + a non-uniform distribution -- for instance, when the Ethernet + Tags are all even or all odd. In such a case, let us assume that + the ES is multihomed to two PEs; one of the PEs will be elected + as the DF for all of the VLANs. This is very suboptimal. It + defeats the purpose of service carving, as the DFs are not really + evenly spread across the PEs hosting the ES. In fact, in this + particular case, one of the PEs does not get elected as the DF at + all, so it does not participate in DF responsibilities at all. + Consider another example where, referring to Figure 1, let's + assume that (1) PE2, PE3, and PE4 are listed in ascending order + by IP address and (2) each VLAN configured on ES2 is associated + with an Ethernet Tag of the form (3x+1), where x is an integer. + This will result in PE3 always being selected as the DF. + + 2. The Ethernet Tag that identifies the BD can be as large as 2^24; + however, it is not guaranteed that the tenant BD on the ES will + conform to a uniform distribution. In fact, it is up to the + customer what BDs they will configure on the ES. Quoting + [Knuth]: + + In general, we want to avoid values of M that divide r^k+a or + r^k-a, where k and a are small numbers and r is the radix of + the alphabetic character set (usually r=64, 256 or 100), since + a remainder modulo such a value of M tends to be largely a + simple superposition of key digits. Such considerations + suggest that we choose M to be a prime number such that + r^k!=a(modulo)M or r^k!=?a(modulo)M for small k & a. + + In our case, N is the number of PEs (Section 8.5 of [RFC7432]). + N corresponds to M above. Since N, N-1, or N+1 need not satisfy + the primality properties of M, as per the modulo-based DF + assignment [RFC7432], whenever a PE goes down or a new PE boots + up (attached to the same ES), the modulo scheme will not + necessarily map BDs to PEs uniformly. + + + + + + +Rabadan, et al. Standards Track [Page 8] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + 3. Disruption is another problem. Consider a case when the same ES + is multihomed to a set of PEs. When the ES is DOWN in one of the + PEs, say PE1, or PE1 itself reboots, or the BGP process goes down + or the connectivity between PE1 and an RR goes down, the + effective number of PEs in the system now becomes N-1, and DFs + are computed for all the VLANs that are configured on that ES. + In general, if the DF for a VLAN V happens not to be PE1, but + some other PE, say PE2, it is likely that some other PE + (different from PE1 and PE2) will become the new DF. This is not + desirable. Similarly, when a new PE hosts the same ES, the + mapping again changes because of the modulus operation. This + results in needless churn. Again referring to Figure 1, say V1, + V2, and V3 are VLANs configured on ES2 with associated Ethernet + Tags of values 999, 1000, and 1001, respectively. So, PE1, PE2, + and PE3 are the DFs for V1, V2, and V3, respectively. Now when + PE3 goes down, PE2 will become the DF for V1 and PE1 will become + the DF for V2. + + One point to note is that the default DF election algorithm assumes + that all the PEs who are multihomed to the same ES (and interested in + the DF election by exchanging EVPN routes) use an Originating + Router's IP address [RFC7432] of the same family. This does not need + to be the case, as the EVPN address family can be carried over an + IPv4 or IPv6 peering, and the PEs attached to the same ES may use an + address of either family. + + Mathematically, a conventional hash function maps a key k to a number + i representing one of m hash buckets through a function h(k), i.e., + i = h(k). In the EVPN case, h is simply a modulo-m hash function + viz. h(V) = V mod N, where N is the number of PEs that are multihomed + to the ES in question. It is well known that for good hash + distribution using the modulus operation, the modulus N should be a + prime number not too close to a power of 2 [CLRS2009]. When the + effective number of PEs changes from N to N-1 (or vice versa), all + the objects (VLAN V) will be remapped except those for which V mod N + and V mod (N-1) refer to the same PE in the previous and subsequent + ordinal rankings, respectively. From a forwarding perspective, this + is a churn, as it results in reprogramming the PE ports as either + blocking or non-blocking at the PEs where the DF state changes. + + This document addresses this problem and furnishes a solution to this + undesirable behavior. + + + + + + + + + +Rabadan, et al. Standards Track [Page 9] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + +1.3.2. Traffic Black-Holing on Individual AC Failures + + The default DF election algorithm defined by [RFC7432] takes into + account only two variables in the modulus function for a given ES: + the existence of the PE's IP address in the candidate list and the + locally provisioned Ethernet Tags. + + If the DF for an <ESI, EVI> fails (due to physical link/node + failures), an ES route withdrawal will make the NDF PEs re-elect the + DF for that <ESI, EVI> and the service will be recovered. + + However, the default DF election procedure does not provide + protection against "logical" failures or human errors that may occur + at the service level on the DF, while the list of active PEs for a + given ES does not change. These failures may have an impact not only + on the local PE where the issue happens but also on the rest of the + PEs of the ES. Some examples of such logical failures are listed + below: + + (a) A given individual AC defined in an ES is accidentally shut down + or is not provisioned yet (hence, the ACS is DOWN), while the ES + is operationally active (since the ES route is active). + + (b) A given MAC-VRF with a defined ES is either shut down or not + provisioned yet, while the ES is operationally active (since the + ES route is active). In this case, the ACS of all the ACs + defined in that MAC-VRF is considered to be DOWN. + + Neither (a) nor (b) will trigger the DF re-election on the remote + multihomed PEs for a given ES, since the ACS is not taken into + account in the DF election procedures. While the ACS is used as a DF + election tiebreaker and trigger in Virtual Private LAN Service (VPLS) + multihoming procedures [VPLS-MH], there is no procedure defined in + the EVPN specification [RFC7432] to trigger the DF re-election based + on the ACS change on the DF. + + + + + + + + + + + + + + + + +Rabadan, et al. Standards Track [Page 10] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + Figure 2 shows an example of logical AC failure. + + +---+ + |CE4| + +---+ + | + PE4 | + +-----+-----+ + +---------------| +-----+ |---------------+ + | | | BD-1| | | + | +-----------+ | + | | + | EVPN | + | | + | PE1 PE2 PE3 | + | (NDF) (DF) (NDF)| + +-----------+ +-----------+ +-----------+ + | | BD-1| | | | BD-1| | | | BD-1| | + | +-----+ |-------| +-----+ |-------| +-----+ | + +-----------+ +-----------+ +-----------+ + AC1\ ES12 /AC2 AC3\ ES23 /AC4 + \ / \ / + \ / \ / + +----+ +----+ + |CE12| |CE23| + +----+ +----+ + + Figure 2: Default DF Election and Traffic Black-Holing + + BD-1 is defined in PE1, PE2, PE3, and PE4. CE12 is a multihomed CE + connected to ES12 in PE1 and PE2. Similarly, CE23 is multihomed to + PE2 and PE3 using ES23. Both CE12 and CE23 are connected to BD-1 + through VLAN-based service interfaces: CE12-VID 1 (VID 1 on CE12) is + associated with AC1 and AC2 in BD-1, whereas CE23-VID 1 is associated + with AC3 and AC4 in BD-1. Assume that, although not represented, + there are other ACs defined on these ESes mapped to different BDs. + + + + + + + + + + + + + + + +Rabadan, et al. Standards Track [Page 11] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + After executing the default DF election algorithm as described in + [RFC7432], PE2 turns out to be the DF for ES12 and ES23 in BD-1. The + following issues may arise: + + (a) If AC2 is accidentally shut down or is not configured yet, CE12 + traffic will be impacted. In the case of All-Active + multihoming, the BUM traffic to CE12 will be "black-holed", + whereas for Single-Active multihoming, all the traffic to/from + CE12 will be discarded. This is because a logical failure in + PE2's AC2 may not trigger an ES route withdrawal for ES12 (since + there are still other ACs active on ES12); therefore, PE1 will + not rerun the DF election procedures. + + (b) If the bridge table for BD-1 is administratively shut down or is + not configured yet on PE2, CE12 and CE23 will both be impacted: + BUM traffic to both CEs will be discarded in the case of + All-Active multihoming, and all traffic will be discarded + to/from the CEs in the case of Single-Active multihoming. This + is because PE1 and PE3 will not rerun the DF election procedures + and will keep assuming that PE2 is the DF. + + Quoting [RFC7432], "When an Ethernet tag is decommissioned on an + Ethernet segment, then the PE MUST withdraw the Ethernet A-D per EVI + route(s) announced for the <ESI, Ethernet tags> that are impacted by + the decommissioning." However, while this A-D per EVI route + withdrawal is used at the remote PEs performing aliasing or backup + procedures, it is not used to influence the DF election for the + affected EVIs. + + This document adds an optional modification of the DF election + procedure so that the ACS may be taken into account as a variable in + the DF election; therefore, EVPN can provide protection against + logical failures. + +1.4. The Need for Extending the Default DF Election in EVPN Services + + Section 1.3 describes some of the issues that exist in the default DF + election procedures. In order to address those issues, this document + introduces a new DF election framework. This framework allows the + PEs to agree on a common DF election algorithm, as well as the + capabilities to enable during the DF election procedure. Generally, + "DF election algorithm" refers to the algorithm by which a number of + input parameters are used to determine the DF PE, while "DF election + capability" refers to an additional feature that can be used prior to + the invocation of the DF election algorithm, such as modifying the + inputs (or list of candidate PEs). + + + + + +Rabadan, et al. Standards Track [Page 12] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + Within this framework, this document defines a new DF election + algorithm and a new capability that can influence the DF election + result: + + o The new DF election algorithm is referred to as "Highest Random + Weight" (HRW). The HRW procedures are described in Section 3. + + o The new DF election capability is referred to as "AC-Influenced DF + election" (AC-DF). The AC-DF procedures are described in + Section 4. + + o HRW and AC-DF mechanisms are independent of each other. + Therefore, a PE may support either HRW or AC-DF independently or + may support both of them together. A PE may also support the + AC-DF capability along with the default DF election algorithm per + [RFC7432]. + + In addition, this document defines a way to indicate the support of + HRW and/or AC-DF along with the EVPN ES routes advertised for a given + ES. Refer to Section 2.2 for more details. + +2. Designated Forwarder Election Protocol and BGP Extensions + + This section describes the BGP extensions required to support the new + DF election procedures. In addition, since the EVPN specification + [RFC7432] leaves several questions open as to the precise FSM + behavior of the DF election, Section 2.1 precisely describes the + intended behavior. + +2.1. The DF Election Finite State Machine (FSM) + + Per [RFC7432], the FSM shown in Figure 3 is executed per <ES, VLAN> + in the case of VLAN-based service or <ES, [VLANs in VLAN Bundle]> in + the case of a VLAN Bundle on each participating PE. Note that the + FSM is conceptual. Any design or implementation MUST comply with + behavior that is equivalent to the behavior outlined in this FSM. + + + + + + + + + + + + + + + +Rabadan, et al. Standards Track [Page 13] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + VLAN_CHANGE VLAN_CHANGE + RCVD_ES RCVD_ES + LOST_ES LOST_ES + +----+ +-------+ + | | | v + | +-+----+ ES_UP ++-------++ + +->+ INIT +-------------->+ DF_WAIT | + ++-----+ +-------+-+ + ^ | + +-----------+ | |DF_TIMER + | ANY_STATE +-------+ VLAN_CHANGE | + +-----------+ ES_DOWN +-----------------+ | + | RCVD_ES v v + +--------++ LOST_ES ++------+-+ + | DF_DONE +<--------------+ DF_CALC +<-+ + +---------+ CALCULATED +-------+-+ | + | | + +----+ + VLAN_CHANGE + RCVD_ES + LOST_ES + + Figure 3: DF Election Finite State Machine + + Observe that each EVI is locally configured on each of the multihomed + PEs attached to a given ES and that the FSM does not provide any + protection against inconsistent configuration between these PEs. + That is, for a given EVI, one or more of the PEs are inadvertently + configured with a different set of VLANs for a VLAN-aware Bundle + service or with different VLANs for a VLAN-based service. + + The states and events shown in Figure 3 are defined as follows. + + States: + + 1. INIT: Initial state. + + 2. DF_WAIT: State in which the participant waits for enough + information to perform the DF election for the EVI/ESI/VLAN + combination. + + 3. DF_CALC: State in which the new DF is recomputed. + + 4. DF_DONE: State in which the corresponding DF for the EVI/ESI/VLAN + combination has been elected. + + 5. ANY_STATE: Refers to any of the above states. + + + + +Rabadan, et al. Standards Track [Page 14] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + Events: + + 1. ES_UP: The ES has been locally configured as "UP". + + 2. ES_DOWN: The ES has been locally configured as "DOWN". + + 3. VLAN_CHANGE: The VLANs configured in a bundle (that uses the ES) + changed. This event is necessary for VLAN Bundles only. + + 4. DF_TIMER: DF timer [RFC7432] (referred to as "Wait timer" in this + document) has expired. + + 5. RCVD_ES: A new or changed ES route is received in an Update + message with an MP_REACH_NLRI. Receiving an unchanged Update + MUST NOT trigger this event. + + 6. LOST_ES: An Update message with an MP_UNREACH_NLRI for a + previously received ES route has been received. If such a + message is seen for a route that has not been advertised + previously, the event MUST NOT be triggered. + + 7. CALCULATED: DF has been successfully calculated. + + Corresponding actions when transitions are performed or states are + entered/exited: + + 1. ANY_STATE on ES_DOWN: + (i) Stop the DF Wait timer. + (ii) Assume an NDF for the local PE. + + 2. INIT on ES_UP: Transition to DF_WAIT. + + 3. INIT on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do nothing. + + 4. DF_WAIT on entering the state: + (i) Start the DF Wait timer if not started already or expired. + (ii) Assume an NDF for the local PE. + + 5. DF_WAIT on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do nothing. + + 6. DF_WAIT on DF_TIMER: Transition to DF_CALC. + + 7. DF_CALC on entering or re-entering the state: + (i) Rebuild the candidate list, perform a hash, and perform the + election. + (ii) Afterwards, the FSM generates a CALCULATED event against + itself. + + + + +Rabadan, et al. Standards Track [Page 15] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + 8. DF_CALC on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do as prescribed in + Transition 7. + + 9. DF_CALC on CALCULATED: Mark the election result for the VLAN or + bundle, and transition to DF_DONE. + + 10. DF_DONE on exiting the state: If a new DF election is triggered + and the current DF is lost, then assume an NDF for the local PE + for the VLAN or VLAN Bundle. + + 11. DF_DONE on VLAN_CHANGE, RCVD_ES, or LOST_ES: Transition to + DF_CALC. + + The above events and transitions are defined for the default DF + election algorithm. As described in Section 4, the use of the AC-DF + capability introduces additional events and transitions. + +2.2. The DF Election Extended Community + + For the DF election procedures to be consistent and unanimous, it is + necessary that all the participating PEs agree on the DF election + algorithm and capabilities to be used. For instance, it is not + possible for some PEs to continue to use the default DF election + algorithm while some PEs use HRW. For brownfield deployments and for + interoperability with legacy PEs, it is important that all PEs have + the ability to fall back on the default DF election. A PE can + indicate its willingness to support HRW and/or AC-DF by signaling a + DF Election Extended Community along with the ES route (Route + Type 4). + + The DF Election Extended Community is a new BGP transitive Extended + Community attribute [RFC4360] that is defined to identify the DF + election procedure to be used for the ES. Figure 4 shows the + encoding of the DF Election Extended Community. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + ~ Bitmap | Reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 4: DF Election Extended Community + + + + + + + +Rabadan, et al. Standards Track [Page 16] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + Where: + + o Type: 0x06, as registered with IANA (Section 7) for EVPN Extended + Communities. + + o Sub-Type: 0x06. "DF Election Extended Community", as registered + with IANA. + + o RSV/Reserved: Reserved bits for information that is specific to + DF Alg. + + o DF Alg (5 bits): Encodes the DF election algorithm values (between + 0 and 31) that the advertising PE desires to use for the ES. This + document creates an IANA registry called "DF Alg" (Section 7), + which contains the following values: + + - Type 0: Default DF election algorithm, or modulus-based + algorithm as defined in [RFC7432]. + + - Type 1: HRW Algorithm (Section 3). + + - Types 2-30: Unassigned. + + - Type 31: Reserved for Experimental Use. + + o Bitmap (2 octets): Encodes "capabilities" to use with the DF + election algorithm in the DF Alg field. This document creates an + IANA registry (Section 7) for the Bitmap field, with values 0-15. + This registry is called "DF Election Capabilities" and includes + the bit values listed below. + + 1 1 1 1 1 1 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | |A| | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 5: Bitmap Field in the DF Election Extended Community + + - Bit 0 (corresponds to Bit 24 of the DF Election Extended + Community): Unassigned. + + - Bit 1: AC-DF Capability (AC-Influenced DF election; see + Section 4). When set to 1, it indicates the desire to use + AC-DF with the rest of the PEs in the ES. + + - Bits 2-15: Unassigned. + + + + +Rabadan, et al. Standards Track [Page 17] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + The DF Election Extended Community is used as follows: + + o A PE SHOULD attach the DF Election Extended Community to any + advertised ES route, and the Extended Community MUST be sent if + the ES is locally configured with a DF election algorithm other + than the default DF election algorithm or if a capability is + required to be used. In the Extended Community, the PE indicates + the desired "DF Alg" algorithm and "Bitmap" capabilities to be + used for the ES. + + - Only one DF Election Extended Community can be sent along with + an ES route. Note that the intent is not for the advertising + PE to indicate all the supported DF election algorithms and + capabilities but to signal the preferred one. + + - DF Alg values 0 and 1 can both be used with Bit 1 (AC-DF) set + to 0 or 1. + + - In general, a specific DF Alg SHOULD determine the use of the + reserved bits in the Extended Community, which may be used in a + different way for a different DF Alg. In particular, for DF + Alg values 0 and 1, the reserved bits are not set by the + advertising PE and SHOULD be ignored by the receiving PE. + + o When a PE receives the ES routes from all the other PEs for the ES + in question, it checks to see if all the advertisements have the + Extended Community with the same DF Alg and Bitmap: + + - If they do, this particular PE MUST follow the procedures for + the advertised DF Alg and capabilities. For instance, if all + ES routes for a given ES indicate DF Alg HRW and AC-DF set + to 1, then the PEs attached to the ES will perform the DF + election as per the HRW algorithm and following the AC-DF + procedures. + + - Otherwise, if even a single advertisement for Route Type 4 is + received without the locally configured DF Alg and capability, + the default DF election algorithm MUST be used as prescribed in + [RFC7432]. This procedure handles the case where participating + PEs in the ES disagree about the DF algorithm and capability to + be applied. + + - The absence of the DF Election Extended Community or the + presence of multiple DF Election Extended Communities (in the + same route) MUST be interpreted by a receiving PE as an + indication of the default DF election algorithm on the sending + PE -- that is, DF Alg 0 and no DF election capabilities. + + + + +Rabadan, et al. Standards Track [Page 18] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + o When all the PEs in an ES advertise DF Type 31, they will rely on + the local policy to decide how to proceed with the DF election. + + o For any new capability defined in the future, the applicability/ + compatibility of this new capability to/with the existing DF Alg + values must be assessed on a case-by-case basis. + + o Likewise, for any new DF Alg defined in the future, its + applicability/compatibility to/with the existing capabilities must + be assessed on a case-by-case basis. + +2.2.1. Backward Compatibility + + Implementations that comply with [RFC7432] only (i.e., + implementations that predate this specification) will not advertise + the DF Election Extended Community. That means that all other + participating PEs in the ES will not receive DF preferences and will + revert to the default DF election algorithm without AC-DF. + + Similarly, an implementation that complies with [RFC7432] only and + that receives a DF Election Extended Community will ignore it and + will continue to use the default DF election algorithm. + +3. The Highest Random Weight DF Election Algorithm + + The procedure discussed in this section is applicable to the DF + election in EVPN services [RFC7432] and the EVPN Virtual Private Wire + Service (VPWS) [RFC8214]. + + HRW as defined in [HRW1999] is originally proposed in the context of + Internet caching and proxy server load balancing. Given an object + name and a set of servers, HRW maps a request to a server using the + object-name (object-id) and server-name (server-id) rather than the + server states. HRW forms a hash out of the server-id and the + object-id and forms an ordered list of the servers for the particular + object-id. The server for which the hash value is highest serves as + the primary server responsible for that particular object, and the + server with the next-highest value in that hash serves as the backup + server. HRW always maps a given object name to the same server + within a given cluster; consequently, it can be used at client sites + to achieve global consensus on object-to-server mappings. When that + server goes down, the backup server becomes the responsible + designate. + + Choosing an appropriate hash function that is statistically oblivious + to the key distribution and imparts a good uniform distribution of + the hash output is an important aspect of the algorithm. + Fortunately, many such hash functions exist. [HRW1999] provides + + + +Rabadan, et al. Standards Track [Page 19] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + pseudorandom functions based on the Unix utilities rand and srand and + easily constructed XOR functions that satisfy the desired hashing + properties. HRW already finds use in multicast and ECMP [RFC2991] + [RFC2992]. + +3.1. HRW and Consistent Hashing + + HRW is not the only algorithm that addresses the object-to-server + mapping problem with goals of fair load distribution, redundancy, and + fast access. There is another family of algorithms that also + addresses this problem; these fall under the umbrella of the + Consistent Hashing Algorithms [CHASH]. These will not be considered + here. + +3.2. HRW Algorithm for EVPN DF Election + + This section describes the application of HRW to DF election. Let + DF(V) denote the DF and BDF(V) denote the BDF for the Ethernet Tag V; + Si is the IP address of PE i; Es is the ESI; and Weight is a function + of V, Si, and Es. + + Note that while the DF election algorithm provided in [RFC7432] uses + a PE address and VLAN as inputs, this document uses an Ethernet Tag, + PE address, and ESI as inputs. This is because if the same set of + PEs are multihomed to the same set of ESes, then the DF election + algorithm used in [RFC7432] would result in the same PE being elected + DF for the same set of BDs on each ES; this could have adverse + side effects on both load balancing and redundancy. Including an ESI + in the DF election algorithm introduces additional entropy, which + significantly reduces the probability of the same PE being elected DF + for the same set of BDs on each ES. Therefore, when using the HRW + algorithm for EVPN DF election, the ESI value in the Weight function + below SHOULD be set to that of the corresponding ES. + + In the case of a VLAN Bundle service, V denotes the lowest VLAN, + similar to the "lowest VLAN in bundle" logic of [RFC7432]. + + 1. DF(V) = Si| Weight(V, Es, Si) >= Weight(V, Es, Sj), for all j. + In the case of a tie, choose the PE whose IP address is + numerically the least. Note that 0 <= i,j < number of PEs in the + redundancy group. + + 2. BDF(V) = Sk| Weight(V, Es, Si) >= Weight(V, Es, Sk), and + Weight(V, Es, Sk) >= Weight(V, Es, Sj). In the case of a tie, + choose the PE whose IP address is numerically the least. + + + + + + +Rabadan, et al. Standards Track [Page 20] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + Where: + + o DF(V) is defined to be the address Si (index i) for which + Weight(V, Es, Si) is the highest; 0 <= i < N-1. + + o BDF(V) is defined as that PE with address Sk for which the + computed Weight is the next highest after the Weight of the DF. + j is the running index from 0 to N-1; i and k are selected values. + + Since the Weight is a pseudorandom function with the domain as the + three-tuple (V, Es, S), it is an efficient and deterministic + algorithm that is independent of the Ethernet Tag V sample space + distribution. Choosing a good hash function for the pseudorandom + function is an important consideration for this algorithm to perform + better than the default algorithm. As mentioned previously, such + functions are described in [HRW1999]. We take as a candidate hash + function the first one out of the two that are listed as preferred in + [HRW1999]: + + Wrand(V, Es, Si) = (1103515245((1103515245.Si+12345) XOR + D(V, Es))+12345)(mod 2^31) + + Here, D(V, Es) is the 31-bit digest (CRC-32 and discarding the + most significant bit (MSB), as noted in [HRW1999]) of the 14-octet + stream (the 4-octet Ethernet Tag V followed by the 10-octet ESI). It + is mandated that the 14-octet stream be formed by the concatenation + of the Ethernet Tag and the ESI in network byte order. The CRC + should proceed as if the stream is in network byte order + (big-endian). Si is the address of the ith server. The server's + IP address length does not matter, as only the low-order 31 bits are + modulo significant. + + A point to note is that the Weight function takes into consideration + the combination of the Ethernet Tag, the ES, and the PE IP address, + and the actual length of the server IP address (whether IPv4 or IPv6) + is not really relevant. The default algorithm defined in [RFC7432] + cannot employ both IPv4 and IPv6 PE addresses, since [RFC7432] does + not specify how to decide on the ordering (the ordinal list) when + both IPv4 and IPv6 PEs are present. + + HRW solves the disadvantages pointed out in Section 1.3.1 of this + document and ensures that: + + o With very high probability, the task of DF election for the VLANs + configured on an ES is more or less equally distributed among the + PEs, even in the case of two PEs (see the first fundamental + problem listed in Section 1.3.1). + + + + +Rabadan, et al. Standards Track [Page 21] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + o If a PE that is not the DF or the BDF for that VLAN goes down or + its connection to the ES goes down, it does not result in a DF or + BDF reassignment. This saves computation, especially in the case + when the connection flaps. + + o More importantly, it avoids the third fundamental problem listed + in Section 1.3.1 (needless disruption) that is inherent in the + existing default DF election. + + o In addition to the DF, the algorithm also furnishes the BDF, which + would be the DF if the current DF fails. + +4. The AC-Influenced DF Election Capability + + The procedure discussed in this section is applicable to the DF + election in EVPN services [RFC7432] and EVPN VPWS [RFC8214]. + + The AC-DF capability is expected to be generally applicable to any + future DF algorithm. It modifies the DF election procedures by + removing from consideration any candidate PE in the ES that cannot + forward traffic on the AC that belongs to the BD. This section is + applicable to VLAN-based and VLAN Bundle service interfaces. + Section 4.1 describes the procedures for VLAN-aware Bundle service + interfaces. + + In particular, when used with the default DF algorithm, the AC-DF + capability modifies Step 3 in the DF election procedure described in + [RFC7432], Section 8.5, as follows: + + 3. When the timer expires, each PE builds an ordered candidate list + of the IP addresses of all the PE nodes attached to the ES + (including itself), in increasing numeric value. The candidate + list is based on the Originating Router's IP addresses of the ES + routes but excludes any PE from whom no Ethernet A-D per ES route + has been received or from whom the route has been withdrawn. + Afterwards, the DF election algorithm is applied on a per + <ES, Ethernet Tag>; however, the IP address for a PE will not be + considered to be a candidate for a given <ES, Ethernet Tag> until + the corresponding Ethernet A-D per EVI route has been received + from that PE. In other words, the ACS on the ES for a given PE + must be UP so that the PE is considered to be a candidate for a + given BD. + + If the default DF algorithm is used, every PE in the resulting + candidate list is then given an ordinal indicating its position in + the ordered list, starting with 0 as the ordinal for the PE with + + + + + +Rabadan, et al. Standards Track [Page 22] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + the numerically lowest IP address. The ordinals are used to + determine which PE node will be the DF for a given Ethernet Tag on + the ES, using the following rule: + + Assuming a redundancy group of N PE nodes, for VLAN-based service, + the PE with ordinal i is the DF for an <ES, Ethernet Tag V> when + (V mod N) = i. In the case of a VLAN (-aware) Bundle service, + then the numerically lowest VLAN value in that bundle on that ES + MUST be used in the modulo function as the Ethernet Tag. + + It should be noted that using the Originating Router's IP Address + field [RFC7432] in the ES route to get the PE IP address needed + for the ordered list allows for a CE to be multihomed across + different Autonomous Systems (ASes) if such a need ever arises. + + The modified Step 3, above, differs from [RFC7432], Section 8.5, + Step 3 in two ways: + + o Any DF Alg can be used -- not only the described modulus-based DF + Alg (referred to as the default DF election or "DF Alg 0" in this + document). + + o The candidate list is pruned based upon non-receipt of Ethernet + A-D routes: a PE's IP address MUST be removed from the ES + candidate list if its Ethernet A-D per ES route is withdrawn. A + PE's IP address MUST NOT be considered to be a candidate DF for an + <ES, Ethernet Tag> if its Ethernet A-D per EVI route for the + <ES, Ethernet Tag> is withdrawn. + + The following example illustrates the AC-DF behavior applied to the + default DF election algorithm, assuming the network in Figure 2: + + (a) When PE1 and PE2 discover ES12, they advertise an ES route for + ES12 with the associated ES-Import Extended Community and the DF + Election Extended Community indicating AC-DF = 1; they start a + DF Wait timer (independently). Likewise, PE2 and PE3 advertise + an ES route for ES23 with AC-DF = 1 and start a DF Wait timer. + + (b) PE1 and PE2 advertise an Ethernet A-D per ES route for ES12. + PE2 and PE3 advertise an Ethernet A-D per ES route for ES23. + + (c) In addition, PE1, PE2, and PE3 advertise an Ethernet A-D per EVI + route for AC1, AC2, AC3, and AC4 as soon as the ACs are enabled. + Note that the AC can be associated with a single customer VID + (e.g., VLAN-based service interfaces) or a bundle of customer + VIDs (e.g., VLAN Bundle service interfaces). + + + + + +Rabadan, et al. Standards Track [Page 23] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + (d) When the timer expires, each PE builds an ordered candidate list + of the IP addresses of all the PE nodes attached to the ES + (including itself) as explained in the modified Step 3 above. + Any PE from which an Ethernet A-D per ES route has not been + received is pruned from the list. + + (e) When electing the DF for a given BD, a PE will not be considered + to be a candidate until an Ethernet A-D per EVI route has been + received from that PE. In other words, the ACS on the ES for a + given PE must be UP so that the PE is considered to be a + candidate for a given BD. For example, PE1 will not consider + PE2 as a candidate for DF election for <ES12, VLAN-1> until an + Ethernet A-D per EVI route is received from PE2 for + <ES12, VLAN-1>. + + (f) Once the PEs with ACS = DOWN for a given BD have been removed + from the candidate list, the DF election can be applied for the + remaining N candidates. + + Note that this procedure only modifies the existing EVPN control + plane by adding and processing the DF Election Extended Community + and by pruning the candidate list of PEs that take part in the DF + election. + + In addition to the events defined in the FSM in Section 2.1, the + following events SHALL modify the candidate PE list and trigger the + DF re-election in a PE for a given <ES, Ethernet Tag>. In the FSM + shown in Figure 3, the events below MUST trigger a transition from + DF_DONE to DF_CALC: + + 1. Local AC going DOWN/UP. + + 2. Reception of a new Ethernet A-D per EVI route update/withdrawal + for the <ES, Ethernet Tag>. + + 3. Reception of a new Ethernet A-D per ES route update/withdrawal + for the ES. + +4.1. AC-Influenced DF Election Capability for VLAN-Aware Bundle + Services + + The procedure described in Section 4 works for VLAN-based and VLAN + Bundle service interfaces because, for those service types, a PE + advertises only one Ethernet A-D per EVI route per <ES, VLAN> or + <ES, VLAN Bundle>. In Section 4, an Ethernet Tag represents a given + VLAN or VLAN Bundle for the purpose of DF election. The withdrawal + + + + + +Rabadan, et al. Standards Track [Page 24] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + of such a route means that the PE cannot forward traffic on that + particular <ES, VLAN> or <ES, VLAN Bundle>; therefore, the PE can be + removed from consideration for DF election. + + According to [RFC7432], in VLAN-aware Bundle services, the PE + advertises multiple Ethernet A-D per EVI routes per <ES, VLAN Bundle> + (one route per Ethernet Tag), while the DF election is still + performed per <ES, VLAN Bundle>. The withdrawal of an individual + route only indicates the unavailability of a specific AC and not + necessarily all the ACs in the <ES, VLAN Bundle>. + + This document modifies the DF election for VLAN-aware Bundle services + in the following ways: + + o After confirming that all the PEs in the ES advertise the AC-DF + capability, a PE will perform a DF election per <ES, VLAN>, as + opposed to per <ES, VLAN Bundle> as described in [RFC7432]. Now, + the withdrawal of an Ethernet A-D per EVI route for a VLAN will + indicate that the advertising PE's ACS is DOWN and the rest of the + PEs in the ES can remove the PE from consideration for DF election + in the <ES, VLAN>. + + o The PEs will now follow the procedures in Section 4. + + For example, assuming three bridge tables in PE1 for the same MAC-VRF + (each one associated with a different Ethernet Tag, e.g., VLAN-1, + VLAN-2, and VLAN-3), PE1 will advertise three Ethernet A-D per EVI + routes for ES12. Each of the three routes will indicate the status + of each of the three ACs in ES12. PE1 will be considered to be a + valid candidate PE for DF election in <ES12, VLAN-1>, <ES12, VLAN-2>, + and <ES12, VLAN-3> as long as its three routes are active. For + instance, if PE1 withdraws the Ethernet A-D per EVI routes for + <ES12, VLAN-1>, the PEs in ES12 will not consider PE1 as a suitable + DF candidate for <ES12, VLAN-1>. PE1 will still be considered for + <ES12, VLAN-2> and <ES12, VLAN-3>, since its routes are active. + +5. Solution Benefits + + The solution described in this document provides the following + benefits: + + (a) It extends the DF election as defined in [RFC7432] to address + the unfair load balancing and potential black-holing issues with + the default DF election algorithm. The solution is applicable + to the DF election in EVPN services [RFC7432] and EVPN VPWS + [RFC8214]. + + + + + +Rabadan, et al. Standards Track [Page 25] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + (b) It defines a way to signal the DF election algorithm and + capabilities intended by the advertising PE. This is done by + defining the DF Election Extended Community, which allows the + advertising PE to indicate its support for the capabilities + defined in this document as well as any subsequently defined DF + election algorithms or capabilities. + + (c) It is backwards compatible with the procedures defined in + [RFC7432]. If one or more PEs in the ES do not support the new + procedures, they will all follow DF election as defined in + [RFC7432]. + +6. Security Considerations + + This document addresses some identified issues in the DF election + procedures described in [RFC7432] by defining a new DF election + framework. In general, this framework allows the PEs that are part + of the same ES to exchange additional information and agree on the DF + election type and capabilities to be used. + + By following the procedures in this document, the operator will + minimize such undesirable situations as unfair load balancing, + service disruption, and traffic black-holing. Because such + situations could be purposely created by a malicious user with access + to the configuration of one PE, this document also enhances the + security of the network. Note that the network will not benefit from + the new procedures if the DF election algorithm is not consistently + configured on all the PEs in the ES (if there is no unanimity among + all the PEs, the DF election algorithm falls back to the default DF + election as provided in [RFC7432]). This behavior could be exploited + by an attacker that manages to modify the configuration of one PE in + the ES so that the DF election algorithm and capabilities in all the + PEs in the ES fall back to the default DF election. If that is the + case, the PEs will be exposed to the unfair load balancing, service + disruption, and black-holing mentioned earlier. + + In addition, the new framework is extensible and allows for new + security enhancements in the future. Note that such enhancements are + out of scope for this document. Finally, since this document extends + the procedures in [RFC7432], the same security considerations as + those described in [RFC7432] are valid for this document. + + + + + + + + + + +Rabadan, et al. Standards Track [Page 26] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + +7. IANA Considerations + + IANA has: + + o Allocated Sub-Type value 0x06 in the "EVPN Extended Community + Sub-Types" registry defined in [RFC7153] as follows: + + Sub-Type Value Name Reference + -------------- ------------------------------ ------------- + 0x06 DF Election Extended Community This document + + o Set up a registry called "DF Alg" for the DF Alg field in the + Extended Community. New registrations will be made through the + "RFC Required" procedure defined in [RFC8126]. Value 31 is for + experimental use and does not require any other RFC than this + document. The following initial values in that registry exist: + + Alg Name Reference + ---- ----------------------------- ------------- + 0 Default DF Election This document + 1 HRW Algorithm This document + 2-30 Unassigned + 31 Reserved for Experimental Use This document + + o Set up a registry called "DF Election Capabilities" for the + 2-octet Bitmap field in the Extended Community. New registrations + will be made through the "RFC Required" procedure defined in + [RFC8126]. The following initial value in that registry exists: + + Bit Name Reference + ---- ---------------- ------------- + 0 Unassigned + 1 AC-DF Capability This document + 2-15 Unassigned + + + + + + + + + + + + + + + + + +Rabadan, et al. Standards Track [Page 27] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + +8. References + +8.1. Normative References + + [RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A., + Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based + Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, + February 2015, <https://www.rfc-editor.org/info/rfc7432>. + + [RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J. + Rabadan, "Virtual Private Wire Service Support in Ethernet + VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017, + <https://www.rfc-editor.org/info/rfc8214>. + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in + RFC 2119 Key Words", BCP 14, RFC 8174, + DOI 10.17487/RFC8174, May 2017, + <https://www.rfc-editor.org/info/rfc8174>. + + [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended + Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, + February 2006, <https://www.rfc-editor.org/info/rfc4360>. + + [RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP + Extended Communities", RFC 7153, DOI 10.17487/RFC7153, + March 2014, <https://www.rfc-editor.org/info/rfc7153>. + + [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for + Writing an IANA Considerations Section in RFCs", BCP 26, + RFC 8126, DOI 10.17487/RFC8126, June 2017, + <https://www.rfc-editor.org/info/rfc8126>. + + + + + + + + + + + + + + + +Rabadan, et al. Standards Track [Page 28] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + +8.2. Informative References + + [VPLS-MH] Kothari, B., Kompella, K., Henderickx, W., Balus, F., and + J. Uttaro, "BGP based Multi-homing in Virtual Private LAN + Service", Work in Progress, + draft-ietf-bess-vpls-multihoming-03, March 2019. + + [CHASH] Karger, D., Lehman, E., Leighton, T., Panigrahy, R., + Levine, M., and D. Lewin, "Consistent Hashing and Random + Trees: Distributed Caching Protocols for Relieving Hot + Spots on the World Wide Web", ACM Symposium on Theory of + Computing, ACM Press, New York, DOI 10.1145/258533.258660, + May 1997. + + [CLRS2009] Cormen, T., Leiserson, C., Rivest, R., and C. Stein, + "Introduction to Algorithms (3rd Edition)", MIT + Press, ISBN 0-262-03384-8, 2009. + + [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and + Multicast Next-Hop Selection", RFC 2991, + DOI 10.17487/RFC2991, November 2000, + <https://www.rfc-editor.org/info/rfc2991>. + + [RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path + Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000, + <https://www.rfc-editor.org/info/rfc2992>. + + [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route + Reflection: An Alternative to Full Mesh Internal BGP + (IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006, + <https://www.rfc-editor.org/info/rfc4456>. + + [HRW1999] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings + to Increase Hit Rates", IEEE/ACM Transactions on + Networking, Volume 6, No. 1, February 1998, + <https://www.microsoft.com/en-us/research/wp-content/ + uploads/2017/02/HRW98.pdf>. + + [Knuth] Knuth, D., "The Art of Computer Programming: Volume 3: + Sorting and Searching", 2nd Edition, Addison-Wesley, + Page 516, 1998. + + + + + + + + + + +Rabadan, et al. Standards Track [Page 29] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + +Acknowledgments + + The authors want to thank Ranganathan Boovaraghavan, Sami Boutros, + Luc Andre Burdet, Anoop Ghanwani, Mrinmoy Ghosh, Jakob Heitz, Leo + Mermelstein, Mankamana Mishra, Tamas Mondal, Laxmi Padakanti, Samir + Thoria, and Sriram Venkateswaran for their review and contributions. + Special thanks to Stephane Litkowski for his thorough review and + detailed contributions. + + They would also like to thank their working group chairs, Matthew + Bocci and Stephane Litkowski, and their AD, Martin Vigoureux, for + their guidance and support. + + Finally, they would like to thank the Directorate reviewers and the + ADs for their thorough reviews and probing questions, the answers to + which have substantially improved the quality of the document. + +Contributors + + The following people have contributed substantially to this document + and should be considered coauthors: + + Antoni Przygienda + Juniper Networks, Inc. + 1194 N. Mathilda Ave. + Sunnyvale, CA 94089 + United States of America + + Email: prz@juniper.net + + Vinod Prabhu + Nokia + + Email: vinod.prabhu@nokia.com + + Wim Henderickx + Nokia + + Email: wim.henderickx@nokia.com + + Wen Lin + Juniper Networks, Inc. + + Email: wlin@juniper.net + + + + + + + +Rabadan, et al. Standards Track [Page 30] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + Patrice Brissette + Cisco Systems + + Email: pbrisset@cisco.com + + Keyur Patel + Arrcus, Inc. + + Email: keyur@arrcus.com + + Autumn Liu + Ciena + + Email: hliu@ciena.com + +Authors' Addresses + + Jorge Rabadan (editor) + Nokia + 777 E. Middlefield Road + Mountain View, CA 94043 + United States of America + + Email: jorge.rabadan@nokia.com + + + Satya Mohanty (editor) + Cisco Systems, Inc. + 225 West Tasman Drive + San Jose, CA 95134 + United States of America + + Email: satyamoh@cisco.com + + + Ali Sajassi + Cisco Systems, Inc. + 225 West Tasman Drive + San Jose, CA 95134 + United States of America + + Email: sajassi@cisco.com + + + + + + + + + +Rabadan, et al. Standards Track [Page 31] + +RFC 8584 DF Election Framework for EVPN Services April 2019 + + + John Drake + Juniper Networks, Inc. + 1194 N. Mathilda Ave. + Sunnyvale, CA 94089 + United States of America + + Email: jdrake@juniper.net + + + Kiran Nagaraj + Nokia + 701 E. Middlefield Road + Mountain View, CA 94043 + United States of America + + Email: kiran.nagaraj@nokia.com + + Senthil Sathappan + Nokia + 701 E. Middlefield Road + Mountain View, CA 94043 + United States of America + + Email: senthil.sathappan@nokia.com + + + + + + + + + + + + + + + + + + + + + + + + + + + +Rabadan, et al. Standards Track [Page 32] + |