1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
|
Internet Engineering Task Force (IETF) J. Rabadan, Ed.
Request for Comments: 8584 Nokia
Updates: 7432 S. Mohanty, Ed.
Category: Standards Track A. Sajassi
ISSN: 2070-1721 Cisco
J. Drake
Juniper
K. Nagaraj
S. Sathappan
Nokia
April 2019
Framework for Ethernet VPN Designated Forwarder Election Extensibility
Abstract
An alternative to the default Designated Forwarder (DF) selection
algorithm in Ethernet VPNs (EVPNs) is defined. The DF is the
Provider Edge (PE) router responsible for sending Broadcast, Unknown
Unicast, and Multicast (BUM) traffic to a multihomed Customer Edge
(CE) device on a given VLAN on a particular Ethernet Segment (ES).
In addition, the ability to influence the DF election result for a
VLAN based on the state of the associated Attachment Circuit (AC) is
specified. This document clarifies the DF election Finite State
Machine in EVPN services. Therefore, it updates the EVPN
specification (RFC 7432).
Status of This Memo
This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 7841.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc8584.
Rabadan, et al. Standards Track [Page 1]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction ....................................................3
1.1. Conventions and Terminology ................................3
1.2. Default Designated Forwarder (DF) Election in EVPN
Services ...................................................5
1.3. Problem Statement ..........................................8
1.3.1. Unfair Load Balancing and Service Disruption ........8
1.3.2. Traffic Black-Holing on Individual AC Failures .....10
1.4. The Need for Extending the Default DF Election in
EVPN Services .............................................12
2. Designated Forwarder Election Protocol and BGP Extensions ......13
2.1. The DF Election Finite State Machine (FSM) ................13
2.2. The DF Election Extended Community ........................16
2.2.1. Backward Compatibility .............................19
3. The Highest Random Weight DF Election Algorithm ................19
3.1. HRW and Consistent Hashing ................................20
3.2. HRW Algorithm for EVPN DF Election ........................20
4. The AC-Influenced DF Election Capability .......................22
4.1. AC-Influenced DF Election Capability for
VLAN-Aware Bundle Services ................................24
5. Solution Benefits ..............................................25
6. Security Considerations ........................................26
7. IANA Considerations ............................................27
8. References .....................................................28
8.1. Normative References ......................................28
8.2. Informative References ....................................29
Acknowledgments ...................................................30
Contributors ......................................................30
Authors' Addresses ................................................31
Rabadan, et al. Standards Track [Page 2]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
1. Introduction
The Designated Forwarder (DF) in Ethernet VPNs (EVPNs) is the
Provider Edge (PE) router responsible for sending Broadcast, Unknown
Unicast, and Multicast (BUM) traffic to a multihomed Customer Edge
(CE) device on a given VLAN on a particular Ethernet Segment (ES).
The DF is elected from the set of multihomed PEs attached to a given
ES, each of which advertises an ES route for the ES as identified by
its Ethernet Segment Identifier (ESI). By default, the EVPN uses a
DF election algorithm referred to as "service carving". The DF
election algorithm is based on a modulus function (V mod N) that
takes the number of PEs in the ES (N) and the VLAN value (V) as
input. This document addresses inefficiencies in the default DF
election algorithm by defining a new DF election algorithm and an
ability to influence the DF election result for a VLAN, depending on
the state of the associated Attachment Circuit (AC). In order to
avoid any ambiguity with the identifier used in the DF election
algorithm, this document uses the term "Ethernet Tag" instead of
"VLAN". This document also creates a registry with IANA for future
DF election algorithms and capabilities (see Section 7). It also
presents a formal definition and clarification of the DF election
Finite State Machine (FSM). Therefore, this document updates
[RFC7432], and EVPN implementations MUST conform to the
prescribed FSM.
The procedures described in this document apply to DF election in all
EVPN solutions, including those described in [RFC7432] and [RFC8214].
Apart from the formal description of the FSM, this document does not
intend to update other procedures described in [RFC7432]; it only
aims to improve the behavior of the DF election on PEs that are
upgraded to follow the procedures described in this document.
1.1. Conventions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
o AC: Attachment Circuit. An AC has an Ethernet Tag associated
with it.
o ACS: Attachment Circuit Status.
o BUM: Broadcast, unknown unicast, and multicast.
o DF: Designated Forwarder.
Rabadan, et al. Standards Track [Page 3]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
o NDF: Non-Designated Forwarder.
o BDF: Backup Designated Forwarder.
o Ethernet A-D per ES route: Refers to Route Type 1 as defined in
[RFC7432] or to Auto-discovery per Ethernet Segment route.
o Ethernet A-D per EVI route: Refers to Route Type 1 as defined in
[RFC7432] or to Auto-discovery per EVPN Instance route.
o ES: Ethernet Segment.
o ESI: Ethernet Segment Identifier.
o EVI: EVPN Instance.
o MAC-VRF: A Virtual Routing and Forwarding table for Media Access
Control (MAC) addresses on a PE.
o BD: Broadcast Domain. An EVI may be comprised of one BD
(VLAN-based or VLAN Bundle services) or multiple BDs (VLAN-aware
Bundle services).
o Bridge table: An instantiation of a BD on a MAC-VRF.
o HRW: Highest Random Weight.
o VID: VLAN Identifier.
o CE-VID: Customer Edge VLAN Identifier.
o Ethernet Tag: Used to represent a BD that is configured on a given
ES for the purpose of DF election. Note that any of the following
may be used to represent a BD: VIDs (including Q-in-Q tags),
configured IDs, VNIs (Virtual Extensible Local Area Network
(VXLAN) Network Identifiers), normalized VIDs, I-SIDs (Service
Instance Identifiers), etc., as long as the representation of the
BDs is configured consistently across the multihomed PEs attached
to that ES. The Ethernet Tag value MUST be different from zero.
o Ethernet Tag ID: Refers to the identifier used in the EVPN routes
defined in [RFC7432]. Its value may be the same as the Ethernet
Tag value (see the definition for Ethernet Tag) when advertising
routes for VLAN-aware Bundle services. Note that in the case of
VLAN-based or VLAN Bundle services, the Ethernet Tag ID is zero.
Rabadan, et al. Standards Track [Page 4]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
o DF election procedure: Also called "DF election". Refers to the
process in its entirety, including the discovery of the PEs in the
ES, the creation and maintenance of the PE candidate list, and the
selection of a PE.
o DF algorithm: A component of the DF election procedure. Strictly
refers to the selection of a PE for a given <ES, Ethernet Tag>.
o RR: Route Reflector. A network routing component for BGP
[RFC4456]. It offers an alternative to the logical full-mesh
requirement of the Internal Border Gateway Protocol (IBGP). The
purpose of the RR is concentration. Multiple BGP routers can peer
with a central point, the RR -- acting as a route reflector server
-- rather than peer with every other router in a full mesh. This
results in an O(N) peering as opposed to O(N^2).
o TTL: Time To Live.
This document also assumes that the reader is familiar with the
terminology provided in [RFC7432].
1.2. Default Designated Forwarder (DF) Election in EVPN Services
[RFC7432] defines the DF as the EVPN PE responsible for:
o Flooding BUM traffic on a given Ethernet Tag on a particular ES to
the CE. This is valid for Single-Active and All-Active EVPN
multihoming.
o Sending unicast traffic on a given Ethernet Tag on a particular ES
to the CE. This is valid for Single-Active multihoming.
Rabadan, et al. Standards Track [Page 5]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
Figure 1 illustrates an example that we will use to explain the DF
function.
+---------------+
| IP/MPLS |
| Core |
+----+ ES1 +----+ +----+
| CE1|-----| | | |____ES2
+----+ | PE1| | PE2| \
| | +----+ \+----+
+----+ | | CE2|
| +----+ /+----+
| | |____/ |
| | PE3| ES2 /
| +----+ /
| | /
+-------------+----+ /
| PE4|____/ES2
| |
+----+
Figure 1: EVPN Multihoming
Figure 1 illustrates a case where there are two ESes: ES1 and ES2.
PE1 is attached to CE1 via ES1, whereas PE2, PE3, and PE4 are
attached to CE2 via ES2, i.e., PE2, PE3, and PE4 form a redundancy
group. Since CE2 is multihomed to different PEs on the same ES, it
is necessary for PE2, PE3, and PE4 to agree on a DF to satisfy the
above-mentioned requirements.
The effect of forwarding loops in a Layer 2 network is particularly
severe because of the broadcast nature of Ethernet traffic and the
lack of a TTL. Therefore, it is very important that, in the case of
a multihomed CE, only one of the PEs be used to send BUM traffic
to it.
One of the prerequisites for this support is that participating PEs
must agree amongst themselves as to who would act as the DF. This
needs to be achieved through a distributed algorithm in which each
participating PE independently and unambiguously selects one of the
participating PEs as the DF, and the result should be consistent and
unanimous.
The default algorithm for DF election defined by [RFC7432] at the
granularity of (ESI, EVI) is referred to as "service carving". In
this document, service carving and the default DF election algorithm
are used interchangeably. With service carving, it is possible to
elect multiple DFs per ES (one per EVI) in order to perform load
Rabadan, et al. Standards Track [Page 6]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
balancing of traffic destined to a given ES. The objective is that
the load-balancing procedures should carve up the BD space among the
redundant PE nodes evenly, in such a way that every PE is the DF for
a distinct set of EVIs.
The DF election algorithm (as described in [RFC7432], Section 8.5) is
based on a modulus operation. The PEs to which the ES (for which DF
election is to be carried out per EVI) is multihomed form an ordered
(ordinal) list in ascending order by PE IP address value. For
example, there are N PEs: PE0, PE1,... PE(N-1) ranked as per
increasing IP addresses in the ordinal list; then, for each VLAN with
Ethernet Tag V, configured on ES1, PEx is the DF for VLAN V on ES1
when x equals (V mod N). In the case of a VLAN Bundle, only the
lowest VLAN is used. In the case when the planned density is high
(meaning there are a significant number of VLANs and the Ethernet
Tags are uniformly distributed), the thinking is that the DF election
will be spread across the PEs hosting that ES and good load balancing
can be achieved.
However, the described default DF election algorithm has some
undesirable properties and, in some cases, can be somewhat disruptive
and unfair. This document describes some of those issues and defines
a mechanism for dealing with them. These mechanisms do involve
changes to the default DF election algorithm, but they do not require
any changes to the EVPN route exchange, and changes in the EVPN
routes will be minimal.
In addition, there is a need to extend the DF election procedures so
that new algorithms and capabilities are possible. A single
algorithm (the default DF election algorithm) may not meet the
requirements in all the use cases.
Note that while [RFC7432] elects a DF per <ES, EVI>, this document
elects a DF per <ES, BD>. This means that unlike [RFC7432], where
for a VLAN-aware Bundle service EVI there is only one DF for the EVI,
this document specifies that there will be multiple DFs, one for each
BD configured in that EVI.
Rabadan, et al. Standards Track [Page 7]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
1.3. Problem Statement
This section describes some potential issues with the default DF
election algorithm.
1.3.1. Unfair Load Balancing and Service Disruption
There are three fundamental problems with the current default DF
election algorithm.
1. The algorithm will not perform well when the Ethernet Tag follows
a non-uniform distribution -- for instance, when the Ethernet
Tags are all even or all odd. In such a case, let us assume that
the ES is multihomed to two PEs; one of the PEs will be elected
as the DF for all of the VLANs. This is very suboptimal. It
defeats the purpose of service carving, as the DFs are not really
evenly spread across the PEs hosting the ES. In fact, in this
particular case, one of the PEs does not get elected as the DF at
all, so it does not participate in DF responsibilities at all.
Consider another example where, referring to Figure 1, let's
assume that (1) PE2, PE3, and PE4 are listed in ascending order
by IP address and (2) each VLAN configured on ES2 is associated
with an Ethernet Tag of the form (3x+1), where x is an integer.
This will result in PE3 always being selected as the DF.
2. The Ethernet Tag that identifies the BD can be as large as 2^24;
however, it is not guaranteed that the tenant BD on the ES will
conform to a uniform distribution. In fact, it is up to the
customer what BDs they will configure on the ES. Quoting
[Knuth]:
In general, we want to avoid values of M that divide r^k+a or
r^k-a, where k and a are small numbers and r is the radix of
the alphabetic character set (usually r=64, 256 or 100), since
a remainder modulo such a value of M tends to be largely a
simple superposition of key digits. Such considerations
suggest that we choose M to be a prime number such that
r^k!=a(modulo)M or r^k!=?a(modulo)M for small k & a.
In our case, N is the number of PEs (Section 8.5 of [RFC7432]).
N corresponds to M above. Since N, N-1, or N+1 need not satisfy
the primality properties of M, as per the modulo-based DF
assignment [RFC7432], whenever a PE goes down or a new PE boots
up (attached to the same ES), the modulo scheme will not
necessarily map BDs to PEs uniformly.
Rabadan, et al. Standards Track [Page 8]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
3. Disruption is another problem. Consider a case when the same ES
is multihomed to a set of PEs. When the ES is DOWN in one of the
PEs, say PE1, or PE1 itself reboots, or the BGP process goes down
or the connectivity between PE1 and an RR goes down, the
effective number of PEs in the system now becomes N-1, and DFs
are computed for all the VLANs that are configured on that ES.
In general, if the DF for a VLAN V happens not to be PE1, but
some other PE, say PE2, it is likely that some other PE
(different from PE1 and PE2) will become the new DF. This is not
desirable. Similarly, when a new PE hosts the same ES, the
mapping again changes because of the modulus operation. This
results in needless churn. Again referring to Figure 1, say V1,
V2, and V3 are VLANs configured on ES2 with associated Ethernet
Tags of values 999, 1000, and 1001, respectively. So, PE1, PE2,
and PE3 are the DFs for V1, V2, and V3, respectively. Now when
PE3 goes down, PE2 will become the DF for V1 and PE1 will become
the DF for V2.
One point to note is that the default DF election algorithm assumes
that all the PEs who are multihomed to the same ES (and interested in
the DF election by exchanging EVPN routes) use an Originating
Router's IP address [RFC7432] of the same family. This does not need
to be the case, as the EVPN address family can be carried over an
IPv4 or IPv6 peering, and the PEs attached to the same ES may use an
address of either family.
Mathematically, a conventional hash function maps a key k to a number
i representing one of m hash buckets through a function h(k), i.e.,
i = h(k). In the EVPN case, h is simply a modulo-m hash function
viz. h(V) = V mod N, where N is the number of PEs that are multihomed
to the ES in question. It is well known that for good hash
distribution using the modulus operation, the modulus N should be a
prime number not too close to a power of 2 [CLRS2009]. When the
effective number of PEs changes from N to N-1 (or vice versa), all
the objects (VLAN V) will be remapped except those for which V mod N
and V mod (N-1) refer to the same PE in the previous and subsequent
ordinal rankings, respectively. From a forwarding perspective, this
is a churn, as it results in reprogramming the PE ports as either
blocking or non-blocking at the PEs where the DF state changes.
This document addresses this problem and furnishes a solution to this
undesirable behavior.
Rabadan, et al. Standards Track [Page 9]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
1.3.2. Traffic Black-Holing on Individual AC Failures
The default DF election algorithm defined by [RFC7432] takes into
account only two variables in the modulus function for a given ES:
the existence of the PE's IP address in the candidate list and the
locally provisioned Ethernet Tags.
If the DF for an <ESI, EVI> fails (due to physical link/node
failures), an ES route withdrawal will make the NDF PEs re-elect the
DF for that <ESI, EVI> and the service will be recovered.
However, the default DF election procedure does not provide
protection against "logical" failures or human errors that may occur
at the service level on the DF, while the list of active PEs for a
given ES does not change. These failures may have an impact not only
on the local PE where the issue happens but also on the rest of the
PEs of the ES. Some examples of such logical failures are listed
below:
(a) A given individual AC defined in an ES is accidentally shut down
or is not provisioned yet (hence, the ACS is DOWN), while the ES
is operationally active (since the ES route is active).
(b) A given MAC-VRF with a defined ES is either shut down or not
provisioned yet, while the ES is operationally active (since the
ES route is active). In this case, the ACS of all the ACs
defined in that MAC-VRF is considered to be DOWN.
Neither (a) nor (b) will trigger the DF re-election on the remote
multihomed PEs for a given ES, since the ACS is not taken into
account in the DF election procedures. While the ACS is used as a DF
election tiebreaker and trigger in Virtual Private LAN Service (VPLS)
multihoming procedures [VPLS-MH], there is no procedure defined in
the EVPN specification [RFC7432] to trigger the DF re-election based
on the ACS change on the DF.
Rabadan, et al. Standards Track [Page 10]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
Figure 2 shows an example of logical AC failure.
+---+
|CE4|
+---+
|
PE4 |
+-----+-----+
+---------------| +-----+ |---------------+
| | | BD-1| | |
| +-----------+ |
| |
| EVPN |
| |
| PE1 PE2 PE3 |
| (NDF) (DF) (NDF)|
+-----------+ +-----------+ +-----------+
| | BD-1| | | | BD-1| | | | BD-1| |
| +-----+ |-------| +-----+ |-------| +-----+ |
+-----------+ +-----------+ +-----------+
AC1\ ES12 /AC2 AC3\ ES23 /AC4
\ / \ /
\ / \ /
+----+ +----+
|CE12| |CE23|
+----+ +----+
Figure 2: Default DF Election and Traffic Black-Holing
BD-1 is defined in PE1, PE2, PE3, and PE4. CE12 is a multihomed CE
connected to ES12 in PE1 and PE2. Similarly, CE23 is multihomed to
PE2 and PE3 using ES23. Both CE12 and CE23 are connected to BD-1
through VLAN-based service interfaces: CE12-VID 1 (VID 1 on CE12) is
associated with AC1 and AC2 in BD-1, whereas CE23-VID 1 is associated
with AC3 and AC4 in BD-1. Assume that, although not represented,
there are other ACs defined on these ESes mapped to different BDs.
Rabadan, et al. Standards Track [Page 11]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
After executing the default DF election algorithm as described in
[RFC7432], PE2 turns out to be the DF for ES12 and ES23 in BD-1. The
following issues may arise:
(a) If AC2 is accidentally shut down or is not configured yet, CE12
traffic will be impacted. In the case of All-Active
multihoming, the BUM traffic to CE12 will be "black-holed",
whereas for Single-Active multihoming, all the traffic to/from
CE12 will be discarded. This is because a logical failure in
PE2's AC2 may not trigger an ES route withdrawal for ES12 (since
there are still other ACs active on ES12); therefore, PE1 will
not rerun the DF election procedures.
(b) If the bridge table for BD-1 is administratively shut down or is
not configured yet on PE2, CE12 and CE23 will both be impacted:
BUM traffic to both CEs will be discarded in the case of
All-Active multihoming, and all traffic will be discarded
to/from the CEs in the case of Single-Active multihoming. This
is because PE1 and PE3 will not rerun the DF election procedures
and will keep assuming that PE2 is the DF.
Quoting [RFC7432], "When an Ethernet tag is decommissioned on an
Ethernet segment, then the PE MUST withdraw the Ethernet A-D per EVI
route(s) announced for the <ESI, Ethernet tags> that are impacted by
the decommissioning." However, while this A-D per EVI route
withdrawal is used at the remote PEs performing aliasing or backup
procedures, it is not used to influence the DF election for the
affected EVIs.
This document adds an optional modification of the DF election
procedure so that the ACS may be taken into account as a variable in
the DF election; therefore, EVPN can provide protection against
logical failures.
1.4. The Need for Extending the Default DF Election in EVPN Services
Section 1.3 describes some of the issues that exist in the default DF
election procedures. In order to address those issues, this document
introduces a new DF election framework. This framework allows the
PEs to agree on a common DF election algorithm, as well as the
capabilities to enable during the DF election procedure. Generally,
"DF election algorithm" refers to the algorithm by which a number of
input parameters are used to determine the DF PE, while "DF election
capability" refers to an additional feature that can be used prior to
the invocation of the DF election algorithm, such as modifying the
inputs (or list of candidate PEs).
Rabadan, et al. Standards Track [Page 12]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
Within this framework, this document defines a new DF election
algorithm and a new capability that can influence the DF election
result:
o The new DF election algorithm is referred to as "Highest Random
Weight" (HRW). The HRW procedures are described in Section 3.
o The new DF election capability is referred to as "AC-Influenced DF
election" (AC-DF). The AC-DF procedures are described in
Section 4.
o HRW and AC-DF mechanisms are independent of each other.
Therefore, a PE may support either HRW or AC-DF independently or
may support both of them together. A PE may also support the
AC-DF capability along with the default DF election algorithm per
[RFC7432].
In addition, this document defines a way to indicate the support of
HRW and/or AC-DF along with the EVPN ES routes advertised for a given
ES. Refer to Section 2.2 for more details.
2. Designated Forwarder Election Protocol and BGP Extensions
This section describes the BGP extensions required to support the new
DF election procedures. In addition, since the EVPN specification
[RFC7432] leaves several questions open as to the precise FSM
behavior of the DF election, Section 2.1 precisely describes the
intended behavior.
2.1. The DF Election Finite State Machine (FSM)
Per [RFC7432], the FSM shown in Figure 3 is executed per <ES, VLAN>
in the case of VLAN-based service or <ES, [VLANs in VLAN Bundle]> in
the case of a VLAN Bundle on each participating PE. Note that the
FSM is conceptual. Any design or implementation MUST comply with
behavior that is equivalent to the behavior outlined in this FSM.
Rabadan, et al. Standards Track [Page 13]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
VLAN_CHANGE VLAN_CHANGE
RCVD_ES RCVD_ES
LOST_ES LOST_ES
+----+ +-------+
| | | v
| +-+----+ ES_UP ++-------++
+->+ INIT +-------------->+ DF_WAIT |
++-----+ +-------+-+
^ |
+-----------+ | |DF_TIMER
| ANY_STATE +-------+ VLAN_CHANGE |
+-----------+ ES_DOWN +-----------------+ |
| RCVD_ES v v
+--------++ LOST_ES ++------+-+
| DF_DONE +<--------------+ DF_CALC +<-+
+---------+ CALCULATED +-------+-+ |
| |
+----+
VLAN_CHANGE
RCVD_ES
LOST_ES
Figure 3: DF Election Finite State Machine
Observe that each EVI is locally configured on each of the multihomed
PEs attached to a given ES and that the FSM does not provide any
protection against inconsistent configuration between these PEs.
That is, for a given EVI, one or more of the PEs are inadvertently
configured with a different set of VLANs for a VLAN-aware Bundle
service or with different VLANs for a VLAN-based service.
The states and events shown in Figure 3 are defined as follows.
States:
1. INIT: Initial state.
2. DF_WAIT: State in which the participant waits for enough
information to perform the DF election for the EVI/ESI/VLAN
combination.
3. DF_CALC: State in which the new DF is recomputed.
4. DF_DONE: State in which the corresponding DF for the EVI/ESI/VLAN
combination has been elected.
5. ANY_STATE: Refers to any of the above states.
Rabadan, et al. Standards Track [Page 14]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
Events:
1. ES_UP: The ES has been locally configured as "UP".
2. ES_DOWN: The ES has been locally configured as "DOWN".
3. VLAN_CHANGE: The VLANs configured in a bundle (that uses the ES)
changed. This event is necessary for VLAN Bundles only.
4. DF_TIMER: DF timer [RFC7432] (referred to as "Wait timer" in this
document) has expired.
5. RCVD_ES: A new or changed ES route is received in an Update
message with an MP_REACH_NLRI. Receiving an unchanged Update
MUST NOT trigger this event.
6. LOST_ES: An Update message with an MP_UNREACH_NLRI for a
previously received ES route has been received. If such a
message is seen for a route that has not been advertised
previously, the event MUST NOT be triggered.
7. CALCULATED: DF has been successfully calculated.
Corresponding actions when transitions are performed or states are
entered/exited:
1. ANY_STATE on ES_DOWN:
(i) Stop the DF Wait timer.
(ii) Assume an NDF for the local PE.
2. INIT on ES_UP: Transition to DF_WAIT.
3. INIT on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do nothing.
4. DF_WAIT on entering the state:
(i) Start the DF Wait timer if not started already or expired.
(ii) Assume an NDF for the local PE.
5. DF_WAIT on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do nothing.
6. DF_WAIT on DF_TIMER: Transition to DF_CALC.
7. DF_CALC on entering or re-entering the state:
(i) Rebuild the candidate list, perform a hash, and perform the
election.
(ii) Afterwards, the FSM generates a CALCULATED event against
itself.
Rabadan, et al. Standards Track [Page 15]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
8. DF_CALC on VLAN_CHANGE, RCVD_ES, or LOST_ES: Do as prescribed in
Transition 7.
9. DF_CALC on CALCULATED: Mark the election result for the VLAN or
bundle, and transition to DF_DONE.
10. DF_DONE on exiting the state: If a new DF election is triggered
and the current DF is lost, then assume an NDF for the local PE
for the VLAN or VLAN Bundle.
11. DF_DONE on VLAN_CHANGE, RCVD_ES, or LOST_ES: Transition to
DF_CALC.
The above events and transitions are defined for the default DF
election algorithm. As described in Section 4, the use of the AC-DF
capability introduces additional events and transitions.
2.2. The DF Election Extended Community
For the DF election procedures to be consistent and unanimous, it is
necessary that all the participating PEs agree on the DF election
algorithm and capabilities to be used. For instance, it is not
possible for some PEs to continue to use the default DF election
algorithm while some PEs use HRW. For brownfield deployments and for
interoperability with legacy PEs, it is important that all PEs have
the ability to fall back on the default DF election. A PE can
indicate its willingness to support HRW and/or AC-DF by signaling a
DF Election Extended Community along with the ES route (Route
Type 4).
The DF Election Extended Community is a new BGP transitive Extended
Community attribute [RFC4360] that is defined to identify the DF
election procedure to be used for the ES. Figure 4 shows the
encoding of the DF Election Extended Community.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | Bitmap ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ Bitmap | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 4: DF Election Extended Community
Rabadan, et al. Standards Track [Page 16]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
Where:
o Type: 0x06, as registered with IANA (Section 7) for EVPN Extended
Communities.
o Sub-Type: 0x06. "DF Election Extended Community", as registered
with IANA.
o RSV/Reserved: Reserved bits for information that is specific to
DF Alg.
o DF Alg (5 bits): Encodes the DF election algorithm values (between
0 and 31) that the advertising PE desires to use for the ES. This
document creates an IANA registry called "DF Alg" (Section 7),
which contains the following values:
- Type 0: Default DF election algorithm, or modulus-based
algorithm as defined in [RFC7432].
- Type 1: HRW Algorithm (Section 3).
- Types 2-30: Unassigned.
- Type 31: Reserved for Experimental Use.
o Bitmap (2 octets): Encodes "capabilities" to use with the DF
election algorithm in the DF Alg field. This document creates an
IANA registry (Section 7) for the Bitmap field, with values 0-15.
This registry is called "DF Election Capabilities" and includes
the bit values listed below.
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |A| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 5: Bitmap Field in the DF Election Extended Community
- Bit 0 (corresponds to Bit 24 of the DF Election Extended
Community): Unassigned.
- Bit 1: AC-DF Capability (AC-Influenced DF election; see
Section 4). When set to 1, it indicates the desire to use
AC-DF with the rest of the PEs in the ES.
- Bits 2-15: Unassigned.
Rabadan, et al. Standards Track [Page 17]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
The DF Election Extended Community is used as follows:
o A PE SHOULD attach the DF Election Extended Community to any
advertised ES route, and the Extended Community MUST be sent if
the ES is locally configured with a DF election algorithm other
than the default DF election algorithm or if a capability is
required to be used. In the Extended Community, the PE indicates
the desired "DF Alg" algorithm and "Bitmap" capabilities to be
used for the ES.
- Only one DF Election Extended Community can be sent along with
an ES route. Note that the intent is not for the advertising
PE to indicate all the supported DF election algorithms and
capabilities but to signal the preferred one.
- DF Alg values 0 and 1 can both be used with Bit 1 (AC-DF) set
to 0 or 1.
- In general, a specific DF Alg SHOULD determine the use of the
reserved bits in the Extended Community, which may be used in a
different way for a different DF Alg. In particular, for DF
Alg values 0 and 1, the reserved bits are not set by the
advertising PE and SHOULD be ignored by the receiving PE.
o When a PE receives the ES routes from all the other PEs for the ES
in question, it checks to see if all the advertisements have the
Extended Community with the same DF Alg and Bitmap:
- If they do, this particular PE MUST follow the procedures for
the advertised DF Alg and capabilities. For instance, if all
ES routes for a given ES indicate DF Alg HRW and AC-DF set
to 1, then the PEs attached to the ES will perform the DF
election as per the HRW algorithm and following the AC-DF
procedures.
- Otherwise, if even a single advertisement for Route Type 4 is
received without the locally configured DF Alg and capability,
the default DF election algorithm MUST be used as prescribed in
[RFC7432]. This procedure handles the case where participating
PEs in the ES disagree about the DF algorithm and capability to
be applied.
- The absence of the DF Election Extended Community or the
presence of multiple DF Election Extended Communities (in the
same route) MUST be interpreted by a receiving PE as an
indication of the default DF election algorithm on the sending
PE -- that is, DF Alg 0 and no DF election capabilities.
Rabadan, et al. Standards Track [Page 18]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
o When all the PEs in an ES advertise DF Type 31, they will rely on
the local policy to decide how to proceed with the DF election.
o For any new capability defined in the future, the applicability/
compatibility of this new capability to/with the existing DF Alg
values must be assessed on a case-by-case basis.
o Likewise, for any new DF Alg defined in the future, its
applicability/compatibility to/with the existing capabilities must
be assessed on a case-by-case basis.
2.2.1. Backward Compatibility
Implementations that comply with [RFC7432] only (i.e.,
implementations that predate this specification) will not advertise
the DF Election Extended Community. That means that all other
participating PEs in the ES will not receive DF preferences and will
revert to the default DF election algorithm without AC-DF.
Similarly, an implementation that complies with [RFC7432] only and
that receives a DF Election Extended Community will ignore it and
will continue to use the default DF election algorithm.
3. The Highest Random Weight DF Election Algorithm
The procedure discussed in this section is applicable to the DF
election in EVPN services [RFC7432] and the EVPN Virtual Private Wire
Service (VPWS) [RFC8214].
HRW as defined in [HRW1999] is originally proposed in the context of
Internet caching and proxy server load balancing. Given an object
name and a set of servers, HRW maps a request to a server using the
object-name (object-id) and server-name (server-id) rather than the
server states. HRW forms a hash out of the server-id and the
object-id and forms an ordered list of the servers for the particular
object-id. The server for which the hash value is highest serves as
the primary server responsible for that particular object, and the
server with the next-highest value in that hash serves as the backup
server. HRW always maps a given object name to the same server
within a given cluster; consequently, it can be used at client sites
to achieve global consensus on object-to-server mappings. When that
server goes down, the backup server becomes the responsible
designate.
Choosing an appropriate hash function that is statistically oblivious
to the key distribution and imparts a good uniform distribution of
the hash output is an important aspect of the algorithm.
Fortunately, many such hash functions exist. [HRW1999] provides
Rabadan, et al. Standards Track [Page 19]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
pseudorandom functions based on the Unix utilities rand and srand and
easily constructed XOR functions that satisfy the desired hashing
properties. HRW already finds use in multicast and ECMP [RFC2991]
[RFC2992].
3.1. HRW and Consistent Hashing
HRW is not the only algorithm that addresses the object-to-server
mapping problem with goals of fair load distribution, redundancy, and
fast access. There is another family of algorithms that also
addresses this problem; these fall under the umbrella of the
Consistent Hashing Algorithms [CHASH]. These will not be considered
here.
3.2. HRW Algorithm for EVPN DF Election
This section describes the application of HRW to DF election. Let
DF(V) denote the DF and BDF(V) denote the BDF for the Ethernet Tag V;
Si is the IP address of PE i; Es is the ESI; and Weight is a function
of V, Si, and Es.
Note that while the DF election algorithm provided in [RFC7432] uses
a PE address and VLAN as inputs, this document uses an Ethernet Tag,
PE address, and ESI as inputs. This is because if the same set of
PEs are multihomed to the same set of ESes, then the DF election
algorithm used in [RFC7432] would result in the same PE being elected
DF for the same set of BDs on each ES; this could have adverse
side effects on both load balancing and redundancy. Including an ESI
in the DF election algorithm introduces additional entropy, which
significantly reduces the probability of the same PE being elected DF
for the same set of BDs on each ES. Therefore, when using the HRW
algorithm for EVPN DF election, the ESI value in the Weight function
below SHOULD be set to that of the corresponding ES.
In the case of a VLAN Bundle service, V denotes the lowest VLAN,
similar to the "lowest VLAN in bundle" logic of [RFC7432].
1. DF(V) = Si| Weight(V, Es, Si) >= Weight(V, Es, Sj), for all j.
In the case of a tie, choose the PE whose IP address is
numerically the least. Note that 0 <= i,j < number of PEs in the
redundancy group.
2. BDF(V) = Sk| Weight(V, Es, Si) >= Weight(V, Es, Sk), and
Weight(V, Es, Sk) >= Weight(V, Es, Sj). In the case of a tie,
choose the PE whose IP address is numerically the least.
Rabadan, et al. Standards Track [Page 20]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
Where:
o DF(V) is defined to be the address Si (index i) for which
Weight(V, Es, Si) is the highest; 0 <= i < N-1.
o BDF(V) is defined as that PE with address Sk for which the
computed Weight is the next highest after the Weight of the DF.
j is the running index from 0 to N-1; i and k are selected values.
Since the Weight is a pseudorandom function with the domain as the
three-tuple (V, Es, S), it is an efficient and deterministic
algorithm that is independent of the Ethernet Tag V sample space
distribution. Choosing a good hash function for the pseudorandom
function is an important consideration for this algorithm to perform
better than the default algorithm. As mentioned previously, such
functions are described in [HRW1999]. We take as a candidate hash
function the first one out of the two that are listed as preferred in
[HRW1999]:
Wrand(V, Es, Si) = (1103515245((1103515245.Si+12345) XOR
D(V, Es))+12345)(mod 2^31)
Here, D(V, Es) is the 31-bit digest (CRC-32 and discarding the
most significant bit (MSB), as noted in [HRW1999]) of the 14-octet
stream (the 4-octet Ethernet Tag V followed by the 10-octet ESI). It
is mandated that the 14-octet stream be formed by the concatenation
of the Ethernet Tag and the ESI in network byte order. The CRC
should proceed as if the stream is in network byte order
(big-endian). Si is the address of the ith server. The server's
IP address length does not matter, as only the low-order 31 bits are
modulo significant.
A point to note is that the Weight function takes into consideration
the combination of the Ethernet Tag, the ES, and the PE IP address,
and the actual length of the server IP address (whether IPv4 or IPv6)
is not really relevant. The default algorithm defined in [RFC7432]
cannot employ both IPv4 and IPv6 PE addresses, since [RFC7432] does
not specify how to decide on the ordering (the ordinal list) when
both IPv4 and IPv6 PEs are present.
HRW solves the disadvantages pointed out in Section 1.3.1 of this
document and ensures that:
o With very high probability, the task of DF election for the VLANs
configured on an ES is more or less equally distributed among the
PEs, even in the case of two PEs (see the first fundamental
problem listed in Section 1.3.1).
Rabadan, et al. Standards Track [Page 21]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
o If a PE that is not the DF or the BDF for that VLAN goes down or
its connection to the ES goes down, it does not result in a DF or
BDF reassignment. This saves computation, especially in the case
when the connection flaps.
o More importantly, it avoids the third fundamental problem listed
in Section 1.3.1 (needless disruption) that is inherent in the
existing default DF election.
o In addition to the DF, the algorithm also furnishes the BDF, which
would be the DF if the current DF fails.
4. The AC-Influenced DF Election Capability
The procedure discussed in this section is applicable to the DF
election in EVPN services [RFC7432] and EVPN VPWS [RFC8214].
The AC-DF capability is expected to be generally applicable to any
future DF algorithm. It modifies the DF election procedures by
removing from consideration any candidate PE in the ES that cannot
forward traffic on the AC that belongs to the BD. This section is
applicable to VLAN-based and VLAN Bundle service interfaces.
Section 4.1 describes the procedures for VLAN-aware Bundle service
interfaces.
In particular, when used with the default DF algorithm, the AC-DF
capability modifies Step 3 in the DF election procedure described in
[RFC7432], Section 8.5, as follows:
3. When the timer expires, each PE builds an ordered candidate list
of the IP addresses of all the PE nodes attached to the ES
(including itself), in increasing numeric value. The candidate
list is based on the Originating Router's IP addresses of the ES
routes but excludes any PE from whom no Ethernet A-D per ES route
has been received or from whom the route has been withdrawn.
Afterwards, the DF election algorithm is applied on a per
<ES, Ethernet Tag>; however, the IP address for a PE will not be
considered to be a candidate for a given <ES, Ethernet Tag> until
the corresponding Ethernet A-D per EVI route has been received
from that PE. In other words, the ACS on the ES for a given PE
must be UP so that the PE is considered to be a candidate for a
given BD.
If the default DF algorithm is used, every PE in the resulting
candidate list is then given an ordinal indicating its position in
the ordered list, starting with 0 as the ordinal for the PE with
Rabadan, et al. Standards Track [Page 22]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
the numerically lowest IP address. The ordinals are used to
determine which PE node will be the DF for a given Ethernet Tag on
the ES, using the following rule:
Assuming a redundancy group of N PE nodes, for VLAN-based service,
the PE with ordinal i is the DF for an <ES, Ethernet Tag V> when
(V mod N) = i. In the case of a VLAN (-aware) Bundle service,
then the numerically lowest VLAN value in that bundle on that ES
MUST be used in the modulo function as the Ethernet Tag.
It should be noted that using the Originating Router's IP Address
field [RFC7432] in the ES route to get the PE IP address needed
for the ordered list allows for a CE to be multihomed across
different Autonomous Systems (ASes) if such a need ever arises.
The modified Step 3, above, differs from [RFC7432], Section 8.5,
Step 3 in two ways:
o Any DF Alg can be used -- not only the described modulus-based DF
Alg (referred to as the default DF election or "DF Alg 0" in this
document).
o The candidate list is pruned based upon non-receipt of Ethernet
A-D routes: a PE's IP address MUST be removed from the ES
candidate list if its Ethernet A-D per ES route is withdrawn. A
PE's IP address MUST NOT be considered to be a candidate DF for an
<ES, Ethernet Tag> if its Ethernet A-D per EVI route for the
<ES, Ethernet Tag> is withdrawn.
The following example illustrates the AC-DF behavior applied to the
default DF election algorithm, assuming the network in Figure 2:
(a) When PE1 and PE2 discover ES12, they advertise an ES route for
ES12 with the associated ES-Import Extended Community and the DF
Election Extended Community indicating AC-DF = 1; they start a
DF Wait timer (independently). Likewise, PE2 and PE3 advertise
an ES route for ES23 with AC-DF = 1 and start a DF Wait timer.
(b) PE1 and PE2 advertise an Ethernet A-D per ES route for ES12.
PE2 and PE3 advertise an Ethernet A-D per ES route for ES23.
(c) In addition, PE1, PE2, and PE3 advertise an Ethernet A-D per EVI
route for AC1, AC2, AC3, and AC4 as soon as the ACs are enabled.
Note that the AC can be associated with a single customer VID
(e.g., VLAN-based service interfaces) or a bundle of customer
VIDs (e.g., VLAN Bundle service interfaces).
Rabadan, et al. Standards Track [Page 23]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
(d) When the timer expires, each PE builds an ordered candidate list
of the IP addresses of all the PE nodes attached to the ES
(including itself) as explained in the modified Step 3 above.
Any PE from which an Ethernet A-D per ES route has not been
received is pruned from the list.
(e) When electing the DF for a given BD, a PE will not be considered
to be a candidate until an Ethernet A-D per EVI route has been
received from that PE. In other words, the ACS on the ES for a
given PE must be UP so that the PE is considered to be a
candidate for a given BD. For example, PE1 will not consider
PE2 as a candidate for DF election for <ES12, VLAN-1> until an
Ethernet A-D per EVI route is received from PE2 for
<ES12, VLAN-1>.
(f) Once the PEs with ACS = DOWN for a given BD have been removed
from the candidate list, the DF election can be applied for the
remaining N candidates.
Note that this procedure only modifies the existing EVPN control
plane by adding and processing the DF Election Extended Community
and by pruning the candidate list of PEs that take part in the DF
election.
In addition to the events defined in the FSM in Section 2.1, the
following events SHALL modify the candidate PE list and trigger the
DF re-election in a PE for a given <ES, Ethernet Tag>. In the FSM
shown in Figure 3, the events below MUST trigger a transition from
DF_DONE to DF_CALC:
1. Local AC going DOWN/UP.
2. Reception of a new Ethernet A-D per EVI route update/withdrawal
for the <ES, Ethernet Tag>.
3. Reception of a new Ethernet A-D per ES route update/withdrawal
for the ES.
4.1. AC-Influenced DF Election Capability for VLAN-Aware Bundle
Services
The procedure described in Section 4 works for VLAN-based and VLAN
Bundle service interfaces because, for those service types, a PE
advertises only one Ethernet A-D per EVI route per <ES, VLAN> or
<ES, VLAN Bundle>. In Section 4, an Ethernet Tag represents a given
VLAN or VLAN Bundle for the purpose of DF election. The withdrawal
Rabadan, et al. Standards Track [Page 24]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
of such a route means that the PE cannot forward traffic on that
particular <ES, VLAN> or <ES, VLAN Bundle>; therefore, the PE can be
removed from consideration for DF election.
According to [RFC7432], in VLAN-aware Bundle services, the PE
advertises multiple Ethernet A-D per EVI routes per <ES, VLAN Bundle>
(one route per Ethernet Tag), while the DF election is still
performed per <ES, VLAN Bundle>. The withdrawal of an individual
route only indicates the unavailability of a specific AC and not
necessarily all the ACs in the <ES, VLAN Bundle>.
This document modifies the DF election for VLAN-aware Bundle services
in the following ways:
o After confirming that all the PEs in the ES advertise the AC-DF
capability, a PE will perform a DF election per <ES, VLAN>, as
opposed to per <ES, VLAN Bundle> as described in [RFC7432]. Now,
the withdrawal of an Ethernet A-D per EVI route for a VLAN will
indicate that the advertising PE's ACS is DOWN and the rest of the
PEs in the ES can remove the PE from consideration for DF election
in the <ES, VLAN>.
o The PEs will now follow the procedures in Section 4.
For example, assuming three bridge tables in PE1 for the same MAC-VRF
(each one associated with a different Ethernet Tag, e.g., VLAN-1,
VLAN-2, and VLAN-3), PE1 will advertise three Ethernet A-D per EVI
routes for ES12. Each of the three routes will indicate the status
of each of the three ACs in ES12. PE1 will be considered to be a
valid candidate PE for DF election in <ES12, VLAN-1>, <ES12, VLAN-2>,
and <ES12, VLAN-3> as long as its three routes are active. For
instance, if PE1 withdraws the Ethernet A-D per EVI routes for
<ES12, VLAN-1>, the PEs in ES12 will not consider PE1 as a suitable
DF candidate for <ES12, VLAN-1>. PE1 will still be considered for
<ES12, VLAN-2> and <ES12, VLAN-3>, since its routes are active.
5. Solution Benefits
The solution described in this document provides the following
benefits:
(a) It extends the DF election as defined in [RFC7432] to address
the unfair load balancing and potential black-holing issues with
the default DF election algorithm. The solution is applicable
to the DF election in EVPN services [RFC7432] and EVPN VPWS
[RFC8214].
Rabadan, et al. Standards Track [Page 25]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
(b) It defines a way to signal the DF election algorithm and
capabilities intended by the advertising PE. This is done by
defining the DF Election Extended Community, which allows the
advertising PE to indicate its support for the capabilities
defined in this document as well as any subsequently defined DF
election algorithms or capabilities.
(c) It is backwards compatible with the procedures defined in
[RFC7432]. If one or more PEs in the ES do not support the new
procedures, they will all follow DF election as defined in
[RFC7432].
6. Security Considerations
This document addresses some identified issues in the DF election
procedures described in [RFC7432] by defining a new DF election
framework. In general, this framework allows the PEs that are part
of the same ES to exchange additional information and agree on the DF
election type and capabilities to be used.
By following the procedures in this document, the operator will
minimize such undesirable situations as unfair load balancing,
service disruption, and traffic black-holing. Because such
situations could be purposely created by a malicious user with access
to the configuration of one PE, this document also enhances the
security of the network. Note that the network will not benefit from
the new procedures if the DF election algorithm is not consistently
configured on all the PEs in the ES (if there is no unanimity among
all the PEs, the DF election algorithm falls back to the default DF
election as provided in [RFC7432]). This behavior could be exploited
by an attacker that manages to modify the configuration of one PE in
the ES so that the DF election algorithm and capabilities in all the
PEs in the ES fall back to the default DF election. If that is the
case, the PEs will be exposed to the unfair load balancing, service
disruption, and black-holing mentioned earlier.
In addition, the new framework is extensible and allows for new
security enhancements in the future. Note that such enhancements are
out of scope for this document. Finally, since this document extends
the procedures in [RFC7432], the same security considerations as
those described in [RFC7432] are valid for this document.
Rabadan, et al. Standards Track [Page 26]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
7. IANA Considerations
IANA has:
o Allocated Sub-Type value 0x06 in the "EVPN Extended Community
Sub-Types" registry defined in [RFC7153] as follows:
Sub-Type Value Name Reference
-------------- ------------------------------ -------------
0x06 DF Election Extended Community This document
o Set up a registry called "DF Alg" for the DF Alg field in the
Extended Community. New registrations will be made through the
"RFC Required" procedure defined in [RFC8126]. Value 31 is for
experimental use and does not require any other RFC than this
document. The following initial values in that registry exist:
Alg Name Reference
---- ----------------------------- -------------
0 Default DF Election This document
1 HRW Algorithm This document
2-30 Unassigned
31 Reserved for Experimental Use This document
o Set up a registry called "DF Election Capabilities" for the
2-octet Bitmap field in the Extended Community. New registrations
will be made through the "RFC Required" procedure defined in
[RFC8126]. The following initial value in that registry exists:
Bit Name Reference
---- ---------------- -------------
0 Unassigned
1 AC-DF Capability This document
2-15 Unassigned
Rabadan, et al. Standards Track [Page 27]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
8. References
8.1. Normative References
[RFC7432] Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432,
February 2015, <https://www.rfc-editor.org/info/rfc7432>.
[RFC8214] Boutros, S., Sajassi, A., Salam, S., Drake, J., and J.
Rabadan, "Virtual Private Wire Service Support in Ethernet
VPN", RFC 8214, DOI 10.17487/RFC8214, August 2017,
<https://www.rfc-editor.org/info/rfc8214>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in
RFC 2119 Key Words", BCP 14, RFC 8174,
DOI 10.17487/RFC8174, May 2017,
<https://www.rfc-editor.org/info/rfc8174>.
[RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended
Communities Attribute", RFC 4360, DOI 10.17487/RFC4360,
February 2006, <https://www.rfc-editor.org/info/rfc4360>.
[RFC7153] Rosen, E. and Y. Rekhter, "IANA Registries for BGP
Extended Communities", RFC 7153, DOI 10.17487/RFC7153,
March 2014, <https://www.rfc-editor.org/info/rfc7153>.
[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
Writing an IANA Considerations Section in RFCs", BCP 26,
RFC 8126, DOI 10.17487/RFC8126, June 2017,
<https://www.rfc-editor.org/info/rfc8126>.
Rabadan, et al. Standards Track [Page 28]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
8.2. Informative References
[VPLS-MH] Kothari, B., Kompella, K., Henderickx, W., Balus, F., and
J. Uttaro, "BGP based Multi-homing in Virtual Private LAN
Service", Work in Progress,
draft-ietf-bess-vpls-multihoming-03, March 2019.
[CHASH] Karger, D., Lehman, E., Leighton, T., Panigrahy, R.,
Levine, M., and D. Lewin, "Consistent Hashing and Random
Trees: Distributed Caching Protocols for Relieving Hot
Spots on the World Wide Web", ACM Symposium on Theory of
Computing, ACM Press, New York, DOI 10.1145/258533.258660,
May 1997.
[CLRS2009] Cormen, T., Leiserson, C., Rivest, R., and C. Stein,
"Introduction to Algorithms (3rd Edition)", MIT
Press, ISBN 0-262-03384-8, 2009.
[RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and
Multicast Next-Hop Selection", RFC 2991,
DOI 10.17487/RFC2991, November 2000,
<https://www.rfc-editor.org/info/rfc2991>.
[RFC2992] Hopps, C., "Analysis of an Equal-Cost Multi-Path
Algorithm", RFC 2992, DOI 10.17487/RFC2992, November 2000,
<https://www.rfc-editor.org/info/rfc2992>.
[RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route
Reflection: An Alternative to Full Mesh Internal BGP
(IBGP)", RFC 4456, DOI 10.17487/RFC4456, April 2006,
<https://www.rfc-editor.org/info/rfc4456>.
[HRW1999] Thaler, D. and C. Ravishankar, "Using Name-Based Mappings
to Increase Hit Rates", IEEE/ACM Transactions on
Networking, Volume 6, No. 1, February 1998,
<https://www.microsoft.com/en-us/research/wp-content/
uploads/2017/02/HRW98.pdf>.
[Knuth] Knuth, D., "The Art of Computer Programming: Volume 3:
Sorting and Searching", 2nd Edition, Addison-Wesley,
Page 516, 1998.
Rabadan, et al. Standards Track [Page 29]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
Acknowledgments
The authors want to thank Ranganathan Boovaraghavan, Sami Boutros,
Luc Andre Burdet, Anoop Ghanwani, Mrinmoy Ghosh, Jakob Heitz, Leo
Mermelstein, Mankamana Mishra, Tamas Mondal, Laxmi Padakanti, Samir
Thoria, and Sriram Venkateswaran for their review and contributions.
Special thanks to Stephane Litkowski for his thorough review and
detailed contributions.
They would also like to thank their working group chairs, Matthew
Bocci and Stephane Litkowski, and their AD, Martin Vigoureux, for
their guidance and support.
Finally, they would like to thank the Directorate reviewers and the
ADs for their thorough reviews and probing questions, the answers to
which have substantially improved the quality of the document.
Contributors
The following people have contributed substantially to this document
and should be considered coauthors:
Antoni Przygienda
Juniper Networks, Inc.
1194 N. Mathilda Ave.
Sunnyvale, CA 94089
United States of America
Email: prz@juniper.net
Vinod Prabhu
Nokia
Email: vinod.prabhu@nokia.com
Wim Henderickx
Nokia
Email: wim.henderickx@nokia.com
Wen Lin
Juniper Networks, Inc.
Email: wlin@juniper.net
Rabadan, et al. Standards Track [Page 30]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
Patrice Brissette
Cisco Systems
Email: pbrisset@cisco.com
Keyur Patel
Arrcus, Inc.
Email: keyur@arrcus.com
Autumn Liu
Ciena
Email: hliu@ciena.com
Authors' Addresses
Jorge Rabadan (editor)
Nokia
777 E. Middlefield Road
Mountain View, CA 94043
United States of America
Email: jorge.rabadan@nokia.com
Satya Mohanty (editor)
Cisco Systems, Inc.
225 West Tasman Drive
San Jose, CA 95134
United States of America
Email: satyamoh@cisco.com
Ali Sajassi
Cisco Systems, Inc.
225 West Tasman Drive
San Jose, CA 95134
United States of America
Email: sajassi@cisco.com
Rabadan, et al. Standards Track [Page 31]
^L
RFC 8584 DF Election Framework for EVPN Services April 2019
John Drake
Juniper Networks, Inc.
1194 N. Mathilda Ave.
Sunnyvale, CA 94089
United States of America
Email: jdrake@juniper.net
Kiran Nagaraj
Nokia
701 E. Middlefield Road
Mountain View, CA 94043
United States of America
Email: kiran.nagaraj@nokia.com
Senthil Sathappan
Nokia
701 E. Middlefield Road
Mountain View, CA 94043
United States of America
Email: senthil.sathappan@nokia.com
Rabadan, et al. Standards Track [Page 32]
^L
|