diff options
Diffstat (limited to 'doc/rfc/rfc8775.txt')
-rw-r--r-- | doc/rfc/rfc8775.txt | 947 |
1 files changed, 947 insertions, 0 deletions
diff --git a/doc/rfc/rfc8775.txt b/doc/rfc/rfc8775.txt new file mode 100644 index 0000000..2769dd9 --- /dev/null +++ b/doc/rfc/rfc8775.txt @@ -0,0 +1,947 @@ + + + + +Internet Engineering Task Force (IETF) Y. Cai +Request for Comments: 8775 H. Ou +Category: Standards Track Alibaba Group +ISSN: 2070-1721 S. Vallepalli + + M. Mishra + S. Venaas + Cisco Systems, Inc. + A. Green + British Telecom + April 2020 + + + PIM Designated Router Load Balancing + +Abstract + + On a multi-access network, one of the PIM-SM (PIM Sparse Mode) + routers is elected as a Designated Router. One of the + responsibilities of the Designated Router is to track local multicast + listeners and forward data to these listeners if the group is + operating in PIM-SM. This document specifies a modification to the + PIM-SM protocol that allows more than one of the PIM-SM routers to + take on this responsibility so that the forwarding load can be + distributed among multiple routers. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc8775. + +Copyright Notice + + Copyright (c) 2020 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction + 2. Terminology + 3. Applicability + 4. Functional Overview + 4.1. GDR Candidates + 5. Protocol Specification + 5.1. Hash Mask and Hash Algorithm + 5.2. Modulo Hash Algorithm + 5.2.1. Modulo Hash Algorithm Examples + 5.2.2. Limitations + 5.3. PIM Hello Options + 5.3.1. PIM DR Load-Balancing Capability (DRLB-Cap) Hello + Option + 5.3.2. PIM DR Load-Balancing List (DRLB-List) Hello Option + 5.4. PIM DR Operation + 5.5. PIM GDR Candidate Operation + 5.6. DRLB-List Hello Option Processing + 5.7. PIM Assert Modification + 5.8. Backward Compatibility + 6. Operational Considerations + 7. IANA Considerations + 7.1. Initial Registry + 7.2. Assignment of New Hash Algorithms + 8. Security Considerations + 9. References + 9.1. Normative References + 9.2. Informative References + Acknowledgements + Authors' Addresses + +1. Introduction + + On a multi-access LAN (such as an Ethernet) with one or more PIM-SM + (PIM Sparse Mode) [RFC7761] routers, one of the PIM-SM routers is + elected as a Designated Router (DR). The PIM DR has two + responsibilities in the PIM-SM protocol. For any active sources on a + LAN, the PIM DR is responsible for registering with the Rendezvous + Point (RP) if the group is operating in PIM-SM. Also, the PIM DR is + responsible for tracking local multicast listeners and forwarding + data to these listeners if the group is operating in PIM-SM. + + Consider the following LAN in Figure 1: + + + (core networks) + | | | + | | | + R1 R2 R3 + | | | + ----(LAN)---- + | + | + (many receivers) + + Figure 1: LAN with Receivers + + Assume R1 is elected as the DR. According to the PIM-SM protocol, R1 + will be responsible for forwarding traffic to that LAN on behalf of + all local members. In addition to keeping track of membership + reports, R1 is also responsible for initiating the creation of source + and/or shared trees towards the senders or the RPs. The membership + reports would be IGMP or Multicast Listener Discovery (MLD) messages. + This applies to any versions of the IGMP and MLD protocols. The most + recent versions are IGMPv3 [RFC3376] and MLDv2 [RFC3810]. + + Having a single router acting as DR and being responsible for data- + plane forwarding leads to several issues. One of the issues is that + the aggregated bandwidth will be limited to what R1 can handle with + regards to capacity of incoming links, the interface on the LAN, and + total forwarding capacity. It is very common that a LAN consists of + switches that run IGMP/MLD or PIM snooping [RFC4541]. This allows + the forwarding of multicast packets to be restricted only to segments + leading to receivers that have indicated their interest in multicast + groups using either IGMP or MLD. The emergence of the switched + Ethernet allows the aggregated bandwidth to exceed, sometimes by a + large number, that of a single link. For example, let us modify + Figure 1 and introduce an Ethernet switch in Figure 2. + + (core networks) + | | | + | | | + R1 R2 R3 + | | | + +=gi1===gi2===gi3=+ + + + + + switch + + + + + +=gi4===gi5===gi6=+ + | | | + H1 H2 H3 + + Figure 2: LAN with Ethernet Switch + + Let us assume that each individual link is a Gigabit Ethernet. Each + router (R1, R2, and R3) and the switch have enough forwarding + capacity to handle hundreds of gigabits of data. + + Let us further assume that each of the hosts requests 500 Mbps of + unique multicast data. This totals to 1.5 Gbps of data, which is + less than what each switch or the combined uplink bandwidth across + the routers can handle, even under failure of a single router. + + On the other hand, the link between R1 and switch, via port gi1, can + only handle a throughput of 1 Gbps. And if R1 is the only DR (the + PIM DR elected using the procedure defined by [RFC7761]), at least + 500 Mbps worth of data will be lost because the only link that can be + used to draw the traffic from the routers to the switch is via gi1. + In other words, the entire network's throughput is limited by the + single connection between the PIM DR and the switch (or LAN, as in + Figure 1). + + Another important issue is related to failover. If R1 is the only + forwarder on a shared LAN, when R1 goes out of service, multicast + forwarding for the entire LAN has to be rebuilt by the newly elected + PIM DR. However, if there were a way that allowed multiple routers + to forward to the LAN for different groups, failure of one of the + routers would only lead to disruption to a subset of the flows, + therefore improving the overall resilience of the network. + + This document specifies a modification to the PIM-SM protocol that + allows more than one of these routers, called Group Designated + Routers (GDRs), to be selected so that the forwarding load can be + distributed among a number of routers. + +2. Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + + With respect to PIM-SM, this document follows the terminology that + has been defined in [RFC7761]. + + This document also introduces the following new acronyms: + + GDR: Group Designated Router. For each multicast flow, either a + (*,G) for Any-Source Multicast (ASM) or an (S,G) for Source- + Specific Multicast (SSM) [RFC4607], a hash algorithm (described + below) is used to select one of the routers as a GDR. The GDR is + responsible for initiating the forwarding tree building process + for the corresponding multicast flow. + + GDR Candidate: a router that has the potential to become a GDR. + There might be multiple GDR Candidates on a LAN, but only one can + become the GDR for a specific multicast flow. + +3. Applicability + + The extension specified in this document applies to PIM-SM routers + acting as last-hop routers (there are directly connected receivers). + It does not alter the behavior of a PIM DR or any other routers on + the first-hop network (directly connected sources). This is because + the source tree is built using the IP address of the sender, not the + IP address of the PIM DR that sends PIM registers towards the RP. + The load balancing between first-hop routers can be achieved + naturally if an IGP provides equal cost multiple paths (which it + usually does in practice). Also, distributing the load to do source + registration does not justify the additional complexity required to + support it. + +4. Functional Overview + + In the PIM DR election as defined in [RFC7761], when multiple routers + are connected to a multi-access LAN (for example, an Ethernet), one + of them is elected to act as PIM DR. The PIM DR is responsible for + sending local Join/Prune messages towards the RP or source. In order + to elect the PIM DR, each PIM router on the LAN examines the received + PIM Hello messages and compares its own DR priority and IP address + with those of its neighbors. The router with the highest DR priority + is the PIM DR. If there are multiple such routers, their IP + addresses are used as the tiebreaker, as described in [RFC7761]. + + In order to share forwarding load among last-hop routers, besides the + normal PIM DR election, one or more GDRs are elected on the multi- + access LAN. There is only one PIM DR on the multi-access LAN, but + there might be multiple GDR Candidates. + + For each multicast flow, that is, (*,G) for ASM and (S,G) for SSM, a + hash algorithm (Section 5.1) is used to select one of the routers to + be the GDR. The new DR Load-Balancing Capability (DRLB-Cap) PIM + Hello Option is used to announce the Capability, as well as the hash + algorithm type. Routers with the new DRLB-Cap Option advertised in + their PIM Hello, using the same GDR election hash algorithm and the + same DR priority as the PIM DR, are considered as GDR Candidates. + + Hash masks are defined for Source, Group, and RP, separately, in + order to handle PIM ASM/SSM. The masks, as well as a sorted list of + GDR Candidate addresses, are announced by the DR in a new DR Load- + Balancing List (DRLB-List) PIM Hello Option. + + A hash algorithm based on the announced Source, Group, or RP masks + allows one GDR to be assigned to a corresponding multicast state. + That GDR is responsible for initiating the creation of the multicast + forwarding tree for multicast traffic. + +4.1. GDR Candidates + + GDR is the new concept introduced by this specification. GDR + Candidates are routers eligible for GDR election on the LAN. To + become a GDR Candidate, a router must have the same DR priority and + run the same GDR election hash algorithm as the DR on the LAN. + + For example, assume there are 4 routers on the LAN: R1, R2, R3, and + R4, each announcing a DRLB-Cap Option. R1, R2, and R3 have the same + DR priority, while R4's DR priority is less preferred. In this + example, R4 will not be eligible for GDR election, because R4 will + not become a PIM DR unless all of R1, R2, and R3 go out of service. + + Furthermore, assume router R1 wins the PIM DR election, R1 and R2 + advertise the same hash algorithm for GDR election, while R3 + advertises a different one. In this case, only R1 and R2 will be + eligible for GDR election, while R3 will not. + + As a DR, R1 will include its own Load-Balancing Hash Masks and the + identity of R1 and R2 (the GDR Candidates) in its DRLB-List Hello + Option. + +5. Protocol Specification + +5.1. Hash Mask and Hash Algorithm + + A hash mask is used to extract a number of bits from the + corresponding IP address field (32 for IPv4, 128 for IPv6) and + calculate a hash value. A hash value is used to select a GDR from + GDR Candidates advertised by the PIM DR. Hash masks allow for + certain flows to always be forwarded by the same GDR, by ignoring + certain bits in the hash value calculation, so that the hash values + are the same. For example, 0.0.255.0 defines a hash mask for an IPv4 + address that masks the first, second, and fourth octets, which means + that only the third octet will influence the hash value computed. + Note that the masks need not be a contiguous set of bits. For + example, for IPv4, 15.15.15.15 would be a valid mask. + + In the text below, a hash mask is, in some places, said to be zero. + A hash mask is zero if no bits are set, that is, 0.0.0.0 for IPv4 and + :: for IPv6. Also, a hash mask is said to be an all-bits-set mask if + it is 255.255.255.255 for IPv4 or + ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff for IPv6. + + There are three hash masks defined: + + * RP Hash Mask + + * Source Hash Mask + + * Group Hash Mask + + The hash masks need to be configured on the PIM routers that can + potentially become a PIM DR, unless the implementation provides + default hash mask values. An implementation SHOULD have default hash + mask values as follows. The default RP Hash Mask SHOULD be zero (no + bits set). The default Source and Group Hash Masks SHOULD both be + all-bits-set masks. These default values are likely acceptable for + most deployments and simplify configuration. There is only a need to + use other masks if one needs to ensure that certain flows are + forwarded by the same GDR. + + The DRLB-List Hello Option contains a list of GDR Candidates. The + first one listed has ordinal number 0, the second listed ordinal + number 1, and the last one has ordinal number N - 1 if there are N + candidates listed. The hash value computed will be the ordinal + number of the GDR Candidate that is acting as GDR for the flow in + question. + + The input to be hashed is determined as follows: + + * If the group is in ASM mode and the RP Hash Mask announced by the + PIM DR is not zero (at least one bit is set), calculate the value + of hashvalue_RP (Section 5.2) to determine the GDR. + + * If the group is in ASM mode and the RP Hash Mask announced by the + PIM DR is zero (no bits are set), obtain the value of + hashvalue_Group (Section 5.2) to determine the GDR. + + * If the group is in SSM mode, use hashvalue_SG (Section 5.2) to + determine the GDR. + + A simple modulo hash algorithm is defined in this document. However, + to allow another hash algorithm to be used, a 1-octet "Hash + Algorithm" field is included in the DRLB-Cap Hello Option to specify + the hash algorithm used by the router. + + If different hash algorithms are advertised among the routers on a + LAN, only the routers advertising the same hash algorithm as the DR + (as well as having the same DR priority as the DR) are eligible for + GDR election. + +5.2. Modulo Hash Algorithm + + As part of computing the hash, the notation LSZC(hash_mask) is used + to denote the number of zeroes counted from the least significant bit + of a hash mask hash_mask. As an example, LSZC(255.255.128) is 7 and + LSZC(ffff:8000::) is 111. If all bits are set, LSZC will be 0. If + the mask is zero, then LSZC will be 32 for IPv4 and 128 for IPv6. + + The number of GDR Candidates is denoted as GDRC. + + The idea behind the modulo hash algorithm is, in simple terms, that + the corresponding mask is applied to a value, then the result is + shifted right LSZC(mask) bits so that the least significant bits that + were masked out are not considered. Then, this result is masked by + 0xffffffff, keeping only the last 32 bits of the result (this only + makes a difference for IPv6). Finally, the hash value is this result + modulo the number of GDR Candidates (GDRC). + + The modulo hash algorithm, for computing the values hashvalue_RP, + hashvalue_Group, and hashvalue_SG, is defined as follows. + + hashvalue_RP is calculated as: + + (((RP_address & RP_mask) >> LSZC(RP_mask)) & 0xffffffff) % GDRC + + RP_address is the address of the RP defined for the group, and + RP_mask is the RP Hash Mask. + + hashvalue_Group is calculated as: + + (((Group_address & Group_mask) >> LSZC(Group_mask)) & 0xffffffff) + % GDRC + + Group_address is the group address, and Group_mask is the Group + Hash Mask. + + hashvalue_SG is calculated as: + + ((((Source_address & Source_mask) >> LSZC(Source_mask)) & + 0xffffffff) ^ (((Group_address & Group_mask) >> LSZC(Group_mask)) + & 0xffffffff)) % GDRC + + Group_address is the group address, and Group_mask is the Group + Hash Mask. + +5.2.1. Modulo Hash Algorithm Examples + + To help illustrate the algorithm, consider this example. Router X + with IPv4 address 203.0.113.1 receives a DRLB-List Hello Option from + the DR that announces RP Hash Mask 0.0.255.0 and a list of GDR + Candidates, sorted by IP addresses from high to low: 203.0.113.3, + 203.0.113.2, and 203.0.113.1. The ordinal number assigned to those + addresses would be: + + 0 for 203.0.113.3; 1 for 203.0.113.2; 2 for 203.0.113.1 (Router X). + + Assume there are 2 RPs: RP1 192.0.2.1 for Group1 and RP2 198.51.100.2 + for Group2. Following the modulo hash algorithm: + + * LSZC(0.0.255.0) is 8, and GDRC is 3. The hashvalue_RP for Group1 + with RP RP1 is: + + (((192.0.2.1 & 0.0.255.0) >> 8) & 0xffffffff % 3) + = 2 % 3 + = 2 + + This matches the ordinal number assigned to Router X. Router X + will be the GDR for Group1. + + * The hashvalue_RP for Group2 with RP RP2 is: + + (((198.51.100.2 & 0.0.255.0) >> 8) & 0xffffffff % 3) + = 100 % 3 + = 1 + + This is different from the ordinal number of Router X (2). Hence, + Router X will not be GDR for Group2. + + For IPv6, consider this example, similar to the above. Router X with + IPv6 address fe80::1 receives a DRLB-List Hello Option from the DR + that announces RP Hash Mask ::ffff:ffff:ffff:0 and a list of GDR + Candidates, sorted by IP addresses from high to low: fe80::3, + fe80::2, and fe80::1. The ordinal number assigned to those addresses + would be: + + 0 for fe80::3; 1 for fe80::2; 2 for fe80::1 (Router X). + + Assume there are 2 RPs: RP1 2001:db8::1:0:5678:1 for Group1 and RP2 + 2001:db8::1:0:1234:2 for Group2. Following the modulo hash + algorithm: + + * LSZC(::ffff:ffff:ffff:0) is 16, and GDRC is 3. The hashvalue_RP + for Group1 with RP RP1 is: + + (((2001:db8::1:0:5678:1 & ::ffff:ffff:ffff:0) >> 16) & + 0xffffffff % 3) + = ((::1:0:5678:0 >> 16) & 0xffffffff % 3) + = (::1:0:5678 & 0xffffffff % 3) + = ::5678 % 3 + = 2 + + This matches the ordinal number assigned to Router X. Router X + will be the GDR for Group1. + + * The hashvalue_RP for Group2 with RP RP2 is: + + (((2001:db8::1:0:1234:1 & ::ffff:ffff:ffff:0) >> 16) & + 0xffffffff % 3) + = ((::1:0:1234:0 >> 16) & 0xffffffff % 3) + = (::1:0:1234 & 0xffffffff % 3) + = ::1234 % 3 + = 1 + + This is different from the ordinal number of Router X (2). Hence, + Router X will not be GDR for Group2. + +5.2.2. Limitations + + The modulo hash algorithm has poor failover characteristics when a + shared LAN has more than two GDRs. In the case of more than two GDRs + on a LAN, when one GDR fails, all of the groups may be reassigned to + a different GDR, even if they were not assigned to the failed GDR. + However, many deployments use only two routers on a shared LAN for + redundancy purposes. Future work may define new hash algorithms + where only groups assigned to the failed GDR get reassigned. + + The modulo hash algorithm will use, at most, 32 consecutive bits of + the input addresses for its computation. Exactly which bits are used + of the source, group, or RP addresses depend on the respective masks. + This limitation may be an issue for IPv6 deployments, since not all + bits of the IPv6 addresses are considered. If this causes + operational issues, a new hash algorithm would need to be defined. + +5.3. PIM Hello Options + + PIM routers include a new option, called "Load-Balancing Capability + (DRLB-Cap)", in their PIM Hello messages. + + Besides this DRLB-Cap Hello Option, the elected PIM DR also includes + a new "DR Load-Balancing List (DRLB-List) Hello Option". The DRLB- + List Hello Option consists of three hash masks, as defined above, and + also a list of GDR Candidate addresses on the LAN. It is recommended + that the GDR Candidate addresses are sorted in descending order. + This ensures that when using algorithms, such as the modulo hash + algorithm in this document, that it is predictable which GDR is + responsible for which groups, regardless of the order the DR learned + about the candidates. + +5.3.1. PIM DR Load-Balancing Capability (DRLB-Cap) Hello Option + + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type = 34 | Length = 4 | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Reserved |Hash Algorithm | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 3: PIM DR Load-Balancing Capability Hello Option + + Type: 34 + + Length: 4 + + Reserved: Transmitted as zero, ignored on receipt. + + Hash Algorithm: Hash algorithm type. A value listed in the IANA + "PIM Designated Router Load-Balancing Hash Algorithms" registry. 0 + is used for the hash algorithm defined in this document. + + This DRLB-Cap Hello Option MUST be advertised by routers on all + interfaces where DR Load Balancing is enabled. Note that the option + is included, at most, once. + +5.3.2. PIM DR Load-Balancing List (DRLB-List) Hello Option + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type = 35 | Length | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Group Mask | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Source Mask | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | RP Mask | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | GDR Candidate Address(es) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 4: PIM DR Load-Balancing List Hello Option + + Type: 35 + + Length: (3 + n) x (4 or 16) bytes, where n is the number of GDR + Candidates. + + Group Mask (32/128 bits): Mask applied to group addresses as part of + hash computation. + + Source Mask (32/128 bits): Mask applied to source addresses as part + of hash computation. + + RP Mask (32/128 bits): Mask applied to RP addresses as part of hash + computation. + + All masks MUST have the same number of bits as the IP source address + in the PIM Hello IP header. + + GDR Candidate Address(es) (32/128 bits): List of GDR Candidate(s) + + All addresses MUST be in the same address family as the PIM Hello + IP header. It is recommended that the addresses are sorted in + descending order. + + If the "Interface ID" option, as specified in [RFC6395], is + present in a GDR Candidate's PIM Hello message and the "Router + Identifier" portion is non-zero: + + * For IPv4, the "GDR Candidate Address" will be set directly to + the "Router Identifier". + + * For IPv6, the "GDR Candidate Address" will be 96 bits of + zeroes, followed by the 32 bit Router Identifier. + + If the "Interface ID" option is not present in a GDR Candidate's + PIM Hello message or if the "Interface ID" option is present but + the "Router Identifier" field is zero, the "GDR Candidate Address" + will be the IPv4 or IPv6 source address of the PIM Hello message. + + This DRLB-List Hello Option MUST only be advertised by the elected + PIM DR. It MUST be ignored if received from a non-DR. The option + MUST also be ignored if the hash masks are not the correct number + of bits or GDR Candidate addresses are in the wrong address + family. + +5.4. PIM DR Operation + + The DR election process is still the same as defined in [RFC7761]. + The DR advertises the new DRLB-List Hello Option, which contains mask + values from user configuration (or default values), followed by a + list of GDR Candidate addresses. Note that if a router included the + "Interface ID" option in the hello message and the Router ID is non- + zero, the Router ID will be used to form the GDR Candidate address of + the router, as discussed in the previous section. It is recommended + that the list be sorted from the highest value to the lowest value. + The reason for sorting the list is to make the behavior + deterministic, regardless of the order in which the DR learns of new + candidates. Note that, as for non-DR routers, the DR also advertises + the DRLB-Cap Hello Option to indicate its ability to support the new + functionality and the type of GDR election hash algorithm it uses. + + If a PIM DR receives a neighbor DRLB-Cap Hello Option that contains + the same hash algorithm as the DR and the neighbor has the same DR + priority as the DR, PIM DR SHOULD consider the neighbor as a GDR + Candidate and insert the GDR Candidate's Address into the list of the + DRLB-List Option. However, the DR may have policies limiting which + or the number of GDR Candidates to include. Likewise, the DR SHOULD + include itself in the list of GDR Candidates, but it is permissible + not to do so, for instance, if there is some policy restricting the + candidate set. + + If a PIM neighbor included in the list expires, stops announcing the + DRLB-Cap Hello Option, changes DR priority, changes hash algorithm, + or otherwise becomes ineligible as a candidate, the DR SHOULD + immediately send a triggered hello with a new list in the DRLB-List + option, excluding the neighbor. + + If a new router becomes eligible as a candidate, there is no urgency + in sending out an updated list. An updated list SHOULD be included + in the next hello. + +5.5. PIM GDR Candidate Operation + + When an IGMP/MLD report is received, a hash algorithm is used by the + GDR Candidates to determine which router is going to be responsible + for building forwarding trees on behalf of the host. + + The router MUST include the DRLB-Cap Hello Option in all PIM Hello + messages sent on the interface. Note that the presence of the DRLB- + Cap Option in the PIM Hello does not guarantee that the router will + be considered as a GDR Candidate. Once the DR election is done, the + DRLB-List Hello Option is received from the current PIM DR containing + a list of the selected GDR Candidates. + + A router only acts as a GDR Candidate if it is included in the GDR + Candidate list of the DRLB-List Hello Option. See next section for + details. + +5.6. DRLB-List Hello Option Processing + + This section discusses processing of the DRLB-List Hello Option, + including the case where it was received in the previous hello but + not in the current hello. All routers MUST ignore the DRLB-List + Hello Option if it is received from a PIM router that is not the DR. + The option MUST only be processed by routers that are announcing the + DRLB-Cap Option and only if the hash algorithm announced by the DR is + the same as the local announcement. All GDR Candidates MUST use the + hash masks advertised in the Option, even if they differ from those + the candidate was configured with. The DR MUST also process its own + DRLB-List Hello Option. + + A router stores the latest option contents that were announced, if + any, and deletes the previous contents. The router MUST also compare + the new contents with any previous contents and, if there are any + changes, continue processing as below. Note that if the option does + not pass the above checks, the below processing MUST be done as if + the option was not announced. + + If the contents of the DRLB-List Option, the masks, or the candidate + list differ from the previously saved copy, it is received for the + first time, or it is no longer being received or accepted, the option + MUST be processed as below. + + 1. If the local router is included in the "GDR Candidate + Address(es)" field, it will look for its own address, or if it + announces a non-zero Router ID, its own Router ID. For each of + the groups or source and group pairs, if the group is in SSM mode + with local receiver interest, the router MUST run the hash + algorithm to determine which of them is for the GDR. + + * If there is no change in the GDR status, then no further + action is required. + + * If the router becomes the new GDR, then a multicast forwarding + tree MUST be built [RFC7761]. + + * If the router is no longer the GDR, then it uses an Assert as + explained in Section 5.7. + + 2. If one of the following occurs: + + * the local router is not included in the "GDR Candidate + Address(es)" field, + + * the DRLB-List Hello Option is no longer included in the DR's + Hello, or + + * the DR's Neighbor Liveness Timer expires [RFC7761], + + then for each group (or each source and group pair if the group + is in SSM mode) with local receiver interest, for which the + router is the GDR, the router uses an Assert as explained in + Section 5.7. + +5.7. PIM Assert Modification + + GDR changes may occur due to configuration change, GDR Candidates + going down, and also new routers coming up and becoming GDR + Candidates. This may occur while flows are being forwarded. If the + GDR for an active flow changes, there is likely to be some + disruption, such as packet loss or duplicates. By using asserts, + packet loss is minimized while allowing a small amount of duplicates. + + When a router stops acting as the GDR for a group, or source and + group pair if SSM, it MUST set the Assert metric preference to + maximum (0x7fffffff) and the Assert metric to one less than maximum + (0xfffffffe). That is, whenever it sends or receives an Assert for + the group, it must use these values as the metric preference and + metric rather than the values provided by the unicast routing + protocol. + + The rest of this section is just for illustration purposes and not + part of the protocol definition. + + To illustrate the behavior when there is a GDR change, consider the + following scenario where there are two flows: G1 and G2. R1 is the + GDR for G1, and R2 is the GDR for G2. When R3 comes up, it is + possible that R3 becomes GDR for both G1 and G2; hence, R3 starts to + build the forwarding tree for G1 and G2. If R1 and R2 stop + forwarding before R3 completes the process, packet loss might occur. + On the other hand, if R1 and R2 continue forwarding while R3 is + building the forwarding trees, duplicates might occur. + + When the role of GDR changes as above, instead of immediately + stopping forwarding, R1 and R2 continue forwarding to G1 and G2 + respectively, while, at the same time, R3 build forwarding trees for + G1 and G2. This will lead to PIM Asserts. + + For G1, using the functionality described in this document, R1 and R3 + determine the new GDR, which is R3. With the modified Assert + behavior, R1 sets its Assert metric to the near maximum value, as + discussed above. That will make R3, which has normal metric in its + Assert, the Assert winner. + +5.8. Backward Compatibility + + In the case of a hybrid Ethernet shared LAN (where some PIM routers + support the functionality defined in this document and some do not): + + * If the DR does not support the new functionality, then there will + be no load balancing. + + * If non-DR routers do not support the new functionality, they will + not be considered as GDR Candidate and will not take part in load + balancing. Load balancing may still happen on the link. + +6. Operational Considerations + + An administrator needs to consider what the total bandwidth + requirements are and find a set of routers that together have enough + available capacity while making sure that each of the routers can + handle its part, assuming that the traffic is distributed roughly + equally among the routers. Ideally, one should also have enough + bandwidth to handle the case where at least one router fails. All + routers should have reachability to the sources and RPs, if + applicable, that are not via the LAN. + + Care must be taken when choosing what hash masks to configure. One + would typically configure the same masks on all the routers so that + they are the same, regardless of which router is elected as DR. The + default masks are likely suitable for most deployment. The RP Hash + Mask must be configured (the default is no bits set) if one wishes to + hash based on the RP address rather than the group address for ASM. + The default masks will use the entire group addresses, and source + addresses if SSM, as part of the hash. An administrator may set + other masks that mask out part of the addresses to ensure that + certain flows always get hashed to the same router. How this is + achieved depends on how the group addresses are allocated. + + Only the routers announcing the same hash algorithm as the DR would + be considered as GDR Candidates. Network administrators need to make + sure that the desired set of routers announce the same algorithm. + Migration between different algorithms is not considered in this + document. + +7. IANA Considerations + + IANA has made these assignments in the "PIM-Hello Options" registry: + value 34 for the PIM DR Load-Balancing Capability (DRLB-Cap) Hello + Option (with Length of 4), and value 35 for the PIM DR Load-Balancing + List (DRLB-List) Hello Option (with variable Length). + + Per this document, IANA has created a registry called "PIM Designated + Router Load-Balancing Hash Algorithms" in the "Protocol Independent + Multicast (PIM)" branch of the registry tree. The registry lists + hash algorithms for use by PIM Designated Router Load Balancing. + +7.1. Initial Registry + + The initial content of the registry is as follows. + + +-------+------------+-----------+ + | Type | Name | Reference | + +=======+============+===========+ + | 0 | Modulo | RFC 8775 | + +-------+------------+-----------+ + | 1-255 | Unassigned | | + +-------+------------+-----------+ + + Table 1 + +7.2. Assignment of New Hash Algorithms + + Assignment of new hash algorithms is done according to the "IETF + Review" procedure; see [RFC8126]. + +8. Security Considerations + + Security of the new DR Load-Balancing PIM Hello Options is only + guaranteed by the security of PIM Hello messages, so the security + considerations for PIM Hello messages, as described in PIM-SM + [RFC7761], apply here. + + If the DR is subverted, it could omit or add certain GDRs or announce + an unsupported algorithm. If another router is subverted, it could + be made DR and cause similar issues. While these issues are specific + to this specification, they are not that different from existing + attacks, such as subverting a DR and lowering the DR priority, + causing a different router to become the DR. + + If, for any reason, the DR includes a GDR in the announced list that + announces a different algorithm from what the DR announces, the GDR + is required to ignore the announcement, and there will be no router + acting as the DR for the flows that hash to that GDR. + + If a GDR is subverted, it could potentially be made to stop + forwarding all the traffic it is expected to forward. This is also + similar today to if a DR is subverted. + + An administrator may be able to achieve the desired load balancing of + known flows, but an attacker may send a single high rate flow that is + served by a single GDR or send multiple flows that are expected to be + hashed to the same GDR. + +9. References + +9.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC6395] Gulrajani, S. and S. Venaas, "An Interface Identifier (ID) + Hello Option for PIM", RFC 6395, DOI 10.17487/RFC6395, + October 2011, <https://www.rfc-editor.org/info/rfc6395>. + + [RFC7761] Fenner, B., Handley, M., Holbrook, H., Kouvelas, I., + Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent + Multicast - Sparse Mode (PIM-SM): Protocol Specification + (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March + 2016, <https://www.rfc-editor.org/info/rfc7761>. + + [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for + Writing an IANA Considerations Section in RFCs", BCP 26, + RFC 8126, DOI 10.17487/RFC8126, June 2017, + <https://www.rfc-editor.org/info/rfc8126>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + +9.2. Informative References + + [RFC3376] Cain, B., Deering, S., Kouvelas, I., Fenner, B., and A. + Thyagarajan, "Internet Group Management Protocol, Version + 3", RFC 3376, DOI 10.17487/RFC3376, October 2002, + <https://www.rfc-editor.org/info/rfc3376>. + + [RFC3810] Vida, R., Ed. and L. Costa, Ed., "Multicast Listener + Discovery Version 2 (MLDv2) for IPv6", RFC 3810, + DOI 10.17487/RFC3810, June 2004, + <https://www.rfc-editor.org/info/rfc3810>. + + [RFC4541] Christensen, M., Kimball, K., and F. Solensky, + "Considerations for Internet Group Management Protocol + (IGMP) and Multicast Listener Discovery (MLD) Snooping + Switches", RFC 4541, DOI 10.17487/RFC4541, May 2006, + <https://www.rfc-editor.org/info/rfc4541>. + + [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for + IP", RFC 4607, DOI 10.17487/RFC4607, August 2006, + <https://www.rfc-editor.org/info/rfc4607>. + +Acknowledgements + + The authors would like to thank Steve Simlo and Taki Millonis for + helping with the original idea; Alia Atlas, Bill Atwood, Joe Clarke, + Alissa Cooper, Jake Holland, Bharat Joshi, Anish Kachinthaya, Anvitha + Kachinthaya, Benjamin Kaduk, Mirja Kühlewind, Barry Leiba, Ben Niven- + Jenkins, Alvaro Retana, Adam Roach, Michael Scharf, Éric Vyncke, and + Carl Wallace for reviews and comments; and Toerless Eckert and + Rishabh Parekh for helpful conversation on the document. + +Authors' Addresses + + Yiqun Cai + Alibaba Group + 520 Almanor Avenue + Sunnyvale, CA 94085 + United States of America + + Email: yiqun.cai@alibaba-inc.com + + + Heidi Ou + Alibaba Group + 520 Almanor Avenue + Sunnyvale, CA 94085 + United States of America + + Email: heidi.ou@alibaba-inc.com + + + Sri Vallepalli + + Email: vallepal@yahoo.com + + + Mankamana Mishra + Cisco Systems, Inc. + 821 Alder Drive, + Milpitas, CA 95035 + United States of America + + Email: mankamis@cisco.com + + + Stig Venaas + Cisco Systems, Inc. + Tasman Drive + San Jose, CA 95134 + United States of America + + Email: stig@cisco.com + + + Andy Green + British Telecom + Adastral Park + Ipswich + IP5 2RE + United Kingdom + + Email: andy.da.green@bt.com |