summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc8333.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc8333.txt')
-rw-r--r--doc/rfc/rfc8333.txt1459
1 files changed, 1459 insertions, 0 deletions
diff --git a/doc/rfc/rfc8333.txt b/doc/rfc/rfc8333.txt
new file mode 100644
index 0000000..bb81dd7
--- /dev/null
+++ b/doc/rfc/rfc8333.txt
@@ -0,0 +1,1459 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) S. Litkowski
+Request for Comments: 8333 B. Decraene
+Category: Standards Track Orange
+ISSN: 2070-1721 C. Filsfils
+ Cisco Systems
+ P. Francois
+ Individual Contributor
+ March 2018
+
+
+ Micro-loop Prevention by Introducing a Local Convergence Delay
+
+Abstract
+
+ This document describes a mechanism for link-state routing protocols
+ that prevents local transient forwarding loops in case of link
+ failure. This mechanism proposes a two-step convergence by
+ introducing a delay between the convergence of the node adjacent to
+ the topology change and the network-wide convergence.
+
+ Because this mechanism delays the IGP convergence, it may only be
+ used for planned maintenance or when Fast Reroute (FRR) protects the
+ traffic during the time between the link failure and the IGP
+ convergence.
+
+ The mechanism is limited to the link-down event in order to keep the
+ mechanism simple.
+
+ Simulations using real network topologies have been performed and
+ show that local loops are a significant portion (>50%) of the total
+ forwarding loops.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ https://www.rfc-editor.org/info/rfc8333.
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 1]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+Copyright Notice
+
+ Copyright (c) 2018 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (https://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 2]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+Table of Contents
+
+ 1. Introduction ....................................................4
+ 2. Terminology .....................................................4
+ 2.1. Acronyms ...................................................4
+ 2.2. Requirements Language ......................................5
+ 3. Side Effects of Transient Forwarding Loops ......................5
+ 3.1. FRR Inefficiency ...........................................5
+ 3.2. Network Congestion .........................................8
+ 4. Overview of the Solution ........................................9
+ 5. Specification ...................................................9
+ 5.1. Definitions ................................................9
+ 5.2. Regular IGP Reaction ......................................10
+ 5.3. Local Events ..............................................10
+ 5.4. Local Delay for Link-Down Events ..........................11
+ 6. Applicability ..................................................11
+ 6.1. Applicable Case: Local Loops ..............................12
+ 6.2. Non-applicable Case: Remote Loops .........................12
+ 7. Simulations ....................................................13
+ 8. Deployment Considerations ......................................14
+ 9. Examples .......................................................15
+ 9.1. Local Link-Down Event .....................................15
+ 9.2. Local and Remote Event ....................................19
+ 9.3. Aborting Local Delay ......................................21
+ 10. Comparison with Other Solutions ...............................23
+ 10.1. PLSN .....................................................23
+ 10.2. oFIB .....................................................24
+ 11. IANA Considerations ...........................................24
+ 12. Security Considerations .......................................24
+ 13. References ....................................................25
+ 13.1. Normative References .....................................25
+ 13.2. Informative References ...................................25
+ Acknowledgements ..................................................26
+ Authors' Addresses ................................................26
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 3]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+1. Introduction
+
+ Micro-loops and some potential solutions are described in [RFC5715].
+ This document describes a simple targeted mechanism that prevents
+ micro-loops that are local to the failure. Based on network
+ analysis, local micro-loops make up a significant portion of the
+ micro-loops. A simple and easily deployable solution for these local
+ micro-loops is critical because these local loops cause some traffic
+ loss after an FRR alternate has been used (see Section 3.1).
+
+ Consider the case in Figure 1 where S does not have an LFA (Loop-Free
+ Alternate) to protect its traffic to D when the S-D link fails. That
+ means that all non-D neighbors of S on the topology will send to S
+ any traffic destined to D; if a neighbor did not, then that neighbor
+ would be loop-free. Regardless of the advanced FRR technique used,
+ when S converges to the new topology, it will send its traffic to a
+ neighbor that is not loop-free and will thus cause a local micro-
+ loop. The deployment of advanced FRR techniques motivates this
+ simple router-local mechanism to solve this targeted problem. This
+ solution can work with the various techniques described in [RFC5715].
+
+ D ------ C
+ | |
+ | | 5
+ | |
+ S ------ B
+
+ Figure 1
+
+ In Figure 1, all links have a metric of 1 except the B-C link, which
+ has a metric of 5. When the S-D link fails, a transient forwarding
+ loop may appear between S and B if S updates its forwarding entry to
+ D before B does.
+
+2. Terminology
+
+2.1. Acronyms
+
+ FIB: Forwarding Information Base
+
+ FRR: Fast Reroute
+
+ IGP: Interior Gateway Protocol
+
+ LFA: Loop-Free Alternate
+
+ LSA: Link State Advertisement
+
+
+
+
+Litkowski, et al. Standards Track [Page 4]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ LSP: Link State Packet
+
+ MRT: Maximally Redundant Tree
+
+ oFIB: Ordered FIB
+
+ PLR: Point of Local Repair
+
+ PLSN: Path Locking via Safe Neighbors
+
+ RIB: Routing Information Base
+
+ RLFA: Remote Loop-Free Alternate
+
+ SPF: Shortest Path First
+
+ TTL: Time to Live
+
+2.2. Requirements Language
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
+ capitals, as shown here.
+
+3. Side Effects of Transient Forwarding Loops
+
+ Even if they are very limited in duration, transient forwarding loops
+ may cause significant network damage.
+
+3.1. FRR Inefficiency
+
+ In Figure 2, we consider an IP/LDP routed network.
+
+ D
+ 1 |
+ | 1
+ A ------ B
+ | | ^
+ 10 | | 5 | T
+ | | |
+ E--------C
+ | 1
+ 1 |
+ S
+
+ Figure 2
+
+
+
+Litkowski, et al. Standards Track [Page 5]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ An RSVP-TE tunnel T, provisioned on C and terminating on B, is used
+ to protect the traffic against C-B link failure (the IGP shortcut
+ feature, defined in [RFC3906], is activated on C). The primary path
+ of T is C->B and FRR is activated on T, providing an FRR bypass or
+ detour using path C->E->A->B. On router C, the next hop to D is the
+ tunnel T, thanks to the IGP shortcut. When the C-B link fails:
+
+ 1. C detects the failure and updates the tunnel path using a
+ preprogrammed FRR path. The traffic path from S to D becomes
+ S->E->C->E->A->B->A->D.
+
+ 2. In parallel, on router C, both the IGP convergence and the TE
+ tunnel convergence (tunnel path recomputation) are occurring:
+
+ * The tunnel T path is recomputed and now uses C->E->A->B.
+
+ * The IGP path to D is recomputed and now uses C->E->A->D.
+
+ 3. On C, the tail-end of the TE tunnel (router B) is no longer on
+ the shortest-path tree (SPT) to D, so C does not continue to
+ encapsulate the traffic to D using the tunnel T and updates its
+ forwarding entry to D using the next-hop E.
+
+ If C updates its forwarding entry to D before router E, there would
+ be a transient forwarding loop between C and E until E has converged.
+
+ Table 1 describes a theoretical sequence of events happening when the
+ B-C link fails. This theoretical sequence of events should only be
+ read as an example.
+
+ +------------+--------+---------------------+-----------------------+
+ | Network | Time | Router C Events | Router E Events |
+ | Condition | | | |
+ +------------+--------+---------------------+-----------------------+
+ | S->D | | | |
+ | Traffic OK | | | |
+ | | | | |
+ | S->D | t0 | Link B-C fails | Link B-C fails |
+ | Traffic | | | |
+ | lost | | | |
+ | | | | |
+ | | t0+20 | C detects the | |
+ | | ms | failure | |
+ | | | | |
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 6]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ | S->D | t0+40 | C activates FRR | |
+ | Traffic OK | ms | | |
+ | | | | |
+ | | t0+50 | C updates its local | |
+ | | ms | LSP/LSA | |
+ | | | | |
+ | | t0+60 | C floods its local | |
+ | | ms | updated LSP/LSA | |
+ | | | | |
+ | | t0+62 | C schedules SPF | |
+ | | ms | (100 ms) | |
+ | | | | |
+ | | t0+87 | | E receives LSP/LSA |
+ | | ms | | from C and floods it |
+ | | | | |
+ | | t0+92 | | E schedules SPF (100 |
+ | | ms | | ms) |
+ | | | | |
+ | | t0+163 | C computes SPF | |
+ | | ms | | |
+ | | | | |
+ | | t0+165 | C starts updating | |
+ | | ms | its RIB/FIB | |
+ | | | | |
+ | | t0+193 | | E computes SPF |
+ | | ms | | |
+ | | | | |
+ | | t0+199 | | E starts updating its |
+ | | ms | | RIB/FIB |
+ | | | | |
+ | S->D | t0+255 | C updates its | |
+ | Traffic | ms | RIB/FIB for D | |
+ | lost | | | |
+ | | | | |
+ | | t0+340 | C convergence ends | |
+ | | ms | | |
+ | | | | |
+ | S->D | t0+443 | | E updates its RIB/FIB |
+ | Traffic OK | ms | | for D |
+ | | | | |
+ | | t0+470 | | E convergence ends |
+ | | ms | | |
+ +------------+--------+---------------------+-----------------------+
+
+ Table 1
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 7]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ The issue described here is completely independent of the FRR
+ mechanism involved (e.g., TE FRR, LFA/RLFA, MRT, etc.) when the
+ primary path uses hop-by-hop routing. The protection enabled by FRR
+ works perfectly but only ensures protection until the PLR has
+ converged (as soon as the PLR has converged, it replaces its FRR path
+ with a new primary path). When implementing FRR, a service provider
+ wants to guarantee a very limited loss of connectivity time. The
+ example described in this section shows that the benefit of FRR may
+ be completely lost due to a transient forwarding loop appearing when
+ PLR has converged. Delaying FIB updates after the IGP convergence
+ (1) may allow the FRR path to be kept until the neighbors have
+ converged and (2) preserves the customer traffic.
+
+3.2. Network Congestion
+
+ In Figure 3, when the S-D link fails, a transient forwarding loop may
+ appear between S and B for destination D. The traffic on the S-B
+ link will constantly increase due to the looping traffic to D.
+ Depending on the TTL of the packets, the traffic rate destined to D,
+ and the bandwidth of the link, the S-B link may become congested in a
+ few hundreds of milliseconds and will stay congested until the loop
+ is eliminated.
+
+ 1
+ D ------ C
+ | |
+ 1 | | 5
+ | |
+ A -- S ------ B
+ / | 1
+ F E
+
+ Figure 3
+
+ The congestion introduced by transient forwarding loops is
+ problematic as it can affect traffic that is not directly affected by
+ the failing network component. In Figure 3, the congestion of the
+ S-B link will impact some customer traffic that is not directly
+ affected by the failure, e.g., traffic from A to B, F to B, and E to
+ B. Class of service may mitigate the congestion for some traffic.
+ However, some traffic not directly affected by the failure will still
+ be dropped as a router is not able to distinguish the looping traffic
+ from the normally forwarded traffic.
+
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 8]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+4. Overview of the Solution
+
+ This document defines a two-step convergence initiated by the router
+ detecting a failure and advertising the topological change in the
+ IGP. This introduces a delay between network-wide convergence and
+ the convergence of the local router.
+
+ The solution described in this document is limited to local link-down
+ events in order to keep the solution simple.
+
+ This ordered convergence is similar to the ordered FIB (oFIB)
+ approach defined in [RFC6976], but it is limited to only a "one-hop"
+ distance. As a consequence, it is more simple and becomes a local-
+ only feature that does not require interoperability. This benefit
+ comes with the limitation of eliminating transient forwarding loops
+ involving the local router only. The mechanism also reuses some
+ concepts described in [PLSN].
+
+5. Specification
+
+5.1. Definitions
+
+ This document refers to the following existing IGP timers. These
+ timers may be standardized or implemented as a vendor-specific local
+ feature.
+
+ o LSP_GEN_TIMER: The delay between the consecutive generation of two
+ local LSPs/LSAs. From an operational point of view, this delay is
+ usually tuned to batch multiple local events in a single local
+ LSP/LSA update. In IS-IS, this timer is defined as
+ minimumLSPGenerationInterval [ISO10589]. In OSPF version 2, this
+ timer is defined as MinLSInterval [RFC2328]. It is often
+ associated with a vendor-specific damping mechanism to slow down
+ reactions by incrementing the timer when multiple consecutive
+ events are detected.
+
+ o SPF_DELAY: The delay between the first IGP event triggering a new
+ routing table computation and the start of that routing table
+ computation. It is often associated with a damping mechanism to
+ slow down reactions by incrementing the timer when the IGP becomes
+ unstable. As an example, [BACKOFF] defines a standard SPF delay
+ algorithm.
+
+
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 9]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ This document introduces the following new timer:
+
+ o ULOOP_DELAY_DOWN_TIMER: Used to slow down the local node
+ convergence in case of link-down events.
+
+5.2. Regular IGP Reaction
+
+ When the status of an adjacency or link changes, the regular IGP
+ convergence behavior of the router advertising the event involves the
+ following main steps:
+
+ 1. IGP is notified of the up/down event.
+
+ 2. The IGP processes the notification and postpones the reaction for
+ LSP_GEN_TIMER ms.
+
+ 3. Upon LSP_GEN_TIMER expiration, the IGP updates its LSP/LSA and
+ floods it.
+
+ 4. The SPF computation is scheduled in SPF_DELAY ms.
+
+ 5. Upon SPF_DELAY timer expiration, the SPF is computed, and then
+ the RIB and FIB are updated.
+
+5.3. Local Events
+
+ The mechanism described in this document assumes that there has been
+ a single link failure as seen by the IGP area/level. If this
+ assumption is violated (e.g., multiple links or nodes failed), then
+ regular IP convergence must be applied (as described in Section 5.2).
+
+ To determine if the mechanism is applicable or not, an implementation
+ SHOULD implement logic to correlate the protocol messages (LSP/LSA)
+ received during the SPF scheduling period in order to determine the
+ topology changes that occurred. This is necessary as multiple
+ protocol messages may describe the same topology change, and a single
+ protocol message may describe multiple topology changes. As a
+ consequence, determining a particular topology change MUST be
+ independent of the order of reception of those protocol messages.
+ How the logic works is left to the implementation.
+
+ Using this logic, if an implementation determines that the associated
+ topology change is a single local link failure, then the router MAY
+ use the mechanism described in this document; otherwise, the regular
+ IP convergence MUST be used.
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 10]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ In Figure 4, let router B be the computing router when the link B-C
+ fails. B updates its local LSP/LSA describing the link B-C as down,
+ C does the same, and both start flooding their updated LSPs/LSAs.
+ During the SPF_DELAY period, B and C learn all the LSPs/LSAs to
+ consider. B sees that C is flooding an advertisement that indicates
+ that a link is down, and B is the other end of that link. B
+ determines that B and C are describing the same single event. Since
+ B receives no other changes, B can determine that this is a local
+ link failure and may decide to activate the mechanism described in
+ this document.
+
+ +--- E ----+--------+
+ | | |
+ A ---- B -------- C ------ D
+
+ Figure 4
+
+5.4. Local Delay for Link-Down Events
+
+ This document introduces a change in step 5 (see list in Section 5.2)
+ so that, upon an adjacency or link-down event, the local convergence
+ is delayed compared to the network-wide convergence. The new step 5
+ is described below:
+
+ 5. Upon SPF_DELAY timer expiration, the SPF is computed. If the
+ condition of a single local link-down event has been met, then an
+ update of the RIB and the FIB MUST be delayed for
+ ULOOP_DELAY_DOWN_TIMER ms. Otherwise, the RIB and FIB SHOULD be
+ updated immediately.
+
+ If a new convergence occurs while ULOOP_DELAY_DOWN_TIMER is running,
+ ULOOP_DELAY_DOWN_TIMER is stopped, and the RIB/FIB SHOULD be updated
+ as part of the new convergence event.
+
+ As a result of this addition, routers local to the failure will
+ converge slower than remote routers. Hence, it SHOULD only be done
+ for a non-urgent convergence, such as administrative deactivation
+ (maintenance) or when the traffic is protected by FRR.
+
+6. Applicability
+
+ As previously stated, this mechanism only avoids the forwarding loops
+ on the links between the node local to the failure and its neighbors.
+ Forwarding loops may still occur on other links.
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 11]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+6.1. Applicable Case: Local Loops
+
+ In Figure 5, let us consider the traffic from G to F. The primary
+ path is G->D->C->E->F. When the link C-E fails, if C updates its
+ forwarding entry for F before D, a transient loop occurs. This is
+ sub-optimal as it breaks C's FRR forwarding even though upstream
+ routers are still forwarding the traffic to C.
+
+ A ------ B ----- E
+ | / |
+ | / |
+ G---D------------C F
+
+ All the links have a metric of 1
+
+ Figure 5
+
+ By implementing the mechanism defined in this document on C, when the
+ C-E link fails, C delays the update of its forwarding entry to F, in
+ order to allow some time for D to converge. FRR on C keeps
+ protecting the traffic during this period. When
+ ULOOP_DELAY_DOWN_TIMER expires on C, its forwarding entry to F is
+ updated. There is no transient forwarding loop on the link C-D.
+
+6.2. Non-applicable Case: Remote Loops
+
+ In Figure 6, let us consider the traffic from G to K. The primary
+ path is G->D->C->F->J->K. When the C-F link fails, if C updates its
+ forwarding entry to K before D, a transient loop occurs between C and
+ D.
+
+ A ------ B ----- E --- H
+ | |
+ | |
+ G---D--------C ------F --- J ---- K
+
+ All the links have a metric of 1 except B-E=15
+
+ Figure 6
+
+ By implementing the mechanism defined in this document on C, when the
+ link C-F fails, C delays the update of its forwarding entry to K,
+ allowing time for D to converge. When ULOOP_DELAY_DOWN_TIMER expires
+ on C, its forwarding entry to F is updated. There is no transient
+ forwarding loop between C and D. However, a transient forwarding
+ loop may still occur between D and A. In this scenario, this
+ mechanism is not enough to address all the possible forwarding loops.
+ However, it does not create additional traffic loss. Besides, in
+
+
+
+Litkowski, et al. Standards Track [Page 12]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ some cases -- such as when the nodes update their FIB in the order C,
+ A, D because the router A is quicker than D to converge -- the
+ mechanism may still avoid the forwarding loop that would have
+ otherwise occurred.
+
+7. Simulations
+
+ Simulations have been run on multiple service-provider topologies.
+ We evaluated the efficiency of the mechanism on eight different
+ service-provider topologies (different network size and design).
+ Table 2 displays the gain for each topology.
+
+ +----------+------+
+ | Topology | Gain |
+ +----------+------+
+ | T1 | 71% |
+ | T2 | 81% |
+ | T3 | 62% |
+ | T4 | 50% |
+ | T5 | 70% |
+ | T6 | 70% |
+ | T7 | 59% |
+ | T8 | 77% |
+ +----------+------+
+
+ Table 2
+
+ We evaluated the gain as follows:
+
+ o We considered a tuple (link A-B, destination D, PLR S, backup
+ next-hop N) as a loop if, upon link A-B failure, the flow from a
+ router S upstream from A (A could be considered as PLR also) to D
+ may loop due to convergence time difference between S and one of
+ its neighbors N.
+
+ o We evaluated the number of potential loop tuples in normal
+ conditions.
+
+ o We evaluated the number of potential loop tuples using the same
+ topological input but taking into account that S converges after
+ N.
+
+ o The gain is the relative number of loops (both remote and local)
+ we succeed in suppressing.
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 13]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ For topology 1, implementing the local delay prevented 71% of the
+ transient forwarding loops created by the failure of any link. The
+ analysis shows that all local loops are prevented and only remote
+ loops remain.
+
+8. Deployment Considerations
+
+ Transient forwarding loops have the following drawbacks:
+
+ o They limit FRR efficiency. Even if FRR is activated within 50 ms,
+ as soon as the PLR has converged, the traffic may be affected by a
+ transient loop.
+
+ o They may impact traffic not directly affected by the failure (due
+ to link congestion).
+
+ The local delay mechanism is a transient forwarding loop avoidance
+ mechanism (like oFIB). Even if it only addresses local transient
+ loops, the efficiency versus complexity comparison of the mechanism
+ makes it a good solution. It is also incrementally deployable with
+ incremental benefits, which makes it an attractive option for both
+ vendors to implement and service providers to deploy. Delaying the
+ convergence time is not an issue if we consider that the traffic is
+ protected during the convergence.
+
+ The ULOOP_DELAY_DOWN_TIMER value should be set according to the
+ maximum IGP convergence time observed in the network (usually
+ observed in the slowest node).
+
+ This mechanism is limited to link-down events. When a link goes
+ down, it eventually goes back up. As a consequence, with this
+ mechanism deployed, only the link-down event will be protected
+ against transient forwarding loops while the link-up event will not.
+ If the operator wants to limit the impact of transient forwarding
+ loops during the link-up event, it should make sure to use specific
+ procedures to bring the link back online. As examples, the operator
+ can decide to put the link back online outside of business hours, or
+ it can use some incremental metric changes to prevent loops (as
+ proposed in [RFC5715]).
+
+
+
+
+
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 14]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+9. Examples
+
+ We consider the following figure for the examples in this section:
+
+ D
+ 1 | F----X
+ | 1 |
+ A ------ B
+ | |
+ 10 | | 5
+ | |
+ E--------C
+ | 1
+ 1 |
+ S
+
+ Figure 7
+
+ The network above is considered to have a convergence time of about 1
+ second, so ULOOP_DELAY_DOWN_TIMER will be adjusted to this value. We
+ also consider that FRR is running on each node.
+
+9.1. Local Link-Down Event
+
+ Table 3 describes the events and their timing on routers C and E when
+ the link B-C goes down. It is based on a theoretical sequence of
+ events that should only been read as an example. As C detects a
+ single local event corresponding to a link-down event (its LSP + LSP
+ from B received), it applies the local delay down behavior, and no
+ micro-loop is formed.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 15]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ +------------+---------+---------------------+----------------------+
+ | Network | Time | Router C Events | Router E Events |
+ | Condition | | | |
+ +------------+---------+---------------------+----------------------+
+ | S->D | | | |
+ | Traffic OK | | | |
+ | | | | |
+ | S->D | t0 | Link B-C fails | Link B-C fails |
+ | Traffic | | | |
+ | lost | | | |
+ | | | | |
+ | | t0+20 | C detects the | |
+ | | ms | failure | |
+ | | | | |
+ | S->D | t0+40 | C activates FRR | |
+ | Traffic OK | ms | | |
+ | | | | |
+ | | t0+50 | C updates its local | |
+ | | ms | LSP/LSA | |
+ | | | | |
+ | | t0+53 | C floods its local | |
+ | | ms | updated LSP/LSA | |
+ | | | | |
+ | | t0+60 | C schedules SPF | |
+ | | ms | (100 ms) | |
+ | | | | |
+ | | t0+67 | C receives LSP/LSA | |
+ | | ms | from B and floods | |
+ | | | it | |
+ | | | | |
+ | | t0+87 | | E receives LSP/LSA |
+ | | ms | | from C and floods it |
+ | | | | |
+ | | t0+90 | | E schedules SPF (100 |
+ | | ms | | ms) |
+ | | | | |
+ | | t0+161 | C computes SPF | |
+ | | ms | | |
+ | | | | |
+ | | t0+165 | C delays its | |
+ | | ms | RIB/FIB update (1 | |
+ | | | sec) | |
+ | | | | |
+ | | t0+193 | | E computes SPF |
+ | | ms | | |
+ | | | | |
+ | | t0+199 | | E starts updating |
+ | | ms | | its RIB/FIB |
+
+
+
+Litkowski, et al. Standards Track [Page 16]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ | | | | |
+ | | t0+443 | | E updates its |
+ | | ms | | RIB/FIB for D |
+ | | | | |
+ | | t0+470 | | E convergence ends |
+ | | ms | | |
+ | | | | |
+ | | t0+1165 | C starts updating | |
+ | | ms | its RIB/FIB | |
+ | | | | |
+ | | t0+1255 | C updates its | |
+ | | ms | RIB/FIB for D | |
+ | | | | |
+ | | t0+1340 | C convergence ends | |
+ | | ms | | |
+ +------------+---------+---------------------+----------------------+
+
+ Table 3
+
+ Similarly, upon B-C link-down event, if LSP/LSA from B is received
+ before C detects the link failure, C will apply the route update
+ delay if the local detection is part of the same SPF run. Table 4
+ describes the associated theoretical sequence of events. It should
+ only been read as an example.
+
+ +------------+---------+---------------------+----------------------+
+ | Network | Time | Router C Events | Router E Events |
+ | Condition | | | |
+ +------------+---------+---------------------+----------------------+
+ | S->D | | | |
+ | Traffic OK | | | |
+ | | | | |
+ | S->D | t0 | Link B-C fails | Link B-C fails |
+ | Traffic | | | |
+ | lost | | | |
+ | | | | |
+ | | t0+32 | C receives LSP/LSA | |
+ | | ms | from B and floods | |
+ | | | it | |
+ | | | | |
+ | | t0+33 | C schedules SPF | |
+ | | ms | (100 ms) | |
+ | | | | |
+ | | t0+50 | C detects the | |
+ | | ms | failure | |
+ | | | | |
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 17]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ | S->D | t0+55 | C activates FRR | |
+ | Traffic OK | ms | | |
+ | | | | |
+ | | t0+55 | C updates its local | |
+ | | ms | LSP/LSA | |
+ | | | | |
+ | | t0+70 | C floods its local | |
+ | | ms | updated LSP/LSA | |
+ | | | | |
+ | | t0+87 | | E receives LSP/LSA |
+ | | ms | | from C and floods it |
+ | | | | |
+ | | t0+90 | | E schedules SPF (100 |
+ | | ms | | ms) |
+ | | | | |
+ | | t0+135 | C computes SPF | |
+ | | ms | | |
+ | | | | |
+ | | t0+140 | C delays its | |
+ | | ms | RIB/FIB update (1 | |
+ | | | sec) | |
+ | | | | |
+ | | t0+193 | | E computes SPF |
+ | | ms | | |
+ | | | | |
+ | | t0+199 | | E starts updating |
+ | | ms | | its RIB/FIB |
+ | | | | |
+ | | t0+443 | | E updates its |
+ | | ms | | RIB/FIB for D |
+ | | | | |
+ | | t0+470 | | E convergence ends |
+ | | ms | | |
+ | | | | |
+ | | t0+1145 | C starts updating | |
+ | | ms | its RIB/FIB | |
+ | | | | |
+ | | t0+1255 | C updates its | |
+ | | ms | RIB/FIB for D | |
+ | | | | |
+ | | t0+1340 | C convergence ends | |
+ | | ms | | |
+ +------------+---------+---------------------+----------------------+
+
+ Table 4
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 18]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+9.2. Local and Remote Event
+
+ Table 5 describes the events and their timing on router C and E when
+ the link B-C goes down and when the link F-X fails in the same time
+ window. C will not apply the local delay because a non-local
+ topology change is also received. Table 5 is based on a theoretical
+ sequence of events that should only been read as an example.
+
+ +-----------+--------+-------------------+--------------------------+
+ | Network | Time | Router C Events | Router E Events |
+ | Condition | | | |
+ +-----------+--------+-------------------+--------------------------+
+ | S->D | | | |
+ | Traffic | | | |
+ | OK | | | |
+ | | | | |
+ | S->D | t0 | Link B-C fails | Link B-C fails |
+ | Traffic | | | |
+ | lost | | | |
+ | | | | |
+ | | t0+20 | C detects the | |
+ | | ms | failure | |
+ | | | | |
+ | | t0+36 | Link F-X fails | Link F-X fails |
+ | | ms | | |
+ | | | | |
+ | S->D | t0+40 | C activates FRR | |
+ | Traffic | ms | | |
+ | OK | | | |
+ | | | | |
+ | | t0+50 | C updates its | |
+ | | ms | local LSP/LSA | |
+ | | | | |
+ | | t0+54 | C receives | |
+ | | ms | LSP/LSA from F | |
+ | | | and floods it | |
+ | | | | |
+ | | t0+60 | C schedules SPF | |
+ | | ms | (100 ms) | |
+ | | | | |
+ | | t0+67 | C receives | |
+ | | ms | LSP/LSA from B | |
+ | | | and floods it | |
+ | | | | |
+ | | t0+69 | | E receives LSP/LSA from |
+ | | ms | | F, floods it and |
+ | | | | schedules SPF (100 ms) |
+ | | | | |
+
+
+
+Litkowski, et al. Standards Track [Page 19]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ | | t0+70 | C floods its | |
+ | | ms | local updated | |
+ | | | LSP/LSA | |
+ | | | | |
+ | | t0+87 | | E receives LSP/LSA from |
+ | | ms | | C |
+ | | | | |
+ | | t0+117 | | E floods LSP/LSA from C |
+ | | ms | | |
+ | | | | |
+ | | t0+160 | C computes SPF | |
+ | | ms | | |
+ | | | | |
+ | | t0+165 | C starts updating | |
+ | | ms | its RIB/FIB (NO | |
+ | | | DELAY) | |
+ | | | | |
+ | | t0+170 | | E computes SPF |
+ | | ms | | |
+ | | | | |
+ | | t0+173 | | E starts updating its |
+ | | ms | | RIB/FIB |
+ | | | | |
+ | S->D | t0+365 | C updates its | |
+ | Traffic | ms | RIB/FIB for D | |
+ | lost | | | |
+ | | | | |
+ | S->D | t0+443 | | E updates its RIB/FIB |
+ | Traffic | ms | | for D |
+ | OK | | | |
+ | | | | |
+ | | t0+450 | C convergence | |
+ | | ms | ends | |
+ | | | | |
+ | | t0+470 | | E convergence ends |
+ | | ms | | |
+ | | | | |
+ +-----------+--------+-------------------+--------------------------+
+
+ Table 5
+
+
+
+
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 20]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+9.3. Aborting Local Delay
+
+ Table 6 describes the events and their timing on routers C and E when
+ the link B-C goes down. In addition, we consider what happens when
+ the F-X link fails during local delay of the FIB update. C will
+ first apply the local delay, but when the new event happens, it will
+ fall back to the standard convergence mechanism without further
+ delaying route insertion. In this example, we consider a
+ ULOOP_DELAY_DOWN_TIMER configured to 2 seconds. Table 6 is based on
+ a theoretical sequence of events that should only been read as an
+ example.
+
+ +------------+--------+----------------------+----------------------+
+ | Network | Time | Router C Events | Router E Events |
+ | Condition | | | |
+ +------------+--------+----------------------+----------------------+
+ | S->D | | | |
+ | Traffic OK | | | |
+ | | | | |
+ | S->D | t0 | Link B-C fails | Link B-C fails |
+ | Traffic | | | |
+ | lost | | | |
+ | | | | |
+ | | t0+20 | C detects the | |
+ | | ms | failure | |
+ | | | | |
+ | S->D | t0+40 | C activates FRR | |
+ | Traffic OK | ms | | |
+ | | | | |
+ | | t0+50 | C updates its local | |
+ | | ms | LSP/LSA | |
+ | | | | |
+ | | t0+55 | C floods its local | |
+ | | ms | updated LSP/LSA | |
+ | | | | |
+ | | t0+57 | C schedules SPF (100 | |
+ | | ms | ms) | |
+ | | | | |
+ | | t0+67 | C receives LSP/LSA | |
+ | | ms | from B and floods it | |
+ | | | | |
+ | | t0+87 | | E receives LSP/LSA |
+ | | ms | | from C and floods it |
+ | | | | |
+ | | t0+90 | | E schedules SPF (100 |
+ | | ms | | ms) |
+ | | | | |
+
+
+
+
+Litkowski, et al. Standards Track [Page 21]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ | | t0+160 | C computes SPF | |
+ | | ms | | |
+ | | | | |
+ | | t0+165 | C delays its RIB/FIB | |
+ | | ms | update (2 sec) | |
+ | | | | |
+ | | t0+193 | | E computes SPF |
+ | | ms | | |
+ | | | | |
+ | | t0+199 | | E starts updating |
+ | | ms | | its RIB/FIB |
+ | | | | |
+ | | t0+254 | Link F-X fails | Link F-X fails |
+ | | ms | | |
+ | | | | |
+ | | t0+300 | C receives LSP/LSA | |
+ | | ms | from F and floods it | |
+ | | | | |
+ | | t0+303 | C schedules SPF (200 | |
+ | | ms | ms) | |
+ | | | | |
+ | | t0+312 | E receives LSP/LSA | |
+ | | ms | from F and floods it | |
+ | | | | |
+ | | t0+313 | E schedules SPF (200 | |
+ | | ms | ms) | |
+ | | | | |
+ | | t0+502 | C computes SPF | |
+ | | ms | | |
+ | | | | |
+ | | t0+505 | C starts updating | |
+ | | ms | its RIB/FIB (NO | |
+ | | | DELAY) | |
+ | | | | |
+ | | t0+514 | | E computes SPF |
+ | | ms | | |
+ | | | | |
+ | | t0+519 | | E starts updating |
+ | | ms | | its RIB/FIB |
+ | | | | |
+ | S->D | t0+659 | C updates its | |
+ | Traffic | ms | RIB/FIB for D | |
+ | lost | | | |
+ | | | | |
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 22]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ | S->D | t0+778 | | E updates its |
+ | Traffic OK | ms | | RIB/FIB for D |
+ | | | | |
+ | | t0+781 | C convergence ends | |
+ | | ms | | |
+ | | | | |
+ | | t0+810 | | E convergence ends |
+ | | ms | | |
+ +------------+--------+----------------------+----------------------+
+
+ Table 6
+
+10. Comparison with Other Solutions
+
+ As stated in Section 4, the local delay solution reuses some concepts
+ already introduced by other IETF proposals but tries to find a trade-
+ off between efficiency and simplicity. This section tries to compare
+ behaviors of the solutions.
+
+10.1. PLSN
+
+ PLSN [PLSN] describes a mechanism where each node in the network
+ tries to avoid transient forwarding loops upon a topology change by
+ always keeping traffic on a loop-free path for a defined duration
+ (locked path to a safe neighbor). The locked path may be the new
+ primary next hop, another neighbor, or the old primary next hop
+ depending on how the safety condition is satisfied.
+
+ PLSN does not solve all transient forwarding loops (see Section 4 of
+ [PLSN] for more details).
+
+ The solution defined in this document reuses some concepts of PLSN
+ but in a more simple fashion:
+
+ o PLSN has three different behaviors: (1) keep using the old next
+ hop, (2) use the new primary next hop if it is safe, or (3) use
+ another safe next hop. The local delay solution, however, only
+ has one: keep using the current next hop (i.e., the old primary
+ next hop or an already-activated FRR path).
+
+ o PLSN may cause some damage while using a safe next hop that is not
+ the new primary next hop if the new safe next hop does not provide
+ enough bandwidth (see [RFC7916]). The solution defined in this
+ document may not experience this issue as the service provider may
+ have control on the FRR path being used, preventing network
+ congestion.
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 23]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ o PLSN applies to all nodes in a network (remote or local changes),
+ while the mechanism defined in this document applies only to the
+ nodes connected to the topology change.
+
+10.2. oFIB
+
+ oFIB [RFC6976] describes a mechanism where the convergence of the
+ network upon a topology change is ordered in order to prevent
+ transient forwarding loops. Each router in the network deduces the
+ failure type from the LSA/LSP received and computes/applies a
+ specific FIB update timer based on the failure type and its rank in
+ the network, considering the failure point as root.
+
+ The oFIB mechanism solves all the transient forwarding loops in a
+ network at the price of introducing complexity in the convergence
+ process that may require careful monitoring by the service provider.
+
+ The solution defined in this document reuses the oFIB concept but
+ limits it to the first hop that experiences the topology change. As
+ demonstrated, the mechanism defined in this document allows all the
+ local transient forwarding loops to be solved; these represent a high
+ percentage of all the loops. Moreover, limiting to one hop allows
+ network-wide convergence behavior to be kept.
+
+11. IANA Considerations
+
+ This document has no IANA actions.
+
+12. Security Considerations
+
+ This document does not introduce any change in terms of IGP security.
+ The operation is internal to the router. The local delay does not
+ increase the number of attack vectors as an attacker could only
+ trigger this mechanism if it already has the ability to disable or
+ enable an IGP link. The local delay does not increase the negative
+ consequences. If an attacker has the ability to disable or enable an
+ IGP link, it can already harm the network by creating instability and
+ harm the traffic by creating forwarding packet loss and forwarding
+ loss for the traffic crossing that link.
+
+
+
+
+
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 24]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+13. References
+
+13.1. Normative References
+
+ [ISO10589] International Organization for Standardization,
+ "Information technology -- Telecommunications and
+ information exchange between systems -- Intermediate
+ System to Intermediate System intra-domain routeing
+ information exchange protocol for use in conjunction with
+ the protocol for providing the connectionless-mode network
+ service (ISO 8473)", ISO/IEC 10589:2002, Second Edition,
+ November 2002.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <https://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328,
+ DOI 10.17487/RFC2328, April 1998,
+ <https://www.rfc-editor.org/info/rfc2328>.
+
+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+ 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+ May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
+13.2. Informative References
+
+ [BACKOFF] Decraene, B., Litkowski, S., Gredler, H., Lindem, A.,
+ Francois, P., and C. Bowers, "SPF Back-off Delay algorithm
+ for link state IGPs", Work in Progress, draft-ietf-rtgwg-
+ backoff-algo-10, March 2018.
+
+ [PLSN] Zinin, A., "Analysis and Minimization of Microloops in
+ Link-state Routing Protocols", Work in Progress,
+ draft-ietf-rtgwg-microloop-analysis-01, October 2005.
+
+ [RFC3906] Shen, N. and H. Smit, "Calculating Interior Gateway
+ Protocol (IGP) Routes Over Traffic Engineering Tunnels",
+ RFC 3906, DOI 10.17487/RFC3906, October 2004,
+ <https://www.rfc-editor.org/info/rfc3906>.
+
+ [RFC5715] Shand, M. and S. Bryant, "A Framework for Loop-Free
+ Convergence", RFC 5715, DOI 10.17487/RFC5715, January
+ 2010, <https://www.rfc-editor.org/info/rfc5715>.
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 25]
+
+RFC 8333 Micro-loop Prevention by Local Delay March 2018
+
+
+ [RFC6976] Shand, M., Bryant, S., Previdi, S., Filsfils, C.,
+ Francois, P., and O. Bonaventure, "Framework for Loop-Free
+ Convergence Using the Ordered Forwarding Information Base
+ (oFIB) Approach", RFC 6976, DOI 10.17487/RFC6976, July
+ 2013, <https://www.rfc-editor.org/info/rfc6976>.
+
+ [RFC7916] Litkowski, S., Ed., Decraene, B., Filsfils, C., Raza, K.,
+ Horneffer, M., and P. Sarkar, "Operational Management of
+ Loop-Free Alternates", RFC 7916, DOI 10.17487/RFC7916,
+ July 2016, <https://www.rfc-editor.org/info/rfc7916>.
+
+Acknowledgements
+
+ We would like to thank the authors of [RFC6976] for introducing the
+ concept of ordered convergence: Mike Shand, Stewart Bryant, Stefano
+ Previdi, and Olivier Bonaventure.
+
+Authors' Addresses
+
+ Stephane Litkowski
+ Orange
+
+ Email: stephane.litkowski@orange.com
+
+
+ Bruno Decraene
+ Orange
+
+ Email: bruno.decraene@orange.com
+
+
+ Clarence Filsfils
+ Cisco Systems
+
+ Email: cfilsfil@cisco.com
+
+
+ Pierre Francois
+ Individual Contributor
+
+ Email: pfrpfr@gmail.com
+
+
+
+
+
+
+
+
+
+
+Litkowski, et al. Standards Track [Page 26]
+