diff options
Diffstat (limited to 'doc/rfc/rfc5715.txt')
-rw-r--r-- | doc/rfc/rfc5715.txt | 1235 |
1 files changed, 1235 insertions, 0 deletions
diff --git a/doc/rfc/rfc5715.txt b/doc/rfc/rfc5715.txt new file mode 100644 index 0000000..f3b440b --- /dev/null +++ b/doc/rfc/rfc5715.txt @@ -0,0 +1,1235 @@ + + + + + + +Internet Engineering Task Force (IETF) M. Shand +Request for Comments: 5715 S. Bryant +Category: Informational Cisco Systems +ISSN: 2070-1721 January 2010 + + + A Framework for Loop-Free Convergence + +Abstract + + A micro-loop is a packet forwarding loop that may occur transiently + among two or more routers in a hop-by-hop packet forwarding paradigm. + + This framework provides a summary of the causes and consequences of + micro-loops and enables the reader to form a judgement on whether + micro-looping is an issue that needs to be addressed in specific + networks. It also provides a survey of the currently proposed + mechanisms that may be used to prevent or to suppress the formation + of micro-loops when an IP or MPLS network undergoes topology change + due to failure, repair, or management action. When sufficiently fast + convergence is not available and the topology is susceptible to + micro-loops, use of one or more of these mechanisms may be desirable. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Not all documents + approved by the IESG are a candidate for any level of Internet + Standard; see Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc5715. + + + + + + + + + + + + + +Shand & Bryant Informational [Page 1] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + +Copyright Notice + + Copyright (c) 2010 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction ....................................................3 + 2. The Nature of Micro-Loops .......................................4 + 3. Applicability ...................................................5 + 4. Micro-Loop Control Strategies ...................................6 + 5. Loop Mitigation .................................................8 + 5.1. Fast Convergence ...........................................8 + 5.2. PLSN .......................................................8 + 6. Micro-Loop Prevention ..........................................10 + 6.1. Incremental Cost Advertisement ............................10 + 6.2. Nearside Tunneling ........................................12 + 6.3. Farside Tunnels ...........................................13 + 6.4. Distributed Tunnels .......................................14 + 6.5. Packet Marking ............................................14 + 6.6. MPLS New Labels ...........................................15 + 6.7. Ordered FIB Update ........................................16 + 6.8. Synchronised FIB Update ...................................18 + 7. Using PLSN in Conjunction with Other Methods ...................18 + 8. Loop Suppression ...............................................19 + 9. Compatibility Issues ...........................................20 + 10. Comparison of Loop-Free Convergence Methods ...................20 + 11. Security Considerations .......................................21 + 12. Acknowledgments ...............................................21 + 13. Informative References ........................................21 + + + + + + + + + + + +Shand & Bryant Informational [Page 2] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + +1. Introduction + + When there is a change to the network topology (due to the failure or + restoration of a link or router, or as a result of management + action), the routers need to converge on a common view of the new + topology and the paths to be used for forwarding traffic to each + destination. During this process, referred to as a routing + transition, packet delivery between certain source/destination pairs + may be disrupted. This occurs due to the time it takes for the + topology change to be propagated around the network together with the + time it takes each individual router to determine and then update the + forwarding information base (FIB) for the affected destinations. + During this transition, packets may be lost due to the continuing + attempts to use the failed component and due to forwarding loops. + Forwarding loops arise due to the inconsistent FIBs that occur as a + result of the difference in time taken by routers to execute the + transition process. This is a problem that may occur in both IP + networks and MPLS networks that use the label distribution protocol + (LDP) [RFC5036] as the label switched path (LSP) signaling protocol. + + The service failures caused by routing transitions are largely hidden + by higher-level protocols that retransmit the lost data. However, + new Internet services could emerge that are more sensitive to the + packet disruption that occurs during a transition. To make the + transition transparent to their users, these services would require a + short routing transition. Ideally, routing transitions would be + completed in zero time with no packet loss. + + Regardless of how optimally the mechanisms involved have been + designed and implemented, it is inevitable that a routing transition + will take some minimum interval that is greater than zero. This has + led to the development of a traffic engineering (TE) fast-reroute + mechanism for MPLS [RFC4090]. Alternative mechanisms that might be + deployed in an MPLS network or an IP network are current work items + in the IETF [RFC5714]. The repair mechanism may, however, be + disrupted by the formation of micro-loops during the period between + the time when the failure is announced and the time when all FIBs + have been updated to reflect the new topology. + + One method of mitigating the effects of micro-loops is to ensure that + the network reconverges in a sufficiently short time that these + effects are inconsequential. Another method is to design the network + topology to minimise or even eliminate the possibility of micro- + loops. + + The propensity to form micro-loops is highly topology dependent, and + algorithms are available to identify which links in a network are + subject to micro-looping. In topologies that are critically + + + +Shand & Bryant Informational [Page 3] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + susceptible to the formation of micro-loops, there is little point in + introducing new mechanisms to provide fast reroute without also + deploying mechanisms that prevent the disruptive effects of micro- + loops. Unless micro-loop prevention is used in these topologies, + packets may not reach the repair and micro-looping packets may cause + congestion, resulting in further packet loss. + + The disruptive effect of micro-loops is not confined to periods when + there is a component failure. Micro-loops can, for example, form + when a component is put back into service following repair. Micro- + loops can also form as a result of a network-maintenance action such + as adding a new network component, removing a network component, or + modifying a link cost. + + This framework provides a summary of the causes and consequences of + micro-loops and enables the reader to form a judgement on whether + micro-looping is an issue that needs to be addressed in specific + networks. It also provides a survey of the currently proposed micro- + loop mitigation mechanisms. When sufficiently fast convergence is + not available and the topology is susceptible to micro-loops, use of + one or more of these mechanisms may be desirable. + +2. The Nature of Micro-Loops + + A micro-loop is a packet forwarding loop that may occur transiently + among two or more routers in a hop-by-hop, packet forwarding + paradigm. + + Micro-loops may form during the periods when a network is re- + converging following ANY topology change and are caused by + inconsistent FIBs in the routers. During the transition, micro-loops + may occur over a single link between a pair of routers that + temporarily use each other as the next hop for a prefix. Micro-loops + may also form when each router in a cycle of three or more routers + has the next router in the cycle as a next hop for a given prefix. + + Cyclic loops may occur if one or more of the following conditions are + met: + + 1. Asymmetric link costs. + + 2. An equal-cost path exists between a pair of routers, each of + which makes a different decision regarding which path to use for + forwarding to a particular destination. Note that even routers + that do not implement equal-cost, multi-path (ECMP) forwarding + must make a choice between the available equal-cost paths, and + unless they make the same choice, the condition for cyclic loops + will be fulfilled. + + + +Shand & Bryant Informational [Page 4] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + 3. Topology changes affecting multiple links, including single node + and line card failures. + + Micro-loops have two undesirable side effects: congestion and repair + starvation. + + o A looping packet consumes bandwidth until it either escapes as a + result of the re-synchronization of the FIBs or its time to live + (TTL) expires. This transiently increases the traffic over a link + by as much as 128 times, and may cause the link to become + congested. This congestion reduces the bandwidth available to + other traffic (which is not otherwise affected by the topology + change). As a result, the "innocent" traffic using the link + experiences increased latency and is liable to congestive packet + loss. + + o In cases where the link or node failure has been protected by a + fast-reroute repair, an inconsistency in the FIBs may prevent some + traffic from reaching the failure, and hence being repaired. The + repair may thus become starved of traffic and thereby rendered + ineffective. + + Although micro-loops are usually considered in the context of a + failure, similar problems of congestive packet loss and starvation + may also occur if the topology change is the result of management + action. For example, consider the case where a link is to be taken + out of service by management action. The link can be retained in + service throughout the transition, thus avoiding the need for any + repair. However, if micro-loops form, they may cause congestion loss + and may also prevent traffic from reaching the link. + + Unless otherwise controlled, micro-loops may form in any part of the + network that forwards (or in the case of a new link, will forward) + packets over a path that includes the affected topology change. The + time taken to propagate the topology change through the network, and + the non-uniform time taken by each router to calculate the new + shortest path tree (SPT) and update its FIB, contribute to the + duration of the packet disruption caused by the micro-loops. In some + cases, a packet may be subject to disruption from micro-loops that + occur sequentially at links along the path, thus further extending + the period of disruption beyond that required to resolve a single + loop. + +3. Applicability + + Loop-free convergence techniques are applicable to any situation in + which micro-loops may form, for example, the convergence of a network + following: + + + +Shand & Bryant Informational [Page 5] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + 1. Component failure + + 2. Component repair + + 3. Management withdrawal of a component + + 4. Management insertion or a component + + 5. Management change of link cost (either positive or negative) + + 6. External cost change, for example, change of external gateway as + a result of a BGP change + + 7. A Shared Risk Link Group (SRLG) failure + + In each case, a component may be a link, a set of links, or an entire + router. Throughout this document, we use the term SRLG when + describing the procedure to be followed when multiple failures have + occurred, whether or not they are members of an explicit SRLG. In + the case of multiple independent failures, the loop-prevention method + described for SRLG may be used, provided it is known that all of + these failures have been repaired. + + Loop-free convergence techniques are applicable to both IP networks + and MPLS-enabled networks that use LDP, including LDP networks that + use the single-hop tunnel fast-reroute mechanism. + + An assessment of whether loop-free convergence techniques are + required should take into account whether or not the interior gateway + protocol (IGP) convergence is sufficiently fast that any micro-loops + are of such short duration that they are not disruptive, and whether + or not the topology is such that micro-loops are likely to form. + +4. Micro-Loop Control Strategies + + Micro-loop control strategies fall into four basic classes: + + 1. Micro-loop mitigation + + 2. Micro-loop prevention + + 3. Micro-loop suppression + + 4. Network design to minimise micro-loops + + + + + + + +Shand & Bryant Informational [Page 6] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + A micro-loop-mitigation scheme works by re-converging the network in + such a way that it reduces, but does not eliminate, the formation of + micro-loops. Such schemes cannot guarantee the productive forwarding + of packets during the transition. + + A micro-loop-prevention mechanism controls the re-convergence of the + network in such a way that no micro-loops form. Such a micro-loop- + prevention mechanism allows the continued use of any fast repair + method until the network has converged on its new topology and + prevents the collateral damage that occurs to other traffic for the + duration of each micro-loop. + + A micro-loop-suppression mechanism attempts to eliminate the + collateral damage caused by micro-loops to other traffic. This may + be achieved by, for example, using a packet-monitoring method that + detects that a packet is looping and drops it. Such schemes make no + attempt to productively forward the packet throughout the network + transition. + + Highly meshed topologies are less susceptible to micro-loops, thus + networks may be designed to minimise the occurrence of micro-loops by + appropriate link placement and metric settings. However, this + approach may conflict with other design requirements, such as cost + and traffic planning, and may not accurately track the evolution of + the network or temporary changes due to outages. + + Note that all known micro-loop-prevention mechanisms and most micro- + loop-mitigation mechanisms extend the duration of the re-convergence + process. When the failed component is protected by a fast-reroute + repair, this implies that the converging network requires the repair + to remain in place for longer than would otherwise be the case. The + extended convergence time means any traffic that is not repaired by + an imperfect repair experiences a significantly longer outage than it + would experience with conventional convergence. + + When a component is returned to service, or when a network management + action has taken place, this additional delay does not cause traffic + disruption because there is no repair involved. However, the + extended delay is undesirable because it increases the time that the + network takes to be ready for another failure, and hence leaves it + vulnerable to multiple failures. + + + + + + + + + + +Shand & Bryant Informational [Page 7] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + +5. Loop Mitigation + + There are two approaches to loop mitigation. + + o Fast convergence + + o A purpose-designed, loop-mitigation mechanism + +5.1. Fast Convergence + + The duration of micro-loops is dependent on the speed of convergence. + Improving the speed of convergence may therefore be seen as a loop- + mitigation technique. + +5.2. PLSN + + The only known purpose-designed, loop-mitigation approach is the Path + Locking with Safe-Neighbors (PLSN) method described in PLSN + [ANALYSIS]. In this method, a micro-loop-free next-hop safety + condition is defined as follows: + + In a symmetric-cost network, it is safe for router X to change to the + use of neighbor Y as its next hop for a specific destination if the + path through Y to that destination satisfies both of the following + criteria: + + 1. X considers Y as its loop-free neighbor based on the topology + before the change, AND + + 2. X considers Y as its downstream neighbor based on the topology + after the change. + + In an asymmetric-cost network, a stricter safety condition is needed, + and the criterion is that: + + X considers Y as its downstream neighbor based on the topology + both before and after the change. + + Based on these criteria, destinations are classified by each router + into three classes: + + o Type A destinations: Destinations unaffected by the change (type + A1) and also destinations whose next hop after the change + satisfies the safety criteria (type A2). + + + + + + + +Shand & Bryant Informational [Page 8] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + o Type B destinations: Destinations that cannot be sent via the new, + primary next hop because the safety criteria are not satisfied, + but that can be sent via another next hop that does satisfy the + safety criteria. + + o Type C destinations: All other destinations. + + Following a topology change, type A destinations are immediately + changed to go via the new topology. Type B destinations are + immediately changed to go via the next hop that satisfies the safety + criteria, even though this is not the shortest path. Type B + destinations continue to go via this path until all routers have + changed their type C destinations over to the new next hop. Routers + must not change their type C destinations until all routers have + changed their type A2 and B destinations to the new or intermediate + (safe) next hop. + + Simulations indicate that this approach produces a significant + reduction in the number of links that are subject to micro-looping. + However, unlike all of the micro-loop-prevention methods, it is only + a partial solution. In particular, micro-loops may form on any link + joining a pair of type C routers. + + Because routers delay updating their type C destination FIB entries, + they will continue to route towards the failure during the time when + the routers are changing their type A and B destinations, and hence + will continue to productively forward packets, provided that viable + repair paths exist. + + A backwards-compatibility issue arises with PLSN. If a router is not + capable of micro-loop control, it will not correctly delay its FIB + update. If all such routers had only type A destinations, this loop- + mitigation mechanism would work as it was designed. Alternatively, + if all such incapable routers had only type C destinations, the + "loop-prevention" announcement mechanism used to trigger the tunnel- + based schemes (see Sections 6.2 to 6.4) could be used to cause the + type A and B destinations to be changed, with the incapable routers + and routers having type C destinations delaying until they received + the "real" announcement. Unfortunately, these two approaches are + mutually incompatible. + + Note that simulations indicate that in most topologies treating type + B destinations as type C results in only a small degradation in loop + prevention. Also note that simulation results indicate that in + production networks where some, but not all, links have asymmetric + costs, using the stricter asymmetric-cost criterion actually reduces + the number of loop-free destinations because fewer destinations can + be classified as type A or B. + + + +Shand & Bryant Informational [Page 9] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + This mechanism operates identically for: + + o events that degrade the topology (e.g., link failure), + + o events that improve the topology (e.g., link restoration), and + + o shared risk link group (SRLG) failure. + +6. Micro-Loop Prevention + + Eight micro-loop-prevention methods have been proposed: + + 1. Incremental cost advertisement + + 2. Nearside tunneling + + 3. Farside tunneling + + 4. Distributed tunnels + + 5. Packet marking + + 6. New MPLS labels + + 7. Ordered FIB update + + 8. Synchronized FIB update + +6.1. Incremental Cost Advertisement + + When a link fails, the cost of the link is normally changed from its + assigned metric to "infinity" in one step. However, it can be proved + [OPT] that no micro-loops will form if the link cost is increased in + suitable increments, and the network is allowed to stabilize before + the next cost increment is advertised. Once the link cost has been + increased to a value greater than that of the lowest alternative cost + around the link, the link may be disabled without causing a micro- + loop. + + The criterion for a link cost change to be safe is that any link that + is subjected to a cost change of x can only cause loops in a part of + the network that has a cyclic cost less than or equal to x. Because + there may exist links that have a cost of one in each direction, + resulting in a cyclic cost of two, this can result in the link cost + having to be raised in increments of one. However, the increment can + be larger where the minimum cost permits. Recent work [OPT] has + + + + + +Shand & Bryant Informational [Page 10] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + shown that there are a number of optimizations that can be applied to + the problem in order to determine the exact set of cost values + required, and hence minimise the number of increments. + + It will be appreciated that when a link is returned to service, its + cost is reduced in small steps from "infinity" to its final cost, + thereby providing similar micro-loop prevention during a "good-news" + event. Note that the link cost may be decreased from "infinity" to + any value greater than that of the lowest alternative cost around the + link in one step without causing a micro-loop. + + When the failure is an SRLG, the link cost increments must be + coordinated across all failing members of the SRLG. This may be + achieved by completing the transition of one link before starting the + next or by interleaving the changes. + + The incremental cost change approach has the advantage over all other + currently known loop-prevention schemes in that it requires no change + to the routing protocol. It will work in any network because it does + not require any cooperation from the other routers in the network. + + Where the micro-loop-prevention mechanism is being used to support a + planned reconfiguration of the network, the extended total + reconvergence time resulting from the multiple increments is of + limited consequence, particularly where the number of increments have + been optimized. This, together with the ability to implement this + technique in isolation, makes this method a good candidate for use + with such management-initiated changes. + + Where the micro-loop-prevention mechanism is being used to support + failure recovery, the number of increments required, and hence the + time taken to fully converge, is significant even for small numbers + of increments. This is because, for the duration of the transition, + some parts of the network continue to use the old forwarding path, + and hence use any repair mechanism for an extended period. In the + case of a failure that cannot be fully repaired, some destinations + may therefore become unreachable for an extended period. In + addition, the network may be vulnerable to a second failure for the + duration of the controlled re-convergence. + + Where large metrics are used and no optimization (such as that + described above) is performed, the incremental cost method can be + extremely slow. However, in cases where the per-link metric is + small, either because small values have been assigned by the network + designers or because of restrictions implicit in the routing protocol + (e.g., RIP restricts the metric, and BGP using the autonomous system + + + + + +Shand & Bryant Informational [Page 11] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + (AS) path length frequently uses an effective metric of one or a very + small integer for each inter AS hop), the number of required + increments can be acceptably small even without optimizations. + +6.2. Nearside Tunneling + + This mechanism works by creating an overlay network using tunnels + whose path is not affected by the topology change and then carrying + the traffic affected by the change in that new network. When all the + traffic is in the new, tunnel-based network, the real network is + allowed to converge on the new topology. Because all the traffic + that would be affected by the change is carried in the overlay + network, no micro-loops form. + + When a failure is detected (or a link is withdrawn from service), the + router adjacent to the failure issues a new "loop-prevention" routing + message announcing the topology change. This message is propagated + through the network by all routers but is only understood by routers + capable of using one of the tunnel-based, micro-loop-prevention + mechanisms. + + Each of the micro-loop-preventing routers builds a tunnel to the + closest router adjacent to the failure. They then determine which of + their traffic would transit the failure and place that traffic in the + tunnel. When all of these tunnels are in place (determined, for + example, by waiting a suitable interval), the failure is announced as + normal. Because these tunnels will be unaffected by the transition + and because the routers protecting the link will continue the repair + (or forward across the link being withdrawn), no traffic will be + disrupted by the failure. When the network has converged, these + tunnels are withdrawn, allowing traffic to be forwarded along its + new, "natural" path. The order of tunnel insertion and withdrawal is + not important, provided that the tunnels are all in place before the + normal announcement is issued and that the repair remains in place + until normal convergence has completed. + + This method completes in bounded time and is generally much faster + than the incremental cost method. Depending on the exact design, it + completes in two or three flood-SPF-FIB update cycles. + + At the time at which the failure is announced as normal, micro-loops + may form within isolated islands of non-micro-loop-preventing + routers. However, only traffic entering the network via such routers + can micro-loop. All traffic entering the network via a micro-loop- + preventing router will be tunneled correctly to the nearest repairing + router -- including, if necessary, being tunneled via a non-micro- + loop-preventing router -- and will not micro-loop. + + + + +Shand & Bryant Informational [Page 12] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + Where there is no requirement to prevent the formation of micro-loops + involving non-micro-loop-preventing routers, a single, "normal" + announcement may be made and a local timer used to determine the time + at which transition from tunneled forwarding to normal forwarding + over the new topology may commence. + + This technique has the disadvantage that it requires traffic to be + tunneled during the transition. This is an issue in IP networks + because not all router designs are capable of high-performance IP + tunneling. It is also an issue in MPLS networks because the + encapsulating router has to know the label set that the decapsulating + router is distributing. + + A further disadvantage of this method is that it requires cooperation + from all the routers within the routing domain to fully protect the + network against micro-loops. + + When a new link is added, the mechanism is run in "reverse". When + the loop-prevention announcement is heard, routers determine which + traffic they will send over the new link and tunnel that traffic to + the router on the near side of that link. This path will not be + affected by the presence of the new link. When the "normal" + announcement is heard, they then update their FIB to send the traffic + normally, according to the new topology. Any traffic encountering a + router that has not yet updated its FIB will be tunneled to the near + side of the link, and will therefore not loop. + + When a management change to the topology is required, again exactly + the same mechanism protects against micro-looping of packets by the + micro-loop-preventing routers. + + When the failure is an SRLG, the required strategy is to classify + traffic according the furthest failing member of the SRLG that it + will traverse on its way to the destination, and to tunnel that + traffic to the repairing router for that SRLG member. This will + require multiple tunnel destinations -- in the limiting case, one per + SRLG member. + +6.3. Farside Tunnels + + Farside tunneling loop prevention requires the loop-preventing + routers to place all of the traffic that would traverse the failure + in one or more tunnels terminating at the router (or, in the case of + node failure, routers) at the far side of the failure. The + properties of this method are a more uniform distribution of repair + traffic than is achieved using the nearside tunnel method and, in the + case of node failure, a reduction in the decapsulation load on any + single router. + + + +Shand & Bryant Informational [Page 13] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + Unlike the nearside tunnel method (which uses normal routing to the + repairing router), this method requires the use of a repair path to + the farside router. This may be provided by the not-via [NOT-VIA] + mechanism, in which case no further computation is needed. + + The mode of operation is otherwise identical to the nearside + tunneling loop-prevention method (Section 6.2). + +6.4. Distributed Tunnels + + In the distributed tunnels loop-prevention method, each router + calculates its own repair and forwards traffic affected by the + failure using that repair. Unlike the fast reroute (FRR) case, the + actual failure is known at the time of the calculation. The + objective of the loop-preventing routers is to get the packets that + would have gone via the failure into Q-space [FRR-TUNN] using routers + that are in P-space. Because packets are decapsulated on entry to + Q-space, rather than being forced to go to the farside of the + failure, more optimum routing may be achieved. This method is + subject to the same reachability constraints described in [FRR-TUNN]. + + The mode of operation is otherwise identical to the nearside + tunneling loop-prevention method (Section 6.2). + + An alternative distributed tunnel mechanism is for all routers to + tunnel to the not-via address [NOT-VIA] associated with the failure. + +6.5. Packet Marking + + If packets could be marked in some way, this information could be + used to assign them to one of: + + o the new topology, + + o the old topology, or + + o a transition topology. + + They would then be correctly forwarded during the transition. This + mechanism works identically for both "bad-news" and "good-news" + events. It also works identically for SRLG failure. There are three + problems with this solution: + + o A packet-marking bit may not be available, for example, a network + supporting both the differentiated services architecture [RFC2475] + and explicit congestion notification [RFC3168] uses all eight bits + of the IPv4 Type of Service field. + + + + +Shand & Bryant Informational [Page 14] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + o The mechanism would introduce a non-standard forwarding procedure. + + o Packet marking using either the old or the new topology would + double the size of the FIB; however, some optimizations may be + possible. + +6.6. MPLS New Labels + + In an MPLS network that is using [RFC5036] for label distribution, + loop-free convergence can be achieved through the use of new labels + when the path that a prefix will take through the network changes. + + As described in Section 6.2, the repairing routers issue a loop- + prevention announcement to start the loop-free convergence process. + All loop-preventing routers calculate the new topology and determine + whether their FIB needs to be changed. If there is no change in the + FIB, they take no part in the following process. + + The routers that need to make a change to their FIB consider each + change and check the new next hop to determine whether it will use a + path in the OLD topology that reaches the destination without + traversing the failure (i.e., the next hop is in P-space with respect + to the failure [FRR-TUNN]). If so, the FIB entry can be immediately + updated. For all of the remaining FIB entries, the router issues a + new label to each of its neighbors. This new label is used to lock + the path during the transition in a similar manner to the previously + described method for loop-free convergence with tunnels + (Section 6.2). Routers receiving a new label install it in their FIB + for MPLS label translation, but do not yet remove the old label and + do not yet use this new label to forward IP packets, i.e., they + prepare to forward using the new label on the new path but do not use + it yet. Any packets received continue to be forwarded the old way, + using the old labels, towards the repair. + + At some time after the loop-prevention announcement, a normal routing + announcement of the failure is issued. This announcement must not be + issued until such time as all routers have carried out all of their + activities that were triggered by the loop-prevention announcement. + On receipt of the normal announcement, all routers that were delaying + convergence move to their new path for both the new and the old + labels. This involves changing the IP address entries to use the new + labels AND changing the old labels to forward using the new labels. + + Because the new label path was installed during the loop-prevention + phase, packets reach their destinations as follows: + + o If they do not go via any router using a new label, they go via + the repairing router and the repair. + + + +Shand & Bryant Informational [Page 15] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + o If they meet any router that is using the new labels, they get + marked with the new labels and reach their destination using the + new path, back-tracking if necessary. + + When all routers have changed to the new path, the network is + converged. At some later time, when it can be assumed that all + routers have moved to using the new path, the FIB can be cleaned up + to remove the, now redundant, old labels. + + As with other methods, the new labels may be modified to provide loop + prevention for "good news". There are also a number of optimizations + of this method. + +6.7. Ordered FIB Update + + The ordered FIB loop prevention method is described in "Loop-free + convergence using oFIB" [oFIB]. Micro-loops occur following a + failure or a cost increase, when a router closer to the failed + component revises its routes to take account of the failure before a + router that is further away. By analyzing the reverse shortest path + tree (rSPT) over which traffic is directed to the failed component in + the old topology, it is possible to determine a strict ordering that + ensures that nodes closer to the root always process the failure + after any nodes further away, and hence micro-loops are prevented. + + When the failure has been announced, each router waits a multiple of + the convergence timer [LF-TIMERS]. The multiple is determined by the + node's position in the rSPT, and the delay value is chosen to + guarantee that a node can complete its processing within this time. + The convergence time may be reduced by employing a signaling + mechanism to notify the parent when all the children have completed + their processing, and hence when it is safe for the parent to + instantiate its new routes. + + The property of this approach is therefore that it imposes a delay + that is bounded by the network diameter, although in many cases it + will be much less. + + When a link is returned to service, the convergence process above is + reversed. A router first determines its distance (in hops) from the + new link in the NEW topology. Before updating its FIB, it then waits + a time equal to the value of that distance multiplied by the + convergence timer. + + It will be seen that network-management actions can similarly be + undertaken by treating a cost increase in a manner similar to a + failure and a cost decrease similar to a restoration. + + + + +Shand & Bryant Informational [Page 16] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + The ordered FIB mechanism requires all nodes in the domain to operate + according to these procedures, and the presence of non-cooperating + nodes can give rise to loops for any traffic that traverses them (not + just traffic that is originated through them). Without additional + mechanisms, these loops could remain in place for a significant time. + + It should be noted that this method requires per-router ordering but + not per-prefix ordering. A router must wait its turn to update its + FIB, but it should then update its entire FIB. + + When an SRLG failure occurs, a router must classify traffic into the + classes that pass over each member of the SRLG. Each router is then + independently assigned a ranking with respect to each SRLG member for + which they have a traffic class. These rankings may be different for + each traffic class. The prefixes of each class are then changed in + the FIB according to the ordering of their specific ranking. Again, + as for the single failure case, signaling may be used to speed up the + convergence process. + + Note that the special SRLG case of a full or partial node failure can + be dealt with without using per-prefix ordering by running a single + reverse-SPF computation rooted at the failed node (or common point of + the subset of failing links in the partial case). + + There are two classes of signaling optimization that can be applied + to the ordered FIB loop-prevention method: + + o When the router makes NO change, it can signal immediately. This + significantly reduces the time taken by the network to process + long chains of routers that have no change to make to their FIB. + + o When a router HAS changed, it can signal that it has completed. + This is more problematic since this may be difficult to determine, + particularly in a distributed architecture, and the optimization + obtained is the difference between the actual time taken to make + the FIB change and the worst-case timer value. This saving could + be of the order of one second per hop. + + There is another method of executing ordered FIB that is based on + pure signaling [SIG]. Methods that use signaling as an optimization + are safe because eventually they fall back on the established IGP + mechanisms that ensure that networks converge under conditions of + packet loss. However, a mechanism that relies on signaling in order + to converge requires a reliable signaling mechanism that must be + proven to recover from any failure circumstance. + + + + + + +Shand & Bryant Informational [Page 17] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + +6.8. Synchronised FIB Update + + Micro-loops form because of the asynchronous nature of the FIB update + process during a network transition. In many router architectures, + it is the time taken to update the FIB itself that is the dominant + term. One approach would be to have two FIBs and, in a synchronized + action throughout the network, to switch from the old to the new. + One way to achieve this synchronized change would be to signal or + otherwise determine the wall clock time of the change and then + execute the change at that time, using NTP [RFC1305] to synchronize + the wall clocks in the routers. + + This approach has a number of major issues. Firstly, two complete + FIBs are needed, which may create a scaling issue; secondly, a + suitable network-wide synchronization method is needed. However, + neither of these are insurmountable problems. + + Since the FIB change synchronization will not be perfect, there may + be some interval during which micro-loops form. Whether this scheme + is classified as a micro-loop-prevention mechanism or a micro-loop- + mitigation mechanism within this taxonomy is therefore dependent on + the degree of synchronization achieved. + + This mechanism works identically for both "bad-news" and "good-news" + events. It also works identically for SRLG failure. Further + consideration needs to be given to interoperating with routers that + do not support this mechanism. Without a suitable interoperating + mechanism, loops may form for the duration of the synchronization + delay. + +7. Using PLSN in Conjunction with Other Methods + + All of the tunnel methods and packet marking can be combined with + PLSN (see Section 5.2 of this document and [ANALYSIS]) to reduce the + traffic that needs to be protected by the advanced method. + Specifically, all traffic could use PLSN except traffic between a + pair of routers, both of which consider the destination to be type C. + The type-C-to-type-C traffic would be protected from micro-looping + through the use of a loop-prevention method. + + However, determining whether the new next-hop router considers a + destination to be type C may be computationally intensive. An + alternative approach would be to use a loop-prevention method for all + local type C destinations. This would not require any additional + computation, but would require the additional loop-prevention method + to be used in cases that would not have generated loops (i.e., when + the new next-hop router considered this to be a type A or B + destination). + + + +Shand & Bryant Informational [Page 18] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + The amount of traffic that would use PLSN is highly dependent on the + network topology and the specific change, but would be expected to be + in the range of 70% to 90% in typical networks. + + However, PLSN cannot be combined safely with ordered FIB. Consider + the network fragment shown below: + + R + /|\ + / | \ + 1/ 2| \3 + / | \ cost S->T = 10 + Y-----X----S----T cost T->S = 1 + | 1 2 | + |1 | + D---------------+ + 20 + + On failure of link XY, according to PLSN, S will regard R as a safe + neighbor for traffic to D. However, the ordered FIB rank of both R + and T will be zero, and hence these can change their FIBs during the + same time interval. If R changes before T, then a loop will form + around R, T, and S. This can be prevented by using a stronger safety + condition than PLSN currently specifies, at the cost of introducing + more type C routers, and hence reducing the PLSN coverage. + +8. Loop Suppression + + A micro-loop-suppression mechanism recognizes that a packet is + looping and drops it. One such approach would be for a router to + recognize, by some means, that it had seen the same packet before. + It is difficult to see how sufficiently reliable discrimination could + be achieved without some form of per-router signature, such as route + recording. A packet-recognizing approach therefore seems infeasible. + + An alternative approach would be to recognize that a packet was + looping by recognizing that it was being sent back to the place from + which it had just come. This would work for the types of loop that + form in symmetric-cost networks, but would not suppress the cyclic + loops that form in asymmetric networks or as a result of multiple + failures. + + This mechanism operates identically for both "bad-news" events, + "good-news" events, and SRLG failure. + + + + + + + +Shand & Bryant Informational [Page 19] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + +9. Compatibility Issues + + Deployment of any micro-loop-control mechanism is a major change to a + network. Full consideration must be given to interoperation between + routers that are capable of micro-loop control and those that are + not. Additionally, there may be a desire to limit the complexity of + micro-loop control by choosing a method based purely on its + simplicity. Any such decision must take into account that if a more + capable scheme is needed in the future, its deployment might be + complicated by interaction with the scheme previously deployed. + +10. Comparison of Loop-Free Convergence Methods + + PLSN [ANALYSIS] is an efficient mechanism to prevent the formation of + micro-loops but is only a partial solution. It is a useful adjunct + to some of the complete solutions but may need modification. + + Incremental cost advertisement in its simplest form is impractical as + a general solution because it takes too long to complete. Optimized + incremental cost advertisement, however, completes in much less time + and requires no assistance from other routers in the network. It is + therefore useful for network-reconfiguration operations. + + Packet marking is probably impractical because of the need to find + the marking bit and to change the forwarding behavior. + + Of the remaining methods, distributed tunnels is significantly more + complex than nearside or farside tunnels and should only be + considered if there is a requirement to distribute the tunnel + decapsulation load. + + Synchronised FIBs is a fast method but has the issue that a suitable + synchronization mechanism needs to be defined. One method would be + to use NTP [RFC1305]; however, the coupling of routing convergence to + a protocol that uses the network may be a problem. During the + transition, there will be some micro-looping for a short interval + because it is not possible to achieve complete synchronization of the + FIB changeover. + + The ordered FIB mechanism has the major advantage that it is a + control-plane-only solution. However, SRLGs require a per- + destination calculation and the convergence delay may be high, + bounded by the network diameter. The use of signaling as an + accelerator may reduce the number of destinations that experience the + full delay, and hence reduce the total re-convergence time to an + acceptable period. + + + + + +Shand & Bryant Informational [Page 20] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + The nearside and farside tunnel methods deal relatively easily with + SRLGs and uncorrelated changes. The convergence delay would be + small. However, these methods require the use of tunneled + forwarding, which is not supported on all router hardware, and raises + issues of forwarding performance. When used with PLSN, the amount of + traffic that was tunneled would be significantly reduced, thus + reducing the forwarding performance concerns. If the selected repair + mechanism requires the use of tunnels, then a tunnel-based loop + prevention scheme may be acceptable. + +11. Security Considerations + + This document analyzes the problem of micro-loops and summarizes a + number of potential solutions that have been proposed. These + solutions require only minor modifications to existing routing + protocols and therefore do not add additional security risks. + However, a full security analysis would need to be provided within + the specification of a particular solution proposed for deployment. + +12. Acknowledgments + + The authors would like to acknowledge contributions to this document + made by Clarence Filsfils. + +13. Informative References + + [ANALYSIS] Zinin, A., "Analysis and Minimization of Microloops in + Link-state Routing Protocols", Work in Progress, + October 2005. + + [FRR-TUNN] Bryant, S., Filsfils, C., Previdi, S., and M. Shand, "IP + Fast Reroute using tunnels", Work in Progress, + November 2007. + + [LF-TIMERS] Atlas, A., Bryant, S., and M. Shand, "Synchronisation of + Loop Free Timer Values", Work in Progress, + February 2008. + + [NOT-VIA] Shand, M., Bryant, S., and S. Previdi, "IP Fast Reroute + Using Not-via Addresses", Work in Progress, July 2009. + + [OPT] Francois, P., Shand, M., and O. Bonaventure, "Disruption + free topology reconfiguration in OSPF networks", IEEE + INFOCOM May 2007, Anchorage. + + [RFC1305] Mills, D., "Network Time Protocol (Version 3) + Specification, Implementation", RFC 1305, March 1992. + + + + +Shand & Bryant Informational [Page 21] + +RFC 5715 A Framework for Loop-Free Convergence January 2010 + + + [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., + and W. Weiss, "An Architecture for Differentiated + Services", RFC 2475, December 1998. + + [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition + of Explicit Congestion Notification (ECN) to IP", + RFC 3168, September 2001. + + [RFC4090] Pan, P., Swallow, G., and A. Atlas, "Fast Reroute + Extensions to RSVP-TE for LSP Tunnels", RFC 4090, + May 2005. + + [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP + Specification", RFC 5036, October 2007. + + [RFC5714] Shand, M. and S. Bryant, "IP Fast Reroute Framework", + RFC 5714, January 2010. + + [SIG] Francois, P. and O. Bonaventure, "Avoiding transient + loops during IGP convergence", IEEE INFOCOM March 2005, + Miami. + + [oFIB] Francois, P., "Loop-free convergence using oFIB", Work + in Progress, February 2008. + +Authors' Addresses + + Mike Shand + Cisco Systems + 250, Longwater Ave, + Green Park, Reading, RG2 6GB + United Kingdom + + EMail: mshand@cisco.com + + + Stewart Bryant + Cisco Systems + 250, Longwater Ave, + Green Park, Reading, RG2 6GB + United Kingdom + + EMail: stbryant@cisco.com + + + + + + + + +Shand & Bryant Informational [Page 22] + |