diff options
Diffstat (limited to 'doc/rfc/rfc6372.txt')
-rw-r--r-- | doc/rfc/rfc6372.txt | 3139 |
1 files changed, 3139 insertions, 0 deletions
diff --git a/doc/rfc/rfc6372.txt b/doc/rfc/rfc6372.txt new file mode 100644 index 0000000..2574aa0 --- /dev/null +++ b/doc/rfc/rfc6372.txt @@ -0,0 +1,3139 @@ + + + + + + +Internet Engineering Task Force (IETF) N. Sprecher, Ed. +Request for Comments: 6372 Nokia Siemens Networks +Category: Informational A. Farrel, Ed. +ISSN: 2070-1721 Juniper Networks + September 2011 + + + MPLS Transport Profile (MPLS-TP) Survivability Framework + +Abstract + + Network survivability is the ability of a network to recover traffic + delivery following failure or degradation of network resources. + Survivability is critical for the delivery of guaranteed network + services, such as those subject to strict Service Level Agreements + (SLAs) that place maximum bounds on the length of time that services + may be degraded or unavailable. + + The Transport Profile of Multiprotocol Label Switching (MPLS-TP) is a + packet-based transport technology based on the MPLS data plane that + reuses many aspects of the MPLS management and control planes. + + This document comprises a framework for the provision of + survivability in an MPLS-TP network; it describes recovery elements, + types, methods, and topological considerations. To enable data-plane + recovery, survivability may be supported by the control plane, + management plane, and by Operations, Administration, and Maintenance + (OAM) functions. This document describes mechanisms for recovering + MPLS-TP Label Switched Paths (LSPs). A detailed description of + pseudowire recovery in MPLS-TP networks is beyond the scope of this + document. + + This document is a product of a joint Internet Engineering Task Force + (IETF) / International Telecommunication Union Telecommunication + Standardization Sector (ITU-T) effort to include an MPLS Transport + Profile within the IETF MPLS and Pseudowire Emulation Edge-to-Edge + (PWE3) architectures to support the capabilities and functionalities + of a packet-based transport network as defined by the ITU-T. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Not all documents + + + +Sprecher & Farrel Informational [Page 1] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + approved by the IESG are a candidate for any level of Internet + Standard; see Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc6372. + +Copyright Notice + + Copyright (c) 2011 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction ....................................................4 + 1.1. Recovery Schemes ...........................................4 + 1.2. Recovery Action Initiation .................................5 + 1.3. Recovery Context ...........................................6 + 1.4. Scope of This Framework ....................................7 + 2. Terminology and References ......................................8 + 3. Requirements for Survivability .................................10 + 4. Functional Architecture ........................................10 + 4.1. Elements of Control .......................................10 + 4.1.1. Operator Control ...................................11 + 4.1.2. Defect-Triggered Actions ...........................12 + 4.1.3. OAM Signaling ......................................12 + 4.1.4. Control-Plane Signaling ............................12 + 4.2. Recovery Scope ............................................13 + 4.2.1. Span Recovery ......................................13 + 4.2.2. Segment Recovery ...................................13 + 4.2.3. End-to-End Recovery ................................14 + 4.3. Grades of Recovery ........................................15 + 4.3.1. Dedicated Protection ...............................15 + 4.3.2. Shared Protection ..................................16 + 4.3.3. Extra Traffic ......................................17 + 4.3.4. Restoration ........................................19 + 4.3.5. Reversion ..........................................20 + 4.4. Mechanisms for Protection .................................20 + + + +Sprecher & Farrel Informational [Page 2] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + 4.4.1. Link-Level Protection ..............................20 + 4.4.2. Alternate Paths and Segments .......................21 + 4.4.3. Protection Tunnels .................................22 + 4.5. Recovery Domains ..........................................23 + 4.6. Protection in Different Topologies ........................24 + 4.7. Mesh Networks .............................................25 + 4.7.1. 1:n Linear Protection ..............................26 + 4.7.2. 1+1 Linear Protection ..............................28 + 4.7.3. P2MP Linear Protection .............................29 + 4.7.4. Triggers for the Linear Protection + Switching Action ...................................30 + 4.7.5. Applicability of Linear Protection for LSP + Segments ...........................................31 + 4.7.6. Shared Mesh Protection .............................32 + 4.8. Ring Networks .............................................33 + 4.9. Recovery in Layered Networks ..............................34 + 4.9.1. Inherited Link-Level Protection ....................35 + 4.9.2. Shared Risk Groups .................................35 + 4.9.3. Fault Correlation ..................................36 + 5. Applicability and Scope of Survivability in MPLS-TP ............37 + 6. Mechanisms for Providing Survivability for MPLS-TP LSPs ........39 + 6.1. Management Plane ..........................................39 + 6.1.1. Configuration of Protection Operation ..............40 + 6.1.2. External Manual Commands ...........................41 + 6.2. Fault Detection ...........................................41 + 6.3. Fault Localization ........................................42 + 6.4. OAM Signaling .............................................43 + 6.4.1. Fault Detection ....................................44 + 6.4.2. Testing for Faults .................................44 + 6.4.3. Fault Localization .................................45 + 6.4.4. Fault Reporting ....................................45 + 6.4.5. Coordination of Recovery Actions ...................46 + 6.5. Control Plane .............................................46 + 6.5.1. Fault Detection ....................................47 + 6.5.2. Testing for Faults .................................47 + 6.5.3. Fault Localization .................................48 + 6.5.4. Fault Status Reporting .............................48 + 6.5.5. Coordination of Recovery Actions ...................49 + 6.5.6. Establishment of Protection and Restoration LSPs ...49 + 7. Pseudowire Recovery Considerations .............................50 + 7.1. Utilization of Underlying MPLS-TP Recovery ................50 + 7.2. Recovery in the Pseudowire Layer ..........................51 + 8. Manageability Considerations ...................................51 + 9. Security Considerations ........................................52 + 10. Acknowledgments ...............................................52 + 11. References ....................................................53 + 11.1. Normative References .....................................53 + 11.2. Informative References ...................................54 + + + +Sprecher & Farrel Informational [Page 3] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +1. Introduction + + Network survivability is the network's ability to recover traffic + delivery following the failure or degradation of traffic delivery + caused by a network fault or a denial-of-service attack on the + network. Survivability plays a critical role in the delivery of + reliable services in transport networks. Guaranteed services in the + form of Service Level Agreements (SLAs) require a resilient network + that very rapidly detects facility or node degradation or failures, + and immediately starts to recover network operations in accordance + with the terms of the SLA. + + The MPLS Transport Profile (MPLS-TP) is described in [RFC5921]. + MPLS-TP is designed to be consistent with existing transport network + operations and management models, while providing survivability + mechanisms, such as protection and restoration. The functionality + provided is intended to be similar to or better than that found in + established transport networks that set a high benchmark for + reliability. That is, it is intended to provide the operator with + functions with which they are familiar through their experience with + other transport networks, although this does not preclude additional + techniques. + + This document provides a framework for MPLS-TP-based survivability + that meets the recovery requirements specified in [RFC5654]. It uses + the recovery terminology defined in [RFC4427], which draws heavily on + [G.808.1], and it refers to the requirements specified in [RFC5654]. + + This document is a product of a joint Internet Engineering Task Force + (IETF) / International Telecommunication Union Telecommunication + Standardization Sector (ITU-T) effort to include an MPLS Transport + Profile within the IETF MPLS and PWE3 architectures to support the + capabilities and functionalities of a packet-based transport network, + as defined by the ITU-T. + +1.1. Recovery Schemes + + Various recovery schemes (for protection and restoration) and + processes have been defined and analyzed in [RFC4427] and [RFC4428]. + These schemes can also be applied in MPLS-TP networks to re-establish + end-to-end traffic delivery according to the agreed service + parameters, and to trigger recovery from "failed" or "degraded" + transport entities. In the context of this document, transport + entities are nodes, links, transport path segments, concatenated + transport path segments, and entire transport paths. Recovery + actions are initiated by the detection of a defect, or by an external + request (e.g., an operator's request for manual control of protection + switching). + + + +Sprecher & Farrel Informational [Page 4] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + [RFC4427] makes a distinction between protection switching and + restoration mechanisms. + + - Protection switching uses pre-assigned capacity between nodes, + where the simplest scheme has a single, dedicated protection entity + for each working entity, while the most complex scheme has m + protection entities shared between n working entities (m:n). + + - Restoration uses any capacity available between nodes and usually + involves rerouting. The resources used for restoration may be pre- + planned (i.e., predetermined, but not yet allocated to the recovery + path), and recovery priority may be used as a differentiation + mechanism to determine which services are recovered and which are + not recovered. + + Both protection switching and restoration may be either + unidirectional or bidirectional; unidirectional implies that + protection switching is performed independently for each direction of + a bidirectional transport path, while bidirectional means that both + directions are switched simultaneously using appropriate + coordination, even if the fault applies to only one direction of the + path. + + Both protection and restoration mechanisms may be either revertive or + non-revertive as described in Section 4.11 of [RFC4427]. + + Preemption priority may be used to determine which services are + sacrificed to enable the recovery of other services. Restoration may + also be either unidirectional or bidirectional. In general, + protection actions are completed within time frames amounting to tens + of milliseconds, while automated restoration actions are normally + completed within periods ranging from hundreds of milliseconds to a + maximum of a few seconds. Restoration is not guaranteed (for + example, because network resources may not be available at the time + of the defect). + +1.2. Recovery Action Initiation + + The recovery schemes described in [RFC4427] and evaluated in + [RFC4428] are presented in the context of control-plane-driven + actions (such as the configuration of the protection entities and + functions, etc.). The presence of a distributed control plane in an + MPLS-TP network is optional. However, the absence of such a control + plane does not affect the operation of the network and the use of + MPLS-TP forwarding, Operations, Administration, and Maintenance + (OAM), and survivability capabilities. In particular, the concepts + + + + + +Sprecher & Farrel Informational [Page 5] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + discussed in [RFC4427] and [RFC4428] refer to recovery actions + effected in the data plane; they are equally applicable in MPLS-TP, + with or without the use of a control plane. + + Thus, some of the MPLS-TP recovery mechanisms do not depend on a + control plane and use MPLS-TP OAM mechanisms or management actions to + trigger recovery actions. + + The principles of MPLS-TP protection-switching actions are similar to + those described in [RFC4427], since the protection mechanism is based + on the capability to detect certain defects in the transport entities + within the recovery domain. The protection-switching controller does + not care which initiation method is used, provided that it can be + given information about the status of the transport entities within + the recovery domain (e.g., OK, signal failure, signal degradation, + etc.). + + In the context of MPLS-TP, it is imperative to ensure that performing + switchovers is possible, regardless of the way in which the network + is configured and managed (for example, regardless of whether a + control-plane, management-plane, or OAM initiation mechanism is + used). + + All MPLS and GMPLS protection mechanisms [RFC4428] are applicable in + an MPLS-TP environment. It is also possible to provision and manage + the related protection entities and functions defined in MPLS and + GMPLS using the management plane [RFC5654]. Regardless of whether an + OAM, management, or control plane initiation mechanism is used, the + protection-switching operation is a data-plane operation. + + In some recovery schemes (such as bidirectional protection + switching), it is necessary to coordinate the protection state + between the edges of the recovery domain to achieve initiation of + recovery actions for both directions. An MPLS-TP protocol may be + used as an in-band (i.e., data-plane based) control protocol in order + to coordinate the protection state between the edges of the + protection domain. When the MPLS-TP control plane is in use, a + control-plane-based mechanism can also be used to coordinate the + protection states between the edges of the protection domain. + +1.3. Recovery Context + + An MPLS-TP Label Switched Path (LSP) may be subject to any part of or + all of MPLS-TP link recovery, path-segment recovery, or end-to-end + recovery, where: + + + + + + +Sprecher & Farrel Informational [Page 6] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + o MPLS-TP link recovery refers to the recovery of an individual link + (and hence all or a subset of the LSPs routed over the link) + between two MPLS-TP nodes. For example, link recovery may be + provided by server-layer recovery. + + o Segment recovery refers to the recovery of an LSP segment (i.e., + segment and concatenated segment in the language of [RFC5654]) + between two nodes and is used to recover from the failure of one + or more links or nodes. + + o End-to-end recovery refers to the recovery of an entire LSP, from + its ingress to its egress node. + + For additional resiliency, more than one of these recovery techniques + may be configured concurrently for a single path. + + Co-routed bidirectional MPLS-TP LSPs are defined in a way that allows + both directions of the LSP to follow the same route through the + network. In this scenario, the operator often requires the + directions to fate-share (that is, if one direction fails, both + directions should cease to operate). + + Associated bidirectional MPLS-TP LSPs exist where the two directions + of a bidirectional LSP follow different paths through the network. + An operator may also request fate-sharing for associated + bidirectional LSPs. + + The requirement for fate-sharing causes a direct interaction between + the recovery processes affecting the two directions of an LSP, so + that both directions of the bidirectional LSP are recovered at the + same time. This mode of recovery is termed bidirectional recovery + and may be seen as a consequence of fate-sharing. + + The recovery scheme operating at the data-plane level can function in + a multi-domain environment (in the wider sense of a "domain" + [RFC4726]). It can also protect against a failure of a boundary node + in the case of inter-domain operation. MPLS-TP recovery schemes are + intended to protect client services when they are sent across the + MPLS-TP network. + +1.4. Scope of This Framework + + This framework introduces the architecture of the MPLS-TP recovery + domain and describes the recovery schemes in MPLS-TP (based on the + recovery types defined in [RFC4427]) as well as the principles of + operation, recovery states, recovery triggers, and information + exchanges between the different elements that support the reference + model. + + + +Sprecher & Farrel Informational [Page 7] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + The framework also describes the qualitative grades of the + survivability functions that can be provided, such as dedicated + recovery, shared protection, restoration, etc. In the event of a + network failure, the grade of recovery directly affects the service + grade provided to the end-user. + + The general description of the functional architecture is applicable + to both LSPs and pseudowires (PWs); however, PW recovery is only + introduced in Section 7, and the relevant details are beyond the + scope of this document and are for further study. + + This framework applies to general recovery schemes as well as to + mechanisms that are optimized for specific topologies and are + tailored to efficiently handle protection switching. + + This document addresses the need for the coordination of protection + switching across multiple layers and at sub-layers (for clarity, we + use the term "layer" to refer equally to layers and sub-layers). + This allows an operator to prevent race conditions and allows the + protection-switching mechanism of one layer to recover from a failure + before switching is invoked at another layer. + + This framework also specifies the functions that must be supported by + MPLS-TP to provide the recovery mechanisms. MPLS-TP introduces a + tool kit to enable recovery in MPLS-TP-based networks and to ensure + that affected services are recovered in the event of a failure. + + Generally, network operators aim to provide the fastest, most stable, + and best protection mechanism at a reasonable cost in accordance with + customer requirements. The greater the grade of protection required, + the greater the number of resources will be consumed. It is + therefore expected that network operators will offer a wide spectrum + of service grade. MPLS-TP-based recovery offers the flexibility to + select a recovery mechanism, define the granularity at which traffic + delivery is to be protected, and choose the specific traffic types + that are to be protected. With MPLS-TP-based recovery, it should be + possible to provide different grades of protection for different + traffic classes within the same path based on the service + requirements. + +2. Terminology and References + + The terminology used in this document is consistent with that defined + in [RFC4427]. The latter is consistent with [G.808.1]. + + However, certain protection concepts (such as ring protection) are + not discussed in [RFC4427]; for those concepts, the terminology used + in this document is drawn from [G.841]. + + + +Sprecher & Farrel Informational [Page 8] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + Readers should refer to those documents for normative definitions. + + This document supplies brief summaries of a number of terms for + reasons of clarity and to assist the reader, but it does not redefine + terms. + + Note, in particular, the distinction and definitions made in + [RFC4427] for the following three terms: + + o Protection: re-establishing end-to-end traffic delivery using pre- + allocated resources. + + o Restoration: re-establishing end-to-end traffic delivery using + resources allocated at the time of need; sometimes referred to as + "repair" of a service, LSP, or the traffic. + + o Recovery: a generic term covering both Protection and Restoration. + + Note that the term "survivability" is used in [RFC5654] to cover the + functional elements of "protection" and "restoration", which are + collectively known as "recovery". + + Important background information on survivability can be found in + [RFC3386], [RFC3469], [RFC4426], [RFC4427], and [RFC4428]. + + In this document, the following additional terminology is applied: + + o "Fault Management", as defined in [RFC5950]. + + o The terms "defect" and "failure" are used interchangeably to + indicate any defect or failure in the sense that they are defined + in [G.806]. The terms also include any signal degradation event + as defined in [G.806]. + + o A "fault" is a fault or fault cause as defined in [G.806]. + + o "Trigger" indicates any event that may initiate a recovery action. + See Section 4.1 for a more detailed discussion of triggers. + + o The acronym "OAM" is defined as Operations, Administration, and + Maintenance, consistent with [RFC6291]. + + o A "Transport Entity" is a node, link, transport path segment, + concatenated transport path segment, or entire transport path. + + o A "Working Entity" is a transport entity that carries traffic + during normal network operation. + + + + +Sprecher & Farrel Informational [Page 9] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + o A "Protection Entity" is a transport entity that is pre-allocated + and used to protect and transport traffic when the working entity + fails. + + o A "Recovery Entity" is a transport entity that is used to recover + and transport traffic when the working entity fails. + + o "Survivability Actions" are the steps that may be taken by network + nodes to communicate faults and to switch traffic from faulted or + degraded paths to other paths. This may include sending messages + and establishing new paths. + + General terminology for MPLS-TP is found in [RFC5921] and [ROSETTA]. + Background information on MPLS-TP requirements can be found in + [RFC5654]. + +3. Requirements for Survivability + + MPLS-TP requirements are presented in [RFC5654] and serve as + normative references for the definition of all MPLS-TP functionality, + including survivability. Survivability is presented in [RFC5654] as + playing a critical role in the delivery of reliable services, and the + requirements for survivability are set out using the recovery + terminology defined in [RFC4427]. + +4. Functional Architecture + + This section presents an overview of the elements relating to the + functional architecture for survivability within an MPLS-TP network. + The components are presented separately to demonstrate the way in + which they may be combined to provide the different grades of + recovery needed to meet the requirements set out in the previous + section. + +4.1. Elements of Control + + Recovery is achieved by implementing specific actions. These actions + aim to repair network resources or redirect traffic along paths that + avoid failures in the network. They may be triggered automatically + by the MPLS-TP network nodes upon detection of a network defect, or + they may be triggered by an operator. Automated actions may be + enhanced by in-band (i.e., data-plane-based) OAM mechanisms, or by + in-band or out-of-band control-plane signaling. + + + + + + + + +Sprecher & Farrel Informational [Page 10] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +4.1.1. Operator Control + + The survivability behavior of the network as a whole, and the + reaction of each transport path when a fault is reported, may be + controlled by the operator. This control can be split into two sets + of functions: policies and actions performed when the transport path + is set up, and commands used to control or force recovery actions for + established transport paths. + + The operator may establish network-wide or local policies that + determine the actions that will be taken when various defects are + reported that affect different transport paths. Also, when a service + request is made that causes the establishment of one or more + transport paths in the network, the operator (or requesting + application) may define a particular grade of service, and this will + be mapped to specific survivability actions taken before and during + transport path setup, after the discovery of a failure of network + resources, and upon recovery of those resources. + + It should be noted that it is unusual to present a user or customer + with options directly related to recovery actions. Instead, the + user/customer enters into an SLA with the network provider, and the + network operator maps the terms of the SLA (for example, for + guaranteed delivery, availability, or reliability) to recovery + schemes within the network. + + The operator can also issue commands to control recovery actions and + events. For example, the operator may perform the following actions: + + o Enable or disable the survivability function. + + o Invoke the simulation of a network fault. + + o Force a switchover from a working path to a recovery path or vice + versa. + + Forced switchover may be performed for network optimization purposes + with minimal service interruption, such as when modifying protected + or unprotected services, when replacing MPLS-TP network nodes, etc. + In some circumstances, a fault may be reported to the operator, and + the operator may then select and initiate the appropriate recovery + action. A description of the different operator commands is found in + Section 4.12 of [RFC4427]. + + + + + + + + +Sprecher & Farrel Informational [Page 11] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +4.1.2. Defect-Triggered Actions + + Survivability actions may be directly triggered by network defects. + This means that the device that detects the defect (for example, + notification of an issue reported from equipment in a lower layer, + failure to receive an OAM Continuity message, or receipt of an OAM + message reporting a failure condition) may immediately perform a + survivability action. + + The action is directly triggered by events in the data plane. Note, + however, that coordination of recovery actions between the edges of + the recovery domain may require message exchanges for some recovery + functions or for performing a bidirectional recovery action. + +4.1.3. OAM Signaling + + OAM signaling refers to data-plane OAM message exchange. Such + messages may be used to detect and localize faults or to indicate a + degradation in the operation of the network. However, in this + context these messages are used to control or trigger survivability + actions. The mechanisms to achieve this are discussed in [RFC6371]. + + OAM signaling may also be used to coordinate recovery actions within + the protection domain. + +4.1.4. Control-Plane Signaling + + Control-plane signaling is responsible for setup, maintenance, and + teardown of transport paths that do not fall under management-plane + control. The control plane may also be used to coordinate the + detection, localization, and reaction to network defects pertaining + to peer relationships (neighbor-to-neighbor or end-to-end). Thus, + control-plane signaling may initiate and coordinate survivability + actions. + + The control plane can also be used to distribute topology and + information relating to resource availability. In this way, the + "graceful shutdown" [RFC5817] of resources may be affected by + withdrawing them; this can be used to invoke a survivability action + in a similar way to that used when reporting or discovering a fault, + as described in the previous sections. + + The use of a control plane for MPLS-TP is discussed in [RFC6373]. + + + + + + + + +Sprecher & Farrel Informational [Page 12] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +4.2. Recovery Scope + + This section describes the elements of recovery. These are the + quantitative aspects of recovery, that is, the parts of the network + for which recovery can be provided. + + Note that the terminology in this section is consistent with + [RFC4427]. Where the terms differ from those in [RFC5654], mapping + is provided. + +4.2.1. Span Recovery + + A span is a single hop between neighboring MPLS-TP nodes in the same + network layer. A span is sometimes referred to as a link, and this + may cause some confusion between the concept of a data link and a + traffic engineering (TE) link. LSPs traverse TE links between + neighboring MPLS-TP nodes in the MPLS-TP network layer. However, a + TE link may be provided by any of the following: + + o A single data link. + + o A series of data links in a lower layer, established as an LSP and + presented to the upper layer as a single TE link. + + o A set of parallel data links in the same layer, presented either as + a bundle of TE links, or as a collection of data links that + together provide a data-link-layer protection scheme. + + Thus, span recovery may be provided by any of the following: + + o Selecting a different TE link from a bundle. + + o Moving the TE link so that it is supported by a different data + link between the same pair of neighbors. + + o Rerouting the LSP in the lower layer. + + Moving the protected LSP to another TE link between the same pair of + neighbors is a form of segment recovery and not a form of span + recovery. Segment Recovery is described in Section 4.2.2. + +4.2.2. Segment Recovery + + An LSP segment comprises one or more continuous hops on the path of + the LSP. [RFC5654] defines two terms. A "segment" is a single hop + along the path of an LSP, while a "concatenated segment" is more than + one hop along the path of an LSP. In the context of this document, a + segment covers both of these concepts. + + + +Sprecher & Farrel Informational [Page 13] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + A PW segment refers to a Single-Segment PW (SS-PW) or to a single + segment of a Multi-Segment PW (MS-PW) that is set up between two PE + devices that may be Terminating PEs (T-PEs) or Switching PEs (S-PEs) + so that the full set of possibilities is T-PE to S-PE, S-PE to S-PE, + S-PE to T-PE, or T-PE to T-PE (for the SS-PW case). As indicated in + Section 1, the recovery of PWs and PW segments is beyond the scope of + this document; however, see Section 7. + + Segment recovery involves redirecting or copying traffic at the + source end of a segment onto an alternate path leading to the other + end of the segment. According to the required grade of recovery + (described in Section 4.3), traffic may be either redirected to a + pre-established segment, through rerouting the protected segment, or + tunneled to the far end of the protected segment through a "bypass" + LSP. For details on recovery mechanisms, see Section 4.4. + + Note that protecting a transport path against node failure requires + the use of segment recovery or end-to-end recovery, while a link + failure can be protected using span, segment, or end-to-end recovery. + +4.2.3. End-to-End Recovery + + End-to-end recovery is a special case of segment recovery where the + protected segment comprises the entire transport path. End-to-end + recovery may be provided as link-diverse or node-diverse recovery + where the recovery path shares no links or no nodes with the working + path. + + Note that node-diverse paths are necessarily link-diverse and that + full, end-to-end node-diversity is required to guarantee recovery. + + Two observations need to be made about end-to-end recovery. + + - Firstly, there may be circumstances where node-diverse end-to-end + paths do not guarantee recovery. The ingress and egress nodes will + themselves be single points of failure. Additionally, there may be + shared risks of failure (for example, geographic collocation, + shared resources, etc.) between diverse nodes as described in + Section 4.9.2. + + - Secondly, it is possible to use end-to-end recovery techniques even + when there is not full diversity and the working and protection + paths share links or nodes. + + + + + + + + +Sprecher & Farrel Informational [Page 14] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +4.3. Grades of Recovery + + This section describes the qualitative grades of survivability that + can be provided. In the event of a network failure, the grade of + recovery offered directly affects the service grade provided to the + end-user. This will be observed as the amount of data lost when a + network fault occurs, and the length of time required to recover + connectivity. + + In general, there is a correlation between the recovery service grade + (i.e., the speed of recovery and reduction of data loss) and the + amount of resources used in the network; better service grades + require the pre-allocation of resources to the recovery paths, and + those resources cannot be used for other purposes if high-quality + recovery is required. An operator will consider how providing + different grades of recovery may require that network resources be + provisioned and allocated for exclusive use of the recovery paths + such that the resources cannot be used to support other customer + services. + + Sections 6 and 7 of [RFC4427] provide a full breakdown of the + protection and recovery schemes. This section summarizes the + qualitative grades available. + + Note that, in the context of recovery, a useful discussion of the + term "resource" and its interpretation in both the IETF and ITU-T + contexts may be found in Section 3.2 of [RFC4397]. + + The selection of the recovery grade and schemes to satisfy the + service grades for an LSP using available network resources is + subject to network and local policy and may be pre-designated through + network planning or may be dynamically determined by the network. + +4.3.1. Dedicated Protection + + In dedicated protection, the resources for the recovery entity are + pre-assigned for the sole use of the protected transport path. This + will clearly be the case in 1+1 protection, and may also be the case + in 1:1 protection where extra traffic (see Section 4.3.3) is not + supported. + + Note that when using protection tunnels (see Section 4.4.3), + resources may also be dedicated to the protection of a specific + transport path. In some cases (1:1 protection), the entire bypass + tunnel may be dedicated to providing recovery for a specific + transport path, while in other cases (such as facility backup), a + subset of the resources associated with the bypass tunnel may be pre- + assigned for the recovery of a specific service. + + + +Sprecher & Farrel Informational [Page 15] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + However, as described in Section 4.4.3, the bypass tunnel method can + also be used for shared protection (Section 4.3.2), either to carry + extra traffic (Section 4.3.3) or to achieve best-effort recovery + without the need for resource reservation. + +4.3.2. Shared Protection + + In shared protection, the resources for the recovery entities of + several services are shared. These may be shared as 1:n or m:n and + are shared on individual links. Link-by-link resource sharing may be + managed and operated along LSP segments, on PW segments, or on end- + to-end transport paths (LSP or PW). Note that there is no + requirement for m:n recovery in the list of MPLS-TP requirements + documented in [RFC5654]. Shared protection can be applied in + different topologies (mesh, ring, etc.) and can utilize different + protection mechanisms (linear, ring, etc.). + + End-to-end shared protection shares resources between a number of + paths that have common end points. Thus, a number of paths (n paths) + are all protected by one or more protection paths (m paths, where m + may equal 1). When there have been m failures, there are no more + available protection paths, and the n paths are no longer protected. + Thus, in 1:n protection, one fault can be protected against before + all the n paths are unprotected. The fact that the paths have become + unprotected needs to be conveyed to the path end points since they + may need to report the change in service grade or may need to take + further action to increase their protection. In end-to-end shared + protection, this communication is simple since the end points are + common. + + In shared mesh protection (see Section 4.7.6), the paths that share + the protection resources do not necessarily have the same end points. + This provides a more flexible resource-sharing scheme, but the + network planning and the coordination of protection state after a + recovery action are more complex. + + Where a bypass tunnel is used (Section 4.4.3), the tunnel might not + have sufficient resources to simultaneously protect all of the paths + for which it offers protection; in the event that all paths were + affected by network defects and failures at the same time, not all of + them would be recovered. Policy would dictate how this situation + should be handled: some paths might be protected, while others would + simply fail; the traffic for some paths would be guaranteed, while + traffic on other paths would be treated as best-effort with the risk + of dropped packets. Alternatively, it is possible that protection + would not be attempted according to local policy at the nodes that + perform the recovery actions. + + + + +Sprecher & Farrel Informational [Page 16] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + Shared protection is a trade-off between assigning network resources + to protection (which is not required most of the time) and risking + unrecoverable services in the event that multiple network defects or + failures occur. Rapid recovery can be achieved with dedicated + protection, but it is delayed by message exchanges in the management, + control, or data planes for shared protection. This means that there + is also a trade-off between rapid recovery and resource sharing. In + some cases, shared protection might not meet the speed required for + protection, but it may still be faster than restoration. + + These trade-offs may be somewhat mitigated by the following: + + o Adjusting the value of n in 1:n protection. + + o Using m:n protection for a value of m > 1. + + o Establishing new protection paths as each available protection + path is put into use. + + In an MPLS-TP network, the degree to which a resource is shared + between LSPs is a policy issue. This policy may be applied to the + resource or to the LSPs, and may be pre-configured, configured per + LSP and installed during LSP establishment, or may be dynamically + configured. + +4.3.3. Extra Traffic + + Section 2.5.1.1 of [RFC5654] says: "Support for extra traffic (as + defined in [RFC4427]) is not required in MPLS-TP and MAY be omitted + from the MPLS-TP specifications". This document observes that extra + traffic facilities may therefore be provided as part of the MPLS-TP + survivability toolkit depending upon the development of suitable + solution specifications. The remainder of this section explains the + concepts of extra traffic without prejudging the decision to specify + or not specify such solutions. + + Network resources allocated for protection represent idle capacity + during the time that recovery is not actually required, and can be + utilized by carrying other traffic, referred to as "extra traffic". + + Note that extra traffic does not need to start or terminate at the + ends of the entity (e.g., LSP) that it uses. + + When a network resource carrying extra traffic is required for the + recovery of protected traffic from the failed working path, the extra + traffic is disrupted. This disruption make take one of two forms: + + + + + +Sprecher & Farrel Informational [Page 17] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + - In "hard preemption", the extra traffic is excluded from the + protection resource. The disruption of the extra traffic is total, + and the service supported by the extra traffic must be dropped, or + some form of rerouting or restoration must be applied to the extra + traffic LSP in order to recover the service. + + Hard preemption is achieved by "setting a switch" on the path of + the extra traffic such that it no longer flows. This situation may + be detected by OAM and reported as a fault, or may be proactively + reported through OAM or control-plane signaling. + + - In "soft preemption", the extra traffic is not explicitly excluded + from the protection resource, but is given lower priority than the + protected traffic. In a packet network (such as MPLS-TP), this can + result in oversubscription of the protection resource with the + result that the extra traffic receives "best-effort" delivery. + Depending on the volume of protection and extra traffic, and the + level of oversubscription, the extra traffic may be slightly or + heavily impacted. + + The event of soft preemption may be detected by OAM and reported as + a degradation of traffic delivery or as a fault. It may also be + proactively reported through OAM or control-plane signaling. + + Note that both hard and soft preemption may utilize additional + message exchanges in the management, control, or data planes. These + messages do not necessarily mean that recovery is delayed, but may + increase the complexity of the protection system. Thus, the benefits + of carrying extra traffic must be weighed against the disadvantages + of delayed recovery, additional network overhead, and the impact on + the services that support the extra traffic according to the details + of the solutions selected. + + Note that extra traffic is not protected by definition, but may be + restored. + + Extra traffic is not supported on dedicated protection resources, + which, by definition, are used for 1+1 protection (Section 4.3.1), + but it can be supported in other protection schemes, including shared + protection (Section 4.3.2) and tunnel protection (Section 4.4.3). + + Best-effort traffic should not be confused with extra traffic. For + best-effort traffic, the network does not guarantee data delivery, + and the user does not receive guaranteed quality of service (e.g., in + terms of jitter, packet loss, delay, etc.). Best-effort traffic + depends on the current traffic load. However, for extra traffic, + quality can only be guaranteed until resources are required for + recovery. At this point, the extra traffic may be completely + + + +Sprecher & Farrel Informational [Page 18] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + displaced, may be treated as best effort, or may itself be recovered + (for example, by restoration techniques). + +4.3.4. Restoration + + This section refers to LSP restoration. Restoration for PWs is + beyond the scope of this document (but see Section 7). + + Restoration represents the most effective use of network resources, + since no resources are reserved for recovery. However, restoration + requires the computation of a new path and the activation of a new + LSP (through the management or control plane). It may be more time- + consuming to perform these steps than to implement recovery using + protection techniques. + + Furthermore, there is no guarantee that restoration will be able to + recover the service. It may be that all suitable network resources + are already in use for other LSPs, so that no new path can be found. + This problem can be partially mitigated by using LSP setup + priorities, so that recovery LSPs can preempt existing LSPs with + lower priorities. + + Additionally, when a network defect occurs, multiple LSPs may be + disrupted by the same event. These LSPs may have been established by + different Network Management Stations (NMSes) or they may have been + signaled by different head-end MPLS-TP nodes, meaning that multiple + points in the network will try to compute and establish recovery LSPs + at the same time. This can lead to a lack of resources within the + network and cause recovery failures; some recovery actions will need + to be retried, resulting in even slower recovery times for some + services. + + Both hard and soft LSP restoration may be supported. For hard LSP + restoration, the resources of the working LSP are released before the + recovery LSP is fully established (i.e., break-before-make). For + soft LSP restoration, the resources of the working LSP are released + after an alternate LSP is fully established (i.e., make-before- + break). Note that in the case of reversion (Section 4.3.5), the + resources associated with the working LSP are not released. + + The restoration resources may be pre-calculated and even pre-signaled + before the restoration action starts, but not pre-allocated. This is + known as pre-planned LSP restoration. The complete + establishment/activation of the restoration LSP occurs only when the + restoration action starts. Pre-planning may occur periodically and + provides the most accurate information about the available resources + in the network. + + + + +Sprecher & Farrel Informational [Page 19] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +4.3.5. Reversion + + After a service has been recovered and traffic is flowing along the + recovery LSP, the defective network resource may be replaced. + Traffic can be redirected back onto the original working LSP (known + as "reversion"), or it can be left where it is on the recovery LSP + ("non-revertive" behavior). + + It should be possible to specify the reversion behavior of each + service; this might even be configured for each recovery instance. + + In non-revertive mode, an additional operational option is possible + where protection roles are switched, so that the recovery LSP becomes + the working LSP, while the previous working path (or the resources + used by the previous working path) are used for recovery in the event + of an additional fault. + + In revertive mode, it is important to prevent excessive swapping + between the working and recovery paths in the case of an intermittent + defect. This can be addressed by using a reversion delay timer (the + Wait-To-Restore timer), which controls the length of time to wait + before reversion following the repair of a fault on the original + working path. It should be possible for an operator to configure + this timer per LSP, and a default value should be defined. + +4.4. Mechanisms for Protection + + This section provides general descriptions (MPLS-TP non-specific) of + the mechanisms that can be used for protection purposes. As + indicated above, while the functional architecture applies to both + LSPs and PWs, the mechanism for recovery described in this document + refers to LSPs and LSP segments only. Recovery mechanisms for + pseudowires and pseudowire segments are for further study and will be + described in a separate document (see also Section 7). + +4.4.1. Link-Level Protection + + Link-level protection refers to two paradigms: (1) where protection + is provided in a lower network layer and (2) where protection is + provided by the MPLS-TP link layer. + + Note that link-level protection mechanisms do not protect the nodes + at each end of the entity (e.g., a link or span) that is protected. + End-to-end or segment protection should be used in conjunction with + link-level protection to protect against a failure of the edge nodes. + + + + + + +Sprecher & Farrel Informational [Page 20] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + Link-level protection offers the following grades of protection: + + o Full protection where a dedicated protection entity (e.g., a link + or span) is pre-established to protect a working entity. When the + working entity fails, the protected traffic is switched to the + protecting entity. In this scenario, all LSPs carried over the + working entity are recovered (in one protection operation) when + there is a failure condition. This is referred to in [RFC4427] as + "bulk recovery". + + o Partial protection where only a subset of the LSPs or traffic + carried over a selected entity is recovered when there is a + failure condition. The decision as to which LSPs will be + recovered and which will not depends on local policy. + + When there is no failure on the working entity, the protection entity + may transport extra traffic that may be preempted when protection + switching occurs. + + If link-level protection is available, it may be desirable to allow + this to be attempted before attempting other recovery mechanisms for + the transport paths affected by the fault because link-level + protection may be faster and more conservative of network resources. + This can be achieved both by limiting the propagation of fault + condition notifications and by delaying the other recovery actions. + This consideration of other protection can be compared with the + discussion of recovery domains (Section 4.5) and recovery in multi- + layer networks (Section 4.9). + + A protection mechanism may be provided at the MPLS-TP link layer + (which connects two MPLS-TP nodes). Such a mechanism can make use of + the procedures defined in [RFC5586] to set up in-band communication + channels at the MPLS-TP Section level, to use these channels to + monitor the health of the MPLS-TP link, and to coordinate the + protection states between the ends of the MPLS-TP link. + +4.4.2. Alternate Paths and Segments + + The use of alternate paths and segments refers to the paradigm + whereby protection is performed in the network layer in which the + protected LSP is located; this applies either to the entire end-to- + end LSP or to a segment of the LSP. In this case, hierarchical LSPs + are not used (compare with Section 4.4.3). + + Different grades of protection may be provided: + + o Dedicated protection where a dedicated entity (e.g., LSP or LSP + segment) is (fully) pre-established to protect a working entity + + + +Sprecher & Farrel Informational [Page 21] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + (e.g., LSP or LSP segment). When a failure condition occurs on + the working entity, traffic is switched onto the protection + entity. Dedicated protection may be performed using 1:1 or 1+1 + linear protection schemes. When the failure condition is + eliminated, the traffic may revert to the working entity. This is + subject to local configuration. + + o Shared protection where one or more protection entities is pre- + established to protect against a failure of one or more working + entities (1:n or m:n). + + When the fault condition on the working entity is eliminated, the + traffic should revert back to the working entity in order to allow + other related working entities to be protected by the shared + protection resource. + +4.4.3. Protection Tunnels + + A protection tunnel is pre-provisioned in order to protect against a + failure condition along a sequence of spans in the network. This may + be achieved using LSP heirarchy. We call such a sequence a network + segment. A failure of a network segment may affect one or more LSPs + that transit the network segment. + + When a failure condition occurs in the network segment (detected + either by OAM on the network segment, or by OAM on a concatenated + segment of one of the LSPs transiting the network segment), one or + more of the protected LSPs are switched over at the ingress point of + the network segment and are transmitted over the protection tunnel. + This is implemented through label stacking. Label mapping may be an + option as well. + + Different grades of protection may be provided: + + o Dedicated protection where the protection tunnel reserves + sufficient resources to provide protection for all protected LSPs + without causing service degradation. + + o Partial protection where the protection tunnel has enough + resources to protect some of the protected LSPs, but not all of + them simultaneously. Policy dictates how this situation should be + handled: it is possible that some LSPs would be protected, while + others would simply fail; it is possible that traffic would be + guaranteed for some LSPs, while for other LSPs it would be treated + as best effort with the risk of packets being dropped. + Alternatively, it is possible that protection would not be + attempted. + + + + +Sprecher & Farrel Informational [Page 22] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +4.5. Recovery Domains + + Protection and restoration are performed in the context of a recovery + domain. A recovery domain is defined between two or more recovery + reference end points that are located at the edges of the recovery + domain and that border on the element on which recovery can be + provided (as described in Section 4.2). This element can be an end- + to-end path, a segment, or a span. + + An end-to-end path can be observed as a special segment case where + the ingress and egress Label Edge Routers (LERs) serve as the + recovery reference end points. + + In this simple case of a point-to-point (P2P) protected entity, two + end points reside at the boundary of the protection domain. An LSP + can enter through one reference end point and exit the recovery + domain through another reference end point. + + In the case of unidirectional point-to-multipoint (P2MP), three or + more end points reside at the boundary of the protection domain. One + of the end points is referred to as the source/root, while the others + are referred to as sinks/leaves. An LSP can enter the recovery + domain through the root point and exit the recovery domain through + the leaf points. + + The recovery mechanism should restore traffic that was interrupted by + a facility (link or node) fault within the recovery domain. Note + that a single link may be part of several recovery domains. If two + recovery domains have common links, one recovery domain must be + contained within the other. This can be referred to as nested + recovery domains. The boundaries of recovery domains may coincide, + but recovery domains must not overlap. + + Note that the edges of a recovery domain are not protected, and + unless the whole domain is contained within another recovery domain, + the edges form a single point of failure. + + A recovery group is defined within a recovery domain and consists of + a working (primary) entity and one or more recovery (backup) entities + that reside between the end points of the recovery domain. To + guarantee protection in all situations, a dedicated recovery entity + should be pre-provisioned using disjoint resources in the recovery + domain, in order to protect against a failure of a working entity. + Of course, mechanisms to detect faults and to trigger protection + switching are also needed. + + The method used to monitor the health of the recovery element is + beyond the scope of this document. The end points that are + + + +Sprecher & Farrel Informational [Page 23] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + responsible for the recovery action must receive information on its + condition. The condition of the recovery element may be 'OK', + 'failed', or 'degraded'. + + When the recovery operation is to be triggered by OAM mechanisms, an + OAM Maintenance Entity Group must be defined for each of the working + and protection entities. + + The recovery entities and functions in a recovery domain can be + configured using a management plane or a control plane. A management + plane may be used to configure the recovery domain by setting the + reference points, the working and recovery entities, and the recovery + type (e.g., 1:1 bidirectional linear protection, ring protection, + etc.). Additional parameters associated with the recovery process + may also be configured. For more details, see Section 6.1. + + When a control plane is used, the ingress LERs may communicate with + the recovery reference points that request that protection or + restoration be configured across a recovery domain. For details, see + Section 6.5. + + Cases of multiple interconnections between distinct recovery domains + create a hierarchical arrangement of recovery domains, since a single + top-level recovery domain is created from the concatenation of two + recovery domains with multiple interconnections. In this case, + recovery actions may be taken both in the individual, lower-level + recovery domains to protect any LSP segment that crosses the domain, + and within the higher-level recovery domain to protect the longer LSP + segment that traverses the higher-level domain. + + The MPLS-TP recovery mechanism can be arranged to ensure coordination + between domains. In interconnected rings, for example, it may be + preferable to allow the upstream ring to perform recovery before the + downstream ring, in order to ensure that recovery takes place in the + ring in which the defect occurred. Coordination of recovery actions + is particularly important in nested domains and is discussed further + in Section 4.9. + +4.6. Protection in Different Topologies + + As described in the requirements listed in Section 3 and detailed in + [RFC5654], the selected recovery techniques may be optimized for + different network topologies if the optimized mechanisms perform + significantly better than the generic mechanisms in the same + topology. + + These mechanisms are required (R91 of [RFC5654]) to interoperate with + the mechanisms defined for arbitrary topologies, in order to allow + + + +Sprecher & Farrel Informational [Page 24] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + end-to-end protection and to ensure that consistent protection + techniques are used across the entire network. In this context, + 'interoperate' means that the use of one technique must not inhibit + the use of another technique in an adjacent part of the network for + use on the same end-to-end transport path, and must not prohibit the + use of end-to-end protection mechanisms. + + The next sections (4.7 and 4.8) describe two different topologies and + explain how recovery may be markedly different in those different + scenarios. They also develop the concept of a recovery domain and + show how end-to-end survivability may be achieved through a + concatenation of recovery domains, each providing some grade of + recovery in part of the network. + +4.7. Mesh Networks + + A mesh network is any network where there is arbitrary + interconnectivity between nodes in the network. Mesh networks are + usually contrasted with more specific topologies such as hub-and- + spoke or ring (see Section 4.8), although such networks are actually + examples of mesh networks. This section is limited to the discussion + of protection techniques in the context of mesh networks. That is, + it does not include optimizations for specific topologies. + + Linear protection is a protection mechanism that provides rapid and + simple protection switching. In a mesh network, linear protection + provides a very suitable protection mechanism because it can operate + between any pair of points within the network. It can protect + against a defect in a node, a span, a transport path segment, or an + end-to-end transport path. Linear protection gives a clear + indication of the protection status. + + Linear protection operates in the context of a protection domain. A + protection domain is a special type of recovery domain (see Section + 4.5) associated with the protection function. A protection domain is + composed of the following architectural elements: + + o A set of end points that reside at the boundary of the protection + domain. In the simple case of 1:n or 1+1 P2P protection, two end + points reside at the boundary of the protection domain. In each + transmission direction, one of the end points is referred to as + the source, and the other is referred to as the sink. For + unidirectional P2MP protection, three or more end points reside at + the boundary of the protection domain. One of the end points is + referred to as the source/root, while the others are referred to + as sinks/leaves. + + + + + +Sprecher & Farrel Informational [Page 25] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + o A Protection Group consists of one or more working (primary) paths + and one or more protection (backup) paths that run between the end + points belonging to the protection domain. To guarantee + protection in all scenarios, a dedicated protection path should be + pre-provisioned to protect against a defect of a working path + (i.e., 1:1 or 1+1 protection schemes). In addition, the working + and the protection paths should be disjoint; i.e., the physical + routes of the working and the protection paths should be + physically diverse in every respect. + + Note that if the resources of the protection path are less than those + of the working path, the protection path may not have sufficient + resources to protect the traffic of the working path. + + As mentioned in Section 4.3.2, the resources of the protection path + may be shared as 1:n. In this scenario, the protection path will not + have sufficient resources to protect all the working paths at a + specific time. + + For bidirectional P2P paths, both unidirectional and bidirectional + protection switching are supported. If a defect occurs when + bidirectional protection switching is defined, the protection actions + are performed in both directions (even if the defect is + unidirectional). The protection state is required to operate with a + level of coordination between the end points of the protection + domain. + + In unidirectional protection switching, the protection actions are + only performed in the affected direction. + + Revertive and non-revertive operations are provided as options for + the network operator. + + Linear protection supports the protection schemes described in the + following sub-sections. + +4.7.1. 1:n Linear Protection + + In the 1:1 scheme, a protection path is allocated to protect against + a defect, failure, or a degradation in a working path. As described + above, to guarantee protection, the protection entity should support + the full capacity and bandwidth, although it may be configured (for + example, because of limited network resource availability) to offer a + degraded service when compared with the working entity. + + Figure 1 presents 1:1 protection architecture. In normal conditions, + data traffic is transmitted over the working entity, while the + protection entity functions in the idle state. (OAM may run on the + + + +Sprecher & Farrel Informational [Page 26] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + protection entity to verify its state.) Normal conditions are + defined when there is no defect, failure, or degradation on the + working entity, and no administrative configuration or request causes + traffic to flow over the protection entity. + + |-----------------Protection Domain---------------| + + ============================== + /**********Working path***********\ + +--------+ ============================== +--------+ + | Node /| |\ Node | + | A {< | | >} B | + | | | | + +--------+ ============================== +--------+ + Protection path + ============================== + + Figure 1: 1:1 Protection Architecture + + If there is a defect on the working entity or a specific + administrative request, traffic is switched to the protection entity. + + Note that when operating with non-revertive behavior (see Section + 4.3.5), after the conditions causing the switchover have been + cleared, the traffic continues to flow on the protection path, but + the working and protection roles are not switched. + + In each transmission direction, the protection domain source bridges + traffic onto the appropriate entity, while the sink selects traffic + from the appropriate entity. The source and the sink need to + coordinate the protection states to ensure that bridging and + selection are performed to and from the same entity. For this + reason, a signaling coordination protocol (either a data-plane in- + band signaling protocol or a control-plane-based signaling protocol) + is required. + + In bidirectional protection switching, both ends of the protection + domain are switched to the protection entity (even when the fault is + unidirectional). This requires a protocol to coordinate the + protection state between the two end points of the protection domain. + + When there is no defect, the bandwidth resources of the idle entity + may be used for traffic with lower priority. When protection + switching is performed, the traffic with lower priority may be + preempted by the protected traffic through tearing down the LSP with + lower priority, reporting a fault on the LSP with lower priority, or + by treating the traffic with lower priority as best effort and + discarding it when there is congestion. + + + +Sprecher & Farrel Informational [Page 27] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + In the general case of 1:n linear protection, one protection entity + is allocated to protect n working entities. The protection entity + might not have sufficient resources to protect all the working + entities that may be affected by fault conditions at a specific time. + In this case, in order to guaranteed protection, the protection + entity should support enough capacity and bandwidth to protect any of + the n working entities. + + When defects or failures occur along multiple working entities, the + entity to be protected should be prioritized. The protection states + between the edges of the protection domain should be fully + coordinated to ensure consistent behavior. As explained in Section + 4.3.5, revertive behavior is recommended when 1:n is supported. + +4.7.2. 1+1 Linear Protection + + In the 1+1 protection scheme, a fully dedicated protection entity is + allocated. + + As depicted in Figure 2, data traffic is copied and fed at the source + to both the working and the protection entities. The traffic on the + working and the protection entities is transmitted simultaneously to + the sink of the protection domain, where selection between the + working and protection entities is performed (based on some + predetermined criteria). + + |---------------Protection Domain---------------| + + ============================== + /**********Working path************\ + +--------+ ============================== +--------+ + | Node /| |\ Node | + | A {< | | >} Z | + | \| |/ | + +--------+ ============================== +--------+ + \**********Protection path*********/ + ============================== + + Figure 2: 1+1 Protection Architecture + + Note that control traffic between the edges of the protection domain + (such as OAM or a control protocol to coordinate the protection + state, etc.) may be transmitted on an entity that differs from the + one used for the protected traffic. These packets should not be + discarded by the sink. + + + + + + +Sprecher & Farrel Informational [Page 28] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + In 1+1 unidirectional protection switching, there is no need to + coordinate the protection state between the protection controllers at + both ends of the protection domain. In 1+1 bidirectional protection + switching, a protocol is required to coordinate the protection state + between the edges of the protection domain. + + In both protection schemes, traffic flows end-to-end on the working + entity after the conditions causing the switchover have been cleared. + Data selection may return to selecting traffic from the working + entity if reversion is enabled, and it will require coordination of + the protection state between the edges of the protection domain. To + avoid frequent switching caused by intermittent defects or failures + when the network is not stable, traffic is not selected from the + working entity before the Wait-To-Restore (WTR) timer has expired. + +4.7.3. P2MP Linear Protection + + Linear protection may be applied to protect unidirectional P2MP + entities using 1+1 protection architecture. The source/root MPLS-TP + node bridges the user traffic to both the working and protection + entities. Each sink/leaf MPLS-TP node selects the traffic from one + entity according to some predetermined criteria. Note that when + there is a fault condition on one of the branches of the P2MP path, + some leaf MPLS-TP nodes may select the working entity, while other + leaf MPLS-TP nodes may select traffic from the protection entity. + + In a 1:1 P2MP protection scheme, the source/root MPLS-TP node needs + to identify the existence of a fault condition on any of the branches + of the network. This means that the sink/leaf MPLS-TP nodes need to + notify the source/root MPLS-TP node of any fault condition. This + also necessitates a return path from the sinks/leaves to the + source/root MPLS-TP node. When protection switching is triggered, + the source/root MPLS-TP node selects the protection transport path + for traffic transfer. + + A form of "segment recovery for P2MP LSPs" could be constructed. + Given a P2MP LSP, one can protect any possible point of failure (link + or node) using N backup P2MP LSPs. Each backup P2MP LSP originates + from the upstream node with respect to a different possible failure + point and terminates at all of the destinations downstream of the + potential failure point. In case of a failure, traffic is redirected + to the backup P2MP path. + + Note that such mechanisms do not yet exist, and their exact behavior + is for further study. + + A 1:n protection scheme for P2MP transport paths is also required by + [RFC5654]. Such a mechanism is for future study. + + + +Sprecher & Farrel Informational [Page 29] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +4.7.4. Triggers for the Linear Protection Switching Action + + Protection switching may be performed when: + + o A defect condition is detected on the working entity, and the + protection entity has "no" or an inferior condition. Proactive + in-band OAM Continuity Check and Connectivity Verification (CC-V) + monitoring of both the working and the protection entities may be + used to enable the rapid detection of a fault condition. For + protection switching, it is common to run a CC-V every 3.33 ms. + In the absence of three consecutive CC-V messages, a fault + condition is declared. In order to monitor the working and the + protection entities, an OAM Maintenance Entity Group should be + defined for each entity. OAM indications associated with fault + conditions should be provided at the edges of the protection + domain that is responsible for the protection-switching operation. + Input from OAM performance monitoring that indicates degradation + in the working entity may also be used as a trigger for protection + switching. In the case of degradation, switching to the + protection entity is needed only if the protection entity can + exhibit better operating conditions. + + o An indication is received from a lower-layer server that there is + a defect in the lower layer. + + o An external operator command is received (e.g., 'Forced Switch', + 'Manual Switch'). For details, see Section 6.1.2. + + o A request to switch over is received from the far end. The far + end may initiate this request, for example, on receipt of an + administrative request to switch over, or when bidirectional 1:1 + protection switching is supported and a defect occurred that could + only be detected by the far end, etc. + + As described above, the protection state should be coordinated + between the end points of the protection domain. Control messages + should be exchanged between the edges of the protection domain to + coordinate the protection state of the edge nodes. Control messages + can be delivered using an in-band, data-plane-driven control protocol + or a control-plane-based protocol. + + For 50-ms protection switching, it is recommended that an in-band, + data-plane-driven signaling protocol be used in order to coordinate + the protection states. An in-band, data-plane protocol for use in + MPLS-TP networks is documented in [MPLS-TP-LP] for linear protection + (ring protection is discussed in Section 4.8 of this document). This + protocol is also used to detect mismatches between the configurations + provisioned at the ends of the protection domain. + + + +Sprecher & Farrel Informational [Page 30] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + As described in Section 6.5, the GMPLS control plane already includes + procedures and message elements to coordinate the protection states + between the edges of the protection domain. These procedures and + protocol messages are specified in [RFC4426], [RFC4872], and + [RFC4873]. However, these messages lack the capability to coordinate + the revertive/non-revertive behavior and the consistency of + configured timers at the edges of the protection domain (timers such + as WTR, hold-off timer, etc.). + +4.7.5. Applicability of Linear Protection for LSP Segments + + In order to implement data-plane-based linear protection on LSP + segments, use is made of the Sub-Path Maintenance Element (SPME), an + MPLS-TP architectural element defined in [RFC5921]. Maintenance + operations (e.g., monitoring, protection, or management) engage with + message transmission (e.g., OAM, Protection Path Coordination, etc.) + in the maintained domain. Further discussion of the architecture for + OAM and SPME is found in [RFC5921] and [RFC6371]. An SPME is an LSP + that is basically defined and used for the purposes of OAM + monitoring, protection, or management of LSP segments. The SPME uses + the MPLS construct of a hierarchical, nested LSP, as defined in + [RFC3031]. + + For linear protection, SPMEs should be defined over the working and + protection entities between the edges of a protection domain. OAM + messages and messages used to coordinate protection state can be + initiated at the edge of the SPME and sent to the peer edge of the + SPME. Note that these messages are sent over the Generic Associated + Channel (G-ACh) within the SPME, and that they use a two-label stack, + the SPME label, and, at the bottom of the stack, the G-ACh label + (GAL) [RFC5586]. + + The end-to-end traffic of the LSP, which includes data traffic and + control traffic (messages for OAM, management, signaling, and to + coordinate protection state), is tunneled within the SPMEs by means + of label stacking, as defined in [RFC3031]. + + Mapping between an LSP and an SPME can be 1:1; this is similar to the + ITU-T Tandem Connection element that defines a sub-layer + corresponding to a segment of a path. Mapping can also be 1:n to + allow the scalable protection of a set of LSP segments traversing the + part of the network in which a protection domain is defined. Note + that each of these LSPs can be initiated or terminated at different + end points in the network, but that they all traverse the protection + domain and share similar constraints (such as requirements for + quality of service (QoS), terms of protection, etc.). + + + + + +Sprecher & Farrel Informational [Page 31] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + Note also that in the context of segment protection, the SPMEs serve + as the working and protection entities. + +4.7.6. Shared Mesh Protection + + For shared mesh protection, the protection resources are used to + protect multiple LSPs that do not all share the same end points; for + example, in Figure 3 there are two paths, ABCDE and VWXYZ. These + paths do not share end points and cannot, therefore, make use of 1:n + linear protection, even though they do not have any common points of + failure. + + ABCDE may be protected by the path APQRE, while VWXYZ can be + protected by the path VPQRZ. In both cases, 1:1 or 1+1 protection + may be used. However, it can be seen that if 1:1 protection is used + for both paths, the PQR network segment does not carry traffic when + no failures affect either of the two working paths. Furthermore, in + the event of only one failure, the PQR segment carries traffic from + only one of the working paths. + + Thus, it is possible for the network resources on the PQR segment to + be shared by the two recovery paths. In this way, mesh protection + can substantially reduce the number of network resources that have to + be reserved in order to provide 1:n protection. + + A----B----C----D----E + \ / + \ / + \ / + P-----Q-----R + / \ + / \ + / \ + V----W----X----Y----Z + + Figure 3: A Shared Mesh Protection Topology + + As the network becomes more complex and the number of LSPs increases, + the potential for shared mesh protection also increases. However, + this can quickly become unmanageable owing to the increased + complexity. Therefore, shared mesh protection is normally pre- + planned and configured by the operator, although an automated system + cannot be ruled out. + + Note that shared mesh protection operates as 1:n linear protection + (see Section 4.7.1). However, the protection state needs to be + coordinated between a larger number of nodes: the end points of the + shared concatenated protection segment (nodes P and R in the example) + + + +Sprecher & Farrel Informational [Page 32] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + as well as the end points of the protected LSPs (nodes A, E, V, and Z + in the example). + + Additionally, note that the shared-protection resources could be used + to carry extra traffic. For example, in Figure 4, an LSP JPQRK could + be a preemptable LSP that constitutes extra traffic over the PQR + hops; it would be displaced in the event of a protection event. In + this case, it should be noted that the protection state must also be + coordinated with the ends of the extra-traffic LSPs. + + A----B----C----D----E + \ / + \ / + \ / + J-----P-----Q-----R-----K + / \ + / \ + / \ + V----W----X----Y----Z + + Figure 4: Shared Mesh Protection with Extra Traffic + +4.8. Ring Networks + + Several service providers have expressed great interest in the + operation of MPLS-TP in ring topologies; they demand a high degree of + survivability functionality in these topologies. + + Various criteria for optimization are considered in ring topologies, + such as: + + 1. Simplification in ring operation in terms of the number of OAM + Maintenance Entities that are needed to trigger the recovery + actions, the number of recovery elements, the number of + management-plane transactions during maintenance operations, etc. + + 2. Optimization of resource consumption around the ring, such as the + number of labels needed for the protection paths that traverse + the network, the total bandwidth required in the ring to ensure + path protection, etc. (see R91 of [RFC5654]). + + [RFC5654] introduces a list of requirements for ring protection + covering the recovery mechanisms needed to protect traffic in a + single ring as well as traffic that traverses more than one ring. + Note that configuration and the operation of the recovery mechanisms + in a ring must scale well with the number of transport paths, the + number of nodes, and the number of ring interconnects. + + + + +Sprecher & Farrel Informational [Page 33] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + The requirements for ring protection are fully compatible with the + generic requirements for recovery. + + The architecture and the mechanisms for ring protection are specified + in separate documents. These mechanisms need to be evaluated against + the requirements specified in [RFC5654], which includes guidance on + the principles for the development of new mechanisms. + +4.9. Recovery in Layered Networks + + In multi-layer or multi-regional networking [RFC5212], recovery may + be performed at multiple layers or across nested recovery domains. + + The MPLS-TP recovery mechanism must ensure that the timing of + recovery is coordinated in order to avoid race scenarios. This also + allows the recovery mechanism of the server layer to fix the problem + before recovery takes place in the MPLS-TP layer, or the MPLS-TP + layer to perform recovery before a client network. + + A hold-off timer is required to coordinate recovery timing in + multiple layers or across nested recovery domains. Setting this + configurable timer involves a trade-off between rapid recovery and + the creation of a race condition where multiple layers respond to the + same fault, potentially allocating resources in an inefficient + manner. Thus, the detection of a defect condition in the MPLS-TP + layer should not immediately trigger the recovery process if the + hold-off timer is configured as a value other than zero. Instead, + the hold-off timer should be started when the defect is detected and, + on expiry, the recovery element should be checked to determine + whether the defect condition still exists. If it does exist, the + defect triggers the recovery operation. + + The hold-off timer should be configurable. + + In other configurations, where the lower layer does not have a + restoration capability, or where it is not expected to provide + protection, the lower layer needs to trigger the higher layer to + immediately perform recovery. Although this can be forced by + configuring the hold-off timer as zero, it may be that because of + layer independence, the higher layer does not know whether the lower + layer will perform restoration. In this case, the higher layer will + configure a non-zero hold-off timer and rely on the receipt of a + specific notification from the lower layer if the lower layer cannot + perform restoration. Since layer boundaries are always within nodes, + such coordination is implementation-specific and does not need to be + covered here. + + + + + +Sprecher & Farrel Informational [Page 34] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + Reference should be made to [RFC3386], which discusses the + interaction between layers in survivable networks. + +4.9.1. Inherited Link-Level Protection + + Where a link in the MPLS-TP network is formed through connectivity + (i.e., a packet or non-packet LSP) in a lower-layer network, that + connectivity may itself be protected; for example, the LSP in the + lower-layer network may be provisioned with 1+1 protection. In this + case, the link in the MPLS-TP network has an inherited grade of + protection. + + An LSP in the MPLS-TP network may be provisioned with protection in + the MPLS-TP network, as already described, or it may be provisioned + to utilize only those links that have inherited protection. + + By classifying the links in the MPLS-TP network according to the + grade of protection that they inherited from the server network, it + is possible to compute an end-to-end path in the MPLS-TP network that + uses only those links with a specific or superior grade of inherited + protection. This means that the end-to-end MPLS-TP LSP can be + protected at the grade necessary to conform to the SLA without + needing to provide any additional protection in the MPLS-TP layer. + This reduces complexity, saves network resources, and eliminates + protection-switching coordination problems. + + When the requisite grade of inherited protection is not available on + all segments along the path in the MPLS-TP network, segment + protection may be used to achieve the desired protection grade. + + It should be noted, however, that inherited protection only applies + to links. Nodes cannot be protected in this way. An operator will + need to perform an analysis of the relative likelihood and + consequences of node failure if this approach is taken without + providing protection in the MPLS-TP LSP or PW layer to handle node + failure. + +4.9.2. Shared Risk Groups + + When an MPLS-TP protection scheme is established, it is important + that the working and protection paths do not share resources in the + network. If this is not achieved, a single defect may affect both + the working and the protection paths with the result that traffic + cannot be delivered -- since under such a condition the traffic was + not protected. + + + + + + +Sprecher & Farrel Informational [Page 35] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + Note that this restriction does not apply to restoration, since this + takes place after the fault has occurred, which means that the point + of failure can be avoided if an available path exists. + + When planning a recovery scheme, it is possible to use a topology map + of the MPLS-TP layer to select paths that use diverse links and nodes + within the MPLS-TP network. However, this does not guarantee that + the paths are truly diverse; for example, two separate links in an + MPLS-TP network may be provided by two lambdas in the same optical + fiber, or by two fibers that cross the same bridge. Moreover, two + completely separate MPLS-TP nodes might be situated in the same + building with a shared power supply. + + Thus, in order to achieve proper recovery planning, the MPLS-TP + network must have an understanding of the groups of lower-layer + resources that share a common risk of failure. From this, MPLS-TP + shared risk groups can be constructed that show which MPLS-TP + resources share a common risk of failure. Diversity of working and + protection paths can be planned, not only with regard to nodes and + links but also in order to refrain from using resources from the same + shared risk groups. + +4.9.3. Fault Correlation + + In a layered network, a low-layer fault may be detected and reported + by multiple layers and may sometimes lead to the generation of + multiple fault reports from the same layer. For example, a failure + of a data link may be reported by the line cards in an MPLS-TP node, + but it could also be detected and reported by the MPLS-TP OAM. + + Section 4.6 explains how it is important to coordinate the + survivability actions configured and operated in a multi-layer + network in a way that will avoid over-equipping the survivability + resources in the network, while ensuring that recovery actions are + performed in only one layer at a time. + + Fault correlation is about understanding which single event has + generated a set of fault reports, so that recovery actions can be + coordinated, and so that the fault logging system does not become + overloaded. Fault correlation depends on understanding resource use + at lower layers, shared risk groups, and a wider view with regard to + the way in which the layers are interrelated. + + Fault correlation is most easily performed at the point of fault + detection; for example, an MPLS-TP node that receives a fault + notification from the lower layer, and detects a fault on an LSP in + the MPLS-TP layer, can easily correlate these two events. + Furthermore, if the same node detects multiple faults on LSPs that + + + +Sprecher & Farrel Informational [Page 36] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + share the same faulty data link, it can easily correlate them. Such + a node may use correlation to perform group-based recovery actions + and can reduce the number of alarm events that it generates to its + management station. + + Fault correlation may also be performed at a management station that + receives fault reports from different layers and different nodes in + the network. This enables the management station to coordinate + management-originated recovery actions and to present consolidated + fault information to the user and automated management systems. + + It is also necessary to correlate fault information detected and + reported through OAM. This function would enable a fault detected at + a lower layer, and reported at a transit node of an MPLS-TP LSP, to + be correlated with an MPLS-TP-layer fault detected at a Maintenance + End Point (MEP) -- for example, the egress of the MPLS-TP LSP. Such + correlation allows the coordination of recovery actions performed at + the MEP, but it also requires that the lower-layer fault information + is propagated to the MEP, which is most easily achieved using a + control plane, management plane, or OAM message. + +5. Applicability and Scope of Survivability in MPLS-TP + + The MPLS-TP network can be viewed as two layers (the MPLS LSP layer + and the PW layer). The MPLS-TP network operates over data-link + connections and data-link networks whereby the MPLS-TP links are + provided by individual data links or by connections in a lower-layer + network. The MPLS LSP layer is a mandatory part of the MPLS-TP + network, while the PW layer is an optional addition for supporting + specific services. + + MPLS-TP survivability provides recovery from failure of the links and + nodes in the MPLS-TP network. The link defects and failures are + typically caused by defects or failures in the underlying data-link + connections and networks, but this section is only concerned with + recovery actions performed in the MPLS-TP network, which must recover + from the manifestation of any problem as a defect failure in the + MPLS-TP network. + + This section lists the recovery elements (see Section 1) supported in + each of the two layers that can recover from defects or failures of + nodes or links in the MPLS-TP network. + + + + + + + + + +Sprecher & Farrel Informational [Page 37] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + +--------------+---------------------+------------------------------+ + | Recovery | MPLS LSP Layer | PW Layer | + | Element | | | + +--------------+---------------------+------------------------------+ + | Link | MPLS LSP recovery | The PW layer is not aware of | + | Recovery | can be used to | the underlying network. | + | | survive the failure | This function is not | + | | of an MPLS-TP link. | supported. | + +--------------+---------------------+------------------------------+ + | Segment/Span | An individual LSP | For an SS-PW, segment | + | Recovery | segment can be | recovery is the same as | + | | recovered to | end-to-end recovery. | + | | survive the failure | Segment recovery for an MS-PW| + | | of an MPLS-TP link. | is for future study, and | + | | | this function is now | + | | | provided using end-to-end | + | | | recovery. | + +--------------+---------------------+------------------------------+ + | Concatenated | A concatenated LSP | Concatenated segment | + | Segment | segment can be | recovery (in an MS-PW) is for| + | Recovery | recovered to | future study, and this | + | | survive the failure | function is now provided | + | | of an MPLS-TP link | using end-to-end recovery. | + | | or node. | | + +--------------+---------------------+------------------------------+ + | End-to-End | An end-to-end LSP | End-to-end PW recovery can | + | Recovery | can be recovered to | be applied to survive any | + | | survive any node or | node (including S-PE) or | + | | link failure, | link failure, except for | + | | except for the | failure of the ingress or | + | | failure of the | egress T-PE. | + | | ingress or egress | | + | | node. | | + +--------------+---------------------+------------------------------+ + | Service | The MPLS LSP layer | PW-layer service recovery | + | Recovery | is service- | requires surviving faults in | + | | agnostic. This | T-PEs or on Attachment | + | | function is not | Circuits (ACs). This is | + | | supported. | currently out of scope for | + | | | MPLS-TP. | + +--------------+---------------------+------------------------------+ + + Table 1: Recovery Elements Supported + by the MPLS LSP Layer and PW Layer + + Section 6 provides a description of mechanisms for MPLS-TP-LSP + survivability. Section 7 provides a brief overview of mechanisms for + MPLS-TP-PW survivability. + + + +Sprecher & Farrel Informational [Page 38] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +6. Mechanisms for Providing Survivability for MPLS-TP LSPs + + This section describes the existing mechanisms that provide LSP + protection within MPLS-TP networks and highlights areas where new + work is required. + +6.1. Management Plane + + As described above, a fundamental requirement of MPLS-TP is that + recovery mechanisms should be capable of functioning in the absence + of a control plane. Recovery may be triggered by MPLS-TP OAM fault + management functions or by external requests (e.g., an operator's + request for manual control of protection switching). Recovery LSPs + (and in particular Restoration LSPs) may be provisioned through the + management plane. + + The management plane may be used to configure the recovery domain by + setting the reference end-point points (which control the recovery + actions), the working and the recovery entities, and the recovery + type (e.g., 1:1 bidirectional linear protection, ring protection, + etc.). + + Additional parameters associated with the recovery process (such as + WTR and hold-off timers, revertive/non-revertive operation, etc.) may + also be configured. + + In addition, the management plane may initiate manual control of the + recovery function. A priority should be set for the fault conditions + and the operator's requests. + + Since provisioning the recovery domain involves the selection of a + number of options, mismatches may occur at the different reference + points. The MPLS-TP protocol to coordinate protection state, which + is specified in [MPLS-TP-LP], may be used as an in-band (i.e., data- + plane-based) control protocol to coordinate the protection states + between the end points of the recovery domain, and to check the + consistency of configured parameters (such as timers, revertive/non- + revertive behavior, etc.) with discovered inconsistencies that are + reported to the operator. + + It should also be possible for the management plane to track the + recovery status by receiving reports or by issuing polls. + + + + + + + + + +Sprecher & Farrel Informational [Page 39] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +6.1.1. Configuration of Protection Operation + + To implement the protection-switching mechanisms, the following + entities and information should be configured and provisioned: + + o The end points of a recovery domain. As described above, these + end points border on the element of recovery to which recovery is + applied. + + o The protection group, which, depending on the required protection + scheme, consists of a recovery entity and one or more working + entities. In 1:1 or 1+1 P2P protection, the paths of the working + entity and the recovery entities must be physically diverse in + every respect (i.e., not share any resources or physical + locations), in order to guarantee protection. + + o As defined in Section 4.8, the SPME must be supported in order to + implement data-plane-based LSP segment recovery, since related + control messages (e.g., for OAM, Protection Path Coordination, + etc.) can be initiated and terminated at the edges of a path where + push and pop operations are enabled. The SPME is an end-to-end + LSP that in this context corresponds to the recovery entities + (working and protection) and makes use of the MPLS construct of + hierarchical nested LSP, as defined in [RFC3031]. OAM messages + and messages to coordinate protection state can be initiated at + the edge of the SPME and sent over G-ACH to the peer edge of the + SPME. It is necessary to configure the related SPMEs and map + between the LSP segments being protected and the SPME. Mapping + can be 1:1 or 1:N to allow scalable protection of a set of LSP + segments traversing the part of the network in which a protection + domain is defined. + + Note that each of these LSPs can be initiated or terminated at + different end points in the network, but that they all traverse + the protection domain and share similar constraints (such as + requirements for QoS, terms of protection, etc.). + + o The protection type that should be defined (e.g., unidirectional + 1:1, bidirectional 1+1, etc.) + + o Revertive/non-revertive behavior should be configured. + + o Timers (such as WTR, hold-off timer, etc.) should be set. + + + + + + + + +Sprecher & Farrel Informational [Page 40] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +6.1.2. External Manual Commands + + The following external, manual commands may be provided for manual + control of the protection-switching operation. These commands apply + to a protection group; they are listed in descending order of + priority: + + o Blocked protection action - a manual command to prevent data + traffic from switching to the recovery entity. This command + actually disables the protection group. + + o Force protection action - a manual command that forces a switch of + normal data traffic to the recovery entity. + + o Manual protection action - a manual command that forces a switch of + data traffic to the recovery entity only when there is no defect + in the recovery entity. + + o Clear switching command - the operator may request that a previous + administrative switch command (manual or force switch) be cleared. + +6.2. Fault Detection + + Fault detection is a fundamental part of recovery and survivability. + In all schemes, with the exception of some types of 1+1 protection, + the actions required for the recovery of traffic delivery depend on + the discovery of some kind of fault. In 1+1 protection, the selector + (at the receiving end) may simply be configured to choose the better + signal; thus, it does not detect a fault or degradation of itself, + but simply identifies the path that is better for data delivery. + + Faults may be detected in a number of ways depending on the traffic + pattern and the underlying hardware. End-to-end faults may be + reported by the application or by knowledge of the application's data + pattern, but this is an unusual approach. There are two more common + mechanisms for detecting faults in the MPLS-TP layer: + + o Faults reported by the lower layers. + + o Faults detected by protocols within the MPLS-TP layer. + + In an IP/MPLS network, the second mechanism may utilize control-plane + protocols (such as the routing protocols) to detect a failure of + adjacency between neighboring nodes. In an MPLS-TP network, it is + possible that no control plane will be present. Even if a control + plane is present, it will be a GMPLS control plane [RFC3945], which + logically separates control channels from data channels; thus, no + conclusion about the health of a data channel can be drawn from the + + + +Sprecher & Farrel Informational [Page 41] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + failure of an associated control channel. MPLS-TP-layer faults are, + therefore, only detected through the use of OAM protocols, as + described in Section 6.4.1. + + Faults may, however, be reported by a lower layer. These generally + show up as interface failures or data-link failures (sometimes known + as connectivity failures) within the MPLS-TP network, for example, an + underlying optical link may detect loss of light and report a failure + of the MPLS-TP link that uses it. Alternatively, an interface card + failure may be reported to the MPLS-TP layer. + + Faults reported by lower layers are only visible in specific nodes + within the MPLS-TP network (i.e., at the adjacent end points of the + MPLS-TP link). This would only allow recovery to be performed + locally, so, to enable recovery to be performed by nodes that are not + immediately local to the fault, the fault must be reported (Sections + 6.4.3 and 6.5.4). + +6.3. Fault Localization + + If an MPLS-TP node detects that there is a fault in an LSP (that is, + not a network fault reported from a lower layer, but a fault detected + by examining the LSP), it can immediately perform a recovery action. + However, unless the location of the fault is known, the only + practical options are: + + o Perform end-to-end recovery. + + o Perform some other recovery as a speculative act. + + Since the speculative acts are not guaranteed to achieve the desired + results and could consume resources unnecessarily, and since end-to- + end recovery can require a lot of network resources, it is important + to be able to localize the fault. + + Fault localization may be achieved by dividing the network into + protection domains. End-to-end protection is thereby operated on LSP + segments, depending on the domain in which the fault is discovered. + This necessitates monitoring of the LSP at the domain edges. + + Alternatively, a proactive mechanism of fault localization through + OAM (Section 6.4.3) or through the control plane (Section 6.5.3) is + required. + + Fault localization is particularly important for restoration because + a new path must be selected that avoids the fault. It may not be + practical or desirable to select a path that avoids the entire failed + + + + +Sprecher & Farrel Informational [Page 42] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + working path, and it is therefore necessary to isolate the fault's + location. + +6.4. OAM Signaling + + MPLS-TP provides a comprehensive set of OAM tools for fault + management and performance monitoring at different nested levels + (end-to-end, a portion of a path (LSP or PW), and at the link level) + [RFC6371]. + + These tools support proactive and on-demand fault management (for + fault detection and fault localization) as well as performance + monitoring (to measure the quality of the signals and detect + degradation). + + To support fast recovery, it is useful to use some of the proactive + tools to detect fault conditions (e.g., link/node failure or + degradation) and to trigger the recovery action. + + The MPLS-TP OAM messages run in-band with the traffic and support + unidirectional and bidirectional P2P paths as well as P2MP paths. + + As described in [RFC6371], MPLS-TP OAM operates in the context of a + Maintenance Entity that borders on the OAM responsibilities and + represents the portion of a path between two points that is monitored + and maintained, and along which OAM messages are exchanged. + [RFC6371] refers also to a Maintenance Entity Group (MEG), which is a + collection of one or more Maintenance Entities (MEs) that belong to + the same transport path (e.g., P2MP transport path) and which are + maintained and monitored as a group. + + An ME includes two MEPs (Maintenance Entity Group End Points) that + reside at the boundaries of an ME, and a set of zero or more MIPs + (Maintenance Entity Group Intermediate Points) that reside within the + Maintenance Entity along the path. A MEP is capable of initiating + and terminating OAM messages, and as such can only be located at the + edges of a path where push and pop operations are supported. In + order to define an ME over a portion of path, it is necessary to + support SPMEs. + + The SPME is an end-to-end LSP that in this context corresponds to the + ME; it uses the MPLS construct of hierarchical nested LSPs, which is + defined in [RFC3031]. OAM messages can be initiated at the edge of + the SPME and sent over G-ACH to the peer edge of the SPME. + + The related SPMEs must be configured, and mapping must be performed + between the LSP segments being monitored and the SPME. Mapping can + be 1:1 or 1:N to allow scalable operation. Note that each of these + + + +Sprecher & Farrel Informational [Page 43] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + LSPs can be initiated or terminated at different end points in the + network and can share similar constraints (such as requirements for + QoS, terms of protection, etc.). + + With regard to recovery, where MPLS-TP OAM is supported, an OAM + Maintenance Entity Group is defined for each of the working and + protection entities. + +6.4.1. Fault Detection + + MPLS-TP OAM tools may be used proactively to detect the following + fault conditions between MEPs: + + o Loss of continuity and misconnectivity - the proactive Continuity + Check (CC) function is used to detect loss of continuity between + two MEPs in an MEG. The proactive Connectivity Verification (CV) + allows a sink MEP to detect a misconnectivity defect (e.g., + mismerge or misconnection) with its peer source MEP when the + received packet carries an incorrect ME identifier. For + protection switching, it is common to run a CC-V (Continuity Check + and Connectivity Verification) message every 3.33 ms. In the + absence of three consecutive CC-V messages, loss of continuity is + declared and is notified locally to the edge of the recovery + domain in order to trigger a recovery action. In some cases, when + a slower recovery time is acceptable, it is also possible to + lengthen the transmission rate. + + o Signal degradation - notification from OAM performance monitoring + indicating degradation in the working entity may also be used as a + trigger for protection switching. In the event of degradation, + switching to the recovery entity is necessary only if the recovery + entity can guarantee better conditions. Degradation can be + measured by proactively activating MPLS-TP OAM packet loss + measurement or delay measurement. + + o A MEP can receive an indication from its sink MEP of a Remote + Defect Indication and locally notify the end point of the recovery + domain regarding the fault condition, in order to trigger the + recovery action. + +6.4.2. Testing for Faults + + The management plane may be used to initiate the testing of links, + LSP segments, or entire LSPs. + + MPLS-TP provides OAM tools that may be manually invoked on-demand for + a limited period, in order to troubleshoot links, LSP segments, or + entire LSPs (e.g., diagnostics, connectivity verification, packet + + + +Sprecher & Farrel Informational [Page 44] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + loss measurements, etc.). On-demand monitoring covers a combination + of "in-service" and "out-of-service" monitoring functions. Out-of- + service testing is supported by the OAM on-demand lock operation. + The lock operation temporarily disables the transport entity (LSP, + LSP segment, or link), preventing the transmission of all types of + traffic, with the exceptions of test traffic and OAM (dedicated to + the locked entity). + + [RFC6371] describes the operations of the OAM functions that may be + initiated on-demand and provides some considerations. + + MPLS-TP also supports in-service and out-of-service testing of the + recovery (protection and restoration) mechanism, the integrity of the + protection/recovery transport paths, and the coordination protocol + between the end points of the recovery domain. The testing operation + emulates a protection-switching request but does not perform the + actual switching action. + +6.4.3. Fault Localization + + MPLS-TP provides OAM tools to locate a fault and determine its + precise location. Fault detection often only takes place at key + points in the network (such as at LSP end points or at MEPs). This + means that a fault may be located anywhere within a segment of the + relevant LSP. Finer information granularity is needed to implement + optimal recovery actions or to diagnose the fault. On-demand tools + like trace-route, loopback, and on-demand CC-V can be used to + localize a fault. + + The information may be notified locally to the end point of the + recovery domain to allow implementation of optimal recovery action. + This may be useful for the re-calculation of a recovery path. + + The information should also be reported to network management for + diagnostic purposes. + +6.4.4. Fault Reporting + + The end points of a recovery domain should be able to detect fault + conditions in the recovery domain and to notify the management plane. + + In addition, a node within a recovery domain that detects a fault + condition should also be able to report this to network management. + Network management should be capable of correlating the fault reports + and identifying the source of the fault. + + MPLS-TP OAM tools support a function where an intermediate node along + a path is able to send an alarm report message to the MEP, indicating + + + +Sprecher & Farrel Informational [Page 45] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + the presence of a fault condition in the server layer that connects + it to its adjacent node. This capability allows a MEP to suppress + alarms that may be generated as a result of a failure condition in + the server layer. + +6.4.5. Coordination of Recovery Actions + + As described above, in some cases (such as in bidirectional + protection switching, etc.) it is necessary to coordinate the + protection states between the edges of the recovery domain. + [MPLS-TP-LP] defines procedures, protocol messages, and elements for + this purpose. + + The protocol is also used to signal administrative requests (e.g., + manual switch, etc.), but only when these are provisioned at the edge + of the recovery domain. + + The protocol also enables mismatches to be detected between the + configurations at the ends of the protection domain (such as timers, + revertive/non-revertive behavior); these mismatches can subsequently + be reported to the management plane. + + In the absence of suitable coordination (owing to failures in the + delivery or processing of the coordination protocol messages), + protection switching will fail. This means that the operation of the + protocol that coordinates the protection state is a fundamental part + of protection switching. + +6.5. Control Plane + + The GMPLS control plane has been proposed as the control plane for + MPLS-TP [RFC5317]. Since GMPLS was designed for use in transport + networks, and since it has been implemented and deployed in many + networks, it is not surprising that it contains many features that + support a high degree of survivability. + + The signaling elements of the GMPLS control plane utilize extensions + to the Resource Reservation Protocol (RSVP) (as described in a series + of documents commencing with [RFC3471] and [RFC3473]), although it is + based on [RFC3209] and [RFC2205]. The architecture for GMPLS is + provided in [RFC3945], while [RFC4426] gives a functional description + of the protocol extensions needed to support GMPLS-based recovery + (i.e., protection and restoration). + + A further control-plane protocol called the Link Management Protocol + (LMP) [RFC4204] is part of the GMPLS protocol family and can be used + to coordinate fault localization and reporting. + + + + +Sprecher & Farrel Informational [Page 46] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + Clearly, the control-plane techniques described here only apply where + an MPLS-TP control plane is deployed and operated. All mandatory + MPLS-TP survivability features must be enabled, even in the absence + of the control plane. However, when present, the control plane may + be used to provide alternative mechanisms that may be desirable, + since they offer simple automation or a richer feature set. + +6.5.1. Fault Detection + + The control plane is unable to detect data-plane faults. However, it + does provide mechanisms that detect control-plane faults, and these + can be used to recognize data-plane faults when it is evident that + the control and data planes are fate-sharing. Although [RFC5654] + specifies that MPLS-TP must support an out-of-band control channel, + it does not insist that it be used exclusively. This means that + there may be deployments where an in-band (or at least an in-fiber) + control channel is used. In this scenario, failure of the control + channel can be used to infer that there is a failure of the data + channel, or, at least, it can be used to trigger an investigation of + the health of the data channel. + + Both RSVP and LMP provide a control channel "keep-alive" mechanism + (called the Hello message in both cases). Failure to receive a + message in the configured/negotiated time period indicates a control- + plane failure. GMPLS routing protocols ([RFC4203] and [RFC5307]) + also include keep-alive mechanisms designed to detect routing + adjacency failures. Although these keep-alive mechanisms tend to + operate at a relatively low frequency (on the order of seconds), it + is still possible that the first indication of a control-plane fault + will be received through the routing protocol. + + Note, however, that care must be taken to ascertain that a specific + failure is not caused by a problem in the control-plane software or + in a processor component at the far end of a link. + + Because of the various issues involved, it is not recommended that + the control plane be used as the primary mechanism for fault + detection in an MPLS-TP network. + +6.5.2. Testing for Faults + + The control plane may be used to initiate and coordinate the testing + of links, LSP segments, or entire LSPs. This is important in some + technologies where it is necessary to halt data transmission while + testing, but it may also be useful where testing needs to be + specifically enabled or configured. + + + + + +Sprecher & Farrel Informational [Page 47] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + LMP provides a control-plane mechanism to test the continuity and + connectivity (and naming) of individual links. A single management + operation is required to initiate the test at one end of the link, + while the LMP handles the coordination with the other end of the + link. The test mechanism for an MPLS packet link relies on the LMP + Test message inserted into the data stream at one end of the link and + extracted at the other end of the link. This mechanism need not + disrupt data flowing over the link. + + Note that a link in the LMP may, in fact, be an LSP tunnel used to + form a link in the MPLS-TP network. + + GMPLS signaling (RSVP) offers two mechanisms that may also assist + with fault testing. The first mechanism [RFC3473] defines the + Admin_Status object that allows an LSP to be set into "testing mode". + The interpretation of this mode is implementation-specific and could + be documented more precisely for MPLS-TP. The mode sets the whole + LSP into a state where it can be tested; this need not be disruptive + to data traffic. + + The second mechanism provided by GMPLS to support testing is + described in [GMPLS-OAM]. This protocol extension supports the + configuration (including enabling and disabling) of OAM mechanisms + for a specific LSP. + +6.5.3. Fault Localization + + Fault localization is the process whereby the exact location of a + fault is determined. Fault detection often only takes place at key + points in the network (such as at LSP end points or at MEPs). This + means that a fault may be located anywhere within a segment of the + relevant LSP. + + If segment or end-to-end protection is in use, this level of + information is often sufficient to repair the LSP. However, if finer + information granularity is required (either to implement optimal + recovery actions or to diagnose a fault), it is necessary to localize + the specific fault. + + LMP provides a cascaded test-and-propagate mechanism that is designed + specifically for this purpose. + +6.5.4. Fault Status Reporting + + GMPLS signaling uses the Notify message to report fault status + [RFC3473]. The Notify message can apply to a single LSP or can carry + fault information for a set of LSPs, in order to improve the + scalability of fault notification. + + + +Sprecher & Farrel Informational [Page 48] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + Since the Notify message is targeted at a specific node, it can be + delivered rapidly without requiring hop-by-hop processing. It can be + targeted at LSP end points or at segment end points (such as MEPs). + The target points for Notify messages can be manually configured + within the network, or they may be signaled when the LSP is set up. + + This enables the process to be made consistent with segment + protection as well as with the concept of Maintenance Entities. + + GMPLS signaling also provides a slower, hop-by-hop mechanism for + reporting individual LSP faults on a hop-by-hop basis using PathErr + and ResvErr messages. + + [RFC4783] provides a mechanism to coordinate alarms and other event + or fault information through GMPLS signaling. This mechanism is + useful for understanding the status of the resources used by an LSP + and for providing information as to why an LSP is not functioning; + however, it is not intended to replace other fault-reporting + mechanisms. + + GMPLS routing protocols [RFC4203] and [RFC5307] are used to advertise + link availability and capabilities within a GMPLS-enabled network. + Thus, the routing protocols can also provide indirect information + about network faults; that is, the protocol may stop advertising or + may withdraw the advertisement for a failed link, or it may advertise + that the link is about to be shut down gracefully [RFC5817]. This + mechanisms is, however, not normally considered to be fast enough for + use as a trigger for protection switching. + +6.5.5. Coordination of Recovery Actions + + Fault coordination is an important feature for certain protection + mechanisms (such as bidirectional 1:1 protection). The use of the + GMPLS Notify message for this purpose is described in [RFC4426]; + however, specific message field values have not yet been defined for + this operation. + + Further work is needed in GMPLS for control and configuration of + reversion behavior for end-to-end and segment protection, and the + coordination of timer values. + +6.5.6. Establishment of Protection and Restoration LSPs + + The management plane may be used to set up protection and recovery + LSPs, but, when present, the control plane may be used. + + + + + + +Sprecher & Farrel Informational [Page 49] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + Several protocol extensions exist that simplify this process: + + o [RFC4872] provides features that support end-to-end protection + switching. + + o [RFC4873] describes the establishment of a single, segment- + protected LSP. Note that end-to-end protection is a special case + of segment protection, and [RFC4872] can also be used to provide + end-to-end protection. + + o [RFC4874] allows an LSP to be signaled with a request that its + path exclude specified resources such as links, nodes, and shared + risk link groups (SRLGs). This allows a disjoint protection path + to be requested or a recovery path to be set up to avoid failed + resources. + + o Lastly, it should be noted that [RFC5298] provides an overview of + the GMPLS techniques available to achieve protection in multi- + domain environments. + +7. Pseudowire Recovery Considerations + + Pseudowires provide end-to-end connectivity over the MPLS-TP network + and may comprise a single pseudowire segment, or multiple segments + "stitched" together to provide end-to-end connectivity. + + The pseudowire may, itself, require protection, in order to meet the + service-level guarantees of its SLA. This protection could be + provided by the MPLS-TP LSPs that support the pseudowire, or could be + a feature of the pseudowire layer itself. + + As indicated above, the functional architecture described in this + document applies to both LSPs and pseudowires. However, the recovery + mechanisms for pseudowires are for further study and will be defined + in a separate document by the PWE3 working group. + +7.1. Utilization of Underlying MPLS-TP Recovery + + MPLS-TP PWs are carried across the network inside MPLS-TP LSPs. + Therefore, an obvious way to provide protection for a PW is to + protect the LSP that carries it. Such protection can take any of the + forms described in this document. The choice of recovery scheme will + depend on the required speed of recovery and the traffic loss that is + acceptable for the SLA that the PW is providing. + + If the PW is a Multi-Segment PW, then LSP recovery can only protect + the PW in individual segments. This means that a single LSP recovery + action cannot protect against a failure of a PW switching point (an + + + +Sprecher & Farrel Informational [Page 50] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + S-PE), nor can it protect more than one segment at a time, since the + LSP tunnel is terminated at each S-PE. In this respect, LSP + protection of a PW is very similar to link-level protection offered + to the MPLS-TP LSP layer by an underlying network layer (see Section + 4.9). + +7.2. Recovery in the Pseudowire Layer + + Recovery in the PW layer can be provided by simply running separate + PWs end-to-end. Other recovery mechanisms in the PW layer, such as + segment or concatenated segment recovery, or service-level recovery + involving survivability of T-PE or AC faults will be described in a + separate document. + + As with any recovery mechanism, it is important to coordinate between + layers. This coordination is necessary to ensure that actions + associated with recovery mechanisms are only performed in one layer + at a time (that is, the recovery of an underlying LSP needs to be + coordinated with the recovery of the PW itself). It also makes sure + that the working and protection PWs do not both use the same MPLS + resources within the network (for example, by running over the same + LSP tunnel; see also Section 4.9). + +8. Manageability Considerations + + Manageability of MPLS-TP networks and their functions is discussed in + [RFC5950]. OAM features are discussed in [RFC6371]. + + Survivability has some key interactions with management, as described + in this document. In particular: + + o Recovery domains may be configured in a way that prevents one-to- + one correspondence between the MPLS-TP network and the recovery + domains. + + o Survivability policies may be configured per network, per recovery + domain, or per LSP. + + o Configuration of OAM may involve the selection of MEPs; enabling + OAM on network segments, spans, and links; and the operation of + OAM on LSPs, concatenated LSP segments, and LSP segments. + + o Manual commands may be used to control recovery functions, + including forcing recovery and locking recovery actions. + + See also the considerations regarding security for management and OAM + in Section 9 of this document. + + + + +Sprecher & Farrel Informational [Page 51] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +9. Security Considerations + + This framework does not introduce any new security considerations; + general issues relating to MPLS security can be found in [RFC5920]. + + However, several points about MPLS-TP survivability should be noted + here. + + o If an attacker is able to force a protection switch-over, this may + result in a small perturbation to user traffic and could result in + extra traffic being preempted or displaced from the protection + resources. In the case of 1:n protection or shared mesh + protection, this may result in other traffic becoming unprotected. + Therefore, it is important that OAM protocols for detecting or + notifying faults use adequate security to prevent them from being + used (through the insertion of bogus messages or through the + capture of legitimate messages) to falsely trigger a recovery + event. + + o If manual commands are modified, captured, or simulated (including + replay), it might be possible for an attacker to perform forced + recovery actions or to impose lock-out. These actions could + impact the capability to provide the recovery function and could + also affect the normal operation of the network for other traffic. + Therefore, management protocols used to perform manual commands + must allow the operator to use appropriate security mechanisms. + This includes verification that the user who performs the commands + has appropriate authorization. + + o If the control plane is used to configure or operate recovery + mechanisms, the control-plane protocols must also be capable of + providing adequate security. + +10. Acknowledgments + + Thanks to the following people for useful comments and discussions: + Italo Busi, David McWalter, Lou Berger, Yaacov Weingarten, Stewart + Bryant, Dan Frost, Lievren Levrau, Xuehui Dai, Liu Guoman, Xiao Min, + Daniele Ceccarelli, Scott Bradner, Francesco Fondelli, Curtis + Villamizar, Maarten Vissers, and Greg Mirsky. + + The Editors would like to thank the participants in ITU-T Study Group + 15 for their detailed review. + + Some figures and text on shared mesh protection were borrowed from + [MPLS-TP-MESH] with thanks to Tae-sik Cheung and Jeong-dong Ryoo. + + + + + +Sprecher & Farrel Informational [Page 52] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + +11. References + +11.1. Normative References + + [G.806] ITU-T, "Characteristics of transport equipment - + Description methodology and generic functionality", + Recommendation G.806, January 2009. + + [G.808.1] ITU-T, "Generic Protection Switching - Linear trail + and subnetwork protection", Recommendation G.808.1, + December 2003. + + [G.841] ITU-T, "Types and Characteristics of SDH Network + Protection Architectures", Recommendation G.841, + October 1998. + + [RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., + and S. Jamin, "Resource ReSerVation Protocol (RSVP) -- + Version 1 Functional Specification", RFC 2205, + September 1997. + + [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, + V., and G. Swallow, "RSVP-TE: Extensions to RSVP for + LSP Tunnels", RFC 3209, December 2001. + + [RFC3471] Berger, L., Ed., "Generalized Multi-Protocol Label + Switching (GMPLS) Signaling Functional Description", + RFC 3471, January 2003. + + [RFC3473] Berger, L., Ed., "Generalized Multi-Protocol Label + Switching (GMPLS) Signaling Resource ReserVation + Protocol-Traffic Engineering (RSVP-TE) Extensions", + RFC 3473, January 2003. + + [RFC3945] Mannie, E., Ed., "Generalized Multi-Protocol Label + Switching (GMPLS) Architecture", RFC 3945, October + 2004. + + [RFC4203] Kompella, K., Ed., and Y. Rekhter, Ed., "OSPF + Extensions in Support of Generalized Multi-Protocol + Label Switching (GMPLS)", RFC 4203, October 2005. + + [RFC4204] Lang, J., Ed., "Link Management Protocol (LMP)", RFC + 4204, October 2005. + + + + + + + +Sprecher & Farrel Informational [Page 53] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + [RFC4427] Mannie, E., Ed., and D. Papadimitriou, Ed., "Recovery + (Protection and Restoration) Terminology for + Generalized Multi-Protocol Label Switching (GMPLS)", + RFC 4427, March 2006. + + [RFC4428] Papadimitriou, D., Ed., and E. Mannie, Ed., "Analysis + of Generalized Multi-Protocol Label Switching + (GMPLS)-based Recovery Mechanisms (including + Protection and Restoration)", RFC 4428, March 2006. + + [RFC4873] Berger, L., Bryskin, I., Papadimitriou, D., and A. + Farrel, "GMPLS Segment Recovery", RFC 4873, May 2007. + + [RFC5307] Kompella, K., Ed., and Y. Rekhter, Ed., "IS-IS + Extensions in Support of Generalized Multi-Protocol + Label Switching (GMPLS)", RFC 5307, October 2008. + + [RFC5317] Bryant, S., Ed., and L. Andersson, Ed., "Joint Working + Team (JWT) Report on MPLS Architectural Considerations + for a Transport Profile", RFC 5317, February 2009. + + [RFC5586] Bocci, M., Ed., Vigoureux, M., Ed., and S. Bryant, + Ed., "MPLS Generic Associated Channel", RFC 5586, June + 2009. + + [RFC5654] Niven-Jenkins, B., Ed., Brungard, D., Ed., Betts, M., + Ed., Sprecher, N., and S. Ueno, "Requirements of an + MPLS Transport Profile", RFC 5654, September 2009. + + [RFC5921] Bocci, M., Ed., Bryant, S., Ed., Frost, D., Ed., + Levrau, L., and L. Berger, "A Framework for MPLS in + Transport Networks", RFC 5921, July 2010. + + [RFC5950] Mansfield, S., Ed., Gray, E., Ed., and K. Lam, Ed., + "Network Management Framework for MPLS-based Transport + Networks", RFC 5950, September 2010. + + [RFC6371] Buci, I., Ed. and B. Niven-Jenkins, Ed., "A Framework + for MPLS in Transport Networks", RFC 6371, September + 2011. + +11.2. Informative References + + [GMPLS-OAM] Takacs, A., Fedyk, D., and J. He, "GMPLS RSVP-TE + extensions for OAM Configuration", Work in Progress, + July 2011. + + + + + +Sprecher & Farrel Informational [Page 54] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + [MPLS-TP-LP] Weingarten, Y., Osborne, E., Sprecher, N., Fulignoli, + A., Ed., and Y. Weingarten, Ed., "MPLS-TP Linear + Protection", Work in Progress, August 2011. + + [MPLS-TP-MESH] Cheung, T. and J. Ryoo, "MPLS-TP Shared Mesh + Protection", Work in Progress, April 2011. + + [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, + "Multiprotocol Label Switching Architecture", RFC + 3031, January 2001. + + [RFC3386] Lai, W., Ed., and D. McDysan, Ed., "Network Hierarchy + and Multilayer Survivability", RFC 3386, November + 2002. + + [RFC3469] Sharma, V., Ed., and F. Hellstrand, Ed., "Framework + for Multi-Protocol Label Switching (MPLS)-based + Recovery", RFC 3469, February 2003. + + [RFC4397] Bryskin, I. and A. Farrel, "A Lexicography for the + Interpretation of Generalized Multiprotocol Label + Switching (GMPLS) Terminology within the Context of + the ITU-T's Automatically Switched Optical Network + (ASON) Architecture", RFC 4397, February 2006. + + [RFC4426] Lang, J., Ed., Rajagopalan, B., Ed., and D. + Papadimitriou, Ed., "Generalized Multi-Protocol Label + Switching (GMPLS) Recovery Functional Specification", + RFC 4426, March 2006. + + [RFC4726] Farrel, A., Vasseur, J.-P., and A. Ayyangar, "A + Framework for Inter-Domain Multiprotocol Label + Switching Traffic Engineering", RFC 4726, November + 2006. + + [RFC4783] Berger, L., Ed., "GMPLS - Communication of Alarm + Information", RFC 4783, December 2006. + + [RFC4872] Lang, J., Ed., Rekhter, Y., Ed., and D. Papadimitriou, + Ed., "RSVP-TE Extensions in Support of End-to-End + Generalized Multi-Protocol Label Switching (GMPLS) + Recovery", RFC 4872, May 2007. + + [RFC4874] Lee, CY., Farrel, A., and S. De Cnodder, "Exclude + Routes - Extension to Resource ReserVation Protocol- + Traffic Engineering (RSVP-TE)", RFC 4874, April 2007. + + + + + +Sprecher & Farrel Informational [Page 55] + +RFC 6372 MPLS-TP Survivability Framework September 2011 + + + [RFC5212] Shiomoto, K., Papadimitriou, D., Le Roux, JL., + Vigoureux, M., and D. Brungard, "Requirements for + GMPLS-Based Multi-Region and Multi-Layer Networks + (MRN/MLN)", RFC 5212, July 2008. + + [RFC5298] Takeda, T., Ed., Farrel, A., Ed., Ikejiri, Y., and JP. + Vasseur, "Analysis of Inter-Domain Label Switched Path + (LSP) Recovery", RFC 5298, August 2008. + + [RFC5817] Ali, Z., Vasseur, JP., Zamfir, A., and J. Newton, + "Graceful Shutdown in MPLS and Generalized MPLS + Traffic Engineering Networks", RFC 5817, April 2010. + + [RFC5920] Fang, L., Ed., "Security Framework for MPLS and GMPLS + Networks", RFC 5920, July 2010. + + [RFC6373] Andersson, L., Ed., Berger, L., Ed., Fang, L., Ed., + and Bitar, N., Ed, and E. Gray, Ed., "MPLS-TP Control + Plane Framework", RFC 6373, September 2011. + + [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., + Romascanu, D., and S. Mansfield, "Guidelines for the + Use of the "OAM" Acronym in the IETF", BCP 161, RFC + 6291, June 2011. + + [ROSETTA] Van Helvoort, H., Ed., Andersson, L., Ed., and N. + Sprecher, Ed., "A Thesaurus for the Terminology used + in Multiprotocol Label Switching Transport Profile + (MPLS-TP) drafts/RFCs and ITU-T's Transport Network + Recommendations", Work in Progress, June 2011. + +Authors' Addresses + + Nurit Sprecher (editor) + Nokia Siemens Networks + 3 Hanagar St. + Neve Ne'eman B Hod + Hasharon, 45241 Israel + + EMail: nurit.sprecher@nsn.com + + + Adrian Farrel (editor) + Juniper Networks + + EMail: adrian@olddog.co.uk + + + + + +Sprecher & Farrel Informational [Page 56] + |