summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc6372.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc6372.txt')
-rw-r--r--doc/rfc/rfc6372.txt3139
1 files changed, 3139 insertions, 0 deletions
diff --git a/doc/rfc/rfc6372.txt b/doc/rfc/rfc6372.txt
new file mode 100644
index 0000000..2574aa0
--- /dev/null
+++ b/doc/rfc/rfc6372.txt
@@ -0,0 +1,3139 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) N. Sprecher, Ed.
+Request for Comments: 6372 Nokia Siemens Networks
+Category: Informational A. Farrel, Ed.
+ISSN: 2070-1721 Juniper Networks
+ September 2011
+
+
+ MPLS Transport Profile (MPLS-TP) Survivability Framework
+
+Abstract
+
+ Network survivability is the ability of a network to recover traffic
+ delivery following failure or degradation of network resources.
+ Survivability is critical for the delivery of guaranteed network
+ services, such as those subject to strict Service Level Agreements
+ (SLAs) that place maximum bounds on the length of time that services
+ may be degraded or unavailable.
+
+ The Transport Profile of Multiprotocol Label Switching (MPLS-TP) is a
+ packet-based transport technology based on the MPLS data plane that
+ reuses many aspects of the MPLS management and control planes.
+
+ This document comprises a framework for the provision of
+ survivability in an MPLS-TP network; it describes recovery elements,
+ types, methods, and topological considerations. To enable data-plane
+ recovery, survivability may be supported by the control plane,
+ management plane, and by Operations, Administration, and Maintenance
+ (OAM) functions. This document describes mechanisms for recovering
+ MPLS-TP Label Switched Paths (LSPs). A detailed description of
+ pseudowire recovery in MPLS-TP networks is beyond the scope of this
+ document.
+
+ This document is a product of a joint Internet Engineering Task Force
+ (IETF) / International Telecommunication Union Telecommunication
+ Standardization Sector (ITU-T) effort to include an MPLS Transport
+ Profile within the IETF MPLS and Pseudowire Emulation Edge-to-Edge
+ (PWE3) architectures to support the capabilities and functionalities
+ of a packet-based transport network as defined by the ITU-T.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for informational purposes.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Not all documents
+
+
+
+Sprecher & Farrel Informational [Page 1]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ approved by the IESG are a candidate for any level of Internet
+ Standard; see Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc6372.
+
+Copyright Notice
+
+ Copyright (c) 2011 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction ....................................................4
+ 1.1. Recovery Schemes ...........................................4
+ 1.2. Recovery Action Initiation .................................5
+ 1.3. Recovery Context ...........................................6
+ 1.4. Scope of This Framework ....................................7
+ 2. Terminology and References ......................................8
+ 3. Requirements for Survivability .................................10
+ 4. Functional Architecture ........................................10
+ 4.1. Elements of Control .......................................10
+ 4.1.1. Operator Control ...................................11
+ 4.1.2. Defect-Triggered Actions ...........................12
+ 4.1.3. OAM Signaling ......................................12
+ 4.1.4. Control-Plane Signaling ............................12
+ 4.2. Recovery Scope ............................................13
+ 4.2.1. Span Recovery ......................................13
+ 4.2.2. Segment Recovery ...................................13
+ 4.2.3. End-to-End Recovery ................................14
+ 4.3. Grades of Recovery ........................................15
+ 4.3.1. Dedicated Protection ...............................15
+ 4.3.2. Shared Protection ..................................16
+ 4.3.3. Extra Traffic ......................................17
+ 4.3.4. Restoration ........................................19
+ 4.3.5. Reversion ..........................................20
+ 4.4. Mechanisms for Protection .................................20
+
+
+
+Sprecher & Farrel Informational [Page 2]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ 4.4.1. Link-Level Protection ..............................20
+ 4.4.2. Alternate Paths and Segments .......................21
+ 4.4.3. Protection Tunnels .................................22
+ 4.5. Recovery Domains ..........................................23
+ 4.6. Protection in Different Topologies ........................24
+ 4.7. Mesh Networks .............................................25
+ 4.7.1. 1:n Linear Protection ..............................26
+ 4.7.2. 1+1 Linear Protection ..............................28
+ 4.7.3. P2MP Linear Protection .............................29
+ 4.7.4. Triggers for the Linear Protection
+ Switching Action ...................................30
+ 4.7.5. Applicability of Linear Protection for LSP
+ Segments ...........................................31
+ 4.7.6. Shared Mesh Protection .............................32
+ 4.8. Ring Networks .............................................33
+ 4.9. Recovery in Layered Networks ..............................34
+ 4.9.1. Inherited Link-Level Protection ....................35
+ 4.9.2. Shared Risk Groups .................................35
+ 4.9.3. Fault Correlation ..................................36
+ 5. Applicability and Scope of Survivability in MPLS-TP ............37
+ 6. Mechanisms for Providing Survivability for MPLS-TP LSPs ........39
+ 6.1. Management Plane ..........................................39
+ 6.1.1. Configuration of Protection Operation ..............40
+ 6.1.2. External Manual Commands ...........................41
+ 6.2. Fault Detection ...........................................41
+ 6.3. Fault Localization ........................................42
+ 6.4. OAM Signaling .............................................43
+ 6.4.1. Fault Detection ....................................44
+ 6.4.2. Testing for Faults .................................44
+ 6.4.3. Fault Localization .................................45
+ 6.4.4. Fault Reporting ....................................45
+ 6.4.5. Coordination of Recovery Actions ...................46
+ 6.5. Control Plane .............................................46
+ 6.5.1. Fault Detection ....................................47
+ 6.5.2. Testing for Faults .................................47
+ 6.5.3. Fault Localization .................................48
+ 6.5.4. Fault Status Reporting .............................48
+ 6.5.5. Coordination of Recovery Actions ...................49
+ 6.5.6. Establishment of Protection and Restoration LSPs ...49
+ 7. Pseudowire Recovery Considerations .............................50
+ 7.1. Utilization of Underlying MPLS-TP Recovery ................50
+ 7.2. Recovery in the Pseudowire Layer ..........................51
+ 8. Manageability Considerations ...................................51
+ 9. Security Considerations ........................................52
+ 10. Acknowledgments ...............................................52
+ 11. References ....................................................53
+ 11.1. Normative References .....................................53
+ 11.2. Informative References ...................................54
+
+
+
+Sprecher & Farrel Informational [Page 3]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+1. Introduction
+
+ Network survivability is the network's ability to recover traffic
+ delivery following the failure or degradation of traffic delivery
+ caused by a network fault or a denial-of-service attack on the
+ network. Survivability plays a critical role in the delivery of
+ reliable services in transport networks. Guaranteed services in the
+ form of Service Level Agreements (SLAs) require a resilient network
+ that very rapidly detects facility or node degradation or failures,
+ and immediately starts to recover network operations in accordance
+ with the terms of the SLA.
+
+ The MPLS Transport Profile (MPLS-TP) is described in [RFC5921].
+ MPLS-TP is designed to be consistent with existing transport network
+ operations and management models, while providing survivability
+ mechanisms, such as protection and restoration. The functionality
+ provided is intended to be similar to or better than that found in
+ established transport networks that set a high benchmark for
+ reliability. That is, it is intended to provide the operator with
+ functions with which they are familiar through their experience with
+ other transport networks, although this does not preclude additional
+ techniques.
+
+ This document provides a framework for MPLS-TP-based survivability
+ that meets the recovery requirements specified in [RFC5654]. It uses
+ the recovery terminology defined in [RFC4427], which draws heavily on
+ [G.808.1], and it refers to the requirements specified in [RFC5654].
+
+ This document is a product of a joint Internet Engineering Task Force
+ (IETF) / International Telecommunication Union Telecommunication
+ Standardization Sector (ITU-T) effort to include an MPLS Transport
+ Profile within the IETF MPLS and PWE3 architectures to support the
+ capabilities and functionalities of a packet-based transport network,
+ as defined by the ITU-T.
+
+1.1. Recovery Schemes
+
+ Various recovery schemes (for protection and restoration) and
+ processes have been defined and analyzed in [RFC4427] and [RFC4428].
+ These schemes can also be applied in MPLS-TP networks to re-establish
+ end-to-end traffic delivery according to the agreed service
+ parameters, and to trigger recovery from "failed" or "degraded"
+ transport entities. In the context of this document, transport
+ entities are nodes, links, transport path segments, concatenated
+ transport path segments, and entire transport paths. Recovery
+ actions are initiated by the detection of a defect, or by an external
+ request (e.g., an operator's request for manual control of protection
+ switching).
+
+
+
+Sprecher & Farrel Informational [Page 4]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ [RFC4427] makes a distinction between protection switching and
+ restoration mechanisms.
+
+ - Protection switching uses pre-assigned capacity between nodes,
+ where the simplest scheme has a single, dedicated protection entity
+ for each working entity, while the most complex scheme has m
+ protection entities shared between n working entities (m:n).
+
+ - Restoration uses any capacity available between nodes and usually
+ involves rerouting. The resources used for restoration may be pre-
+ planned (i.e., predetermined, but not yet allocated to the recovery
+ path), and recovery priority may be used as a differentiation
+ mechanism to determine which services are recovered and which are
+ not recovered.
+
+ Both protection switching and restoration may be either
+ unidirectional or bidirectional; unidirectional implies that
+ protection switching is performed independently for each direction of
+ a bidirectional transport path, while bidirectional means that both
+ directions are switched simultaneously using appropriate
+ coordination, even if the fault applies to only one direction of the
+ path.
+
+ Both protection and restoration mechanisms may be either revertive or
+ non-revertive as described in Section 4.11 of [RFC4427].
+
+ Preemption priority may be used to determine which services are
+ sacrificed to enable the recovery of other services. Restoration may
+ also be either unidirectional or bidirectional. In general,
+ protection actions are completed within time frames amounting to tens
+ of milliseconds, while automated restoration actions are normally
+ completed within periods ranging from hundreds of milliseconds to a
+ maximum of a few seconds. Restoration is not guaranteed (for
+ example, because network resources may not be available at the time
+ of the defect).
+
+1.2. Recovery Action Initiation
+
+ The recovery schemes described in [RFC4427] and evaluated in
+ [RFC4428] are presented in the context of control-plane-driven
+ actions (such as the configuration of the protection entities and
+ functions, etc.). The presence of a distributed control plane in an
+ MPLS-TP network is optional. However, the absence of such a control
+ plane does not affect the operation of the network and the use of
+ MPLS-TP forwarding, Operations, Administration, and Maintenance
+ (OAM), and survivability capabilities. In particular, the concepts
+
+
+
+
+
+Sprecher & Farrel Informational [Page 5]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ discussed in [RFC4427] and [RFC4428] refer to recovery actions
+ effected in the data plane; they are equally applicable in MPLS-TP,
+ with or without the use of a control plane.
+
+ Thus, some of the MPLS-TP recovery mechanisms do not depend on a
+ control plane and use MPLS-TP OAM mechanisms or management actions to
+ trigger recovery actions.
+
+ The principles of MPLS-TP protection-switching actions are similar to
+ those described in [RFC4427], since the protection mechanism is based
+ on the capability to detect certain defects in the transport entities
+ within the recovery domain. The protection-switching controller does
+ not care which initiation method is used, provided that it can be
+ given information about the status of the transport entities within
+ the recovery domain (e.g., OK, signal failure, signal degradation,
+ etc.).
+
+ In the context of MPLS-TP, it is imperative to ensure that performing
+ switchovers is possible, regardless of the way in which the network
+ is configured and managed (for example, regardless of whether a
+ control-plane, management-plane, or OAM initiation mechanism is
+ used).
+
+ All MPLS and GMPLS protection mechanisms [RFC4428] are applicable in
+ an MPLS-TP environment. It is also possible to provision and manage
+ the related protection entities and functions defined in MPLS and
+ GMPLS using the management plane [RFC5654]. Regardless of whether an
+ OAM, management, or control plane initiation mechanism is used, the
+ protection-switching operation is a data-plane operation.
+
+ In some recovery schemes (such as bidirectional protection
+ switching), it is necessary to coordinate the protection state
+ between the edges of the recovery domain to achieve initiation of
+ recovery actions for both directions. An MPLS-TP protocol may be
+ used as an in-band (i.e., data-plane based) control protocol in order
+ to coordinate the protection state between the edges of the
+ protection domain. When the MPLS-TP control plane is in use, a
+ control-plane-based mechanism can also be used to coordinate the
+ protection states between the edges of the protection domain.
+
+1.3. Recovery Context
+
+ An MPLS-TP Label Switched Path (LSP) may be subject to any part of or
+ all of MPLS-TP link recovery, path-segment recovery, or end-to-end
+ recovery, where:
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 6]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ o MPLS-TP link recovery refers to the recovery of an individual link
+ (and hence all or a subset of the LSPs routed over the link)
+ between two MPLS-TP nodes. For example, link recovery may be
+ provided by server-layer recovery.
+
+ o Segment recovery refers to the recovery of an LSP segment (i.e.,
+ segment and concatenated segment in the language of [RFC5654])
+ between two nodes and is used to recover from the failure of one
+ or more links or nodes.
+
+ o End-to-end recovery refers to the recovery of an entire LSP, from
+ its ingress to its egress node.
+
+ For additional resiliency, more than one of these recovery techniques
+ may be configured concurrently for a single path.
+
+ Co-routed bidirectional MPLS-TP LSPs are defined in a way that allows
+ both directions of the LSP to follow the same route through the
+ network. In this scenario, the operator often requires the
+ directions to fate-share (that is, if one direction fails, both
+ directions should cease to operate).
+
+ Associated bidirectional MPLS-TP LSPs exist where the two directions
+ of a bidirectional LSP follow different paths through the network.
+ An operator may also request fate-sharing for associated
+ bidirectional LSPs.
+
+ The requirement for fate-sharing causes a direct interaction between
+ the recovery processes affecting the two directions of an LSP, so
+ that both directions of the bidirectional LSP are recovered at the
+ same time. This mode of recovery is termed bidirectional recovery
+ and may be seen as a consequence of fate-sharing.
+
+ The recovery scheme operating at the data-plane level can function in
+ a multi-domain environment (in the wider sense of a "domain"
+ [RFC4726]). It can also protect against a failure of a boundary node
+ in the case of inter-domain operation. MPLS-TP recovery schemes are
+ intended to protect client services when they are sent across the
+ MPLS-TP network.
+
+1.4. Scope of This Framework
+
+ This framework introduces the architecture of the MPLS-TP recovery
+ domain and describes the recovery schemes in MPLS-TP (based on the
+ recovery types defined in [RFC4427]) as well as the principles of
+ operation, recovery states, recovery triggers, and information
+ exchanges between the different elements that support the reference
+ model.
+
+
+
+Sprecher & Farrel Informational [Page 7]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ The framework also describes the qualitative grades of the
+ survivability functions that can be provided, such as dedicated
+ recovery, shared protection, restoration, etc. In the event of a
+ network failure, the grade of recovery directly affects the service
+ grade provided to the end-user.
+
+ The general description of the functional architecture is applicable
+ to both LSPs and pseudowires (PWs); however, PW recovery is only
+ introduced in Section 7, and the relevant details are beyond the
+ scope of this document and are for further study.
+
+ This framework applies to general recovery schemes as well as to
+ mechanisms that are optimized for specific topologies and are
+ tailored to efficiently handle protection switching.
+
+ This document addresses the need for the coordination of protection
+ switching across multiple layers and at sub-layers (for clarity, we
+ use the term "layer" to refer equally to layers and sub-layers).
+ This allows an operator to prevent race conditions and allows the
+ protection-switching mechanism of one layer to recover from a failure
+ before switching is invoked at another layer.
+
+ This framework also specifies the functions that must be supported by
+ MPLS-TP to provide the recovery mechanisms. MPLS-TP introduces a
+ tool kit to enable recovery in MPLS-TP-based networks and to ensure
+ that affected services are recovered in the event of a failure.
+
+ Generally, network operators aim to provide the fastest, most stable,
+ and best protection mechanism at a reasonable cost in accordance with
+ customer requirements. The greater the grade of protection required,
+ the greater the number of resources will be consumed. It is
+ therefore expected that network operators will offer a wide spectrum
+ of service grade. MPLS-TP-based recovery offers the flexibility to
+ select a recovery mechanism, define the granularity at which traffic
+ delivery is to be protected, and choose the specific traffic types
+ that are to be protected. With MPLS-TP-based recovery, it should be
+ possible to provide different grades of protection for different
+ traffic classes within the same path based on the service
+ requirements.
+
+2. Terminology and References
+
+ The terminology used in this document is consistent with that defined
+ in [RFC4427]. The latter is consistent with [G.808.1].
+
+ However, certain protection concepts (such as ring protection) are
+ not discussed in [RFC4427]; for those concepts, the terminology used
+ in this document is drawn from [G.841].
+
+
+
+Sprecher & Farrel Informational [Page 8]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ Readers should refer to those documents for normative definitions.
+
+ This document supplies brief summaries of a number of terms for
+ reasons of clarity and to assist the reader, but it does not redefine
+ terms.
+
+ Note, in particular, the distinction and definitions made in
+ [RFC4427] for the following three terms:
+
+ o Protection: re-establishing end-to-end traffic delivery using pre-
+ allocated resources.
+
+ o Restoration: re-establishing end-to-end traffic delivery using
+ resources allocated at the time of need; sometimes referred to as
+ "repair" of a service, LSP, or the traffic.
+
+ o Recovery: a generic term covering both Protection and Restoration.
+
+ Note that the term "survivability" is used in [RFC5654] to cover the
+ functional elements of "protection" and "restoration", which are
+ collectively known as "recovery".
+
+ Important background information on survivability can be found in
+ [RFC3386], [RFC3469], [RFC4426], [RFC4427], and [RFC4428].
+
+ In this document, the following additional terminology is applied:
+
+ o "Fault Management", as defined in [RFC5950].
+
+ o The terms "defect" and "failure" are used interchangeably to
+ indicate any defect or failure in the sense that they are defined
+ in [G.806]. The terms also include any signal degradation event
+ as defined in [G.806].
+
+ o A "fault" is a fault or fault cause as defined in [G.806].
+
+ o "Trigger" indicates any event that may initiate a recovery action.
+ See Section 4.1 for a more detailed discussion of triggers.
+
+ o The acronym "OAM" is defined as Operations, Administration, and
+ Maintenance, consistent with [RFC6291].
+
+ o A "Transport Entity" is a node, link, transport path segment,
+ concatenated transport path segment, or entire transport path.
+
+ o A "Working Entity" is a transport entity that carries traffic
+ during normal network operation.
+
+
+
+
+Sprecher & Farrel Informational [Page 9]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ o A "Protection Entity" is a transport entity that is pre-allocated
+ and used to protect and transport traffic when the working entity
+ fails.
+
+ o A "Recovery Entity" is a transport entity that is used to recover
+ and transport traffic when the working entity fails.
+
+ o "Survivability Actions" are the steps that may be taken by network
+ nodes to communicate faults and to switch traffic from faulted or
+ degraded paths to other paths. This may include sending messages
+ and establishing new paths.
+
+ General terminology for MPLS-TP is found in [RFC5921] and [ROSETTA].
+ Background information on MPLS-TP requirements can be found in
+ [RFC5654].
+
+3. Requirements for Survivability
+
+ MPLS-TP requirements are presented in [RFC5654] and serve as
+ normative references for the definition of all MPLS-TP functionality,
+ including survivability. Survivability is presented in [RFC5654] as
+ playing a critical role in the delivery of reliable services, and the
+ requirements for survivability are set out using the recovery
+ terminology defined in [RFC4427].
+
+4. Functional Architecture
+
+ This section presents an overview of the elements relating to the
+ functional architecture for survivability within an MPLS-TP network.
+ The components are presented separately to demonstrate the way in
+ which they may be combined to provide the different grades of
+ recovery needed to meet the requirements set out in the previous
+ section.
+
+4.1. Elements of Control
+
+ Recovery is achieved by implementing specific actions. These actions
+ aim to repair network resources or redirect traffic along paths that
+ avoid failures in the network. They may be triggered automatically
+ by the MPLS-TP network nodes upon detection of a network defect, or
+ they may be triggered by an operator. Automated actions may be
+ enhanced by in-band (i.e., data-plane-based) OAM mechanisms, or by
+ in-band or out-of-band control-plane signaling.
+
+
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 10]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+4.1.1. Operator Control
+
+ The survivability behavior of the network as a whole, and the
+ reaction of each transport path when a fault is reported, may be
+ controlled by the operator. This control can be split into two sets
+ of functions: policies and actions performed when the transport path
+ is set up, and commands used to control or force recovery actions for
+ established transport paths.
+
+ The operator may establish network-wide or local policies that
+ determine the actions that will be taken when various defects are
+ reported that affect different transport paths. Also, when a service
+ request is made that causes the establishment of one or more
+ transport paths in the network, the operator (or requesting
+ application) may define a particular grade of service, and this will
+ be mapped to specific survivability actions taken before and during
+ transport path setup, after the discovery of a failure of network
+ resources, and upon recovery of those resources.
+
+ It should be noted that it is unusual to present a user or customer
+ with options directly related to recovery actions. Instead, the
+ user/customer enters into an SLA with the network provider, and the
+ network operator maps the terms of the SLA (for example, for
+ guaranteed delivery, availability, or reliability) to recovery
+ schemes within the network.
+
+ The operator can also issue commands to control recovery actions and
+ events. For example, the operator may perform the following actions:
+
+ o Enable or disable the survivability function.
+
+ o Invoke the simulation of a network fault.
+
+ o Force a switchover from a working path to a recovery path or vice
+ versa.
+
+ Forced switchover may be performed for network optimization purposes
+ with minimal service interruption, such as when modifying protected
+ or unprotected services, when replacing MPLS-TP network nodes, etc.
+ In some circumstances, a fault may be reported to the operator, and
+ the operator may then select and initiate the appropriate recovery
+ action. A description of the different operator commands is found in
+ Section 4.12 of [RFC4427].
+
+
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 11]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+4.1.2. Defect-Triggered Actions
+
+ Survivability actions may be directly triggered by network defects.
+ This means that the device that detects the defect (for example,
+ notification of an issue reported from equipment in a lower layer,
+ failure to receive an OAM Continuity message, or receipt of an OAM
+ message reporting a failure condition) may immediately perform a
+ survivability action.
+
+ The action is directly triggered by events in the data plane. Note,
+ however, that coordination of recovery actions between the edges of
+ the recovery domain may require message exchanges for some recovery
+ functions or for performing a bidirectional recovery action.
+
+4.1.3. OAM Signaling
+
+ OAM signaling refers to data-plane OAM message exchange. Such
+ messages may be used to detect and localize faults or to indicate a
+ degradation in the operation of the network. However, in this
+ context these messages are used to control or trigger survivability
+ actions. The mechanisms to achieve this are discussed in [RFC6371].
+
+ OAM signaling may also be used to coordinate recovery actions within
+ the protection domain.
+
+4.1.4. Control-Plane Signaling
+
+ Control-plane signaling is responsible for setup, maintenance, and
+ teardown of transport paths that do not fall under management-plane
+ control. The control plane may also be used to coordinate the
+ detection, localization, and reaction to network defects pertaining
+ to peer relationships (neighbor-to-neighbor or end-to-end). Thus,
+ control-plane signaling may initiate and coordinate survivability
+ actions.
+
+ The control plane can also be used to distribute topology and
+ information relating to resource availability. In this way, the
+ "graceful shutdown" [RFC5817] of resources may be affected by
+ withdrawing them; this can be used to invoke a survivability action
+ in a similar way to that used when reporting or discovering a fault,
+ as described in the previous sections.
+
+ The use of a control plane for MPLS-TP is discussed in [RFC6373].
+
+
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 12]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+4.2. Recovery Scope
+
+ This section describes the elements of recovery. These are the
+ quantitative aspects of recovery, that is, the parts of the network
+ for which recovery can be provided.
+
+ Note that the terminology in this section is consistent with
+ [RFC4427]. Where the terms differ from those in [RFC5654], mapping
+ is provided.
+
+4.2.1. Span Recovery
+
+ A span is a single hop between neighboring MPLS-TP nodes in the same
+ network layer. A span is sometimes referred to as a link, and this
+ may cause some confusion between the concept of a data link and a
+ traffic engineering (TE) link. LSPs traverse TE links between
+ neighboring MPLS-TP nodes in the MPLS-TP network layer. However, a
+ TE link may be provided by any of the following:
+
+ o A single data link.
+
+ o A series of data links in a lower layer, established as an LSP and
+ presented to the upper layer as a single TE link.
+
+ o A set of parallel data links in the same layer, presented either as
+ a bundle of TE links, or as a collection of data links that
+ together provide a data-link-layer protection scheme.
+
+ Thus, span recovery may be provided by any of the following:
+
+ o Selecting a different TE link from a bundle.
+
+ o Moving the TE link so that it is supported by a different data
+ link between the same pair of neighbors.
+
+ o Rerouting the LSP in the lower layer.
+
+ Moving the protected LSP to another TE link between the same pair of
+ neighbors is a form of segment recovery and not a form of span
+ recovery. Segment Recovery is described in Section 4.2.2.
+
+4.2.2. Segment Recovery
+
+ An LSP segment comprises one or more continuous hops on the path of
+ the LSP. [RFC5654] defines two terms. A "segment" is a single hop
+ along the path of an LSP, while a "concatenated segment" is more than
+ one hop along the path of an LSP. In the context of this document, a
+ segment covers both of these concepts.
+
+
+
+Sprecher & Farrel Informational [Page 13]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ A PW segment refers to a Single-Segment PW (SS-PW) or to a single
+ segment of a Multi-Segment PW (MS-PW) that is set up between two PE
+ devices that may be Terminating PEs (T-PEs) or Switching PEs (S-PEs)
+ so that the full set of possibilities is T-PE to S-PE, S-PE to S-PE,
+ S-PE to T-PE, or T-PE to T-PE (for the SS-PW case). As indicated in
+ Section 1, the recovery of PWs and PW segments is beyond the scope of
+ this document; however, see Section 7.
+
+ Segment recovery involves redirecting or copying traffic at the
+ source end of a segment onto an alternate path leading to the other
+ end of the segment. According to the required grade of recovery
+ (described in Section 4.3), traffic may be either redirected to a
+ pre-established segment, through rerouting the protected segment, or
+ tunneled to the far end of the protected segment through a "bypass"
+ LSP. For details on recovery mechanisms, see Section 4.4.
+
+ Note that protecting a transport path against node failure requires
+ the use of segment recovery or end-to-end recovery, while a link
+ failure can be protected using span, segment, or end-to-end recovery.
+
+4.2.3. End-to-End Recovery
+
+ End-to-end recovery is a special case of segment recovery where the
+ protected segment comprises the entire transport path. End-to-end
+ recovery may be provided as link-diverse or node-diverse recovery
+ where the recovery path shares no links or no nodes with the working
+ path.
+
+ Note that node-diverse paths are necessarily link-diverse and that
+ full, end-to-end node-diversity is required to guarantee recovery.
+
+ Two observations need to be made about end-to-end recovery.
+
+ - Firstly, there may be circumstances where node-diverse end-to-end
+ paths do not guarantee recovery. The ingress and egress nodes will
+ themselves be single points of failure. Additionally, there may be
+ shared risks of failure (for example, geographic collocation,
+ shared resources, etc.) between diverse nodes as described in
+ Section 4.9.2.
+
+ - Secondly, it is possible to use end-to-end recovery techniques even
+ when there is not full diversity and the working and protection
+ paths share links or nodes.
+
+
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 14]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+4.3. Grades of Recovery
+
+ This section describes the qualitative grades of survivability that
+ can be provided. In the event of a network failure, the grade of
+ recovery offered directly affects the service grade provided to the
+ end-user. This will be observed as the amount of data lost when a
+ network fault occurs, and the length of time required to recover
+ connectivity.
+
+ In general, there is a correlation between the recovery service grade
+ (i.e., the speed of recovery and reduction of data loss) and the
+ amount of resources used in the network; better service grades
+ require the pre-allocation of resources to the recovery paths, and
+ those resources cannot be used for other purposes if high-quality
+ recovery is required. An operator will consider how providing
+ different grades of recovery may require that network resources be
+ provisioned and allocated for exclusive use of the recovery paths
+ such that the resources cannot be used to support other customer
+ services.
+
+ Sections 6 and 7 of [RFC4427] provide a full breakdown of the
+ protection and recovery schemes. This section summarizes the
+ qualitative grades available.
+
+ Note that, in the context of recovery, a useful discussion of the
+ term "resource" and its interpretation in both the IETF and ITU-T
+ contexts may be found in Section 3.2 of [RFC4397].
+
+ The selection of the recovery grade and schemes to satisfy the
+ service grades for an LSP using available network resources is
+ subject to network and local policy and may be pre-designated through
+ network planning or may be dynamically determined by the network.
+
+4.3.1. Dedicated Protection
+
+ In dedicated protection, the resources for the recovery entity are
+ pre-assigned for the sole use of the protected transport path. This
+ will clearly be the case in 1+1 protection, and may also be the case
+ in 1:1 protection where extra traffic (see Section 4.3.3) is not
+ supported.
+
+ Note that when using protection tunnels (see Section 4.4.3),
+ resources may also be dedicated to the protection of a specific
+ transport path. In some cases (1:1 protection), the entire bypass
+ tunnel may be dedicated to providing recovery for a specific
+ transport path, while in other cases (such as facility backup), a
+ subset of the resources associated with the bypass tunnel may be pre-
+ assigned for the recovery of a specific service.
+
+
+
+Sprecher & Farrel Informational [Page 15]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ However, as described in Section 4.4.3, the bypass tunnel method can
+ also be used for shared protection (Section 4.3.2), either to carry
+ extra traffic (Section 4.3.3) or to achieve best-effort recovery
+ without the need for resource reservation.
+
+4.3.2. Shared Protection
+
+ In shared protection, the resources for the recovery entities of
+ several services are shared. These may be shared as 1:n or m:n and
+ are shared on individual links. Link-by-link resource sharing may be
+ managed and operated along LSP segments, on PW segments, or on end-
+ to-end transport paths (LSP or PW). Note that there is no
+ requirement for m:n recovery in the list of MPLS-TP requirements
+ documented in [RFC5654]. Shared protection can be applied in
+ different topologies (mesh, ring, etc.) and can utilize different
+ protection mechanisms (linear, ring, etc.).
+
+ End-to-end shared protection shares resources between a number of
+ paths that have common end points. Thus, a number of paths (n paths)
+ are all protected by one or more protection paths (m paths, where m
+ may equal 1). When there have been m failures, there are no more
+ available protection paths, and the n paths are no longer protected.
+ Thus, in 1:n protection, one fault can be protected against before
+ all the n paths are unprotected. The fact that the paths have become
+ unprotected needs to be conveyed to the path end points since they
+ may need to report the change in service grade or may need to take
+ further action to increase their protection. In end-to-end shared
+ protection, this communication is simple since the end points are
+ common.
+
+ In shared mesh protection (see Section 4.7.6), the paths that share
+ the protection resources do not necessarily have the same end points.
+ This provides a more flexible resource-sharing scheme, but the
+ network planning and the coordination of protection state after a
+ recovery action are more complex.
+
+ Where a bypass tunnel is used (Section 4.4.3), the tunnel might not
+ have sufficient resources to simultaneously protect all of the paths
+ for which it offers protection; in the event that all paths were
+ affected by network defects and failures at the same time, not all of
+ them would be recovered. Policy would dictate how this situation
+ should be handled: some paths might be protected, while others would
+ simply fail; the traffic for some paths would be guaranteed, while
+ traffic on other paths would be treated as best-effort with the risk
+ of dropped packets. Alternatively, it is possible that protection
+ would not be attempted according to local policy at the nodes that
+ perform the recovery actions.
+
+
+
+
+Sprecher & Farrel Informational [Page 16]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ Shared protection is a trade-off between assigning network resources
+ to protection (which is not required most of the time) and risking
+ unrecoverable services in the event that multiple network defects or
+ failures occur. Rapid recovery can be achieved with dedicated
+ protection, but it is delayed by message exchanges in the management,
+ control, or data planes for shared protection. This means that there
+ is also a trade-off between rapid recovery and resource sharing. In
+ some cases, shared protection might not meet the speed required for
+ protection, but it may still be faster than restoration.
+
+ These trade-offs may be somewhat mitigated by the following:
+
+ o Adjusting the value of n in 1:n protection.
+
+ o Using m:n protection for a value of m > 1.
+
+ o Establishing new protection paths as each available protection
+ path is put into use.
+
+ In an MPLS-TP network, the degree to which a resource is shared
+ between LSPs is a policy issue. This policy may be applied to the
+ resource or to the LSPs, and may be pre-configured, configured per
+ LSP and installed during LSP establishment, or may be dynamically
+ configured.
+
+4.3.3. Extra Traffic
+
+ Section 2.5.1.1 of [RFC5654] says: "Support for extra traffic (as
+ defined in [RFC4427]) is not required in MPLS-TP and MAY be omitted
+ from the MPLS-TP specifications". This document observes that extra
+ traffic facilities may therefore be provided as part of the MPLS-TP
+ survivability toolkit depending upon the development of suitable
+ solution specifications. The remainder of this section explains the
+ concepts of extra traffic without prejudging the decision to specify
+ or not specify such solutions.
+
+ Network resources allocated for protection represent idle capacity
+ during the time that recovery is not actually required, and can be
+ utilized by carrying other traffic, referred to as "extra traffic".
+
+ Note that extra traffic does not need to start or terminate at the
+ ends of the entity (e.g., LSP) that it uses.
+
+ When a network resource carrying extra traffic is required for the
+ recovery of protected traffic from the failed working path, the extra
+ traffic is disrupted. This disruption make take one of two forms:
+
+
+
+
+
+Sprecher & Farrel Informational [Page 17]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ - In "hard preemption", the extra traffic is excluded from the
+ protection resource. The disruption of the extra traffic is total,
+ and the service supported by the extra traffic must be dropped, or
+ some form of rerouting or restoration must be applied to the extra
+ traffic LSP in order to recover the service.
+
+ Hard preemption is achieved by "setting a switch" on the path of
+ the extra traffic such that it no longer flows. This situation may
+ be detected by OAM and reported as a fault, or may be proactively
+ reported through OAM or control-plane signaling.
+
+ - In "soft preemption", the extra traffic is not explicitly excluded
+ from the protection resource, but is given lower priority than the
+ protected traffic. In a packet network (such as MPLS-TP), this can
+ result in oversubscription of the protection resource with the
+ result that the extra traffic receives "best-effort" delivery.
+ Depending on the volume of protection and extra traffic, and the
+ level of oversubscription, the extra traffic may be slightly or
+ heavily impacted.
+
+ The event of soft preemption may be detected by OAM and reported as
+ a degradation of traffic delivery or as a fault. It may also be
+ proactively reported through OAM or control-plane signaling.
+
+ Note that both hard and soft preemption may utilize additional
+ message exchanges in the management, control, or data planes. These
+ messages do not necessarily mean that recovery is delayed, but may
+ increase the complexity of the protection system. Thus, the benefits
+ of carrying extra traffic must be weighed against the disadvantages
+ of delayed recovery, additional network overhead, and the impact on
+ the services that support the extra traffic according to the details
+ of the solutions selected.
+
+ Note that extra traffic is not protected by definition, but may be
+ restored.
+
+ Extra traffic is not supported on dedicated protection resources,
+ which, by definition, are used for 1+1 protection (Section 4.3.1),
+ but it can be supported in other protection schemes, including shared
+ protection (Section 4.3.2) and tunnel protection (Section 4.4.3).
+
+ Best-effort traffic should not be confused with extra traffic. For
+ best-effort traffic, the network does not guarantee data delivery,
+ and the user does not receive guaranteed quality of service (e.g., in
+ terms of jitter, packet loss, delay, etc.). Best-effort traffic
+ depends on the current traffic load. However, for extra traffic,
+ quality can only be guaranteed until resources are required for
+ recovery. At this point, the extra traffic may be completely
+
+
+
+Sprecher & Farrel Informational [Page 18]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ displaced, may be treated as best effort, or may itself be recovered
+ (for example, by restoration techniques).
+
+4.3.4. Restoration
+
+ This section refers to LSP restoration. Restoration for PWs is
+ beyond the scope of this document (but see Section 7).
+
+ Restoration represents the most effective use of network resources,
+ since no resources are reserved for recovery. However, restoration
+ requires the computation of a new path and the activation of a new
+ LSP (through the management or control plane). It may be more time-
+ consuming to perform these steps than to implement recovery using
+ protection techniques.
+
+ Furthermore, there is no guarantee that restoration will be able to
+ recover the service. It may be that all suitable network resources
+ are already in use for other LSPs, so that no new path can be found.
+ This problem can be partially mitigated by using LSP setup
+ priorities, so that recovery LSPs can preempt existing LSPs with
+ lower priorities.
+
+ Additionally, when a network defect occurs, multiple LSPs may be
+ disrupted by the same event. These LSPs may have been established by
+ different Network Management Stations (NMSes) or they may have been
+ signaled by different head-end MPLS-TP nodes, meaning that multiple
+ points in the network will try to compute and establish recovery LSPs
+ at the same time. This can lead to a lack of resources within the
+ network and cause recovery failures; some recovery actions will need
+ to be retried, resulting in even slower recovery times for some
+ services.
+
+ Both hard and soft LSP restoration may be supported. For hard LSP
+ restoration, the resources of the working LSP are released before the
+ recovery LSP is fully established (i.e., break-before-make). For
+ soft LSP restoration, the resources of the working LSP are released
+ after an alternate LSP is fully established (i.e., make-before-
+ break). Note that in the case of reversion (Section 4.3.5), the
+ resources associated with the working LSP are not released.
+
+ The restoration resources may be pre-calculated and even pre-signaled
+ before the restoration action starts, but not pre-allocated. This is
+ known as pre-planned LSP restoration. The complete
+ establishment/activation of the restoration LSP occurs only when the
+ restoration action starts. Pre-planning may occur periodically and
+ provides the most accurate information about the available resources
+ in the network.
+
+
+
+
+Sprecher & Farrel Informational [Page 19]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+4.3.5. Reversion
+
+ After a service has been recovered and traffic is flowing along the
+ recovery LSP, the defective network resource may be replaced.
+ Traffic can be redirected back onto the original working LSP (known
+ as "reversion"), or it can be left where it is on the recovery LSP
+ ("non-revertive" behavior).
+
+ It should be possible to specify the reversion behavior of each
+ service; this might even be configured for each recovery instance.
+
+ In non-revertive mode, an additional operational option is possible
+ where protection roles are switched, so that the recovery LSP becomes
+ the working LSP, while the previous working path (or the resources
+ used by the previous working path) are used for recovery in the event
+ of an additional fault.
+
+ In revertive mode, it is important to prevent excessive swapping
+ between the working and recovery paths in the case of an intermittent
+ defect. This can be addressed by using a reversion delay timer (the
+ Wait-To-Restore timer), which controls the length of time to wait
+ before reversion following the repair of a fault on the original
+ working path. It should be possible for an operator to configure
+ this timer per LSP, and a default value should be defined.
+
+4.4. Mechanisms for Protection
+
+ This section provides general descriptions (MPLS-TP non-specific) of
+ the mechanisms that can be used for protection purposes. As
+ indicated above, while the functional architecture applies to both
+ LSPs and PWs, the mechanism for recovery described in this document
+ refers to LSPs and LSP segments only. Recovery mechanisms for
+ pseudowires and pseudowire segments are for further study and will be
+ described in a separate document (see also Section 7).
+
+4.4.1. Link-Level Protection
+
+ Link-level protection refers to two paradigms: (1) where protection
+ is provided in a lower network layer and (2) where protection is
+ provided by the MPLS-TP link layer.
+
+ Note that link-level protection mechanisms do not protect the nodes
+ at each end of the entity (e.g., a link or span) that is protected.
+ End-to-end or segment protection should be used in conjunction with
+ link-level protection to protect against a failure of the edge nodes.
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 20]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ Link-level protection offers the following grades of protection:
+
+ o Full protection where a dedicated protection entity (e.g., a link
+ or span) is pre-established to protect a working entity. When the
+ working entity fails, the protected traffic is switched to the
+ protecting entity. In this scenario, all LSPs carried over the
+ working entity are recovered (in one protection operation) when
+ there is a failure condition. This is referred to in [RFC4427] as
+ "bulk recovery".
+
+ o Partial protection where only a subset of the LSPs or traffic
+ carried over a selected entity is recovered when there is a
+ failure condition. The decision as to which LSPs will be
+ recovered and which will not depends on local policy.
+
+ When there is no failure on the working entity, the protection entity
+ may transport extra traffic that may be preempted when protection
+ switching occurs.
+
+ If link-level protection is available, it may be desirable to allow
+ this to be attempted before attempting other recovery mechanisms for
+ the transport paths affected by the fault because link-level
+ protection may be faster and more conservative of network resources.
+ This can be achieved both by limiting the propagation of fault
+ condition notifications and by delaying the other recovery actions.
+ This consideration of other protection can be compared with the
+ discussion of recovery domains (Section 4.5) and recovery in multi-
+ layer networks (Section 4.9).
+
+ A protection mechanism may be provided at the MPLS-TP link layer
+ (which connects two MPLS-TP nodes). Such a mechanism can make use of
+ the procedures defined in [RFC5586] to set up in-band communication
+ channels at the MPLS-TP Section level, to use these channels to
+ monitor the health of the MPLS-TP link, and to coordinate the
+ protection states between the ends of the MPLS-TP link.
+
+4.4.2. Alternate Paths and Segments
+
+ The use of alternate paths and segments refers to the paradigm
+ whereby protection is performed in the network layer in which the
+ protected LSP is located; this applies either to the entire end-to-
+ end LSP or to a segment of the LSP. In this case, hierarchical LSPs
+ are not used (compare with Section 4.4.3).
+
+ Different grades of protection may be provided:
+
+ o Dedicated protection where a dedicated entity (e.g., LSP or LSP
+ segment) is (fully) pre-established to protect a working entity
+
+
+
+Sprecher & Farrel Informational [Page 21]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ (e.g., LSP or LSP segment). When a failure condition occurs on
+ the working entity, traffic is switched onto the protection
+ entity. Dedicated protection may be performed using 1:1 or 1+1
+ linear protection schemes. When the failure condition is
+ eliminated, the traffic may revert to the working entity. This is
+ subject to local configuration.
+
+ o Shared protection where one or more protection entities is pre-
+ established to protect against a failure of one or more working
+ entities (1:n or m:n).
+
+ When the fault condition on the working entity is eliminated, the
+ traffic should revert back to the working entity in order to allow
+ other related working entities to be protected by the shared
+ protection resource.
+
+4.4.3. Protection Tunnels
+
+ A protection tunnel is pre-provisioned in order to protect against a
+ failure condition along a sequence of spans in the network. This may
+ be achieved using LSP heirarchy. We call such a sequence a network
+ segment. A failure of a network segment may affect one or more LSPs
+ that transit the network segment.
+
+ When a failure condition occurs in the network segment (detected
+ either by OAM on the network segment, or by OAM on a concatenated
+ segment of one of the LSPs transiting the network segment), one or
+ more of the protected LSPs are switched over at the ingress point of
+ the network segment and are transmitted over the protection tunnel.
+ This is implemented through label stacking. Label mapping may be an
+ option as well.
+
+ Different grades of protection may be provided:
+
+ o Dedicated protection where the protection tunnel reserves
+ sufficient resources to provide protection for all protected LSPs
+ without causing service degradation.
+
+ o Partial protection where the protection tunnel has enough
+ resources to protect some of the protected LSPs, but not all of
+ them simultaneously. Policy dictates how this situation should be
+ handled: it is possible that some LSPs would be protected, while
+ others would simply fail; it is possible that traffic would be
+ guaranteed for some LSPs, while for other LSPs it would be treated
+ as best effort with the risk of packets being dropped.
+ Alternatively, it is possible that protection would not be
+ attempted.
+
+
+
+
+Sprecher & Farrel Informational [Page 22]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+4.5. Recovery Domains
+
+ Protection and restoration are performed in the context of a recovery
+ domain. A recovery domain is defined between two or more recovery
+ reference end points that are located at the edges of the recovery
+ domain and that border on the element on which recovery can be
+ provided (as described in Section 4.2). This element can be an end-
+ to-end path, a segment, or a span.
+
+ An end-to-end path can be observed as a special segment case where
+ the ingress and egress Label Edge Routers (LERs) serve as the
+ recovery reference end points.
+
+ In this simple case of a point-to-point (P2P) protected entity, two
+ end points reside at the boundary of the protection domain. An LSP
+ can enter through one reference end point and exit the recovery
+ domain through another reference end point.
+
+ In the case of unidirectional point-to-multipoint (P2MP), three or
+ more end points reside at the boundary of the protection domain. One
+ of the end points is referred to as the source/root, while the others
+ are referred to as sinks/leaves. An LSP can enter the recovery
+ domain through the root point and exit the recovery domain through
+ the leaf points.
+
+ The recovery mechanism should restore traffic that was interrupted by
+ a facility (link or node) fault within the recovery domain. Note
+ that a single link may be part of several recovery domains. If two
+ recovery domains have common links, one recovery domain must be
+ contained within the other. This can be referred to as nested
+ recovery domains. The boundaries of recovery domains may coincide,
+ but recovery domains must not overlap.
+
+ Note that the edges of a recovery domain are not protected, and
+ unless the whole domain is contained within another recovery domain,
+ the edges form a single point of failure.
+
+ A recovery group is defined within a recovery domain and consists of
+ a working (primary) entity and one or more recovery (backup) entities
+ that reside between the end points of the recovery domain. To
+ guarantee protection in all situations, a dedicated recovery entity
+ should be pre-provisioned using disjoint resources in the recovery
+ domain, in order to protect against a failure of a working entity.
+ Of course, mechanisms to detect faults and to trigger protection
+ switching are also needed.
+
+ The method used to monitor the health of the recovery element is
+ beyond the scope of this document. The end points that are
+
+
+
+Sprecher & Farrel Informational [Page 23]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ responsible for the recovery action must receive information on its
+ condition. The condition of the recovery element may be 'OK',
+ 'failed', or 'degraded'.
+
+ When the recovery operation is to be triggered by OAM mechanisms, an
+ OAM Maintenance Entity Group must be defined for each of the working
+ and protection entities.
+
+ The recovery entities and functions in a recovery domain can be
+ configured using a management plane or a control plane. A management
+ plane may be used to configure the recovery domain by setting the
+ reference points, the working and recovery entities, and the recovery
+ type (e.g., 1:1 bidirectional linear protection, ring protection,
+ etc.). Additional parameters associated with the recovery process
+ may also be configured. For more details, see Section 6.1.
+
+ When a control plane is used, the ingress LERs may communicate with
+ the recovery reference points that request that protection or
+ restoration be configured across a recovery domain. For details, see
+ Section 6.5.
+
+ Cases of multiple interconnections between distinct recovery domains
+ create a hierarchical arrangement of recovery domains, since a single
+ top-level recovery domain is created from the concatenation of two
+ recovery domains with multiple interconnections. In this case,
+ recovery actions may be taken both in the individual, lower-level
+ recovery domains to protect any LSP segment that crosses the domain,
+ and within the higher-level recovery domain to protect the longer LSP
+ segment that traverses the higher-level domain.
+
+ The MPLS-TP recovery mechanism can be arranged to ensure coordination
+ between domains. In interconnected rings, for example, it may be
+ preferable to allow the upstream ring to perform recovery before the
+ downstream ring, in order to ensure that recovery takes place in the
+ ring in which the defect occurred. Coordination of recovery actions
+ is particularly important in nested domains and is discussed further
+ in Section 4.9.
+
+4.6. Protection in Different Topologies
+
+ As described in the requirements listed in Section 3 and detailed in
+ [RFC5654], the selected recovery techniques may be optimized for
+ different network topologies if the optimized mechanisms perform
+ significantly better than the generic mechanisms in the same
+ topology.
+
+ These mechanisms are required (R91 of [RFC5654]) to interoperate with
+ the mechanisms defined for arbitrary topologies, in order to allow
+
+
+
+Sprecher & Farrel Informational [Page 24]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ end-to-end protection and to ensure that consistent protection
+ techniques are used across the entire network. In this context,
+ 'interoperate' means that the use of one technique must not inhibit
+ the use of another technique in an adjacent part of the network for
+ use on the same end-to-end transport path, and must not prohibit the
+ use of end-to-end protection mechanisms.
+
+ The next sections (4.7 and 4.8) describe two different topologies and
+ explain how recovery may be markedly different in those different
+ scenarios. They also develop the concept of a recovery domain and
+ show how end-to-end survivability may be achieved through a
+ concatenation of recovery domains, each providing some grade of
+ recovery in part of the network.
+
+4.7. Mesh Networks
+
+ A mesh network is any network where there is arbitrary
+ interconnectivity between nodes in the network. Mesh networks are
+ usually contrasted with more specific topologies such as hub-and-
+ spoke or ring (see Section 4.8), although such networks are actually
+ examples of mesh networks. This section is limited to the discussion
+ of protection techniques in the context of mesh networks. That is,
+ it does not include optimizations for specific topologies.
+
+ Linear protection is a protection mechanism that provides rapid and
+ simple protection switching. In a mesh network, linear protection
+ provides a very suitable protection mechanism because it can operate
+ between any pair of points within the network. It can protect
+ against a defect in a node, a span, a transport path segment, or an
+ end-to-end transport path. Linear protection gives a clear
+ indication of the protection status.
+
+ Linear protection operates in the context of a protection domain. A
+ protection domain is a special type of recovery domain (see Section
+ 4.5) associated with the protection function. A protection domain is
+ composed of the following architectural elements:
+
+ o A set of end points that reside at the boundary of the protection
+ domain. In the simple case of 1:n or 1+1 P2P protection, two end
+ points reside at the boundary of the protection domain. In each
+ transmission direction, one of the end points is referred to as
+ the source, and the other is referred to as the sink. For
+ unidirectional P2MP protection, three or more end points reside at
+ the boundary of the protection domain. One of the end points is
+ referred to as the source/root, while the others are referred to
+ as sinks/leaves.
+
+
+
+
+
+Sprecher & Farrel Informational [Page 25]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ o A Protection Group consists of one or more working (primary) paths
+ and one or more protection (backup) paths that run between the end
+ points belonging to the protection domain. To guarantee
+ protection in all scenarios, a dedicated protection path should be
+ pre-provisioned to protect against a defect of a working path
+ (i.e., 1:1 or 1+1 protection schemes). In addition, the working
+ and the protection paths should be disjoint; i.e., the physical
+ routes of the working and the protection paths should be
+ physically diverse in every respect.
+
+ Note that if the resources of the protection path are less than those
+ of the working path, the protection path may not have sufficient
+ resources to protect the traffic of the working path.
+
+ As mentioned in Section 4.3.2, the resources of the protection path
+ may be shared as 1:n. In this scenario, the protection path will not
+ have sufficient resources to protect all the working paths at a
+ specific time.
+
+ For bidirectional P2P paths, both unidirectional and bidirectional
+ protection switching are supported. If a defect occurs when
+ bidirectional protection switching is defined, the protection actions
+ are performed in both directions (even if the defect is
+ unidirectional). The protection state is required to operate with a
+ level of coordination between the end points of the protection
+ domain.
+
+ In unidirectional protection switching, the protection actions are
+ only performed in the affected direction.
+
+ Revertive and non-revertive operations are provided as options for
+ the network operator.
+
+ Linear protection supports the protection schemes described in the
+ following sub-sections.
+
+4.7.1. 1:n Linear Protection
+
+ In the 1:1 scheme, a protection path is allocated to protect against
+ a defect, failure, or a degradation in a working path. As described
+ above, to guarantee protection, the protection entity should support
+ the full capacity and bandwidth, although it may be configured (for
+ example, because of limited network resource availability) to offer a
+ degraded service when compared with the working entity.
+
+ Figure 1 presents 1:1 protection architecture. In normal conditions,
+ data traffic is transmitted over the working entity, while the
+ protection entity functions in the idle state. (OAM may run on the
+
+
+
+Sprecher & Farrel Informational [Page 26]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ protection entity to verify its state.) Normal conditions are
+ defined when there is no defect, failure, or degradation on the
+ working entity, and no administrative configuration or request causes
+ traffic to flow over the protection entity.
+
+ |-----------------Protection Domain---------------|
+
+ ==============================
+ /**********Working path***********\
+ +--------+ ============================== +--------+
+ | Node /| |\ Node |
+ | A {< | | >} B |
+ | | | |
+ +--------+ ============================== +--------+
+ Protection path
+ ==============================
+
+ Figure 1: 1:1 Protection Architecture
+
+ If there is a defect on the working entity or a specific
+ administrative request, traffic is switched to the protection entity.
+
+ Note that when operating with non-revertive behavior (see Section
+ 4.3.5), after the conditions causing the switchover have been
+ cleared, the traffic continues to flow on the protection path, but
+ the working and protection roles are not switched.
+
+ In each transmission direction, the protection domain source bridges
+ traffic onto the appropriate entity, while the sink selects traffic
+ from the appropriate entity. The source and the sink need to
+ coordinate the protection states to ensure that bridging and
+ selection are performed to and from the same entity. For this
+ reason, a signaling coordination protocol (either a data-plane in-
+ band signaling protocol or a control-plane-based signaling protocol)
+ is required.
+
+ In bidirectional protection switching, both ends of the protection
+ domain are switched to the protection entity (even when the fault is
+ unidirectional). This requires a protocol to coordinate the
+ protection state between the two end points of the protection domain.
+
+ When there is no defect, the bandwidth resources of the idle entity
+ may be used for traffic with lower priority. When protection
+ switching is performed, the traffic with lower priority may be
+ preempted by the protected traffic through tearing down the LSP with
+ lower priority, reporting a fault on the LSP with lower priority, or
+ by treating the traffic with lower priority as best effort and
+ discarding it when there is congestion.
+
+
+
+Sprecher & Farrel Informational [Page 27]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ In the general case of 1:n linear protection, one protection entity
+ is allocated to protect n working entities. The protection entity
+ might not have sufficient resources to protect all the working
+ entities that may be affected by fault conditions at a specific time.
+ In this case, in order to guaranteed protection, the protection
+ entity should support enough capacity and bandwidth to protect any of
+ the n working entities.
+
+ When defects or failures occur along multiple working entities, the
+ entity to be protected should be prioritized. The protection states
+ between the edges of the protection domain should be fully
+ coordinated to ensure consistent behavior. As explained in Section
+ 4.3.5, revertive behavior is recommended when 1:n is supported.
+
+4.7.2. 1+1 Linear Protection
+
+ In the 1+1 protection scheme, a fully dedicated protection entity is
+ allocated.
+
+ As depicted in Figure 2, data traffic is copied and fed at the source
+ to both the working and the protection entities. The traffic on the
+ working and the protection entities is transmitted simultaneously to
+ the sink of the protection domain, where selection between the
+ working and protection entities is performed (based on some
+ predetermined criteria).
+
+ |---------------Protection Domain---------------|
+
+ ==============================
+ /**********Working path************\
+ +--------+ ============================== +--------+
+ | Node /| |\ Node |
+ | A {< | | >} Z |
+ | \| |/ |
+ +--------+ ============================== +--------+
+ \**********Protection path*********/
+ ==============================
+
+ Figure 2: 1+1 Protection Architecture
+
+ Note that control traffic between the edges of the protection domain
+ (such as OAM or a control protocol to coordinate the protection
+ state, etc.) may be transmitted on an entity that differs from the
+ one used for the protected traffic. These packets should not be
+ discarded by the sink.
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 28]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ In 1+1 unidirectional protection switching, there is no need to
+ coordinate the protection state between the protection controllers at
+ both ends of the protection domain. In 1+1 bidirectional protection
+ switching, a protocol is required to coordinate the protection state
+ between the edges of the protection domain.
+
+ In both protection schemes, traffic flows end-to-end on the working
+ entity after the conditions causing the switchover have been cleared.
+ Data selection may return to selecting traffic from the working
+ entity if reversion is enabled, and it will require coordination of
+ the protection state between the edges of the protection domain. To
+ avoid frequent switching caused by intermittent defects or failures
+ when the network is not stable, traffic is not selected from the
+ working entity before the Wait-To-Restore (WTR) timer has expired.
+
+4.7.3. P2MP Linear Protection
+
+ Linear protection may be applied to protect unidirectional P2MP
+ entities using 1+1 protection architecture. The source/root MPLS-TP
+ node bridges the user traffic to both the working and protection
+ entities. Each sink/leaf MPLS-TP node selects the traffic from one
+ entity according to some predetermined criteria. Note that when
+ there is a fault condition on one of the branches of the P2MP path,
+ some leaf MPLS-TP nodes may select the working entity, while other
+ leaf MPLS-TP nodes may select traffic from the protection entity.
+
+ In a 1:1 P2MP protection scheme, the source/root MPLS-TP node needs
+ to identify the existence of a fault condition on any of the branches
+ of the network. This means that the sink/leaf MPLS-TP nodes need to
+ notify the source/root MPLS-TP node of any fault condition. This
+ also necessitates a return path from the sinks/leaves to the
+ source/root MPLS-TP node. When protection switching is triggered,
+ the source/root MPLS-TP node selects the protection transport path
+ for traffic transfer.
+
+ A form of "segment recovery for P2MP LSPs" could be constructed.
+ Given a P2MP LSP, one can protect any possible point of failure (link
+ or node) using N backup P2MP LSPs. Each backup P2MP LSP originates
+ from the upstream node with respect to a different possible failure
+ point and terminates at all of the destinations downstream of the
+ potential failure point. In case of a failure, traffic is redirected
+ to the backup P2MP path.
+
+ Note that such mechanisms do not yet exist, and their exact behavior
+ is for further study.
+
+ A 1:n protection scheme for P2MP transport paths is also required by
+ [RFC5654]. Such a mechanism is for future study.
+
+
+
+Sprecher & Farrel Informational [Page 29]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+4.7.4. Triggers for the Linear Protection Switching Action
+
+ Protection switching may be performed when:
+
+ o A defect condition is detected on the working entity, and the
+ protection entity has "no" or an inferior condition. Proactive
+ in-band OAM Continuity Check and Connectivity Verification (CC-V)
+ monitoring of both the working and the protection entities may be
+ used to enable the rapid detection of a fault condition. For
+ protection switching, it is common to run a CC-V every 3.33 ms.
+ In the absence of three consecutive CC-V messages, a fault
+ condition is declared. In order to monitor the working and the
+ protection entities, an OAM Maintenance Entity Group should be
+ defined for each entity. OAM indications associated with fault
+ conditions should be provided at the edges of the protection
+ domain that is responsible for the protection-switching operation.
+ Input from OAM performance monitoring that indicates degradation
+ in the working entity may also be used as a trigger for protection
+ switching. In the case of degradation, switching to the
+ protection entity is needed only if the protection entity can
+ exhibit better operating conditions.
+
+ o An indication is received from a lower-layer server that there is
+ a defect in the lower layer.
+
+ o An external operator command is received (e.g., 'Forced Switch',
+ 'Manual Switch'). For details, see Section 6.1.2.
+
+ o A request to switch over is received from the far end. The far
+ end may initiate this request, for example, on receipt of an
+ administrative request to switch over, or when bidirectional 1:1
+ protection switching is supported and a defect occurred that could
+ only be detected by the far end, etc.
+
+ As described above, the protection state should be coordinated
+ between the end points of the protection domain. Control messages
+ should be exchanged between the edges of the protection domain to
+ coordinate the protection state of the edge nodes. Control messages
+ can be delivered using an in-band, data-plane-driven control protocol
+ or a control-plane-based protocol.
+
+ For 50-ms protection switching, it is recommended that an in-band,
+ data-plane-driven signaling protocol be used in order to coordinate
+ the protection states. An in-band, data-plane protocol for use in
+ MPLS-TP networks is documented in [MPLS-TP-LP] for linear protection
+ (ring protection is discussed in Section 4.8 of this document). This
+ protocol is also used to detect mismatches between the configurations
+ provisioned at the ends of the protection domain.
+
+
+
+Sprecher & Farrel Informational [Page 30]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ As described in Section 6.5, the GMPLS control plane already includes
+ procedures and message elements to coordinate the protection states
+ between the edges of the protection domain. These procedures and
+ protocol messages are specified in [RFC4426], [RFC4872], and
+ [RFC4873]. However, these messages lack the capability to coordinate
+ the revertive/non-revertive behavior and the consistency of
+ configured timers at the edges of the protection domain (timers such
+ as WTR, hold-off timer, etc.).
+
+4.7.5. Applicability of Linear Protection for LSP Segments
+
+ In order to implement data-plane-based linear protection on LSP
+ segments, use is made of the Sub-Path Maintenance Element (SPME), an
+ MPLS-TP architectural element defined in [RFC5921]. Maintenance
+ operations (e.g., monitoring, protection, or management) engage with
+ message transmission (e.g., OAM, Protection Path Coordination, etc.)
+ in the maintained domain. Further discussion of the architecture for
+ OAM and SPME is found in [RFC5921] and [RFC6371]. An SPME is an LSP
+ that is basically defined and used for the purposes of OAM
+ monitoring, protection, or management of LSP segments. The SPME uses
+ the MPLS construct of a hierarchical, nested LSP, as defined in
+ [RFC3031].
+
+ For linear protection, SPMEs should be defined over the working and
+ protection entities between the edges of a protection domain. OAM
+ messages and messages used to coordinate protection state can be
+ initiated at the edge of the SPME and sent to the peer edge of the
+ SPME. Note that these messages are sent over the Generic Associated
+ Channel (G-ACh) within the SPME, and that they use a two-label stack,
+ the SPME label, and, at the bottom of the stack, the G-ACh label
+ (GAL) [RFC5586].
+
+ The end-to-end traffic of the LSP, which includes data traffic and
+ control traffic (messages for OAM, management, signaling, and to
+ coordinate protection state), is tunneled within the SPMEs by means
+ of label stacking, as defined in [RFC3031].
+
+ Mapping between an LSP and an SPME can be 1:1; this is similar to the
+ ITU-T Tandem Connection element that defines a sub-layer
+ corresponding to a segment of a path. Mapping can also be 1:n to
+ allow the scalable protection of a set of LSP segments traversing the
+ part of the network in which a protection domain is defined. Note
+ that each of these LSPs can be initiated or terminated at different
+ end points in the network, but that they all traverse the protection
+ domain and share similar constraints (such as requirements for
+ quality of service (QoS), terms of protection, etc.).
+
+
+
+
+
+Sprecher & Farrel Informational [Page 31]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ Note also that in the context of segment protection, the SPMEs serve
+ as the working and protection entities.
+
+4.7.6. Shared Mesh Protection
+
+ For shared mesh protection, the protection resources are used to
+ protect multiple LSPs that do not all share the same end points; for
+ example, in Figure 3 there are two paths, ABCDE and VWXYZ. These
+ paths do not share end points and cannot, therefore, make use of 1:n
+ linear protection, even though they do not have any common points of
+ failure.
+
+ ABCDE may be protected by the path APQRE, while VWXYZ can be
+ protected by the path VPQRZ. In both cases, 1:1 or 1+1 protection
+ may be used. However, it can be seen that if 1:1 protection is used
+ for both paths, the PQR network segment does not carry traffic when
+ no failures affect either of the two working paths. Furthermore, in
+ the event of only one failure, the PQR segment carries traffic from
+ only one of the working paths.
+
+ Thus, it is possible for the network resources on the PQR segment to
+ be shared by the two recovery paths. In this way, mesh protection
+ can substantially reduce the number of network resources that have to
+ be reserved in order to provide 1:n protection.
+
+ A----B----C----D----E
+ \ /
+ \ /
+ \ /
+ P-----Q-----R
+ / \
+ / \
+ / \
+ V----W----X----Y----Z
+
+ Figure 3: A Shared Mesh Protection Topology
+
+ As the network becomes more complex and the number of LSPs increases,
+ the potential for shared mesh protection also increases. However,
+ this can quickly become unmanageable owing to the increased
+ complexity. Therefore, shared mesh protection is normally pre-
+ planned and configured by the operator, although an automated system
+ cannot be ruled out.
+
+ Note that shared mesh protection operates as 1:n linear protection
+ (see Section 4.7.1). However, the protection state needs to be
+ coordinated between a larger number of nodes: the end points of the
+ shared concatenated protection segment (nodes P and R in the example)
+
+
+
+Sprecher & Farrel Informational [Page 32]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ as well as the end points of the protected LSPs (nodes A, E, V, and Z
+ in the example).
+
+ Additionally, note that the shared-protection resources could be used
+ to carry extra traffic. For example, in Figure 4, an LSP JPQRK could
+ be a preemptable LSP that constitutes extra traffic over the PQR
+ hops; it would be displaced in the event of a protection event. In
+ this case, it should be noted that the protection state must also be
+ coordinated with the ends of the extra-traffic LSPs.
+
+ A----B----C----D----E
+ \ /
+ \ /
+ \ /
+ J-----P-----Q-----R-----K
+ / \
+ / \
+ / \
+ V----W----X----Y----Z
+
+ Figure 4: Shared Mesh Protection with Extra Traffic
+
+4.8. Ring Networks
+
+ Several service providers have expressed great interest in the
+ operation of MPLS-TP in ring topologies; they demand a high degree of
+ survivability functionality in these topologies.
+
+ Various criteria for optimization are considered in ring topologies,
+ such as:
+
+ 1. Simplification in ring operation in terms of the number of OAM
+ Maintenance Entities that are needed to trigger the recovery
+ actions, the number of recovery elements, the number of
+ management-plane transactions during maintenance operations, etc.
+
+ 2. Optimization of resource consumption around the ring, such as the
+ number of labels needed for the protection paths that traverse
+ the network, the total bandwidth required in the ring to ensure
+ path protection, etc. (see R91 of [RFC5654]).
+
+ [RFC5654] introduces a list of requirements for ring protection
+ covering the recovery mechanisms needed to protect traffic in a
+ single ring as well as traffic that traverses more than one ring.
+ Note that configuration and the operation of the recovery mechanisms
+ in a ring must scale well with the number of transport paths, the
+ number of nodes, and the number of ring interconnects.
+
+
+
+
+Sprecher & Farrel Informational [Page 33]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ The requirements for ring protection are fully compatible with the
+ generic requirements for recovery.
+
+ The architecture and the mechanisms for ring protection are specified
+ in separate documents. These mechanisms need to be evaluated against
+ the requirements specified in [RFC5654], which includes guidance on
+ the principles for the development of new mechanisms.
+
+4.9. Recovery in Layered Networks
+
+ In multi-layer or multi-regional networking [RFC5212], recovery may
+ be performed at multiple layers or across nested recovery domains.
+
+ The MPLS-TP recovery mechanism must ensure that the timing of
+ recovery is coordinated in order to avoid race scenarios. This also
+ allows the recovery mechanism of the server layer to fix the problem
+ before recovery takes place in the MPLS-TP layer, or the MPLS-TP
+ layer to perform recovery before a client network.
+
+ A hold-off timer is required to coordinate recovery timing in
+ multiple layers or across nested recovery domains. Setting this
+ configurable timer involves a trade-off between rapid recovery and
+ the creation of a race condition where multiple layers respond to the
+ same fault, potentially allocating resources in an inefficient
+ manner. Thus, the detection of a defect condition in the MPLS-TP
+ layer should not immediately trigger the recovery process if the
+ hold-off timer is configured as a value other than zero. Instead,
+ the hold-off timer should be started when the defect is detected and,
+ on expiry, the recovery element should be checked to determine
+ whether the defect condition still exists. If it does exist, the
+ defect triggers the recovery operation.
+
+ The hold-off timer should be configurable.
+
+ In other configurations, where the lower layer does not have a
+ restoration capability, or where it is not expected to provide
+ protection, the lower layer needs to trigger the higher layer to
+ immediately perform recovery. Although this can be forced by
+ configuring the hold-off timer as zero, it may be that because of
+ layer independence, the higher layer does not know whether the lower
+ layer will perform restoration. In this case, the higher layer will
+ configure a non-zero hold-off timer and rely on the receipt of a
+ specific notification from the lower layer if the lower layer cannot
+ perform restoration. Since layer boundaries are always within nodes,
+ such coordination is implementation-specific and does not need to be
+ covered here.
+
+
+
+
+
+Sprecher & Farrel Informational [Page 34]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ Reference should be made to [RFC3386], which discusses the
+ interaction between layers in survivable networks.
+
+4.9.1. Inherited Link-Level Protection
+
+ Where a link in the MPLS-TP network is formed through connectivity
+ (i.e., a packet or non-packet LSP) in a lower-layer network, that
+ connectivity may itself be protected; for example, the LSP in the
+ lower-layer network may be provisioned with 1+1 protection. In this
+ case, the link in the MPLS-TP network has an inherited grade of
+ protection.
+
+ An LSP in the MPLS-TP network may be provisioned with protection in
+ the MPLS-TP network, as already described, or it may be provisioned
+ to utilize only those links that have inherited protection.
+
+ By classifying the links in the MPLS-TP network according to the
+ grade of protection that they inherited from the server network, it
+ is possible to compute an end-to-end path in the MPLS-TP network that
+ uses only those links with a specific or superior grade of inherited
+ protection. This means that the end-to-end MPLS-TP LSP can be
+ protected at the grade necessary to conform to the SLA without
+ needing to provide any additional protection in the MPLS-TP layer.
+ This reduces complexity, saves network resources, and eliminates
+ protection-switching coordination problems.
+
+ When the requisite grade of inherited protection is not available on
+ all segments along the path in the MPLS-TP network, segment
+ protection may be used to achieve the desired protection grade.
+
+ It should be noted, however, that inherited protection only applies
+ to links. Nodes cannot be protected in this way. An operator will
+ need to perform an analysis of the relative likelihood and
+ consequences of node failure if this approach is taken without
+ providing protection in the MPLS-TP LSP or PW layer to handle node
+ failure.
+
+4.9.2. Shared Risk Groups
+
+ When an MPLS-TP protection scheme is established, it is important
+ that the working and protection paths do not share resources in the
+ network. If this is not achieved, a single defect may affect both
+ the working and the protection paths with the result that traffic
+ cannot be delivered -- since under such a condition the traffic was
+ not protected.
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 35]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ Note that this restriction does not apply to restoration, since this
+ takes place after the fault has occurred, which means that the point
+ of failure can be avoided if an available path exists.
+
+ When planning a recovery scheme, it is possible to use a topology map
+ of the MPLS-TP layer to select paths that use diverse links and nodes
+ within the MPLS-TP network. However, this does not guarantee that
+ the paths are truly diverse; for example, two separate links in an
+ MPLS-TP network may be provided by two lambdas in the same optical
+ fiber, or by two fibers that cross the same bridge. Moreover, two
+ completely separate MPLS-TP nodes might be situated in the same
+ building with a shared power supply.
+
+ Thus, in order to achieve proper recovery planning, the MPLS-TP
+ network must have an understanding of the groups of lower-layer
+ resources that share a common risk of failure. From this, MPLS-TP
+ shared risk groups can be constructed that show which MPLS-TP
+ resources share a common risk of failure. Diversity of working and
+ protection paths can be planned, not only with regard to nodes and
+ links but also in order to refrain from using resources from the same
+ shared risk groups.
+
+4.9.3. Fault Correlation
+
+ In a layered network, a low-layer fault may be detected and reported
+ by multiple layers and may sometimes lead to the generation of
+ multiple fault reports from the same layer. For example, a failure
+ of a data link may be reported by the line cards in an MPLS-TP node,
+ but it could also be detected and reported by the MPLS-TP OAM.
+
+ Section 4.6 explains how it is important to coordinate the
+ survivability actions configured and operated in a multi-layer
+ network in a way that will avoid over-equipping the survivability
+ resources in the network, while ensuring that recovery actions are
+ performed in only one layer at a time.
+
+ Fault correlation is about understanding which single event has
+ generated a set of fault reports, so that recovery actions can be
+ coordinated, and so that the fault logging system does not become
+ overloaded. Fault correlation depends on understanding resource use
+ at lower layers, shared risk groups, and a wider view with regard to
+ the way in which the layers are interrelated.
+
+ Fault correlation is most easily performed at the point of fault
+ detection; for example, an MPLS-TP node that receives a fault
+ notification from the lower layer, and detects a fault on an LSP in
+ the MPLS-TP layer, can easily correlate these two events.
+ Furthermore, if the same node detects multiple faults on LSPs that
+
+
+
+Sprecher & Farrel Informational [Page 36]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ share the same faulty data link, it can easily correlate them. Such
+ a node may use correlation to perform group-based recovery actions
+ and can reduce the number of alarm events that it generates to its
+ management station.
+
+ Fault correlation may also be performed at a management station that
+ receives fault reports from different layers and different nodes in
+ the network. This enables the management station to coordinate
+ management-originated recovery actions and to present consolidated
+ fault information to the user and automated management systems.
+
+ It is also necessary to correlate fault information detected and
+ reported through OAM. This function would enable a fault detected at
+ a lower layer, and reported at a transit node of an MPLS-TP LSP, to
+ be correlated with an MPLS-TP-layer fault detected at a Maintenance
+ End Point (MEP) -- for example, the egress of the MPLS-TP LSP. Such
+ correlation allows the coordination of recovery actions performed at
+ the MEP, but it also requires that the lower-layer fault information
+ is propagated to the MEP, which is most easily achieved using a
+ control plane, management plane, or OAM message.
+
+5. Applicability and Scope of Survivability in MPLS-TP
+
+ The MPLS-TP network can be viewed as two layers (the MPLS LSP layer
+ and the PW layer). The MPLS-TP network operates over data-link
+ connections and data-link networks whereby the MPLS-TP links are
+ provided by individual data links or by connections in a lower-layer
+ network. The MPLS LSP layer is a mandatory part of the MPLS-TP
+ network, while the PW layer is an optional addition for supporting
+ specific services.
+
+ MPLS-TP survivability provides recovery from failure of the links and
+ nodes in the MPLS-TP network. The link defects and failures are
+ typically caused by defects or failures in the underlying data-link
+ connections and networks, but this section is only concerned with
+ recovery actions performed in the MPLS-TP network, which must recover
+ from the manifestation of any problem as a defect failure in the
+ MPLS-TP network.
+
+ This section lists the recovery elements (see Section 1) supported in
+ each of the two layers that can recover from defects or failures of
+ nodes or links in the MPLS-TP network.
+
+
+
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 37]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ +--------------+---------------------+------------------------------+
+ | Recovery | MPLS LSP Layer | PW Layer |
+ | Element | | |
+ +--------------+---------------------+------------------------------+
+ | Link | MPLS LSP recovery | The PW layer is not aware of |
+ | Recovery | can be used to | the underlying network. |
+ | | survive the failure | This function is not |
+ | | of an MPLS-TP link. | supported. |
+ +--------------+---------------------+------------------------------+
+ | Segment/Span | An individual LSP | For an SS-PW, segment |
+ | Recovery | segment can be | recovery is the same as |
+ | | recovered to | end-to-end recovery. |
+ | | survive the failure | Segment recovery for an MS-PW|
+ | | of an MPLS-TP link. | is for future study, and |
+ | | | this function is now |
+ | | | provided using end-to-end |
+ | | | recovery. |
+ +--------------+---------------------+------------------------------+
+ | Concatenated | A concatenated LSP | Concatenated segment |
+ | Segment | segment can be | recovery (in an MS-PW) is for|
+ | Recovery | recovered to | future study, and this |
+ | | survive the failure | function is now provided |
+ | | of an MPLS-TP link | using end-to-end recovery. |
+ | | or node. | |
+ +--------------+---------------------+------------------------------+
+ | End-to-End | An end-to-end LSP | End-to-end PW recovery can |
+ | Recovery | can be recovered to | be applied to survive any |
+ | | survive any node or | node (including S-PE) or |
+ | | link failure, | link failure, except for |
+ | | except for the | failure of the ingress or |
+ | | failure of the | egress T-PE. |
+ | | ingress or egress | |
+ | | node. | |
+ +--------------+---------------------+------------------------------+
+ | Service | The MPLS LSP layer | PW-layer service recovery |
+ | Recovery | is service- | requires surviving faults in |
+ | | agnostic. This | T-PEs or on Attachment |
+ | | function is not | Circuits (ACs). This is |
+ | | supported. | currently out of scope for |
+ | | | MPLS-TP. |
+ +--------------+---------------------+------------------------------+
+
+ Table 1: Recovery Elements Supported
+ by the MPLS LSP Layer and PW Layer
+
+ Section 6 provides a description of mechanisms for MPLS-TP-LSP
+ survivability. Section 7 provides a brief overview of mechanisms for
+ MPLS-TP-PW survivability.
+
+
+
+Sprecher & Farrel Informational [Page 38]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+6. Mechanisms for Providing Survivability for MPLS-TP LSPs
+
+ This section describes the existing mechanisms that provide LSP
+ protection within MPLS-TP networks and highlights areas where new
+ work is required.
+
+6.1. Management Plane
+
+ As described above, a fundamental requirement of MPLS-TP is that
+ recovery mechanisms should be capable of functioning in the absence
+ of a control plane. Recovery may be triggered by MPLS-TP OAM fault
+ management functions or by external requests (e.g., an operator's
+ request for manual control of protection switching). Recovery LSPs
+ (and in particular Restoration LSPs) may be provisioned through the
+ management plane.
+
+ The management plane may be used to configure the recovery domain by
+ setting the reference end-point points (which control the recovery
+ actions), the working and the recovery entities, and the recovery
+ type (e.g., 1:1 bidirectional linear protection, ring protection,
+ etc.).
+
+ Additional parameters associated with the recovery process (such as
+ WTR and hold-off timers, revertive/non-revertive operation, etc.) may
+ also be configured.
+
+ In addition, the management plane may initiate manual control of the
+ recovery function. A priority should be set for the fault conditions
+ and the operator's requests.
+
+ Since provisioning the recovery domain involves the selection of a
+ number of options, mismatches may occur at the different reference
+ points. The MPLS-TP protocol to coordinate protection state, which
+ is specified in [MPLS-TP-LP], may be used as an in-band (i.e., data-
+ plane-based) control protocol to coordinate the protection states
+ between the end points of the recovery domain, and to check the
+ consistency of configured parameters (such as timers, revertive/non-
+ revertive behavior, etc.) with discovered inconsistencies that are
+ reported to the operator.
+
+ It should also be possible for the management plane to track the
+ recovery status by receiving reports or by issuing polls.
+
+
+
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 39]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+6.1.1. Configuration of Protection Operation
+
+ To implement the protection-switching mechanisms, the following
+ entities and information should be configured and provisioned:
+
+ o The end points of a recovery domain. As described above, these
+ end points border on the element of recovery to which recovery is
+ applied.
+
+ o The protection group, which, depending on the required protection
+ scheme, consists of a recovery entity and one or more working
+ entities. In 1:1 or 1+1 P2P protection, the paths of the working
+ entity and the recovery entities must be physically diverse in
+ every respect (i.e., not share any resources or physical
+ locations), in order to guarantee protection.
+
+ o As defined in Section 4.8, the SPME must be supported in order to
+ implement data-plane-based LSP segment recovery, since related
+ control messages (e.g., for OAM, Protection Path Coordination,
+ etc.) can be initiated and terminated at the edges of a path where
+ push and pop operations are enabled. The SPME is an end-to-end
+ LSP that in this context corresponds to the recovery entities
+ (working and protection) and makes use of the MPLS construct of
+ hierarchical nested LSP, as defined in [RFC3031]. OAM messages
+ and messages to coordinate protection state can be initiated at
+ the edge of the SPME and sent over G-ACH to the peer edge of the
+ SPME. It is necessary to configure the related SPMEs and map
+ between the LSP segments being protected and the SPME. Mapping
+ can be 1:1 or 1:N to allow scalable protection of a set of LSP
+ segments traversing the part of the network in which a protection
+ domain is defined.
+
+ Note that each of these LSPs can be initiated or terminated at
+ different end points in the network, but that they all traverse
+ the protection domain and share similar constraints (such as
+ requirements for QoS, terms of protection, etc.).
+
+ o The protection type that should be defined (e.g., unidirectional
+ 1:1, bidirectional 1+1, etc.)
+
+ o Revertive/non-revertive behavior should be configured.
+
+ o Timers (such as WTR, hold-off timer, etc.) should be set.
+
+
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 40]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+6.1.2. External Manual Commands
+
+ The following external, manual commands may be provided for manual
+ control of the protection-switching operation. These commands apply
+ to a protection group; they are listed in descending order of
+ priority:
+
+ o Blocked protection action - a manual command to prevent data
+ traffic from switching to the recovery entity. This command
+ actually disables the protection group.
+
+ o Force protection action - a manual command that forces a switch of
+ normal data traffic to the recovery entity.
+
+ o Manual protection action - a manual command that forces a switch of
+ data traffic to the recovery entity only when there is no defect
+ in the recovery entity.
+
+ o Clear switching command - the operator may request that a previous
+ administrative switch command (manual or force switch) be cleared.
+
+6.2. Fault Detection
+
+ Fault detection is a fundamental part of recovery and survivability.
+ In all schemes, with the exception of some types of 1+1 protection,
+ the actions required for the recovery of traffic delivery depend on
+ the discovery of some kind of fault. In 1+1 protection, the selector
+ (at the receiving end) may simply be configured to choose the better
+ signal; thus, it does not detect a fault or degradation of itself,
+ but simply identifies the path that is better for data delivery.
+
+ Faults may be detected in a number of ways depending on the traffic
+ pattern and the underlying hardware. End-to-end faults may be
+ reported by the application or by knowledge of the application's data
+ pattern, but this is an unusual approach. There are two more common
+ mechanisms for detecting faults in the MPLS-TP layer:
+
+ o Faults reported by the lower layers.
+
+ o Faults detected by protocols within the MPLS-TP layer.
+
+ In an IP/MPLS network, the second mechanism may utilize control-plane
+ protocols (such as the routing protocols) to detect a failure of
+ adjacency between neighboring nodes. In an MPLS-TP network, it is
+ possible that no control plane will be present. Even if a control
+ plane is present, it will be a GMPLS control plane [RFC3945], which
+ logically separates control channels from data channels; thus, no
+ conclusion about the health of a data channel can be drawn from the
+
+
+
+Sprecher & Farrel Informational [Page 41]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ failure of an associated control channel. MPLS-TP-layer faults are,
+ therefore, only detected through the use of OAM protocols, as
+ described in Section 6.4.1.
+
+ Faults may, however, be reported by a lower layer. These generally
+ show up as interface failures or data-link failures (sometimes known
+ as connectivity failures) within the MPLS-TP network, for example, an
+ underlying optical link may detect loss of light and report a failure
+ of the MPLS-TP link that uses it. Alternatively, an interface card
+ failure may be reported to the MPLS-TP layer.
+
+ Faults reported by lower layers are only visible in specific nodes
+ within the MPLS-TP network (i.e., at the adjacent end points of the
+ MPLS-TP link). This would only allow recovery to be performed
+ locally, so, to enable recovery to be performed by nodes that are not
+ immediately local to the fault, the fault must be reported (Sections
+ 6.4.3 and 6.5.4).
+
+6.3. Fault Localization
+
+ If an MPLS-TP node detects that there is a fault in an LSP (that is,
+ not a network fault reported from a lower layer, but a fault detected
+ by examining the LSP), it can immediately perform a recovery action.
+ However, unless the location of the fault is known, the only
+ practical options are:
+
+ o Perform end-to-end recovery.
+
+ o Perform some other recovery as a speculative act.
+
+ Since the speculative acts are not guaranteed to achieve the desired
+ results and could consume resources unnecessarily, and since end-to-
+ end recovery can require a lot of network resources, it is important
+ to be able to localize the fault.
+
+ Fault localization may be achieved by dividing the network into
+ protection domains. End-to-end protection is thereby operated on LSP
+ segments, depending on the domain in which the fault is discovered.
+ This necessitates monitoring of the LSP at the domain edges.
+
+ Alternatively, a proactive mechanism of fault localization through
+ OAM (Section 6.4.3) or through the control plane (Section 6.5.3) is
+ required.
+
+ Fault localization is particularly important for restoration because
+ a new path must be selected that avoids the fault. It may not be
+ practical or desirable to select a path that avoids the entire failed
+
+
+
+
+Sprecher & Farrel Informational [Page 42]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ working path, and it is therefore necessary to isolate the fault's
+ location.
+
+6.4. OAM Signaling
+
+ MPLS-TP provides a comprehensive set of OAM tools for fault
+ management and performance monitoring at different nested levels
+ (end-to-end, a portion of a path (LSP or PW), and at the link level)
+ [RFC6371].
+
+ These tools support proactive and on-demand fault management (for
+ fault detection and fault localization) as well as performance
+ monitoring (to measure the quality of the signals and detect
+ degradation).
+
+ To support fast recovery, it is useful to use some of the proactive
+ tools to detect fault conditions (e.g., link/node failure or
+ degradation) and to trigger the recovery action.
+
+ The MPLS-TP OAM messages run in-band with the traffic and support
+ unidirectional and bidirectional P2P paths as well as P2MP paths.
+
+ As described in [RFC6371], MPLS-TP OAM operates in the context of a
+ Maintenance Entity that borders on the OAM responsibilities and
+ represents the portion of a path between two points that is monitored
+ and maintained, and along which OAM messages are exchanged.
+ [RFC6371] refers also to a Maintenance Entity Group (MEG), which is a
+ collection of one or more Maintenance Entities (MEs) that belong to
+ the same transport path (e.g., P2MP transport path) and which are
+ maintained and monitored as a group.
+
+ An ME includes two MEPs (Maintenance Entity Group End Points) that
+ reside at the boundaries of an ME, and a set of zero or more MIPs
+ (Maintenance Entity Group Intermediate Points) that reside within the
+ Maintenance Entity along the path. A MEP is capable of initiating
+ and terminating OAM messages, and as such can only be located at the
+ edges of a path where push and pop operations are supported. In
+ order to define an ME over a portion of path, it is necessary to
+ support SPMEs.
+
+ The SPME is an end-to-end LSP that in this context corresponds to the
+ ME; it uses the MPLS construct of hierarchical nested LSPs, which is
+ defined in [RFC3031]. OAM messages can be initiated at the edge of
+ the SPME and sent over G-ACH to the peer edge of the SPME.
+
+ The related SPMEs must be configured, and mapping must be performed
+ between the LSP segments being monitored and the SPME. Mapping can
+ be 1:1 or 1:N to allow scalable operation. Note that each of these
+
+
+
+Sprecher & Farrel Informational [Page 43]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ LSPs can be initiated or terminated at different end points in the
+ network and can share similar constraints (such as requirements for
+ QoS, terms of protection, etc.).
+
+ With regard to recovery, where MPLS-TP OAM is supported, an OAM
+ Maintenance Entity Group is defined for each of the working and
+ protection entities.
+
+6.4.1. Fault Detection
+
+ MPLS-TP OAM tools may be used proactively to detect the following
+ fault conditions between MEPs:
+
+ o Loss of continuity and misconnectivity - the proactive Continuity
+ Check (CC) function is used to detect loss of continuity between
+ two MEPs in an MEG. The proactive Connectivity Verification (CV)
+ allows a sink MEP to detect a misconnectivity defect (e.g.,
+ mismerge or misconnection) with its peer source MEP when the
+ received packet carries an incorrect ME identifier. For
+ protection switching, it is common to run a CC-V (Continuity Check
+ and Connectivity Verification) message every 3.33 ms. In the
+ absence of three consecutive CC-V messages, loss of continuity is
+ declared and is notified locally to the edge of the recovery
+ domain in order to trigger a recovery action. In some cases, when
+ a slower recovery time is acceptable, it is also possible to
+ lengthen the transmission rate.
+
+ o Signal degradation - notification from OAM performance monitoring
+ indicating degradation in the working entity may also be used as a
+ trigger for protection switching. In the event of degradation,
+ switching to the recovery entity is necessary only if the recovery
+ entity can guarantee better conditions. Degradation can be
+ measured by proactively activating MPLS-TP OAM packet loss
+ measurement or delay measurement.
+
+ o A MEP can receive an indication from its sink MEP of a Remote
+ Defect Indication and locally notify the end point of the recovery
+ domain regarding the fault condition, in order to trigger the
+ recovery action.
+
+6.4.2. Testing for Faults
+
+ The management plane may be used to initiate the testing of links,
+ LSP segments, or entire LSPs.
+
+ MPLS-TP provides OAM tools that may be manually invoked on-demand for
+ a limited period, in order to troubleshoot links, LSP segments, or
+ entire LSPs (e.g., diagnostics, connectivity verification, packet
+
+
+
+Sprecher & Farrel Informational [Page 44]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ loss measurements, etc.). On-demand monitoring covers a combination
+ of "in-service" and "out-of-service" monitoring functions. Out-of-
+ service testing is supported by the OAM on-demand lock operation.
+ The lock operation temporarily disables the transport entity (LSP,
+ LSP segment, or link), preventing the transmission of all types of
+ traffic, with the exceptions of test traffic and OAM (dedicated to
+ the locked entity).
+
+ [RFC6371] describes the operations of the OAM functions that may be
+ initiated on-demand and provides some considerations.
+
+ MPLS-TP also supports in-service and out-of-service testing of the
+ recovery (protection and restoration) mechanism, the integrity of the
+ protection/recovery transport paths, and the coordination protocol
+ between the end points of the recovery domain. The testing operation
+ emulates a protection-switching request but does not perform the
+ actual switching action.
+
+6.4.3. Fault Localization
+
+ MPLS-TP provides OAM tools to locate a fault and determine its
+ precise location. Fault detection often only takes place at key
+ points in the network (such as at LSP end points or at MEPs). This
+ means that a fault may be located anywhere within a segment of the
+ relevant LSP. Finer information granularity is needed to implement
+ optimal recovery actions or to diagnose the fault. On-demand tools
+ like trace-route, loopback, and on-demand CC-V can be used to
+ localize a fault.
+
+ The information may be notified locally to the end point of the
+ recovery domain to allow implementation of optimal recovery action.
+ This may be useful for the re-calculation of a recovery path.
+
+ The information should also be reported to network management for
+ diagnostic purposes.
+
+6.4.4. Fault Reporting
+
+ The end points of a recovery domain should be able to detect fault
+ conditions in the recovery domain and to notify the management plane.
+
+ In addition, a node within a recovery domain that detects a fault
+ condition should also be able to report this to network management.
+ Network management should be capable of correlating the fault reports
+ and identifying the source of the fault.
+
+ MPLS-TP OAM tools support a function where an intermediate node along
+ a path is able to send an alarm report message to the MEP, indicating
+
+
+
+Sprecher & Farrel Informational [Page 45]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ the presence of a fault condition in the server layer that connects
+ it to its adjacent node. This capability allows a MEP to suppress
+ alarms that may be generated as a result of a failure condition in
+ the server layer.
+
+6.4.5. Coordination of Recovery Actions
+
+ As described above, in some cases (such as in bidirectional
+ protection switching, etc.) it is necessary to coordinate the
+ protection states between the edges of the recovery domain.
+ [MPLS-TP-LP] defines procedures, protocol messages, and elements for
+ this purpose.
+
+ The protocol is also used to signal administrative requests (e.g.,
+ manual switch, etc.), but only when these are provisioned at the edge
+ of the recovery domain.
+
+ The protocol also enables mismatches to be detected between the
+ configurations at the ends of the protection domain (such as timers,
+ revertive/non-revertive behavior); these mismatches can subsequently
+ be reported to the management plane.
+
+ In the absence of suitable coordination (owing to failures in the
+ delivery or processing of the coordination protocol messages),
+ protection switching will fail. This means that the operation of the
+ protocol that coordinates the protection state is a fundamental part
+ of protection switching.
+
+6.5. Control Plane
+
+ The GMPLS control plane has been proposed as the control plane for
+ MPLS-TP [RFC5317]. Since GMPLS was designed for use in transport
+ networks, and since it has been implemented and deployed in many
+ networks, it is not surprising that it contains many features that
+ support a high degree of survivability.
+
+ The signaling elements of the GMPLS control plane utilize extensions
+ to the Resource Reservation Protocol (RSVP) (as described in a series
+ of documents commencing with [RFC3471] and [RFC3473]), although it is
+ based on [RFC3209] and [RFC2205]. The architecture for GMPLS is
+ provided in [RFC3945], while [RFC4426] gives a functional description
+ of the protocol extensions needed to support GMPLS-based recovery
+ (i.e., protection and restoration).
+
+ A further control-plane protocol called the Link Management Protocol
+ (LMP) [RFC4204] is part of the GMPLS protocol family and can be used
+ to coordinate fault localization and reporting.
+
+
+
+
+Sprecher & Farrel Informational [Page 46]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ Clearly, the control-plane techniques described here only apply where
+ an MPLS-TP control plane is deployed and operated. All mandatory
+ MPLS-TP survivability features must be enabled, even in the absence
+ of the control plane. However, when present, the control plane may
+ be used to provide alternative mechanisms that may be desirable,
+ since they offer simple automation or a richer feature set.
+
+6.5.1. Fault Detection
+
+ The control plane is unable to detect data-plane faults. However, it
+ does provide mechanisms that detect control-plane faults, and these
+ can be used to recognize data-plane faults when it is evident that
+ the control and data planes are fate-sharing. Although [RFC5654]
+ specifies that MPLS-TP must support an out-of-band control channel,
+ it does not insist that it be used exclusively. This means that
+ there may be deployments where an in-band (or at least an in-fiber)
+ control channel is used. In this scenario, failure of the control
+ channel can be used to infer that there is a failure of the data
+ channel, or, at least, it can be used to trigger an investigation of
+ the health of the data channel.
+
+ Both RSVP and LMP provide a control channel "keep-alive" mechanism
+ (called the Hello message in both cases). Failure to receive a
+ message in the configured/negotiated time period indicates a control-
+ plane failure. GMPLS routing protocols ([RFC4203] and [RFC5307])
+ also include keep-alive mechanisms designed to detect routing
+ adjacency failures. Although these keep-alive mechanisms tend to
+ operate at a relatively low frequency (on the order of seconds), it
+ is still possible that the first indication of a control-plane fault
+ will be received through the routing protocol.
+
+ Note, however, that care must be taken to ascertain that a specific
+ failure is not caused by a problem in the control-plane software or
+ in a processor component at the far end of a link.
+
+ Because of the various issues involved, it is not recommended that
+ the control plane be used as the primary mechanism for fault
+ detection in an MPLS-TP network.
+
+6.5.2. Testing for Faults
+
+ The control plane may be used to initiate and coordinate the testing
+ of links, LSP segments, or entire LSPs. This is important in some
+ technologies where it is necessary to halt data transmission while
+ testing, but it may also be useful where testing needs to be
+ specifically enabled or configured.
+
+
+
+
+
+Sprecher & Farrel Informational [Page 47]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ LMP provides a control-plane mechanism to test the continuity and
+ connectivity (and naming) of individual links. A single management
+ operation is required to initiate the test at one end of the link,
+ while the LMP handles the coordination with the other end of the
+ link. The test mechanism for an MPLS packet link relies on the LMP
+ Test message inserted into the data stream at one end of the link and
+ extracted at the other end of the link. This mechanism need not
+ disrupt data flowing over the link.
+
+ Note that a link in the LMP may, in fact, be an LSP tunnel used to
+ form a link in the MPLS-TP network.
+
+ GMPLS signaling (RSVP) offers two mechanisms that may also assist
+ with fault testing. The first mechanism [RFC3473] defines the
+ Admin_Status object that allows an LSP to be set into "testing mode".
+ The interpretation of this mode is implementation-specific and could
+ be documented more precisely for MPLS-TP. The mode sets the whole
+ LSP into a state where it can be tested; this need not be disruptive
+ to data traffic.
+
+ The second mechanism provided by GMPLS to support testing is
+ described in [GMPLS-OAM]. This protocol extension supports the
+ configuration (including enabling and disabling) of OAM mechanisms
+ for a specific LSP.
+
+6.5.3. Fault Localization
+
+ Fault localization is the process whereby the exact location of a
+ fault is determined. Fault detection often only takes place at key
+ points in the network (such as at LSP end points or at MEPs). This
+ means that a fault may be located anywhere within a segment of the
+ relevant LSP.
+
+ If segment or end-to-end protection is in use, this level of
+ information is often sufficient to repair the LSP. However, if finer
+ information granularity is required (either to implement optimal
+ recovery actions or to diagnose a fault), it is necessary to localize
+ the specific fault.
+
+ LMP provides a cascaded test-and-propagate mechanism that is designed
+ specifically for this purpose.
+
+6.5.4. Fault Status Reporting
+
+ GMPLS signaling uses the Notify message to report fault status
+ [RFC3473]. The Notify message can apply to a single LSP or can carry
+ fault information for a set of LSPs, in order to improve the
+ scalability of fault notification.
+
+
+
+Sprecher & Farrel Informational [Page 48]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ Since the Notify message is targeted at a specific node, it can be
+ delivered rapidly without requiring hop-by-hop processing. It can be
+ targeted at LSP end points or at segment end points (such as MEPs).
+ The target points for Notify messages can be manually configured
+ within the network, or they may be signaled when the LSP is set up.
+
+ This enables the process to be made consistent with segment
+ protection as well as with the concept of Maintenance Entities.
+
+ GMPLS signaling also provides a slower, hop-by-hop mechanism for
+ reporting individual LSP faults on a hop-by-hop basis using PathErr
+ and ResvErr messages.
+
+ [RFC4783] provides a mechanism to coordinate alarms and other event
+ or fault information through GMPLS signaling. This mechanism is
+ useful for understanding the status of the resources used by an LSP
+ and for providing information as to why an LSP is not functioning;
+ however, it is not intended to replace other fault-reporting
+ mechanisms.
+
+ GMPLS routing protocols [RFC4203] and [RFC5307] are used to advertise
+ link availability and capabilities within a GMPLS-enabled network.
+ Thus, the routing protocols can also provide indirect information
+ about network faults; that is, the protocol may stop advertising or
+ may withdraw the advertisement for a failed link, or it may advertise
+ that the link is about to be shut down gracefully [RFC5817]. This
+ mechanisms is, however, not normally considered to be fast enough for
+ use as a trigger for protection switching.
+
+6.5.5. Coordination of Recovery Actions
+
+ Fault coordination is an important feature for certain protection
+ mechanisms (such as bidirectional 1:1 protection). The use of the
+ GMPLS Notify message for this purpose is described in [RFC4426];
+ however, specific message field values have not yet been defined for
+ this operation.
+
+ Further work is needed in GMPLS for control and configuration of
+ reversion behavior for end-to-end and segment protection, and the
+ coordination of timer values.
+
+6.5.6. Establishment of Protection and Restoration LSPs
+
+ The management plane may be used to set up protection and recovery
+ LSPs, but, when present, the control plane may be used.
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 49]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ Several protocol extensions exist that simplify this process:
+
+ o [RFC4872] provides features that support end-to-end protection
+ switching.
+
+ o [RFC4873] describes the establishment of a single, segment-
+ protected LSP. Note that end-to-end protection is a special case
+ of segment protection, and [RFC4872] can also be used to provide
+ end-to-end protection.
+
+ o [RFC4874] allows an LSP to be signaled with a request that its
+ path exclude specified resources such as links, nodes, and shared
+ risk link groups (SRLGs). This allows a disjoint protection path
+ to be requested or a recovery path to be set up to avoid failed
+ resources.
+
+ o Lastly, it should be noted that [RFC5298] provides an overview of
+ the GMPLS techniques available to achieve protection in multi-
+ domain environments.
+
+7. Pseudowire Recovery Considerations
+
+ Pseudowires provide end-to-end connectivity over the MPLS-TP network
+ and may comprise a single pseudowire segment, or multiple segments
+ "stitched" together to provide end-to-end connectivity.
+
+ The pseudowire may, itself, require protection, in order to meet the
+ service-level guarantees of its SLA. This protection could be
+ provided by the MPLS-TP LSPs that support the pseudowire, or could be
+ a feature of the pseudowire layer itself.
+
+ As indicated above, the functional architecture described in this
+ document applies to both LSPs and pseudowires. However, the recovery
+ mechanisms for pseudowires are for further study and will be defined
+ in a separate document by the PWE3 working group.
+
+7.1. Utilization of Underlying MPLS-TP Recovery
+
+ MPLS-TP PWs are carried across the network inside MPLS-TP LSPs.
+ Therefore, an obvious way to provide protection for a PW is to
+ protect the LSP that carries it. Such protection can take any of the
+ forms described in this document. The choice of recovery scheme will
+ depend on the required speed of recovery and the traffic loss that is
+ acceptable for the SLA that the PW is providing.
+
+ If the PW is a Multi-Segment PW, then LSP recovery can only protect
+ the PW in individual segments. This means that a single LSP recovery
+ action cannot protect against a failure of a PW switching point (an
+
+
+
+Sprecher & Farrel Informational [Page 50]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ S-PE), nor can it protect more than one segment at a time, since the
+ LSP tunnel is terminated at each S-PE. In this respect, LSP
+ protection of a PW is very similar to link-level protection offered
+ to the MPLS-TP LSP layer by an underlying network layer (see Section
+ 4.9).
+
+7.2. Recovery in the Pseudowire Layer
+
+ Recovery in the PW layer can be provided by simply running separate
+ PWs end-to-end. Other recovery mechanisms in the PW layer, such as
+ segment or concatenated segment recovery, or service-level recovery
+ involving survivability of T-PE or AC faults will be described in a
+ separate document.
+
+ As with any recovery mechanism, it is important to coordinate between
+ layers. This coordination is necessary to ensure that actions
+ associated with recovery mechanisms are only performed in one layer
+ at a time (that is, the recovery of an underlying LSP needs to be
+ coordinated with the recovery of the PW itself). It also makes sure
+ that the working and protection PWs do not both use the same MPLS
+ resources within the network (for example, by running over the same
+ LSP tunnel; see also Section 4.9).
+
+8. Manageability Considerations
+
+ Manageability of MPLS-TP networks and their functions is discussed in
+ [RFC5950]. OAM features are discussed in [RFC6371].
+
+ Survivability has some key interactions with management, as described
+ in this document. In particular:
+
+ o Recovery domains may be configured in a way that prevents one-to-
+ one correspondence between the MPLS-TP network and the recovery
+ domains.
+
+ o Survivability policies may be configured per network, per recovery
+ domain, or per LSP.
+
+ o Configuration of OAM may involve the selection of MEPs; enabling
+ OAM on network segments, spans, and links; and the operation of
+ OAM on LSPs, concatenated LSP segments, and LSP segments.
+
+ o Manual commands may be used to control recovery functions,
+ including forcing recovery and locking recovery actions.
+
+ See also the considerations regarding security for management and OAM
+ in Section 9 of this document.
+
+
+
+
+Sprecher & Farrel Informational [Page 51]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+9. Security Considerations
+
+ This framework does not introduce any new security considerations;
+ general issues relating to MPLS security can be found in [RFC5920].
+
+ However, several points about MPLS-TP survivability should be noted
+ here.
+
+ o If an attacker is able to force a protection switch-over, this may
+ result in a small perturbation to user traffic and could result in
+ extra traffic being preempted or displaced from the protection
+ resources. In the case of 1:n protection or shared mesh
+ protection, this may result in other traffic becoming unprotected.
+ Therefore, it is important that OAM protocols for detecting or
+ notifying faults use adequate security to prevent them from being
+ used (through the insertion of bogus messages or through the
+ capture of legitimate messages) to falsely trigger a recovery
+ event.
+
+ o If manual commands are modified, captured, or simulated (including
+ replay), it might be possible for an attacker to perform forced
+ recovery actions or to impose lock-out. These actions could
+ impact the capability to provide the recovery function and could
+ also affect the normal operation of the network for other traffic.
+ Therefore, management protocols used to perform manual commands
+ must allow the operator to use appropriate security mechanisms.
+ This includes verification that the user who performs the commands
+ has appropriate authorization.
+
+ o If the control plane is used to configure or operate recovery
+ mechanisms, the control-plane protocols must also be capable of
+ providing adequate security.
+
+10. Acknowledgments
+
+ Thanks to the following people for useful comments and discussions:
+ Italo Busi, David McWalter, Lou Berger, Yaacov Weingarten, Stewart
+ Bryant, Dan Frost, Lievren Levrau, Xuehui Dai, Liu Guoman, Xiao Min,
+ Daniele Ceccarelli, Scott Bradner, Francesco Fondelli, Curtis
+ Villamizar, Maarten Vissers, and Greg Mirsky.
+
+ The Editors would like to thank the participants in ITU-T Study Group
+ 15 for their detailed review.
+
+ Some figures and text on shared mesh protection were borrowed from
+ [MPLS-TP-MESH] with thanks to Tae-sik Cheung and Jeong-dong Ryoo.
+
+
+
+
+
+Sprecher & Farrel Informational [Page 52]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+11. References
+
+11.1. Normative References
+
+ [G.806] ITU-T, "Characteristics of transport equipment -
+ Description methodology and generic functionality",
+ Recommendation G.806, January 2009.
+
+ [G.808.1] ITU-T, "Generic Protection Switching - Linear trail
+ and subnetwork protection", Recommendation G.808.1,
+ December 2003.
+
+ [G.841] ITU-T, "Types and Characteristics of SDH Network
+ Protection Architectures", Recommendation G.841,
+ October 1998.
+
+ [RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S.,
+ and S. Jamin, "Resource ReSerVation Protocol (RSVP) --
+ Version 1 Functional Specification", RFC 2205,
+ September 1997.
+
+ [RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan,
+ V., and G. Swallow, "RSVP-TE: Extensions to RSVP for
+ LSP Tunnels", RFC 3209, December 2001.
+
+ [RFC3471] Berger, L., Ed., "Generalized Multi-Protocol Label
+ Switching (GMPLS) Signaling Functional Description",
+ RFC 3471, January 2003.
+
+ [RFC3473] Berger, L., Ed., "Generalized Multi-Protocol Label
+ Switching (GMPLS) Signaling Resource ReserVation
+ Protocol-Traffic Engineering (RSVP-TE) Extensions",
+ RFC 3473, January 2003.
+
+ [RFC3945] Mannie, E., Ed., "Generalized Multi-Protocol Label
+ Switching (GMPLS) Architecture", RFC 3945, October
+ 2004.
+
+ [RFC4203] Kompella, K., Ed., and Y. Rekhter, Ed., "OSPF
+ Extensions in Support of Generalized Multi-Protocol
+ Label Switching (GMPLS)", RFC 4203, October 2005.
+
+ [RFC4204] Lang, J., Ed., "Link Management Protocol (LMP)", RFC
+ 4204, October 2005.
+
+
+
+
+
+
+
+Sprecher & Farrel Informational [Page 53]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ [RFC4427] Mannie, E., Ed., and D. Papadimitriou, Ed., "Recovery
+ (Protection and Restoration) Terminology for
+ Generalized Multi-Protocol Label Switching (GMPLS)",
+ RFC 4427, March 2006.
+
+ [RFC4428] Papadimitriou, D., Ed., and E. Mannie, Ed., "Analysis
+ of Generalized Multi-Protocol Label Switching
+ (GMPLS)-based Recovery Mechanisms (including
+ Protection and Restoration)", RFC 4428, March 2006.
+
+ [RFC4873] Berger, L., Bryskin, I., Papadimitriou, D., and A.
+ Farrel, "GMPLS Segment Recovery", RFC 4873, May 2007.
+
+ [RFC5307] Kompella, K., Ed., and Y. Rekhter, Ed., "IS-IS
+ Extensions in Support of Generalized Multi-Protocol
+ Label Switching (GMPLS)", RFC 5307, October 2008.
+
+ [RFC5317] Bryant, S., Ed., and L. Andersson, Ed., "Joint Working
+ Team (JWT) Report on MPLS Architectural Considerations
+ for a Transport Profile", RFC 5317, February 2009.
+
+ [RFC5586] Bocci, M., Ed., Vigoureux, M., Ed., and S. Bryant,
+ Ed., "MPLS Generic Associated Channel", RFC 5586, June
+ 2009.
+
+ [RFC5654] Niven-Jenkins, B., Ed., Brungard, D., Ed., Betts, M.,
+ Ed., Sprecher, N., and S. Ueno, "Requirements of an
+ MPLS Transport Profile", RFC 5654, September 2009.
+
+ [RFC5921] Bocci, M., Ed., Bryant, S., Ed., Frost, D., Ed.,
+ Levrau, L., and L. Berger, "A Framework for MPLS in
+ Transport Networks", RFC 5921, July 2010.
+
+ [RFC5950] Mansfield, S., Ed., Gray, E., Ed., and K. Lam, Ed.,
+ "Network Management Framework for MPLS-based Transport
+ Networks", RFC 5950, September 2010.
+
+ [RFC6371] Buci, I., Ed. and B. Niven-Jenkins, Ed., "A Framework
+ for MPLS in Transport Networks", RFC 6371, September
+ 2011.
+
+11.2. Informative References
+
+ [GMPLS-OAM] Takacs, A., Fedyk, D., and J. He, "GMPLS RSVP-TE
+ extensions for OAM Configuration", Work in Progress,
+ July 2011.
+
+
+
+
+
+Sprecher & Farrel Informational [Page 54]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ [MPLS-TP-LP] Weingarten, Y., Osborne, E., Sprecher, N., Fulignoli,
+ A., Ed., and Y. Weingarten, Ed., "MPLS-TP Linear
+ Protection", Work in Progress, August 2011.
+
+ [MPLS-TP-MESH] Cheung, T. and J. Ryoo, "MPLS-TP Shared Mesh
+ Protection", Work in Progress, April 2011.
+
+ [RFC3031] Rosen, E., Viswanathan, A., and R. Callon,
+ "Multiprotocol Label Switching Architecture", RFC
+ 3031, January 2001.
+
+ [RFC3386] Lai, W., Ed., and D. McDysan, Ed., "Network Hierarchy
+ and Multilayer Survivability", RFC 3386, November
+ 2002.
+
+ [RFC3469] Sharma, V., Ed., and F. Hellstrand, Ed., "Framework
+ for Multi-Protocol Label Switching (MPLS)-based
+ Recovery", RFC 3469, February 2003.
+
+ [RFC4397] Bryskin, I. and A. Farrel, "A Lexicography for the
+ Interpretation of Generalized Multiprotocol Label
+ Switching (GMPLS) Terminology within the Context of
+ the ITU-T's Automatically Switched Optical Network
+ (ASON) Architecture", RFC 4397, February 2006.
+
+ [RFC4426] Lang, J., Ed., Rajagopalan, B., Ed., and D.
+ Papadimitriou, Ed., "Generalized Multi-Protocol Label
+ Switching (GMPLS) Recovery Functional Specification",
+ RFC 4426, March 2006.
+
+ [RFC4726] Farrel, A., Vasseur, J.-P., and A. Ayyangar, "A
+ Framework for Inter-Domain Multiprotocol Label
+ Switching Traffic Engineering", RFC 4726, November
+ 2006.
+
+ [RFC4783] Berger, L., Ed., "GMPLS - Communication of Alarm
+ Information", RFC 4783, December 2006.
+
+ [RFC4872] Lang, J., Ed., Rekhter, Y., Ed., and D. Papadimitriou,
+ Ed., "RSVP-TE Extensions in Support of End-to-End
+ Generalized Multi-Protocol Label Switching (GMPLS)
+ Recovery", RFC 4872, May 2007.
+
+ [RFC4874] Lee, CY., Farrel, A., and S. De Cnodder, "Exclude
+ Routes - Extension to Resource ReserVation Protocol-
+ Traffic Engineering (RSVP-TE)", RFC 4874, April 2007.
+
+
+
+
+
+Sprecher & Farrel Informational [Page 55]
+
+RFC 6372 MPLS-TP Survivability Framework September 2011
+
+
+ [RFC5212] Shiomoto, K., Papadimitriou, D., Le Roux, JL.,
+ Vigoureux, M., and D. Brungard, "Requirements for
+ GMPLS-Based Multi-Region and Multi-Layer Networks
+ (MRN/MLN)", RFC 5212, July 2008.
+
+ [RFC5298] Takeda, T., Ed., Farrel, A., Ed., Ikejiri, Y., and JP.
+ Vasseur, "Analysis of Inter-Domain Label Switched Path
+ (LSP) Recovery", RFC 5298, August 2008.
+
+ [RFC5817] Ali, Z., Vasseur, JP., Zamfir, A., and J. Newton,
+ "Graceful Shutdown in MPLS and Generalized MPLS
+ Traffic Engineering Networks", RFC 5817, April 2010.
+
+ [RFC5920] Fang, L., Ed., "Security Framework for MPLS and GMPLS
+ Networks", RFC 5920, July 2010.
+
+ [RFC6373] Andersson, L., Ed., Berger, L., Ed., Fang, L., Ed.,
+ and Bitar, N., Ed, and E. Gray, Ed., "MPLS-TP Control
+ Plane Framework", RFC 6373, September 2011.
+
+ [RFC6291] Andersson, L., van Helvoort, H., Bonica, R.,
+ Romascanu, D., and S. Mansfield, "Guidelines for the
+ Use of the "OAM" Acronym in the IETF", BCP 161, RFC
+ 6291, June 2011.
+
+ [ROSETTA] Van Helvoort, H., Ed., Andersson, L., Ed., and N.
+ Sprecher, Ed., "A Thesaurus for the Terminology used
+ in Multiprotocol Label Switching Transport Profile
+ (MPLS-TP) drafts/RFCs and ITU-T's Transport Network
+ Recommendations", Work in Progress, June 2011.
+
+Authors' Addresses
+
+ Nurit Sprecher (editor)
+ Nokia Siemens Networks
+ 3 Hanagar St.
+ Neve Ne'eman B Hod
+ Hasharon, 45241 Israel
+
+ EMail: nurit.sprecher@nsn.com
+
+
+ Adrian Farrel (editor)
+ Juniper Networks
+
+ EMail: adrian@olddog.co.uk
+
+
+
+
+
+Sprecher & Farrel Informational [Page 56]
+