summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc4428.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4428.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc4428.txt')
-rw-r--r--doc/rfc/rfc4428.txt2635
1 files changed, 2635 insertions, 0 deletions
diff --git a/doc/rfc/rfc4428.txt b/doc/rfc/rfc4428.txt
new file mode 100644
index 0000000..2cf0284
--- /dev/null
+++ b/doc/rfc/rfc4428.txt
@@ -0,0 +1,2635 @@
+
+
+
+
+
+
+Network Working Group D. Papadimitriou, Ed.
+Request for Comments: 4428 Alcatel
+Category: Informational E. Mannie, Ed.
+ Perceval
+ March 2006
+
+
+ Analysis of Generalized Multi-Protocol Label Switching (GMPLS)-based
+ Recovery Mechanisms (including Protection and Restoration)
+
+Status of This Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2006).
+
+Abstract
+
+ This document provides an analysis grid to evaluate, compare, and
+ contrast the Generalized Multi-Protocol Label Switching (GMPLS)
+ protocol suite capabilities with the recovery mechanisms currently
+ proposed at the IETF CCAMP Working Group. A detailed analysis of
+ each of the recovery phases is provided using the terminology defined
+ in RFC 4427. This document focuses on transport plane survivability
+ and recovery issues and not on control plane resilience and related
+ aspects.
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 2. Contributors ....................................................4
+ 3. Conventions Used in this Document ...............................5
+ 4. Fault Management ................................................5
+ 4.1. Failure Detection ..........................................5
+ 4.2. Failure Localization and Isolation .........................8
+ 4.3. Failure Notification .......................................9
+ 4.4. Failure Correlation .......................................11
+ 5. Recovery Mechanisms ............................................11
+ 5.1. Transport vs. Control Plane Responsibilities ..............11
+ 5.2. Technology-Independent and Technology-Dependent
+ Mechanisms ................................................12
+ 5.2.1. OTN Recovery .......................................12
+ 5.2.2. Pre-OTN Recovery ...................................13
+ 5.2.3. SONET/SDH Recovery .................................13
+
+
+
+Papadimitriou & Mannie Informational [Page 1]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ 5.3. Specific Aspects of Control Plane-Based Recovery
+ Mechanisms ................................................14
+ 5.3.1. In-Band vs. Out-Of-Band Signaling ..................14
+ 5.3.2. Uni- vs. Bi-Directional Failures ...................15
+ 5.3.3. Partial vs. Full Span Recovery .....................17
+ 5.3.4. Difference between LSP, LSP Segment and
+ Span Recovery ......................................18
+ 5.4. Difference between Recovery Type and Scheme ...............19
+ 5.5. LSP Recovery Mechanisms ...................................21
+ 5.5.1. Classification .....................................21
+ 5.5.2. LSP Restoration ....................................23
+ 5.5.3. Pre-Planned LSP Restoration ........................24
+ 5.5.4. LSP Segment Restoration ............................25
+ 6. Reversion ......................................................26
+ 6.1. Wait-To-Restore (WTR) .....................................26
+ 6.2. Revertive Mode Operation ..................................26
+ 6.3. Orphans ...................................................27
+ 7. Hierarchies ....................................................27
+ 7.1. Horizontal Hierarchy (Partitioning) .......................28
+ 7.2. Vertical Hierarchy (Layers) ...............................28
+ 7.2.1. Recovery Granularity ...............................30
+ 7.3. Escalation Strategies .....................................30
+ 7.4. Disjointness ..............................................31
+ 7.4.1. SRLG Disjointness ..................................32
+ 8. Recovery Mechanisms Analysis ...................................33
+ 8.1. Fast Convergence (Detection/Correlation and
+ Hold-off Time) ............................................34
+ 8.2. Efficiency (Recovery Switching Time) ......................34
+ 8.3. Robustness ................................................35
+ 8.4. Resource Optimization .....................................36
+ 8.4.1. Recovery Resource Sharing ..........................37
+ 8.4.2. Recovery Resource Sharing and SRLG Recovery ........39
+ 8.4.3. Recovery Resource Sharing, SRLG
+ Disjointness and Admission Control .................40
+ 9. Summary and Conclusions ........................................42
+ 10. Security Considerations .......................................43
+ 11. Acknowledgements ..............................................43
+ 12. References ....................................................44
+ 12.1. Normative References .....................................44
+ 12.2. Informative References ...................................44
+
+
+
+
+
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 2]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+1. Introduction
+
+ This document provides an analysis grid to evaluate, compare, and
+ contrast the Generalized MPLS (GMPLS) protocol suite capabilities
+ with the recovery mechanisms proposed at the IETF CCAMP Working
+ Group. The focus is on transport plane survivability and recovery
+ issues and not on control-plane-resilience-related aspects. Although
+ the recovery mechanisms described in this document impose different
+ requirements on GMPLS-based recovery protocols, the protocols'
+ specifications will not be covered in this document. Though the
+ concepts discussed are technology independent, this document
+ implicitly focuses on SONET [T1.105]/SDH [G.707], Optical Transport
+ Networks (OTN) [G.709], and pre-OTN technologies, except when
+ specific details need to be considered (for instance, in the case of
+ failure detection).
+
+ A detailed analysis is provided for each of the recovery phases as
+ identified in [RFC4427]. These phases define the sequence of generic
+ operations that need to be performed when a LSP/Span failure (or any
+ other event generating such failures) occurs:
+
+ - Phase 1: Failure Detection
+ - Phase 2: Failure Localization (and Isolation)
+ - Phase 3: Failure Notification
+ - Phase 4: Recovery (Protection or Restoration)
+ - Phase 5: Reversion (Normalization)
+
+ Together, failure detection, localization, and notification phases
+ are referred to as "fault management". Within a recovery domain, the
+ entities involved during the recovery operations are defined in
+ [RFC4427]; these entities include ingress, egress, and intermediate
+ nodes. The term "recovery mechanism" is used to cover both
+ protection and restoration mechanisms. Specific terms such as
+ "protection" and "restoration" are used only when differentiation is
+ required. Likewise the term "failure" is used to represent both
+ signal failure and signal degradation.
+
+ In addition, when analyzing the different hierarchical recovery
+ mechanisms including disjointness-related issues, a clear distinction
+ is made between partitioning (horizontal hierarchy) and layering
+ (vertical hierarchy). In order to assess the current GMPLS protocol
+ capabilities and the potential need for further extensions, the
+ dimensions for analyzing each of the recovery mechanisms detailed in
+ this document are introduced. This document concludes by detailing
+ the applicability of the current GMPLS protocol building blocks for
+ recovery purposes.
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 3]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+2. Contributors
+
+ This document is the result of the CCAMP Working Group Protection and
+ Restoration design team joint effort. Besides the editors, the
+ following are the authors that contributed to the present memo:
+
+ Deborah Brungard (AT&T)
+ 200 S. Laurel Ave.
+ Middletown, NJ 07748, USA
+
+ EMail: dbrungard@att.com
+
+
+ Sudheer Dharanikota
+
+ EMail: sudheer@ieee.org
+
+
+ Jonathan P. Lang (Sonos)
+ 506 Chapala Street
+ Santa Barbara, CA 93101, USA
+
+ EMail: jplang@ieee.org
+
+
+ Guangzhi Li (AT&T)
+ 180 Park Avenue,
+ Florham Park, NJ 07932, USA
+
+ EMail: gli@research.att.com
+
+
+ Eric Mannie
+ Perceval
+ Rue Tenbosch, 9
+ 1000 Brussels
+ Belgium
+
+ Phone: +32-2-6409194
+ EMail: eric.mannie@perceval.net
+
+
+ Dimitri Papadimitriou (Alcatel)
+ Francis Wellesplein, 1
+ B-2018 Antwerpen, Belgium
+
+ EMail: dimitri.papadimitriou@alcatel.be
+
+
+
+
+Papadimitriou & Mannie Informational [Page 4]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ Bala Rajagopalan
+ Microsoft India Development Center
+ Hyderabad, India
+
+ EMail: balar@microsoft.com
+
+
+ Yakov Rekhter (Juniper)
+ 1194 N. Mathilda Avenue
+ Sunnyvale, CA 94089, USA
+
+ EMail: yakov@juniper.net
+
+3. Conventions Used in this Document
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [RFC2119].
+
+ Any other recovery-related terminology used in this document conforms
+ to that defined in [RFC4427]. The reader is also assumed to be
+ familiar with the terminology developed in [RFC3945], [RFC3471],
+ [RFC3473], [RFC4202], and [RFC4204].
+
+4. Fault Management
+
+4.1. Failure Detection
+
+ Transport failure detection is the only phase that cannot be achieved
+ by the control plane alone because the latter needs a hook to the
+ transport plane in order to collect the related information. It has
+ to be emphasized that even if failure events themselves are detected
+ by the transport plane, the latter, upon a failure condition, must
+ trigger the control plane for subsequent actions through the use of
+ GMPLS signaling capabilities (see [RFC3471] and [RFC3473]) or Link
+ Management Protocol capabilities (see [RFC4204], Section 6).
+
+ Therefore, by definition, transport failure detection is transport
+ technology dependent (and so exceptionally, we keep here the
+ "transport plane" terminology). In transport fault management,
+ distinction is made between a defect and a failure. Here, the
+ discussion addresses failure detection (persistent fault cause). In
+ the technology-dependent descriptions, a more precise specification
+ will be provided.
+
+ As an example, SONET/SDH (see [G.707], [G.783], and [G.806]) provides
+ supervision capabilities covering:
+
+
+
+
+Papadimitriou & Mannie Informational [Page 5]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ - Continuity: SONET/SDH monitors the integrity of the continuity of a
+ trail (i.e., section or path). This operation is performed by
+ monitoring the presence/absence of the signal. Examples are Loss
+ of Signal (LOS) detection for the physical layer, Unequipped (UNEQ)
+ Signal detection for the path layer, Server Signal Fail Detection
+ (e.g., AIS) at the client layer.
+
+ - Connectivity: SONET/SDH monitors the integrity of the routing of
+ the signal between end-points. Connectivity monitoring is needed
+ if the layer provides flexible connectivity, either automatically
+ (e.g., cross-connects) or manually (e.g., fiber distribution
+ frame). An example is the Trail (i.e., section or path) Trace
+ Identifier used at the different layers and the corresponding Trail
+ Trace Identifier Mismatch detection.
+
+ - Alignment: SONET/SDH checks that the client and server layer frame
+ start can be correctly recovered from the detection of loss of
+ alignment. The specific processes depend on the signal/frame
+ structure and may include: (multi-)frame alignment, pointer
+ processing, and alignment of several independent frames to a common
+ frame start in case of inverse multiplexing. Loss of alignment is
+ a generic term. Examples are loss of frame, loss of multi-frame,
+ or loss of pointer.
+
+ - Payload type: SONET/SDH checks that compatible adaptation functions
+ are used at the source and the destination. Normally, this is done
+ by adding a payload type identifier (referred to as the "signal
+ label") at the source adaptation function and comparing it with the
+ expected identifier at the destination. For instance, the payload
+ type identifier is compared with the corresponding mismatch
+ detection.
+
+ - Signal Quality: SONET/SDH monitors the performance of a signal.
+ For instance, if the performance falls below a certain threshold, a
+ defect -- excessive errors (EXC) or degraded signal (DEG) -- is
+ detected.
+
+ The most important point is that the supervision processes and the
+ corresponding failure detection (used to initiate the recovery
+ phase(s)) result in either:
+
+ - Signal Degrade (SD): A signal indicating that the associated data
+ has degraded in the sense that a degraded defect condition is
+ active (for instance, a dDEG declared when the Bit Error Rate
+ exceeds a preset threshold). Or
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 6]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ - Signal Fail (SF): A signal indicating that the associated data has
+ failed in the sense that a signal interrupting near-end defect
+ condition is active (as opposed to the degraded defect).
+
+ In Optical Transport Networks (OTN), equivalent supervision
+ capabilities are provided at the optical/digital section layers
+ (i.e., Optical Transmission Section (OTS), Optical Multiplex Section
+ (OMS) and Optical channel Transport Unit (OTU)) and at the
+ optical/digital path layers (i.e., Optical Channel (OCh) and Optical
+ channel Data Unit (ODU)). Interested readers are referred to the
+ ITU-T Recommendations [G.798] and [G.709] for more details.
+
+ The above are examples that illustrate cases where the failure
+ detection and reporting entities (see [RFC4427]) are co-located. The
+ following example illustrates the scenario where the failure
+ detecting and reporting entities (see [RFC4427]) are not co-located.
+
+ In pre-OTN networks, a failure may be masked by intermediate O-E-O
+ based Optical Line System (OLS), preventing a Photonic Cross-Connect
+ (PXC) from detecting upstream failures. In such cases, failure
+ detection may be assisted by an out-of-band communication channel,
+ and failure condition may be reported to the PXC control plane. This
+ can be provided by using [RFC4209] extensions that deliver IP
+ message-based communication between the PXC and the OLS control
+ plane. Also, since PXCs are independent of the framing format,
+ failure conditions can only be triggered either by detecting the
+ absence of the optical signal or by measuring its quality. These
+ mechanisms are generally less reliable than electrical (digital)
+ ones. Both types of detection mechanisms are outside the scope of
+ this document. If the intermediate OLS supports electrical (digital)
+ mechanisms, using the LMP communication channel, these failure
+ conditions are reported to
+
+ the PXC and subsequent recovery actions are performed as described in
+ Section 5. As such, from the control plane viewpoint, this mechanism
+ turns the OLS-PXC-composed system into a single logical entity, thus
+ having the same failure management mechanisms as any other O-E-O
+ capable device.
+
+ More generally, the following are typical failure conditions in
+ SONET/SDH and pre-OTN networks:
+
+ - Loss of Light (LOL)/Loss of Signal (LOS): Signal Failure (SF)
+ condition where the optical signal is not detected any longer on
+ the receiver of a given interface.
+
+ - Signal Degrade (SD): detection of the signal degradation over
+ a specific period of time.
+
+
+
+Papadimitriou & Mannie Informational [Page 7]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ - For SONET/SDH payloads, all of the above-mentioned supervision
+ capabilities can be used, resulting in SD or SF conditions.
+
+ In summary, the following cases apply when considering the
+ communication between the detecting and reporting entities:
+
+ - Co-located detecting and reporting entities: both the detecting and
+ reporting entities are on the same node (e.g., SONET/SDH equipment,
+ Opaque cross-connects, and, with some limitations, Transparent
+ cross-connects, etc.)
+
+ - Non-co-located detecting and reporting entities:
+
+ o with in-band communication between entities: entities are
+ physically separated, but the transport plane provides in-band
+ communication between them (e.g., Server Signal Failures such as
+ Alarm Indication Signal (AIS), etc.)
+
+ o with out-of-band communication between entities: entities are
+ physically separated, but an out-of-band communication channel is
+ provided between them (e.g., using [RFCF4204]).
+
+4.2. Failure Localization and Isolation
+
+ Failure localization provides information to the deciding entity
+ about the location (and so the identity) of the transport plane
+ entity that detects the LSP(s)/span(s) failure. The deciding entity
+ can then make an accurate decision to achieve finer grained recovery
+ switching action(s). Note that this information can also be included
+ as part of the failure notification (see Section 4.3).
+
+ In some cases, this accurate failure localization information may be
+ less urgent to determine if it requires performing more time-
+ consuming failure isolation (see also Section 4.4). This is
+ particularly the case when edge-to-edge LSP recovery is performed
+ based on a simple failure notification (including the identification
+ of the working LSPs under failure condition). Note that "edge"
+ refers to a sub-network end-node, for instance. In this case, a more
+ accurate localization and isolation can be performed after recovery
+ of these LSPs.
+
+ Failure localization should be triggered immediately after the fault
+ detection phase. This operation can be performed at the transport
+ plane and/or (if the operation is unavailable via the transport
+ plane) the control plane level where dedicated signaling messages can
+ be used. When performed at the control plane level, a protocol such
+ as LMP (see [RFC4204], Section 6) can be used for failure
+ localization purposes.
+
+
+
+Papadimitriou & Mannie Informational [Page 8]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+4.3. Failure Notification
+
+ Failure notification is used 1) to inform intermediate nodes that an
+ LSP/span failure has occurred and has been detected and 2) to inform
+ the deciding entities (which can correspond to any intermediate or
+ end-point of the failed LSP/span) that the corresponding service is
+ not available. In general, these deciding entities will be the ones
+ making the appropriate recovery decision. When co-located with the
+ recovering entity, these entities will also perform the corresponding
+ recovery action(s).
+
+ Failure notification can be provided either by the transport or by
+ the control plane. As an example, let us first briefly describe the
+ failure notification mechanism defined at the SONET/SDH transport
+ plane level (also referred to as maintenance signal supervision):
+
+ - AIS (Alarm Indication Signal) occurs as a result of a failure
+ condition such as Loss of Signal and is used to notify downstream
+ nodes (of the appropriate layer processing) that a failure has
+ occurred. AIS performs two functions: 1) inform the intermediate
+ nodes (with the appropriate layer monitoring capability) that a
+ failure has been detected and 2) notify the connection end-point
+ that the service is no longer available.
+
+ For a distributed control plane supporting one (or more) failure
+ notification mechanism(s), regardless of the mechanism's actual
+ implementation, the same capabilities are needed with more (or less)
+ information provided about the LSPs/spans under failure condition,
+ their detailed statuses, etc.
+
+ The most important difference between these mechanisms is related to
+ the fact that transport plane notifications (as defined today) would
+ directly initiate either a certain type of protection switching (such
+ as those described in [RFC4427]) via the transport plane or
+ restoration actions via the management plane.
+
+ On the other hand, using a failure notification mechanism through the
+ control plane would provide the possibility of triggering either a
+ protection or a restoration action via the control plane. This has
+ the advantage that a control-plane-recovery-responsible entity does
+ not necessarily have to be co-located with a transport
+ maintenance/recovery domain. A control plane recovery domain can be
+ defined at entities not supporting a transport plane recovery.
+
+ Moreover, as specified in [RFC3473], notification message exchanges
+ through a GMPLS control plane may not follow the same path as the
+ LSP/spans for which these messages carry the status. In turn, this
+ ensures a fast, reliable (through acknowledgement and the use of
+
+
+
+Papadimitriou & Mannie Informational [Page 9]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ either a dedicated control plane network or disjoint control
+ channels), and efficient (through the aggregation of several LSP/span
+ statuses within the same message) failure notification mechanism.
+
+ The other important properties to be met by the failure notification
+ mechanism are mainly the following:
+
+ - Notification messages must provide enough information such that the
+ most efficient subsequent recovery action will be taken at the
+ recovering entities (in most of the recovery types and schemes this
+ action is even deterministic). Remember here that these entities
+ can be either intermediate or end-points through which normal
+ traffic flows. Based on local policy, intermediate nodes may not
+ use this information for subsequent recovery actions (see for
+ instance the APS protocol phases as described in [RFC4427]). In
+ addition, since fast notification is a mechanism running in
+ collaboration with the existing GMPLS signaling (see [RFC3473])
+ that also allows intermediate nodes to stay informed about the
+ status of the working LSP/spans under failure condition.
+
+ The trade-off here arises when defining what information the
+ LSP/span end-points (more precisely, the deciding entities) need in
+ order for the recovering entity to take the best recovery action:
+ If not enough information is provided, the decision cannot be
+ optimal (note that in this eventuality, the important issue is to
+ quantify the level of sub-optimality). If too much information is
+ provided, the control plane may be overloaded with unnecessary
+ information and the aggregation/correlation of this notification
+ information will be more complex and time-consuming to achieve.
+ Note that a more detailed quantification of the amount of
+ information to be exchanged and processed is strongly dependent on
+ the failure notification protocol.
+
+ - If the failure localization and isolation are not performed by one
+ of the LSP/span end-points or some intermediate points, the points
+ should receive enough information from the notification message in
+ order to locate the failure. Otherwise, they would need to (re-)
+ initiate a failure localization and isolation action.
+
+ - Avoiding so-called notification storms implies that 1) the failure
+ detection output is correlated (i.e., alarm correlation) and
+ aggregated at the node detecting the failure(s), 2) the failure
+ notifications are directed to a restricted set of destinations (in
+ general the end-points), and 3) failure notification suppression
+ (i.e., alarm suppression) is provided in order to limit flooding in
+ case of multiple and/or correlated failures detected at several
+ locations in the network.
+
+
+
+
+Papadimitriou & Mannie Informational [Page 10]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ - Alarm correlation and aggregation (at the failure-detecting node)
+ implies a consistent decision based on the conditions for which a
+ trade-off between fast convergence (at detecting node) and fast
+ notification (implying that correlation and aggregation occurs at
+ receiving end-points) can be found.
+
+4.4. Failure Correlation
+
+ A single failure event (such as a span failure) can cause multiple
+ failure (such as individual LSP failures) conditions to be reported.
+ These can be grouped (i.e., correlated) to reduce the number of
+ failure conditions communicated on the reporting channel, for both
+ in-band and out-of-band failure reporting.
+
+ In such a scenario, it can be important to wait for a certain period
+ of time, typically called failure correlation time, and gather all
+ the failures to report them as a group of failures (or simply group
+ failure). For instance, this approach can be provided using LMP-WDM
+ for pre-OTN networks (see [RFC4209]) or when using Signal
+ Failure/Degrade Group in the SONET/SDH context.
+
+ Note that a default average time interval during which failure
+ correlation operation can be performed is difficult to provide since
+ it is strongly dependent on the underlying network topology.
+ Therefore, providing a per-node configurable failure correlation time
+ can be advisable. The detailed selection criteria for this time
+ interval are outside of the scope of this document.
+
+ When failure correlation is not provided, multiple failure
+ notification messages may be sent out in response to a single failure
+ (for instance, a fiber cut). Each failure notification message
+ contains a set of information on the failed working resources (for
+ instance, the individual lambda LSP flowing through this fiber).
+ This allows for a more prompt response, but can potentially overload
+ the control plane due to a large amount of failure notifications.
+
+5. Recovery Mechanisms
+
+5.1. Transport vs. Control Plane Responsibilities
+
+ When applicable, recovery resources are provisioned, for both
+ protection and restoration, using GMPLS signaling capabilities.
+ Thus, these are control plane-driven actions (topological and
+ resource-constrained) that are always performed in this context.
+
+ The following tables give an overview of the responsibilities taken
+ by the control plane in case of LSP/span recovery:
+
+
+
+
+Papadimitriou & Mannie Informational [Page 11]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ 1. LSP/span Protection
+
+ - Phase 1: Failure Detection Transport plane
+ - Phase 2: Failure Localization/Isolation Transport/Control plane
+ - Phase 3: Failure Notification Transport/Control plane
+ - Phase 4: Protection Switching Transport/Control plane
+ - Phase 5: Reversion (Normalization) Transport/Control plane
+
+ Note: in the context of LSP/span protection, control plane actions
+ can be performed either for operational purposes and/or
+ synchronization purposes (vertical synchronization between transport
+ and control plane) and/or notification purposes (horizontal
+ synchronization between end-nodes at control plane level). This
+ suggests the selection of the responsible plane (in particular for
+ protection switching) during the provisioning phase of the
+ protected/protection LSP.
+
+ 2. LSP/span Restoration
+
+ - Phase 1: Failure Detection Transport plane
+ - Phase 2: Failure Localization/Isolation Transport/Control plane
+ - Phase 3: Failure Notification Control plane
+ - Phase 4: Recovery Switching Control plane
+ - Phase 5: Reversion (Normalization) Control plane
+
+ Therefore, this document primarily focuses on provisioning of LSP
+ recovery resources, failure notification mechanisms, recovery
+ switching, and reversion operations. Moreover, some additional
+ considerations can be dedicated to the mechanisms associated to the
+ failure localization/isolation phase.
+
+5.2. Technology-Independent and Technology-Dependent Mechanisms
+
+ The present recovery mechanisms analysis applies to any circuit-
+ oriented data plane technology with discrete bandwidth increments
+ (like SONET/SDH, G.709 OTN, etc.) being controlled by a GMPLS-based
+ distributed control plane.
+
+ The following sub-sections are not intended to favor one technology
+ versus another. They list pro and cons for each technology in order
+ to determine the mechanisms that GMPLS-based recovery must deliver to
+ overcome their cons and make use of their pros in their respective
+ applicability context.
+
+5.2.1. OTN Recovery
+
+ OTN recovery specifics are left for further consideration.
+
+
+
+
+Papadimitriou & Mannie Informational [Page 12]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+5.2.2. Pre-OTN Recovery
+
+ Pre-OTN recovery specifics (also referred to as "lambda switching")
+ present mainly the following advantages:
+
+ - They benefit from a simpler architecture, making it more suitable
+ for mesh-based recovery types and schemes (on a per-channel basis).
+
+ - Failure suppression at intermediate node transponders, e.g., use of
+ squelching, implies that failures (such as LoL) will propagate to
+ edge nodes. Thus, edge nodes will have the possibility to initiate
+ recovery actions driven by upper layers (vs. use of non-standard
+ masking of upstream failures).
+
+ The main disadvantage is the lack of interworking due to the large
+ amount of failure management (in particular failure notification
+ protocols) and recovery mechanisms currently available.
+
+ Note also, that for all-optical networks, combination of recovery
+ with optical physical impairments is left for a future release of
+ this document because corresponding detection technologies are under
+ specification.
+
+5.2.3. SONET/SDH Recovery
+
+ Some of the advantages of SONET [T1.105]/SDH [G.707], and more
+ generically any Time Division Multiplexing (TDM) transport plane
+ recovery, are that they provide:
+
+ - Protection types operating at the data plane level that are
+ standardized (see [G.841]) and can operate across protected domains
+ and interwork (see [G.842]).
+
+ - Failure detection, notification, and path/section Automatic
+ Protection Switching (APS) mechanisms.
+
+ - Greater control over the granularity of the TDM LSPs/links that can
+ be recovered with respect to coarser optical channel (or whole
+ fiber content) recovery switching
+
+ Some of the limitations of the SONET/SDH recovery are:
+
+ - Limited topological scope: Inherently the use of ring topologies,
+ typically, dedicated Sub-Network Connection Protection (SNCP) or
+ shared protection rings, has reduced flexibility and resource
+ efficiency with respect to the (somewhat more complex) meshed
+ recovery.
+
+
+
+
+Papadimitriou & Mannie Informational [Page 13]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ - Inefficient use of spare capacity: SONET/SDH protection is largely
+ applied to ring topologies, where spare capacity often remains
+ idle, making the efficiency of bandwidth usage a real issue.
+
+ - Support of meshed recovery requires intensive network management
+ development, and the functionality is limited by both the network
+ elements and the capabilities of the element management systems
+ (thus justifying the development of GMPLS-based distributed
+ recovery mechanisms.)
+
+5.3. Specific Aspects of Control Plane-Based Recovery Mechanisms
+
+5.3.1. In-Band vs. Out-Of-Band Signaling
+
+ The nodes communicate through the use of IP terminating control
+ channels defining the control plane (transport) topology. In this
+ context, two classes of transport mechanisms can be considered here:
+ in-fiber or out-of-fiber (through a dedicated physically diverse
+ control network referred to as the Data Communication Network or
+ DCN). The potential impact of the usage of an in-fiber (signaling)
+ transport mechanism is briefly considered here.
+
+ In-fiber transport mechanisms can be further subdivided into in-band
+ and out-of-band. As such, the distinction between in-fiber in-band
+ and in-fiber out-of-band signaling reduces to the consideration of a
+ logically- versus physically-embedded control plane topology with
+ respect to the transport plane topology. In the scope of this
+ document, it is assumed that at least one IP control channel between
+ each pair of adjacent nodes is continuously available to enable the
+ exchange of recovery-related information and messages. Thus, in
+ either case (i.e., in-band or out-of-band) at least one logical or
+ physical control channel between each pair of nodes is always
+ expected to be available.
+
+ Therefore, the key issue when using in-fiber signaling is whether one
+ can assume independence between the fault-tolerance capabilities of
+ control plane and the failures affecting the transport plane
+ (including the nodes). Note also that existing specifications like
+ the OTN provide a limited form of independence for in-fiber signaling
+ by dedicating a separate optical supervisory channel (OSC, see
+ [G.709] and [G.874]) to transport the overhead and other control
+ traffic. For OTNs, failure of the OSC does not result in failing the
+ optical channels. Similarly, loss of the control channel must not
+ result in failing the data channels (transport plane).
+
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 14]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+5.3.2. Uni- vs. Bi-Directional Failures
+
+ The failure detection, correlation, and notification mechanisms
+ (described in Section 4) can be triggered when either a uni-
+ directional or a bi-directional LSP/Span failure occurs (or a
+ combination of both). As illustrated in Figures 1 and 2, two
+ alternatives can be considered here:
+
+ 1. Uni-directional failure detection: the failure is detected on the
+ receiver side, i.e., it is detected by only the downstream node to
+ the failure (or by the upstream node depending on the failure
+ propagation direction, respectively).
+
+ 2. Bi-directional failure detection: the failure is detected on the
+ receiver side of both downstream node AND upstream node to the
+ failure.
+
+ Notice that after the failure detection time, if only control-plane-
+ based failure management is provided, the peering node is unaware of
+ the failure detection status of its neighbor.
+
+ ------- ------- ------- -------
+ | | | |Tx Rx| | | |
+ | NodeA |----...----| NodeB |xxxxxxxxx| NodeC |----...----| NodeD |
+ | |----...----| |---------| |----...----| |
+ ------- ------- ------- -------
+
+ t0 >>>>>>> F
+
+ t1 x <---------------x
+ Notification
+ t2 <--------...--------x x--------...-------->
+ Up Notification Down Notification
+
+ Figure 1: Uni-directional failure detection
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 15]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ ------- ------- ------- -------
+ | | | |Tx Rx| | | |
+ | NodeA |----...----| NodeB |xxxxxxxxx| NodeC |----...----| NodeD |
+ | |----...----| |xxxxxxxxx| |----...----| |
+ ------- ------- ------- -------
+
+ t0 F <<<<<<< >>>>>>> F
+
+ t1 x <-------------> x
+ Notification
+ t2 <--------...--------x x--------...-------->
+ Up Notification Down Notification
+
+ Figure 2: Bi-directional failure detection
+
+ After failure detection, the following failure management operations
+ can be subsequently considered:
+
+ - Each detecting entity sends a notification message to the
+ corresponding transmitting entity. For instance, in Figure 1, node
+ C sends a notification message to node B. In Figure 2, node C
+ sends a notification message to node B while node B sends a
+ notification message to node C. To ensure reliable failure
+ notification, a dedicated acknowledgement message can be returned
+ back to the sender node.
+
+ - Next, within a certain (and pre-determined) time window, nodes
+ impacted by the failure occurrences may perform their correlation.
+ In case of uni-directional failure, node B only receives the
+ notification message from node C, and thus the time for this
+ operation is negligible. In case of bi-directional failure, node B
+ has to correlate the received notification message from node C with
+ the corresponding locally detected information (and node C has to
+ do the same with the message from node B).
+
+ - After some (pre-determined) period of time, referred to as the
+ hold-off time, if the local recovery actions (see Section 5.3.4)
+ were not successful, the following occurs. In case of uni-
+ directional failure and depending on the directionality of the LSP,
+ node B should send an upstream notification message (see [RFC3473])
+ to the ingress node A. Node C may send a downstream notification
+ message (see [RFC3473]) to the egress node D. However, in that
+ case, only node A would initiate an edge to edge recovery action.
+ Node A is referred to as the "master", and node D is referred to as
+ the "slave", per [RFC4427]. Note that the other LSP end-node (node
+ D in this case) may be optionally notified using a downstream
+ notification message (see [RFC3473]).
+
+
+
+
+Papadimitriou & Mannie Informational [Page 16]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ In case of bi-directional failure, node B should send an upstream
+ notification message (see [RFC3473]) to the ingress node A. Node C
+ may send a downstream notification message (see [RFC3473]) to the
+ egress node D. However, due to the dependence on the LSP
+ directionality, only ingress node A would initiate an edge-to-edge
+ recovery action. Note that the other LSP end-node (node D in this
+ case) should also be notified of this event using a downstream
+ notification message (see [RFC3473]). For instance, if an LSP
+ directed from D to A is under failure condition, only the
+ notification message sent from node C to D would initiate a
+ recovery action. In this case, per [RFC4427], the deciding and
+ recovering node D is referred to as the "master", while node A is
+ referred to as the "slave" (i.e., recovering only entity).
+
+ Note: The determination of the master and the slave may be based
+ either on configured information or dedicated protocol capability.
+
+ In the above scenarios, the path followed by the upstream and
+ downstream notification messages does not have to be the same as the
+ one followed by the failed LSP (see [RFC3473] for more details on the
+ notification message exchange). The important point concerning this
+ mechanism is that either the detecting/reporting entity (i.e., nodes
+ B and C) is also the deciding/recovery entity or the
+ detecting/reporting entity is simply an intermediate node in the
+ subsequent recovery process. One refers to local recovery in the
+ former case, and to edge-to-edge recovery in the latter one (see also
+ Section 5.3.4).
+
+5.3.3. Partial vs. Full Span Recovery
+
+ When a given span carries more than one LSP or LSP segment, an
+ additional aspect must be considered. In case of span failure, the
+ LSPs it carries can be recovered individually, as a group (aka bulk
+ LSP recovery), or as independent sub-groups. When correlation time
+ windows are used and simultaneous recovery of several LSPs can be
+ performed using a single request, the selection of this mechanism
+ would be triggered independently of the failure notification
+ granularity. Moreover, criteria for forming such sub-groups are
+ outside of the scope of this document.
+
+ Additional complexity arises in the case of (sub-)group LSP recovery.
+ Between a given pair of nodes, the LSPs that a given (sub-)group
+ contains may have been created from different source nodes (i.e.,
+ initiator) and directed toward different destination nodes.
+ Consequently the failure notification messages following a bi-
+ directional span failure that affects several LSPs (or the whole
+ group of LSPs it carries) are not necessarily directed toward the
+ same initiator nodes. In particular, these messages may be directed
+
+
+
+Papadimitriou & Mannie Informational [Page 17]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ to both the upstream and downstream nodes to the failure. Therefore,
+ such span failure may trigger recovery actions to be performed from
+ both sides (i.e., from both the upstream and the downstream nodes to
+ the failure). In order to facilitate the definition of the
+ corresponding recovery mechanisms (and their sequence), one assumes
+ here as well that, per [RFC4427], the deciding (and recovering)
+ entity (referred to as the "master") is the only initiator of the
+ recovery of the whole LSP (sub-)group.
+
+5.3.4. Difference between LSP, LSP Segment and Span Recovery
+
+ The recovery definitions given in [RFC4427] are quite generic and
+ apply for link (or local span) and LSP recovery. The major
+ difference between LSP, LSP Segment and span recovery is related to
+ the number of intermediate nodes that the signaling messages have to
+ travel. Since nodes are not necessarily adjacent in the case of LSP
+ (or LSP Segment) recovery, signaling message exchanges from the
+ reporting to the deciding/recovery entity may have to cross several
+ intermediate nodes. In particular, this applies to the notification
+ messages due to the number of hops separating the location of a
+ failure occurrence from its destination. This results in an
+ additional propagation and forwarding delay. Note that the former
+ delay may in certain circumstances be non-negligible; e.g., in a
+ copper out-of-band network, the delay is approximately 1 ms per
+ 200km.
+
+ Moreover, the recovery mechanisms applicable to end-to-end LSPs and
+ to the segments that may compose an end-to-end LSP (i.e., edge-to-
+ edge recovery) can be exactly the same. However, one expects in the
+ latter case, that the destination of the failure notification message
+ will be the ingress/egress of each of these segments. Therefore,
+ using the mechanisms described in Section 5.3.2, failure notification
+ messages can be exchanged first between terminating points of the LSP
+ segment, and after expiration of the hold-off time, between
+ terminating points of the end-to-end LSP.
+
+ Note: Several studies provide quantitative analysis of the relative
+ performance of LSP/span recovery techniques. [WANG] for instance,
+ provides an analysis grid for these techniques showing that dynamic
+ LSP restoration (see Section 5.5.2) performs well under medium
+ network loads, but suffers performance degradations at higher loads
+ due to greater contention for recovery resources. LSP restoration
+ upon span failure, as defined in [WANG], degrades at higher loads
+ because paths around failed links tend to increase the hop count of
+ the affected LSPs and thus consume additional network resources.
+ Also, performance of LSP restoration can be enhanced by a failed
+ working LSP's source node that initiates a new recovery attempt if an
+ initial attempt fails. A single retry attempt is sufficient to
+
+
+
+Papadimitriou & Mannie Informational [Page 18]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ produce large increases in the restoration success rate and ability
+ to initiate successful LSP restoration attempts, especially at high
+ loads, while not adding significantly to the long-term average
+ recovery time. Allowing additional attempts produces only small
+ additional gains in performance. This suggests using additional
+ (intermediate) crankback signaling when using dynamic LSP restoration
+ (described in Section 5.5.2 - case 2). Details on crankback
+ signaling are outside the scope of this document.
+
+5.4. Difference between Recovery Type and Scheme
+
+ [RFC4427] defines the basic LSP/span recovery types. This section
+ describes the recovery schemes that can be built using these recovery
+ types. In brief, a recovery scheme is defined as the combination of
+ several ingress-egress node pairs supporting a given recovery type
+ (from the set of the recovery types they allow). Several examples
+ are provided here to illustrate the difference between recovery types
+ such as 1:1 or M:N, and recovery schemes such as (1:1)^n or (M:N)^n
+ (referred to as shared-mesh recovery).
+
+ 1. (1:1)^n with recovery resource sharing
+
+ The exponent, n, indicates the number of times a 1:1 recovery type is
+ applied between at most n different ingress-egress node pairs. Here,
+ at most n pairs of disjoint working and recovery LSPs/spans share a
+ common resource at most n times. Since the working LSPs/spans are
+ mutually disjoint, simultaneous requests for use of the shared
+ (common) resource will only occur in case of simultaneous failures,
+ which are less likely to happen.
+
+ For instance, in the common (1:1)^2 case, if the 2 recovery LSPs in
+ the group overlap the same common resource, then it can handle only
+ single failures; any multiple working LSP failures will cause at
+ least one working LSP to be denied automatic recovery. Consider for
+ instance the following topology with the working LSPs A-B-C and F-G-H
+ and their respective recovery LSPs A-D-E-C and F-D-E-H that share a
+ common D-E link resource.
+
+ A---------B---------C
+ \ /
+ \ /
+ D-------------E
+ / \
+ / \
+ F---------G---------H
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 19]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ 2. (M:N)^n with recovery resource sharing
+
+ The (M:N)^n scheme is documented here for the sake of completeness
+ only (i.e., it is not mandated that GMPLS capabilities support this
+ scheme). The exponent, n, indicates the number of times an M:N
+ recovery type is applied between at most n different ingress-egress
+ node pairs. So the interpretation follows from the previous case,
+ except that here disjointness applies to the N working LSPs/spans and
+ to the M recovery LSPs/spans while sharing at most n times M common
+ resources.
+
+ In both schemes, it results in a "group" of sum{n=1}^N N{n} working
+ LSPs and a pool of shared recovery resources, not all of which are
+ available to any given working LSP. In such conditions, defining a
+ metric that describes the amount of overlap among the recovery LSPs
+ would give some indication of the group's ability to handle
+ simultaneous failures of multiple LSPs.
+
+ For instance, in the simple (1:1)^n case, if n recovery LSPs in a
+ (1:1)^n group overlap, then the group can handle only single
+ failures; any simultaneous failure of multiple working LSPs will
+ cause at least one working LSP to be denied automatic recovery. But
+ if one considers, for instance, a (2:2)^2 group in which there are
+ two pairs of overlapping recovery LSPs, then two LSPs (belonging to
+ the same pair) can be simultaneously recovered. The latter case can
+ be illustrated by the following topology with 2 pairs of working LSPs
+ A-B-C and F-G-H and their respective recovery LSPs A-D-E-C and
+ F-D-E-H that share two common D-E link resources.
+
+ A========B========C
+ \\ //
+ \\ //
+ D =========== E
+ // \\
+ // \\
+ F========G========H
+
+ Moreover, in all these schemes, (working) path disjointness can be
+ enforced by exchanging information related to working LSPs during the
+ recovery LSP signaling. Specific issues related to the combination
+ of shared (discrete) bandwidth and disjointness for recovery schemes
+ are described in Section 8.4.2.
+
+
+
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 20]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+5.5. LSP Recovery Mechanisms
+
+5.5.1. Classification
+
+ The recovery time and ratio of LSPs/spans depend on proper recovery
+ LSP provisioning (meaning pre-provisioning when performed before
+ failure occurrence) and the level of overbooking of recovery
+ resources (i.e., over-provisioning). A proper balance of these two
+ operations will result in the desired LSP/span recovery time and
+ ratio when single or multiple failures occur. Note also that these
+ operations are mostly performed during the network planning phases.
+
+ The different options for LSP (pre-)provisioning and overbooking are
+ classified below to structure the analysis of the different recovery
+ mechanisms.
+
+ 1. Pre-Provisioning
+
+ Proper recovery LSP pre-provisioning will help to alleviate the
+ failure of the working LSPs (due to the failure of the resources that
+ carry these LSPs). As an example, one may compute and establish the
+ recovery LSP either end-to-end or segment-per-segment, to protect a
+ working LSP from multiple failure events affecting link(s), node(s)
+ and/or SRLG(s). The recovery LSP pre-provisioning options are
+ classified as follows in the figure below:
+
+ (1) The recovery path can be either pre-computed or computed on-
+ demand.
+
+ (2) When the recovery path is pre-computed, it can be either pre-
+ signaled (implying recovery resource reservation) or signaled
+ on-demand.
+
+ (3) When the recovery resources are pre-signaled, they can be either
+ pre-selected or selected on-demand.
+
+ Recovery LSP provisioning phases:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 21]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ (1) Path Computation --> On-demand
+ |
+ |
+ --> Pre-Computed
+ |
+ |
+ (2) Signaling --> On-demand
+ |
+ |
+ --> Pre-Signaled
+ |
+ |
+ (3) Resource Selection --> On-demand
+ |
+ |
+ --> Pre-Selected
+
+ Note that these different options lead to different LSP/span recovery
+ times. The following sections will consider the above-mentioned
+ pre-provisioning options when analyzing the different recovery
+ mechanisms.
+
+ 2. Overbooking
+
+ There are many mechanisms available that allow the overbooking of the
+ recovery resources. This overbooking can be done per LSP (as in the
+ example mentioned above), per link (such as span protection), or even
+ per domain. In all these cases, the level of overbooking, as shown
+ in the below figure, can be classified as dedicated (such as 1+1 and
+ 1:1), shared (such as 1:N and M:N), or unprotected (and thus
+ restorable, if enough recovery resources are available).
+
+ Overbooking levels:
+
+ +----- Dedicated (for instance: 1+1, 1:1, etc.)
+ |
+ |
+
+ +----- Shared (for instance: 1:N, M:N, etc.)
+ |
+ Level of |
+ Overbooking -----+----- Unprotected (for instance: 0:1, 0:N)
+
+ Also, when using shared recovery, one may support preemptible extra-
+ traffic; the recovery mechanism is then expected to allow preemption
+ of this low priority traffic in case of recovery resource contention
+ during recovery operations. The following sections will consider the
+
+
+
+
+Papadimitriou & Mannie Informational [Page 22]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ above-mentioned overbooking options when analyzing the different
+ recovery mechanisms.
+
+5.5.2. LSP Restoration
+
+ The following times are defined to provide a quantitative estimation
+ about the time performance of the different LSP restoration
+ mechanisms (also referred to as LSP re-routing):
+
+ - Path Computation Time: Tc
+ - Path Selection Time: Ts
+ - End-to-end LSP Resource Reservation Time: Tr (a delta for resource
+ selection is also considered, the corresponding total time is then
+ referred to as Trs)
+ - End-to-end LSP Resource Activation Time: Ta (a delta for
+ resource selection is also considered, the corresponding total
+ time is then referred to as Tas)
+
+ The Path Selection Time (Ts) is considered when a pool of recovery
+ LSP paths between a given pair of source/destination end-points is
+ pre-computed, and after a failure occurrence one of these paths is
+ selected for the recovery of the LSP under failure condition.
+
+ Note: failure management operations such as failure detection,
+ correlation, and notification are considered (for a given failure
+ event) as equally time-consuming for all the mechanisms described
+ below:
+
+ 1. With Route Pre-computation (or LSP re-provisioning)
+
+ An end-to-end restoration LSP is established after the failure(s)
+ occur(s) based on a pre-computed path. As such, one can define this
+ as an "LSP re-provisioning" mechanism. Here, one or more (disjoint)
+ paths for the restoration LSP are computed (and optionally pre-
+ selected) before a failure occurs.
+
+ No reservation or selection of resources is performed along the
+ restoration path before failure occurrence. As a result, there is no
+ guarantee that a restoration LSP is available when a failure occurs.
+
+ The expected total restoration time T is thus equal to Ts + Trs or to
+ Trs when a dedicated computation is performed for each working LSP.
+
+ 2. Without Route Pre-computation (or Full LSP re-routing)
+
+ An end-to-end restoration LSP is dynamically established after the
+ failure(s) occur(s). After failure occurrence, one or more
+ (disjoint) paths for the restoration LSP are dynamically computed and
+
+
+
+Papadimitriou & Mannie Informational [Page 23]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ one is selected. As such, one can define this as a complete "LSP
+ re-routing" mechanism.
+
+ No reservation or selection of resources is performed along the
+ restoration path before failure occurrence. As a result, there is no
+ guarantee that a restoration LSP is available when a failure occurs.
+
+ The expected total restoration time T is thus equal to Tc (+ Ts) +
+ Trs. Therefore, time performance between these two approaches
+ differs by the time required for route computation Tc (and its
+ potential selection time, Ts).
+
+5.5.3. Pre-Planned LSP Restoration
+
+ Pre-planned LSP restoration (also referred to as pre-planned LSP re-
+ routing) implies that the restoration LSP is pre-signaled. This in
+ turn implies the reservation of recovery resources along the
+ restoration path. Two cases can be defined based on whether the
+ recovery resources are pre-selected.
+
+ 1. With resource reservation and without resource pre-selection
+
+ Before failure occurrence, an end-to-end restoration path is pre-
+ selected from a set of pre-computed (disjoint) paths. The
+ restoration LSP is signaled along this pre-selected path to reserve
+ resources at each node, but these resources are not selected.
+
+ In this case, the resources reserved for each restoration LSP may be
+ dedicated or shared between multiple restoration LSPs whose working
+ LSPs are not expected to fail simultaneously. Local node policies
+ can be applied to define the degree to which these resources can be
+ shared across independent failures. Also, since a restoration scheme
+ is considered, resource sharing should not be limited to restoration
+ LSPs that start and end at the same ingress and egress nodes.
+ Therefore, each node participating in this scheme is expected to
+ receive some feedback information on the sharing degree of the
+ recovery resource(s) that this scheme involves.
+
+ Upon failure detection/notification message reception, signaling is
+ initiated along the restoration path to select the resources, and to
+ perform the appropriate operation at each node crossed by the
+ restoration LSP (e.g., cross-connections). If lower priority LSPs
+ were established using the restoration resources, they must be
+ preempted when the restoration LSP is activated.
+
+ Thus, the expected total restoration time T is equal to Tas (post-
+ failure activation), while operations performed before failure
+ occurrence take Tc + Ts + Tr.
+
+
+
+Papadimitriou & Mannie Informational [Page 24]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ 2. With both resource reservation and resource pre-selection
+
+ Before failure occurrence, an end-to-end restoration path is pre-
+ selected from a set of pre-computed (disjoint) paths. The
+ restoration LSP is signaled along this pre-selected path to reserve
+ AND select resources at each node, but these resources are not
+ committed at the data plane level. So that the selection of the
+ recovery resources is committed at the control plane level only, no
+ cross-connections are performed along the restoration path.
+
+ In this case, the resources reserved and selected for each
+ restoration LSP may be dedicated or even shared between multiple
+ restoration LSPs whose associated working LSPs are not expected to
+ fail simultaneously. Local node policies can be applied to define
+ the degree to which these resources can be shared across independent
+ failures. Also, because a restoration scheme is considered, resource
+ sharing should not be limited to restoration LSPs that start and end
+ at the same ingress and egress nodes. Therefore, each node
+ participating in this scheme is expected to receive some feedback
+ information on the sharing degree of the recovery resource(s) that
+ this scheme involves.
+
+ Upon failure detection/notification message reception, signaling is
+ initiated along the restoration path to activate the reserved and
+ selected resources, and to perform the appropriate operation at each
+ node crossed by the restoration LSP (e.g., cross-connections). If
+ lower priority LSPs were established using the restoration resources,
+ they must be preempted when the restoration LSP is activated.
+
+ Thus, the expected total restoration time T is equal to Ta (post-
+ failure activation), while operations performed before failure
+ occurrence take Tc + Ts + Trs. Therefore, time performance between
+ these two approaches differs only by the time required for resource
+ selection during the activation of the recovery LSP (i.e., Tas - Ta).
+
+5.5.4. LSP Segment Restoration
+
+ The above approaches can be applied on an edge-to-edge LSP basis
+ rather than end-to-end LSP basis (i.e., to reduce the global recovery
+ time) by allowing the recovery of the individual LSP segments
+ constituting the end-to-end LSP.
+
+ Also, by using the horizontal hierarchy approach described in Section
+ 7.1, an end-to-end LSP can be recovered by multiple recovery
+ mechanisms applied on an LSP segment basis (e.g., 1:1 edge-to-edge
+ LSP protection in a metro network, and M:N edge-to-edge protection in
+ the core). These mechanisms are ideally independent and may even use
+ different failure localization and notification mechanisms.
+
+
+
+Papadimitriou & Mannie Informational [Page 25]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+6. Reversion
+
+ Reversion (a.k.a. normalization) is defined as the mechanism allowing
+ switching of normal traffic from the recovery LSP/span to the working
+ LSP/span previously under failure condition. Use of normalization is
+ at the discretion of the recovery domain policy. Normalization may
+ impact the normal traffic (a second hit) depending on the
+ normalization mechanism used.
+
+ If normalization is supported, then 1) the LSP/span must be returned
+ to the working LSP/span when the failure condition clears and 2) the
+ capability to de-activate (turn-off) the use of reversion should be
+ provided. De-activation of reversion should not impact the normal
+ traffic, regardless of whether it is currently using the working or
+ recovery LSP/span.
+
+ Note: during the failure, the reuse of any non-failed resources
+ (e.g., LSP and/or spans) belonging to the working LSP/span is under
+ the discretion of recovery domain policy.
+
+6.1. Wait-To-Restore (WTR)
+
+ A specific mechanism (Wait-To-Restore) is used to prevent frequent
+ recovery switching operations due to an intermittent defect (e.g.,
+ Bit Error Rate (BER) fluctuating around the SD threshold).
+
+ First, an LSP/span under failure condition must become fault-free,
+ e.g., a BER less than a certain recovery threshold. After the
+ recovered LSP/span (i.e., the previously working LSP/span) meets this
+ criterion, a fixed period of time shall elapse before normal traffic
+ uses the corresponding resources again. This duration called Wait-
+ To-Restore (WTR) period or timer is generally on the order of a few
+ minutes (for instance, 5 minutes) and should be capable of being set.
+ The WTR timer may be either a fixed period, or provide for
+ incrementally longer periods before retrying. An SF or SD condition
+ on the previously working LSP/span will override the WTR timer value
+ (i.e., the WTR cancels and the WTR timer will restart).
+
+6.2. Revertive Mode Operation
+
+ In revertive mode of operation, when the recovery LSP/span is no
+ longer required, i.e., the failed working LSP/span is no longer in SD
+ or SF condition, a local Wait-to-Restore (WTR) state will be
+ activated before switching the normal traffic back to the recovered
+ working LSP/span.
+
+ During the reversion operation, since this state becomes the highest
+ in priority, signaling must maintain the normal traffic on the
+
+
+
+Papadimitriou & Mannie Informational [Page 26]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ recovery LSP/span from the previously failed working LSP/span.
+ Moreover, during this WTR state, any null traffic or extra traffic
+ (if applicable) request is rejected.
+
+ However, deactivation (cancellation) of the wait-to-restore timer may
+ occur if there are higher priority request attempts. That is, the
+ recovery LSP/span usage by the normal traffic may be preempted if a
+ higher priority request for this recovery LSP/span is attempted.
+
+6.3. Orphans
+
+ When a reversion operation is requested, normal traffic must be
+ switched from the recovery to the recovered working LSP/span. A
+ particular situation occurs when the previously working LSP/span
+ cannot be recovered, so normal traffic cannot be switched back. In
+ that case, the LSP/span under failure condition (also referred to as
+ "orphan") must be cleared (i.e., removed) from the pool of resources
+ allocated for normal traffic. Otherwise, potential de-
+ synchronization between the control and transport plane resource
+ usage can appear. Depending on the signaling protocol capabilities
+ and behavior, different mechanisms are expected here.
+
+ Therefore, any reserved or allocated resources for the LSP/span under
+ failure condition must be unreserved/de-allocated. Several ways can
+ be used for that purpose: wait for the clear-out time interval to
+ elapse, initiate a deletion from the ingress or the egress node, or
+ trigger the initiation of deletion from an entity (such as an EMS or
+ NMS) capable of reacting upon reception of an appropriate
+ notification message.
+
+7. Hierarchies
+
+ Recovery mechanisms are being made available at multiple (if not all)
+ transport layers within so-called "IP/MPLS-over-optical" networks.
+ However, each layer has certain recovery features, and one needs to
+ determine the exact impact of the interaction between the recovery
+ mechanisms provided by these layers.
+
+ Hierarchies are used to build scalable complex systems. By hiding
+ the internal details, abstraction is used as a mechanism to build
+ large networks or as a technique for enforcing technology,
+ topological, or administrative boundaries. The same hierarchical
+ concept can be applied to control the network survivability. Network
+ survivability is the set of capabilities that allow a network to
+ restore affected traffic in the event of a failure. Network
+ survivability is defined further in [RFC4427]. In general, it is
+ expected that the recovery action is taken by the recoverable
+ LSP/span closest to the failure in order to avoid the multiplication
+
+
+
+Papadimitriou & Mannie Informational [Page 27]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ of recovery actions. Moreover, recovery hierarchies also can be
+ bound to control plane logical partitions (e.g., administrative or
+ topological boundaries). Each logical partition may apply different
+ recovery mechanisms.
+
+ In brief, it is commonly accepted that the lower layers can provide
+ coarse but faster recovery while the higher layers can provide finer
+ but slower recovery. Moreover, it is also desirable to avoid similar
+ layers with functional overlaps in order to optimize network resource
+ utilization and processing overhead, since repeating the same
+ capabilities at each layer does not create any added value for the
+ network as a whole. In addition, even if a lower layer recovery
+ mechanism is enabled, it does not prevent the additional provision of
+ a recovery mechanism at the upper layer. The inverse statement does
+ not necessarily hold; that is, enabling an upper layer recovery
+ mechanism may prevent the use of a lower layer recovery mechanism.
+ In this context, this section analyzes these hierarchical aspects
+ including the physical (passive) layer(s).
+
+7.1. Horizontal Hierarchy (Partitioning)
+
+ A horizontal hierarchy is defined when partitioning a single-layer
+ network (and its control plane) into several recovery domains.
+ Within a domain, the recovery scope may extend over a link (or span),
+ LSP segment, or even an end-to-end LSP. Moreover, an administrative
+ domain may consist of a single recovery domain or can be partitioned
+ into several smaller recovery domains. The operator can partition
+ the network into recovery domains based on physical network topology,
+ control plane capabilities, or various traffic engineering
+ constraints.
+
+ An example often addressed in the literature is the metro-core-metro
+ application (sometimes extended to a metro-metro/core-core) within a
+ single transport layer (see Section 7.2). For such a case, an end-
+ to-end LSP is defined between the ingress and egress metro nodes,
+ while LSP segments may be defined within the metro or core sub-
+ networks. Each of these topological structures determines a so-
+ called "recovery domain" since each of the LSPs they carry can have
+ its own recovery type (or even scheme). The support of multiple
+ recovery types and schemes within a sub-network is referred to as a
+ "multi-recovery capable domain" or simply "multi-recovery domain".
+
+7.2. Vertical Hierarchy (Layers)
+
+ It is very challenging to combine the different recovery capabilities
+ available across the path (i.e., switching capable) and section
+ layers to ensure that certain network survivability objectives are
+ met for the network-supported services.
+
+
+
+Papadimitriou & Mannie Informational [Page 28]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ As a first analysis step, one can draw the following guidelines for
+ a vertical coordination of the recovery mechanisms:
+
+ - The lower the layer, the faster the notification and switching.
+
+ - The higher the layer, the finer the granularity of the recoverable
+ entity and therefore the granularity of the recovery resource.
+
+ Moreover, in the context of this analysis, a vertical hierarchy
+ consists of multiple layered transport planes providing different:
+
+ - Discrete bandwidth granularities for non-packet LSPs such as OCh,
+ ODUk, STS_SPE/HOVC, and VT_SPE/LOVC LSPs and continuous bandwidth
+ granularities for packet LSPs.
+
+ - Potential recovery capabilities with different temporal
+ granularities: ranging from milliseconds to tens of seconds
+
+ Note: based on the bandwidth granularity, we can determine four
+ classes of vertical hierarchies: (1) packet over packet, (2) packet
+ over circuit, (3) circuit over packet, and (4) circuit over circuit.
+ Below we briefly expand on (4) only. (2) is covered in [RFC3386]. (1)
+ is extensively covered by the MPLS Working Group, and (3) by the PWE3
+ Working Group.
+
+ In SONET/SDH environments, one typically considers the VT_SPE/LOVC
+ and STS SPE/HOVC as independent layers (for example, VT_SPE/LOVC LSP
+ uses the underlying STS_SPE/HOVC LSPs as links). In OTN, the ODUk
+ path layers will lie on the OCh path layer, i.e., the ODUk LSPs use
+ the underlying OCh LSPs as OTUk links. Note here that lower layer
+ LSPs may simply be provisioned and not necessarily dynamically
+ triggered or established (control driven approach). In this context,
+ an LSP at the path layer (i.e., established using GMPLS signaling),
+ such as an optical channel LSP, appears at the OTUk layer as a link,
+ controlled by a link management protocol such as LMP.
+
+ The first key issue with multi-layer recovery is that achieving
+ individual or bulk LSP recovery will be as efficient as the
+ underlying link (local span) recovery. In such a case, the span can
+ be either protected or unprotected, but the LSP it carries must be
+ (at least locally) recoverable. Therefore, the span recovery process
+ can be either independent when protected (or restorable), or
+ triggered by the upper LSP recovery process. The former case
+ requires coordination to achieve subsequent LSP recovery. Therefore,
+ in order to achieve robustness and fast convergence, multi-layer
+ recovery requires a fine-tuned coordination mechanism.
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 29]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ Moreover, in the absence of adequate recovery mechanism coordination
+ (for instance, a pre-determined coordination when using a hold-off
+ timer), a failure notification may propagate from one layer to the
+ next one within a recovery hierarchy. This can cause "collisions"
+ and trigger simultaneous recovery actions that may lead to race
+ conditions and, in turn, reduce the optimization of the resource
+ utilization and/or generate global instabilities in the network (see
+ [MANCHESTER]). Therefore, a consistent and efficient escalation
+ strategy is needed to coordinate recovery across several layers.
+
+ One can expect that the definition of the recovery mechanisms and
+ protocol(s) is technology-independent so that they can be
+ consistently implemented at different layers; this would in turn
+ simplify their global coordination. Moreover, as mentioned in
+ [RFC3386], some looser form of coordination and communication between
+ (vertical) layers such as a consistent hold-off timer configuration
+ (and setup through signaling during the working LSP establishment)
+ can be considered, thereby allowing the synchronization between
+ recovery actions performed across these layers.
+
+7.2.1. Recovery Granularity
+
+ In most environments, the design of the network and the vertical
+ distribution of the LSP bandwidth are such that the recovery
+ granularity is finer at higher layers. The OTN and SONET/SDH layers
+ can recover only the whole section or the individual connections they
+ transports whereas the IP/MPLS control plane can recover individual
+ packet LSPs or groups of packet LSPs independently of their
+ granularity. On the other side, the recovery granularity at the
+ sub-wavelength level (i.e., SONET/SDH) can be provided only when the
+ network includes devices switching at the same granularity (and thus
+ not with optical channel level). Therefore, the network layer can
+ deliver control-plane-driven recovery mechanisms on a per-LSP basis
+ if and only if these LSPs have their corresponding switching
+ granularity supported at the transport plane level.
+
+7.3. Escalation Strategies
+
+ There are two types of escalation strategies (see [DEMEESTER]):
+ bottom-up and top-down.
+
+ The bottom-up approach assumes that lower layer recovery types and
+ schemes are more expedient and faster than upper layer ones.
+ Therefore, we can inhibit or hold off higher layer recovery.
+ However, this assumption is not entirely true. Consider for instance
+ a SONET/SDH based protection mechanism (with a protection switching
+ time of less than 50 ms) lying on top of an OTN restoration mechanism
+ (with a restoration time of less than 200 ms). Therefore, this
+
+
+
+Papadimitriou & Mannie Informational [Page 30]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ assumption should be (at least) clarified as: the lower layer
+ recovery mechanism is expected to be faster than the upper level one,
+ if the same type of recovery mechanism is used at each layer.
+
+ Consequently, taking into account the recovery actions at the
+ different layers in a bottom-up approach: if lower layer recovery
+ mechanisms are provided and sequentially activated in conjunction
+ with higher layer ones, the lower layers must have an opportunity to
+ recover normal traffic before the higher layers do. However, if
+ lower layer recovery is slower than higher layer recovery, the lower
+ layer must either communicate the failure-related information to the
+ higher layer(s) (and allow it to perform recovery), or use a hold-off
+ timer in order to temporarily set the higher layer recovery action in
+ a "standby mode". Note that the a priori information exchange
+ between layers concerning their efficiency is not within the current
+ scope of this document. Nevertheless, the coordination functionality
+ between layers must be configurable and tunable.
+
+ For example, coordination between the optical and packet layer
+ control plane enables the optical layer to perform the failure
+ management operations (in particular, failure detection and
+ notification) while giving to the packet layer control plane the
+ authority to decide and perform the recovery actions. If the packet
+ layer recovery action is unsuccessful, fallback at the optical layer
+ can be performed subsequently.
+
+ The top-down approach attempts service recovery at the higher layers
+ before invoking lower layer recovery. Higher layer recovery is
+ service selective, and permits "per-CoS" or "per-connection" re-
+ routing. With this approach, the most important aspect is that the
+ upper layer should provide its own reliable and independent failure
+ detection mechanism from the lower layer.
+
+ [DEMEESTER] also suggests recovery mechanisms incorporating a
+ coordinated effort shared by two adjacent layers with periodic status
+ updates. Moreover, some of these recovery operations can be pre-
+ assigned (on a per-link basis) to a certain layer, e.g., a given link
+ will be recovered at the packet layer while another will be recovered
+ at the optical layer.
+
+7.4. Disjointness
+
+ Having link and node diverse working and recovery LSPs/spans does not
+ guarantee their complete disjointness. Due to the common physical
+ layer topology (passive), additional hierarchical concepts, such as
+ the Shared Risk Link Group (SRLG), and mechanisms, such as SRLG
+ diverse path computation, must be developed to provide complete
+ working and recovery LSP/span disjointness (see [IPO-IMP] and
+
+
+
+Papadimitriou & Mannie Informational [Page 31]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ [RFC4202]). Otherwise, a failure affecting the working LSP/span
+ would also potentially affect the recovery LSP/span; one refers to
+ such an event as "common failure".
+
+7.4.1. SRLG Disjointness
+
+ A Shared Risk Link Group (SRLG) is defined as the set of links
+ sharing a common risk (such as a common physical resource such as a
+ fiber link or a fiber cable). For instance, a set of links L belongs
+ to the same SRLG s, if they are provisioned over the same fiber link
+ f.
+
+ The SRLG properties can be summarized as follows:
+
+ 1) A link belongs to more than one SRLG if and only if it crosses one
+ of the resources covered by each of them.
+
+ 2) Two links belonging to the same SRLG can belong individually to
+ (one or more) other SRLGs.
+
+ 3) The SRLG set S of an LSP is defined as the union of the individual
+ SRLG s of the individual links composing this LSP.
+
+ SRLG disjointness is also applicable to LSPs:
+
+ The LSP SRLG disjointness concept is based on the following
+ postulate: an LSP (i.e., a sequence of links and nodes) covers an
+ SRLG if and only if it crosses one of the links or nodes belonging
+ to that SRLG.
+
+ Therefore, the SRLG disjointness for LSPs, can be defined as
+ follows: two LSPs are disjoint with respect to an SRLG s if and
+ only if they do not cover simultaneously this SRLG s.
+
+ Whilst the SRLG disjointness for LSPs with respect to a set S of
+ SRLGs, is defined as follows: two LSPs are disjoint with respect
+ to a set of SRLGs S if and only if the set of SRLGs that are
+ common to both LSPs is disjoint from set S.
+
+ The impact on recovery is noticeable: SRLG disjointness is a
+ necessary (but not a sufficient) condition to ensure network
+ survivability. With respect to the physical network resources, a
+ working-recovery LSP/span pair must be SRLG-disjoint in case of
+ dedicated recovery type. On the other hand, in case of shared
+ recovery, a group of working LSP/spans must be mutually SRLG-disjoint
+ in order to allow for a (single and common) shared recovery LSP that
+ is itself SRLG-disjoint from each of the working LSPs/spans.
+
+
+
+
+Papadimitriou & Mannie Informational [Page 32]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+8. Recovery Mechanisms Analysis
+
+ In order to provide a structured analysis of the recovery mechanisms
+ detailed in the previous sections, the following dimensions can be
+ considered:
+
+ 1. Fast convergence (performance): provide a mechanism that
+ aggregates multiple failures (implying fast failure detection and
+ correlation mechanisms) and fast recovery decision independently
+ of the number of failures occurring in the optical network (also
+ implying a fast failure notification).
+
+ 2. Efficiency (scalability): minimize the switching time required for
+ LSP/span recovery independently of the number of LSPs/spans being
+ recovered (this implies efficient failure correlation, fast
+ failure notification, and time-efficient recovery mechanisms).
+
+ 3. Robustness (availability): minimize the LSP/span downtime
+ independently of the underlying topology of the transport plane
+ (this implies a highly responsive recovery mechanism).
+
+ 4. Resource optimization (optimality): minimize the resource
+ capacity, including LSPs/spans and nodes (switching capacity),
+ required for recovery purposes; this dimension can also be
+ referred to as optimizing the sharing degree of the recovery
+ resources.
+
+ 5. Cost optimization: provide a cost-effective recovery type/scheme.
+
+ However, these dimensions are either outside the scope of this
+ document (such as cost optimization and recovery path computational
+ aspects) or mutually conflicting. For instance, it is obvious that
+ providing a 1+1 LSP protection minimizes the LSP downtime (in case of
+ failure) while being non-scalable and consuming recovery resource
+ without enabling any extra-traffic.
+
+ The following sections analyze the recovery phases and mechanisms
+ detailed in the previous sections with respect to the dimensions
+ described above in order to assess the GMPLS protocol suite
+ capabilities and applicability. In turn, this allows the evaluation
+ of the potential need for further GMPLS signaling and routing
+ extensions.
+
+
+
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 33]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+8.1. Fast Convergence (Detection/Correlation and Hold-off Time)
+
+ Fast convergence is related to the failure management operations. It
+ refers to the time elapsed between failure detection/correlation and
+ hold-off time, the point at which the recovery switching actions are
+ initiated. This point has been detailed in Section 4.
+
+8.2. Efficiency (Recovery Switching Time)
+
+ In general, the more pre-assignment/pre-planning of the recovery
+ LSP/span, the more rapid the recovery is. Because protection implies
+ pre-assignment (and cross-connection) of the protection resources, in
+ general, protection recovers faster than restoration.
+
+ Span restoration is likely to be slower than most span protection
+ types; however this greatly depends on the efficiency of the span
+ restoration signaling. LSP restoration with pre-signaled and pre-
+ selected recovery resources is likely to be faster than fully dynamic
+ LSP restoration, especially because of the elimination of any
+ potential crankback during the recovery LSP establishment.
+
+ If one excludes the crankback issue, the difference between dynamic
+ and pre-planned restoration depends on the restoration path
+ computation and selection time. Since computational considerations
+ are outside the scope of this document, it is up to the vendor to
+ determine the average and maximum path computation time in different
+ scenarios and to the operator to decide whether or not dynamic
+ restoration is advantageous over pre-planned schemes that depend on
+ the network environment. This difference also depends on the
+ flexibility provided by pre-planned restoration versus dynamic
+ restoration. Pre-planned restoration implies a somewhat limited
+ number of failure scenarios (that can be due, for instance, to local
+ storage capacity limitation). Dynamic restoration enables on-demand
+ path computation based on the information received through failure
+ notification message, and as such, it is more robust with respect to
+ the failure scenario scope.
+
+ Moreover, LSP segment restoration, in particular, dynamic restoration
+ (i.e., no path pre-computation, so none of the recovery resource is
+ pre-reserved) will generally be faster than end-to-end LSP
+ restoration. However, local LSP restoration assumes that each LSP
+ segment end-point has enough computational capacity to perform this
+ operation while end-to-end LSP restoration requires only that LSP
+ end-points provide this path computation capability.
+
+ Recovery time objectives for SONET/SDH protection switching (not
+ including time to detect failure) are specified in [G.841] at 50 ms,
+ taking into account constraints on distance, number of connections
+
+
+
+Papadimitriou & Mannie Informational [Page 34]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ involved, and in the case of ring enhanced protection, number of
+ nodes in the ring. Recovery time objectives for restoration
+ mechanisms have been proposed through a separate effort [RFC3386].
+
+8.3. Robustness
+
+ In general, the less pre-assignment (protection)/pre-planning
+ (restoration) of the recovery LSP/span, the more robust the recovery
+ type or scheme is to a variety of single failures, provided that
+ adequate resources are available. Moreover, the pre-selection of the
+ recovery resources gives (in the case of multiple failure scenarios)
+ less flexibility than no recovery resource pre-selection. For
+ instance, if failures occur that affect two LSPs sharing a common
+ link along their restoration paths, then only one of these LSPs can
+ be recovered. This occurs unless the restoration path of at least
+ one of these LSPs is re-computed, or the local resource assignment is
+ modified on the fly.
+
+ In addition, recovery types and schemes with pre-planned recovery
+ resources (in particular, LSP/spans for protection and LSPs for
+ restoration purposes) will not be able to recover from failures that
+ simultaneously affect both the working and recovery LSP/span. Thus,
+ the recovery resources should ideally be as disjoint as possible
+ (with respect to link, node, and SRLG) from the working ones, so that
+ any single failure event will not affect both working and recovery
+ LSP/span. In brief, working and recovery resources must be fully
+ diverse in order to guarantee that a given failure will not affect
+ simultaneously the working and the recovery LSP/span. Also, the risk
+ of simultaneous failure of the working and the recovery LSPs can be
+ reduced. It is reduced by computing a new recovery path whenever a
+ failure occurs along one of the recovery LSPs or by computing a new
+ recovery path and provision the corresponding LSP whenever a failure
+ occurs along a working LSP/span. Both methods enable the network to
+ maintain the number of available recovery path constant.
+
+ The robustness of a recovery scheme is also determined by the amount
+ of pre-reserved (i.e., signaled) recovery resources within a given
+ shared resource pool: as the sharing degree of recovery resources
+ increases, the recovery scheme becomes less robust to multiple
+ LSP/span failure occurrences. Recovery schemes, in particular
+ restoration, with pre-signaled resource reservation (with or without
+ pre-selection) should be capable of reserving an adequate amount of
+ resource to ensure recovery from any specific set of failure events,
+ such as any single SRLG failure, any two SRLG failures, etc.
+
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 35]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+8.4. Resource Optimization
+
+ It is commonly admitted that sharing recovery resources provides
+ network resource optimization. Therefore, from a resource
+ utilization perspective, protection schemes are often classified with
+ respect to their degree of sharing recovery resources with the
+ working entities. Moreover, non-permanent bridging protection types
+ allow (under normal conditions) for extra-traffic over the recovery
+ resources.
+
+ From this perspective, the following statements are true:
+
+ 1) 1+1 LSP/Span protection is the most resource-consuming protection
+ type because it does not allow for any extra traffic.
+
+ 2) 1:1 LSP/span recovery requires dedicated recovery LSP/span
+ allowing for extra traffic.
+
+ 3) 1:N and M:N LSP/span recovery require 1 (and M, respectively)
+ recovery LSP/span (shared between the N working LSP/span) allowing
+ for extra traffic.
+
+ Obviously, 1+1 protection precludes, and 1:1 recovery does not allow
+ for any recovery LSP/span sharing, whereas 1:N and M:N recovery do
+ allow sharing of 1 (M, respectively) recovery LSP/spans between N
+ working LSP/spans. However, despite the fact that 1:1 LSP recovery
+ precludes the sharing of the recovery LSP, the recovery schemes that
+ can be built from it (e.g., (1:1)^n, see Section 5.4) do allow
+ sharing of its recovery resources. In addition, the flexibility in
+ the usage of shared recovery resources (in particular, shared links)
+ may be limited because of network topology restrictions, e.g., fixed
+ ring topology for traditional enhanced protection schemes.
+
+ On the other hand, when using LSP restoration with pre-signaled
+ resource reservation, the amount of reserved restoration capacity is
+ determined by the local bandwidth reservation policies. In LSP
+ restoration schemes with re-provisioning, a pool of spare resources
+ can be defined from which all resources are selected after failure
+ occurrence for the purpose of restoration path computation. The
+ degree to which restoration schemes allow sharing amongst multiple
+ independent failures is then directly inferred from the size of the
+ resource pool. Moreover, in all restoration schemes, spare resources
+ can be used to carry preemptible traffic (thus over preemptible
+ LSP/span) when the corresponding resources have not been committed
+ for LSP/span recovery purposes.
+
+ From this, it clearly follows that less recovery resources (i.e.,
+ LSP/spans and switching capacity) have to be allocated to a shared
+
+
+
+Papadimitriou & Mannie Informational [Page 36]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ recovery resource pool if a greater sharing degree is allowed. Thus,
+ the network survivability level is determined by the policy that
+ defines the amount of shared recovery resources and by the maximum
+ sharing degree allowed for these recovery resources.
+
+8.4.1. Recovery Resource Sharing
+
+ When recovery resources are shared over several LSP/Spans, the use of
+ the Maximum Reservable Bandwidth, the Unreserved Bandwidth, and the
+ Maximum LSP Bandwidth (see [RFC4202]) provides the information needed
+ to obtain the optimization of the network resources allocated for
+ shared recovery purposes.
+
+ The Maximum Reservable Bandwidth is defined as the Maximum Link
+ Bandwidth but it may be greater in case of link over-subscription.
+
+ The Unreserved Bandwidth (at priority p) is defined as the bandwidth
+ not yet reserved on a given TE link (its initial value for each
+ priority p corresponds to the Maximum Reservable Bandwidth). Last,
+ the Maximum LSP Bandwidth (at priority p) is defined as the smaller
+ of Unreserved Bandwidth (at priority p) and Maximum Link Bandwidth.
+
+ Here, one generally considers a recovery resource sharing degree (or
+ ratio) to globally optimize the shared recovery resource usage. The
+ distribution of the bandwidth utilization per TE link can be inferred
+ from the per-priority bandwidth pre-allocation. By using the Maximum
+ LSP Bandwidth and the Maximum Reservable Bandwidth, the amount of
+ (over-provisioned) resources that can be used for shared recovery
+ purposes is known from the IGP.
+
+ In order to analyze this behavior, we define the difference between
+ the Maximum Reservable Bandwidth (in the present case, this value is
+ greater than the Maximum Link Bandwidth) and the Maximum LSP
+ Bandwidth per TE link i as the Maximum Shareable Bandwidth or
+ max_R[i]. Within this quantity, the amount of bandwidth currently
+ allocated for shared recovery per TE link i is defined as R[i]. Both
+ quantities are expressed in terms of discrete bandwidth units (and
+ thus, the Minimum LSP Bandwidth is of one bandwidth unit).
+
+ The knowledge of this information available per TE link can be
+ exploited in order to optimize the usage of the resources allocated
+ per TE link for shared recovery. If one refers to r[i] as the actual
+ bandwidth per TE link i (in terms of discrete bandwidth units)
+ committed for shared recovery, then the following quantity must be
+ maximized over the potential TE link candidates:
+
+ sum {i=1}^N [(R{i} - r{i})/(t{i} - b{i})]
+
+
+
+
+Papadimitriou & Mannie Informational [Page 37]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ or equivalently: sum {i=1}^N [(R{i} - r{i})/r{i}]
+
+ with R{i} >= 1 and r{i} >= 1 (in terms of per component
+ bandwidth unit)
+
+ In this formula, N is the total number of links traversed by a given
+ LSP, t[i] the Maximum Link Bandwidth per TE link i, and b[i] the sum
+ per TE link i of the bandwidth committed for working LSPs and other
+ recovery LSPs (thus except "shared bandwidth" LSPs). The quantity
+ [(R{i} - r{i})/r{i}] is defined as the Shared (Recovery) Bandwidth
+ Ratio per TE link i. In addition, TE links for which R[i] reaches
+ max_R[i] or for which r[i] = 0 are pruned during shared recovery path
+ computation as well as TE links for which max_R[i] = r[i] that can
+ simply not be shared.
+
+ More generally, one can draw the following mapping between the
+ available bandwidth at the transport and control plane level:
+
+ - ---------- Max Reservable Bandwidth
+ | ----- ^
+ |R ----- |
+ | ----- |
+ - ----- |max_R
+ ----- |
+ -------- TE link Capacity - ------ | - Maximum TE Link Bandwidth
+ ----- |r ----- v
+ ----- <------ b ------> - ---------- Maximum LSP Bandwidth
+ ----- -----
+ ----- -----
+ ----- -----
+ ----- -----
+ ----- ----- <--- Minimum LSP Bandwidth
+ -------- 0 ---------- 0
+
+ Note that the above approach does not require the flooding of any per
+ LSP information or any detailed distribution of the bandwidth
+ allocation per component link or individual ports or even any per-
+ priority shareable recovery bandwidth information (using a dedicated
+ sub-TLV). The latter would provide the same capability as the
+ already defined Maximum LSP bandwidth per-priority information. This
+ approach is referred to as a Partial (or Aggregated) Information
+ Routing as described in [KODIALAM1] and [KODIALAM2]. They show that
+ the difference obtained with a Full (or Complete) Information Routing
+ approach (where for the whole set of working and recovery LSPs, the
+ amount of bandwidth units they use per-link is known at each node and
+ for each link) is clearly negligible. The Full Information Routing
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 38]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ approach is detailed in [GLI]. Note also that both approaches rely
+ on the deterministic knowledge (at different degrees) of the network
+ topology and resource usage status.
+
+ Moreover, extending the GMPLS signaling capabilities can enhance the
+ Partial Information Routing approach. It is enhanced by allowing
+ working-LSP-related information and, in particular, its path
+ (including link and node identifiers) to be exchanged with the
+ recovery LSP request. This enables more efficient admission control
+ at upstream nodes of shared recovery resources, and in particular,
+ links (see Section 8.4.3).
+
+8.4.2. Recovery Resource Sharing and SRLG Recovery
+
+ Resource shareability can also be maximized with respect to the
+ number of times each SRLG is protected by a recovery resource (in
+ particular, a shared TE link) and methods can be considered for
+ avoiding contention of the shared recovery resources in case of
+ single SRLG failure. These methods enable the sharing of recovery
+ resources between two (or more) recovery LSPs, if their respective
+ working LSPs are mutually disjoint with respect to link, node, and
+ SRLGs. Then, a single failure does not simultaneously disrupt
+ several (or at least two) working LSPs.
+
+ For instance, [BOUILLET] shows that the Partial Information Routing
+ approach can be extended to cover recovery resource shareability with
+ respect to SRLG recoverability (i.e., the number of times each SRLG
+ is recoverable). By flooding this aggregated information per TE
+ link, path computation and selection of SRLG-diverse recovery LSPs
+ can be optimized with respect to the sharing of recovery resource
+ reserved on each TE link. This yields a performance difference of
+ less than 5%, which is negligible compared to the corresponding Full
+ Information Flooding approach (see [GLI]).
+
+ For this purpose, additional extensions to [RFC4202] in support of
+ path computation for shared mesh recovery have been often considered
+ in the literature. TE link attributes would include, among others,
+ the current number of recovery LSPs sharing the recovery resources
+ reserved on the TE link, and the current number of SRLGs recoverable
+ by this amount of (shared) recovery resources reserved on the TE
+ link. The latter is equivalent to the current number of SRLGs that
+ will be recovered by the recovery LSPs sharing the recovery resource
+ reserved on the TE link. Then, if explicit SRLG recoverability is
+ considered, a TE link attribute would be added that includes the
+ explicit list of SRLGs (recoverable by the shared recovery resource
+ reserved on the TE link) and their respective shareable recovery
+ bandwidths. The latter information is equivalent to the shareable
+ recovery bandwidth per SRLG (or per group of SRLGs), which implies
+
+
+
+Papadimitriou & Mannie Informational [Page 39]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ that the amount of shareable bandwidth and the number of listed SRLGs
+ will decrease over time.
+
+ Compared to the case of recovery resource sharing only (regardless of
+ SRLG recoverability, as described in Section 8.4.1), these additional
+ TE link attributes would potentially deliver better path computation
+ and selection (at a distinct ingress node) for shared mesh recovery
+ purposes. However, due to the lack of evidence of better efficiency
+ and due to the complexity that such extensions would generate, they
+ are not further considered in the scope of the present analysis. For
+ instance, a per-SRLG group minimum/maximum shareable recovery
+ bandwidth is restricted by the length that the corresponding (sub-)
+ TLV may take and thus the number of SRLGs that it can include.
+ Therefore, the corresponding parameter should not be translated into
+ GMPLS routing (or even signaling) protocol extensions in the form of
+ TE link sub-TLV.
+
+8.4.3. Recovery Resource Sharing, SRLG Disjointness and Admission
+ Control
+
+ Admission control is a strict requirement to be fulfilled by nodes
+ giving access to shared links. This can be illustrated using the
+ following network topology:
+
+ A ------ C ====== D
+ | | |
+ | | |
+ | B |
+ | | |
+ | | |
+ ------- E ------ F
+
+ Node A creates a working LSP to D (A-C-D), B creates simultaneously a
+ working LSP to D (B-C-D) and a recovery LSP (B-E-F-D) to the same
+ destination. Then, A decides to create a recovery LSP to D (A-E-F-
+ D), but since the C-D span carries both working LSPs, node E should
+ either assign a dedicated resource for this recovery LSP or reject
+ this request if the C-D span has already reached its maximum recovery
+ bandwidth sharing ratio. In the latter case, C-D span failure would
+ imply that one of the working LSP would not be recoverable.
+
+ Consequently, node E must have the required information to perform
+ admission control for the recovery LSP requests it processes
+ (implying for instance, that the path followed by the working LSP is
+ carried with the corresponding recovery LSP request). If node E can
+ guarantee that the working LSPs (A-C-D and B-C-D) are SRLG disjoint
+ over the C-D span, it may securely accept the incoming recovery LSP
+ request and assign to the recovery LSPs (A-E-F-D and B-E-F-D) the
+
+
+
+Papadimitriou & Mannie Informational [Page 40]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ same resources on the link E-F. This may occur if the link E-F has
+ not yet reached its maximum recovery bandwidth sharing ratio. In
+ this example, one assumes that the node failure probability is
+ negligible compared to the link failure probability.
+
+ To achieve this, the path followed by the working LSP is transported
+ with the recovery LSP request and examined at each upstream node of
+ potentially shareable links. Admission control is performed using
+ the interface identifiers (included in the path) to retrieve in the
+ TE DataBase the list of SRLG IDs associated to each of the working
+ LSP links. If the working LSPs (A-C-D and B-C-D) have one or more
+ link or SRLG ID in common (in this example, one or more SRLG id in
+ common over the span C-D), node E should not assign the same resource
+ over link E-F to the recovery LSPs (A-E-F-D and B-E-F-D). Otherwise,
+ one of these working LSPs would not be recoverable if C-D span
+ failure occurred.
+
+ There are some issues related to this method; the major one is the
+ number of SRLG IDs that a single link can cover (more than 100, in
+ complex environments). Moreover, when using link bundles, this
+ approach may generate the rejection of some recovery LSP requests.
+ This occurs when the SRLG sub-TLV corresponding to a link bundle
+ includes the union of the SRLG id list of all the component links
+ belonging to this bundle (see [RFC4202] and [RFC4201]).
+
+ In order to overcome this specific issue, an additional mechanism may
+ consist of querying the nodes where the information would be
+ available (in this case, node E would query C). The main drawback of
+ this method is that (in addition to the dedicated mechanism(s) it
+ requires) it may become complex when several common nodes are
+ traversed by the working LSPs. Therefore, when using link bundles,
+ solving this issue is closely related to the sequence of the recovery
+ operations. Per-component flooding of SRLG identifiers would deeply
+ impact the scalability of the link state routing protocol.
+ Therefore, one may rely on the usage of an on-line accessible network
+ management system.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 41]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+9. Summary and Conclusions
+
+ The following table summarizes the different recovery types and
+ schemes analyzed throughout this document.
+
+ --------------------------------------------------------------------
+ | Path Search (computation and selection)
+ --------------------------------------------------------------------
+ | Pre-planned (a) | Dynamic (b)
+ --------------------------------------------------------------------
+ | | faster recovery | Does not apply
+ | | less flexible |
+ | 1 | less robust |
+ | | most resource-consuming |
+ Path | | |
+ Setup ------------------------------------------------------------
+ | | relatively fast recovery | Does not apply
+ | | relatively flexible |
+ | 2 | relatively robust |
+ | | resource consumption |
+ | | depends on sharing degree |
+ ------------------------------------------------------------
+ | | relatively fast recovery | less faster (computation)
+ | | more flexible | most flexible
+ | 3 | relatively robust | most robust
+ | | less resource-consuming | least resource-consuming
+ | | depends on sharing degree |
+ --------------------------------------------------------------------
+
+ 1a. Recovery LSP setup (before failure occurrence) with resource
+ reservation (i.e., signaling) and selection is referred to as LSP
+ protection.
+
+ 2a. Recovery LSP setup (before failure occurrence) with resource
+ reservation (i.e., signaling) and with resource pre-selection is
+ referred to as pre-planned LSP re-routing with resource pre-
+ selection. This implies only recovery LSP activation after
+ failure occurrence.
+
+ 3a. Recovery LSP setup (before failure occurrence) with resource
+ reservation (i.e., signaling) and without resource selection is
+ referred to as pre-planned LSP re-routing without resource pre-
+ selection. This implies recovery LSP activation and resource
+ (i.e., label) selection after failure occurrence.
+
+ 3b. Recovery LSP setup after failure occurrence is referred to as to
+ as LSP re-routing, which is full when recovery LSP path
+ computation occurs after failure occurrence.
+
+
+
+Papadimitriou & Mannie Informational [Page 42]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ Thus, the term pre-planned refers to recovery LSP path pre-
+ computation, signaling (reservation), and a priori resource selection
+ (optional), but not cross-connection. Also, the shared-mesh recovery
+ scheme can be viewed as a particular case of 2a) and 3a), using the
+ additional constraint described in Section 8.4.3.
+
+ The implementation of these recovery mechanisms requires only
+ considering extensions to GMPLS signaling protocols (i.e., [RFC3471]
+ and [RFC3473]). These GMPLS signaling extensions should mainly focus
+ in delivering (1) recovery LSP pre-provisioning for the cases 1a, 2a,
+ and 3a, (2) LSP failure notification, (3) recovery LSP switching
+ action(s), and (4) reversion mechanisms.
+
+ Moreover, the present analysis (see Section 8) shows that no GMPLS
+ routing extensions are expected to efficiently implement any of these
+ recovery types and schemes.
+
+10. Security Considerations
+
+ This document does not introduce any additional security issue or
+ imply any specific security consideration from [RFC3945] to the
+ current RSVP-TE GMPLS signaling, routing protocols (OSPF-TE, IS-IS-
+ TE) or network management protocols.
+
+ However, the authorization of requests for resources by GMPLS-capable
+ nodes should determine whether a given party, presumably already
+ authenticated, has a right to access the requested resources. This
+ determination is typically a matter of local policy control, for
+ example, by setting limits on the total bandwidth made available to
+ some party in the presence of resource contention. Such policies may
+ become quite complex as the number of users, types of resources, and
+ sophistication of authorization rules increases. This is
+ particularly the case for recovery schemes that assume pre-planned
+ sharing of recovery resources, or contention for resources in case of
+ dynamic re-routing.
+
+ Therefore, control elements should match the requests against the
+ local authorization policy. These control elements must be capable
+ of making decisions based on the identity of the requester, as
+ verified cryptographically and/or topologically.
+
+11. Acknowledgements
+
+ The authors would like to thank Fabrice Poppe (Alcatel) and Bart
+ Rousseau (Alcatel) for their revision effort, and Richard Rabbat
+ (Fujitsu Labs), David Griffith (NIST), and Lyndon Ong (Ciena) for
+ their useful comments.
+
+
+
+
+Papadimitriou & Mannie Informational [Page 43]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ Thanks also to Adrian Farrel for the thorough review of the document.
+
+12. References
+
+12.1. Normative References
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+ [RFC3471] Berger, L., "Generalized Multi-Protocol Label Switching
+ (GMPLS) Signaling Functional Description", RFC 3471,
+ January 2003.
+
+ [RFC3473] Berger, L., "Generalized Multi-Protocol Label Switching
+ (GMPLS) Signaling Resource ReserVation Protocol-Traffic
+ Engineering (RSVP-TE) Extensions", RFC 3473, January
+ 2003.
+
+ [RFC3945] Mannie, E., "Generalized Multi-Protocol Label Switching
+ (GMPLS) Architecture", RFC 3945, October 2004.
+
+ [RFC4201] Kompella, K., Rekhter, Y., and L. Berger, "Link Bundling
+ in MPLS Traffic Engineering (TE)", RFC 4201, October
+ 2005.
+
+ [RFC4202] Kompella, K., Ed. and Y. Rekhter, Ed., "Routing
+ Extensions in Support of Generalized Multi-Protocol
+ Label Switching (GMPLS)", RFC 4202, October 2005.
+
+ [RFC4204] Lang, J., Ed., "Link Management Protocol (LMP)", RFC
+ 4204, October 2005.
+
+ [RFC4209] Fredette, A., Ed. and J. Lang, Ed., "Link Management
+ Protocol (LMP) for Dense Wavelength Division
+ Multiplexing (DWDM) Optical Line Systems", RFC 4209,
+ October 2005.
+
+ [RFC4427] Mannie E., Ed. and D. Papadimitriou, Ed., "Recovery
+ (Protection and Restoration) Terminology for Generalized
+ Multi-Protocol Label Switching (GMPLS)", RFC 4427, March
+ 2006.
+
+12.2. Informative References
+
+ [BOUILLET] E. Bouillet, et al., "Stochastic Approaches to Compute
+ Shared Meshed Restored Lightpaths in Optical Network
+ Architectures," IEEE Infocom 2002, New York City, June
+ 2002.
+
+
+
+Papadimitriou & Mannie Informational [Page 44]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ [DEMEESTER] P. Demeester, et al., "Resilience in Multilayer
+ Networks," IEEE Communications Magazine, Vol. 37, No. 8,
+ pp. 70-76, August 1998.
+
+ [GLI] G. Li, et al., "Efficient Distributed Path Selection for
+ Shared Restoration Connections," IEEE Infocom 2002, New
+ York City, June 2002.
+
+ [IPO-IMP] Strand, J. and A. Chiu, "Impairments and Other
+ Constraints on Optical Layer Routing", RFC 4054, May
+ 2005.
+
+ [KODIALAM1] M. Kodialam and T.V. Lakshman, "Restorable Dynamic
+ Quality of Service Routing," IEEE Communications
+ Magazine, pp. 72-81, June 2002.
+
+ [KODIALAM2] M. Kodialam and T.V. Lakshman, "Dynamic Routing of
+ Restorable Bandwidth-Guaranteed Tunnels using Aggregated
+ Network Resource Usage Information," IEEE/ ACM
+ Transactions on Networking, pp. 399-410, June 2003.
+
+ [MANCHESTER] J. Manchester, P. Bonenfant and C. Newton, "The
+ Evolution of Transport Network Survivability," IEEE
+ Communications Magazine, August 1999.
+
+ [RFC3386] Lai, W. and D. McDysan, "Network Hierarchy and
+ Multilayer Survivability", RFC 3386, November 2002.
+
+ [T1.105] ANSI, "Synchronous Optical Network (SONET): Basic
+ Description Including Multiplex Structure, Rates, and
+ Formats," ANSI T1.105, January 2001.
+
+ [WANG] J. Wang, L. Sahasrabuddhe, and B. Mukherjee, "Path vs.
+ Subpath vs. Link Restoration for Fault Management in
+ IP-over-WDM Networks: Performance Comparisons Using
+ GMPLS Control Signaling," IEEE Communications Magazine,
+ pp. 80-87, November 2002.
+
+ For information on the availability of the following documents,
+ please see http://www.itu.int
+
+ [G.707] ITU-T, "Network Node Interface for the Synchronous
+ Digital Hierarchy (SDH)," Recommendation G.707, October
+ 2000.
+
+ [G.709] ITU-T, "Network Node Interface for the Optical Transport
+ Network (OTN)," Recommendation G.709, February 2001 (and
+ Amendment no.1, October 2001).
+
+
+
+Papadimitriou & Mannie Informational [Page 45]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+ [G.783] ITU-T, "Characteristics of Synchronous Digital Hierarchy
+ (SDH) Equipment Functional Blocks," Recommendation
+ G.783, October 2000.
+
+ [G.798] ITU-T, "Characteristics of optical transport network
+ hierarchy equipment functional block," Recommendation
+ G.798, June 2004.
+
+ [G.806] ITU-T, "Characteristics of Transport Equipment -
+ Description Methodology and Generic Functionality",
+ Recommendation G.806, October 2000.
+
+ [G.841] ITU-T, "Types and Characteristics of SDH Network
+ Protection Architectures," Recommendation G.841, October
+ 1998.
+
+ [G.842] ITU-T, "Interworking of SDH network protection
+ architectures," Recommendation G.842, October 1998.
+
+ [G.874] ITU-T, "Management aspects of the optical transport
+ network element," Recommendation G.874, November 2001.
+
+Editors' Addresses
+
+ Dimitri Papadimitriou
+ Alcatel
+ Francis Wellesplein, 1
+ B-2018 Antwerpen, Belgium
+
+ Phone: +32 3 240-8491
+ EMail: dimitri.papadimitriou@alcatel.be
+
+
+ Eric Mannie
+ Perceval
+ Rue Tenbosch, 9
+ 1000 Brussels
+ Belgium
+
+ Phone: +32-2-6409194
+ EMail: eric.mannie@perceval.net
+
+
+
+
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 46]
+
+RFC 4428 GMPLS Recovery Mechanisms March 2006
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2006).
+
+ This document is subject to the rights, licenses and restrictions
+ contained in BCP 78, and except as set forth therein, the authors
+ retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+ ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+ INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+ INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr.
+
+ The IETF invites any interested party to bring to its attention any
+ copyrights, patents or patent applications, or other proprietary
+ rights that may cover technology that may be required to implement
+ this standard. Please address the information to the IETF at
+ ietf-ipr@ietf.org.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is provided by the IETF
+ Administrative Support Activity (IASA).
+
+
+
+
+
+
+
+Papadimitriou & Mannie Informational [Page 47]
+