summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc3386.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc3386.txt')
-rw-r--r--doc/rfc/rfc3386.txt1515
1 files changed, 1515 insertions, 0 deletions
diff --git a/doc/rfc/rfc3386.txt b/doc/rfc/rfc3386.txt
new file mode 100644
index 0000000..e39ea01
--- /dev/null
+++ b/doc/rfc/rfc3386.txt
@@ -0,0 +1,1515 @@
+
+
+
+
+
+
+Network Working Group W. Lai, Ed.
+Request for Comments: 3386 AT&T
+Category: Informational D. McDysan, Ed.
+ WorldCom
+ November 2002
+
+
+ Network Hierarchy and Multilayer Survivability
+
+Status of this Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2002). All Rights Reserved.
+
+Abstract
+
+ This document presents a proposal of the near-term and practical
+ requirements for network survivability and hierarchy in current
+ service provider environments.
+
+Conventions used in this document
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in BCP 14, RFC 2119 [2].
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lai, et. al. Informational [Page 1]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+Table of Contents
+
+ 1. Introduction..............................................2
+ 2. Terminology and Concepts..................................5
+ 2.1 Hierarchy................................................6
+ 2.1.1 Vertical Hierarchy.....................................5
+ 2.1.2 Horizontal Hierarchy...................................6
+ 2.2 Survivability Terminology................................6
+ 2.2.1 Survivability..........................................7
+ 2.2.2 Generic Operations.....................................7
+ 2.2.3 Survivability Techniques...............................8
+ 2.2.4 Survivability Performance..............................9
+ 2.3 Survivability Mechanisms: Comparison....................10
+ 3. Survivability............................................11
+ 3.1 Scope...................................................11
+ 3.2 Required initial set of survivability mechanisms........12
+ 3.2.1 1:1 Path Protection with Pre-Established Capacity.....12
+ 3.2.2 1:1 Path Protection with Pre-Planned Capacity.........13
+ 3.2.3 Local Restoration.....................................13
+ 3.2.4 Path Restoration......................................14
+ 3.3 Applications Supported..................................14
+ 3.4 Timing Bounds for Survivability Mechanisms..............15
+ 3.5 Coordination Among Layers...............................16
+ 3.6 Evolution Toward IP Over Optical........................17
+ 4. Hierarchy Requirements...................................17
+ 4.1 Historical Context......................................17
+ 4.2 Applications for Horizontal Hierarchy...................18
+ 4.3 Horizontal Hierarchy Requirements.......................19
+ 5. Survivability and Hierarchy..............................19
+ 6. Security Considerations..................................20
+ 7. References...............................................21
+ 8. Acknowledgments..........................................22
+ 9. Contributing Authors.....................................22
+ Appendix A: Questions used to help develop requirements.....23
+ Editors' Addresses..........................................26
+ Full Copyright Statement....................................27
+
+1. Introduction
+
+ This document is the result of the Network Hierarchy and
+ Survivability Techniques Design Team established within the Traffic
+ Engineering Working Group. This team collected and documented
+ current and near term requirements for survivability and hierarchy in
+ service provider environments. For clarity, an expanded set of
+ definitions is included. The team determined that there appears to
+ be a need to define a small set of interoperable survivability
+ approaches in packet and non-packet networks. Suggested approaches
+ include path-based as well as one that repairs connections in
+
+
+
+Lai, et. al. Informational [Page 2]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ proximity to the network fault. They operate primarily at a single
+ network layer. For hierarchy, there did not appear to be a driving
+ near-term need for work on "vertical hierarchy," defined as
+ communication between network layers such as Time Division
+ Multiplexed (TDM)/optical and Multi-Protocol Label Switching (MPLS).
+ In particular, instead of direct exchange of signaling and routing
+ between vertical layers, some looser form of coordination and
+ communication, such as the specification of hold-off timers, is a
+ nearer term need. For "horizontal hierarchy" in data networks, there
+ are several pressing needs. The requirement is to be able to set up
+ many Label Switched Paths (LSPs) in a service provider network with
+ hierarchical Interior Gateway Protocol (IGP). This is necessary to
+ support layer 2 and layer 3 Virtual Private Network (VPN) services
+ that require edge-to-edge signaling across a core network.
+
+ This document presents a proposal of the near-term and practical
+ requirements for network survivability and hierarchy in current
+ service provider environments. With feedback from the working group
+ solicited, the objective is to help focus the work that is being
+ addressed in the TEWG (Traffic Engineering Working Group), CCAMP
+ (Common Control and Measurement Plane Working Group), and other
+ working groups. A main goal of this work is to provide some
+ expedience for required functionality in multi-vendor service
+ provider networks. The initial focus is primarily on intra-domain
+ operations. However, to maintain consistency in the provision of
+ end-to-end service in a multi-provider environment, rules governing
+ the operations of survivability mechanisms at domain boundaries must
+ also be specified. While such issues are raised and discussed, where
+ appropriate, they will not be treated in depth in the initial release
+ of this document.
+
+ The document first develops a set of definitions to be used later in
+ this document and potentially in other documents as well. It then
+ addresses the requirements and issues associated with service
+ restoration, hierarchy, and finally a short discussion of
+ survivability in hierarchical context.
+
+ Here is a summary of the findings:
+
+ A. Survivability Requirements
+
+ o need to define a small set of interoperable survivability
+ approaches in packet and non-packet networks
+ o suggested survivability mechanisms include
+ - 1:1 path protection with pre-established backup capacity (non-
+ shared)
+ - 1:1 path protection with pre-planned backup capacity (shared)
+
+
+
+
+Lai, et. al. Informational [Page 3]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ - local restoration with repairs in proximity to the network
+ fault
+ - path restoration through source-based rerouting
+ o timing bounds for service restoration to support voice call cutoff
+ (140 msec to 2 sec), protocol timer requirements in premium data
+ services, and mission critical applications
+ o use of restoration priority for service differentiation
+
+ B. Hierarchy Requirements
+
+ B.1. Horizontally Oriented Hierarchy (Intra-Domain)
+
+ o ability to set up many LSPs in a service provider network with
+ hierarchical IGP, for the support of layer 2 and layer 3 VPN
+ services
+ o requirements for multi-area traffic engineering need to be
+ developed to provide guidance for any necessary protocol
+ extensions
+
+ B.2. Vertically Oriented Hierarchy
+
+ The following functionality for survivability is common on most
+ routing equipment today.
+
+ o near-term need is some loose form of coordination and
+ communication based on the use of nested hold-off timers, instead
+ of direct exchange of signaling and routing between vertical
+ layers
+ o means for an upper layer to immediately begin recovery actions in
+ the event that a lower layer is not configured to perform recovery
+
+ C. Survivability Requirements in Horizontal Hierarchy
+
+ o protection of end-to-end connection is based on a concatenated set
+ of connections, each protected within their area
+ o mechanisms for connection routing may include (1) a network
+ element that participates on both sides of a boundary (e.g., OSPF
+ ABR) - note that this is a common point of failure; (2) a route
+ server
+ o need for inter-area signaling of survivability information (1) to
+ enable a "least common denominator" survivability mechanism at the
+ boundary; (2) to convey the success or failure of the service
+ restoration action; e.g., if a part of a "connection" is down on
+ one side of a boundary, there is no need for the other side to
+ recover from failures
+
+
+
+
+
+
+Lai, et. al. Informational [Page 4]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+2. Terminology and Concepts
+
+2.1 Hierarchy
+
+ Hierarchy is a technique used to build scalable complex systems. It
+ is based on an abstraction, at each level, of what is most
+ significant from the details and internal structures of the levels
+ further away. This approach makes use of a general property of all
+ hierarchical systems composed of related subsystems that interactions
+ between subsystems decrease as the level of communication between
+ subsystems decreases.
+
+ Network hierarchy is an abstraction of part of a network's topology,
+ routing and signaling mechanisms. Abstraction may be used as a
+ mechanism to build large networks or as a technique for enforcing
+ administrative, topological, or geographic boundaries. For example,
+ network hierarchy might be used to separate the metropolitan and
+ long-haul regions of a network, or to separate the regional and
+ backbone sections of a network, or to interconnect service provider
+ networks (with BGP which reduces a network to an Autonomous System).
+
+ In this document, network hierarchy is considered from two
+ perspectives:
+
+ (1) Vertically oriented: between two network technology layers.
+ (2) Horizontally oriented: between two areas or administrative
+ subdivisions within the same network technology layer.
+
+2.1.1 Vertical Hierarchy
+
+ Vertical hierarchy is the abstraction, or reduction in information,
+ which would be of benefit when communicating information across
+ network technology layers, as in propagating information between
+ optical and router networks.
+
+ In the vertical hierarchy, the total network functions are
+ partitioned into a series of functional or technological layers with
+ clear logical, and maybe even physical, separation between adjacent
+ layers. Survivability mechanisms either currently exist or are being
+ developed at multiple layers in networks [3]. The optical layer is
+ now becoming capable of providing dynamic ring and mesh restoration
+ functionality, in addition to traditional 1+1 or 1:1 protection. The
+ Synchronous Digital Hierarchy (SDH)/Synchronous Optical NETwork
+ (SONET) layer provides survivability capability with automatic
+ protection switching (APS), as well as self-healing ring and mesh
+ restoration architectures. Similar functionality has been defined in
+ the Asynchronous Transfer Mode (ATM) Layer, with work ongoing to also
+ provide such functionality using MPLS [4]. At the IP layer,
+
+
+
+Lai, et. al. Informational [Page 5]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ rerouting is used to restore service continuity following link and
+ node outages. Rerouting at the IP layer, however, occurs after a
+ period of routing convergence, which may require a few seconds to
+ several minutes to complete [5].
+
+2.1.2 Horizontal Hierarchy
+
+ Horizontal hierarchy is the abstraction that allows a network at one
+ technology layer, for instance a packet network, to scale. Examples
+ of horizontal hierarchy include BGP confederations, separate
+ Autonomous Systems, and multi-area OSPF.
+
+ In the horizontal hierarchy, a large network is partitioned into
+ multiple smaller, non-overlapping sub-networks. The partitioning
+ criteria can be based on topology, network function, administrative
+ policy, or service domain demarcation. Two networks at the *same*
+ hierarchical level, e.g., two Autonomous Systems in BGP, may share a
+ peer relation with each other through some loose form of coupling.
+ On the other hand, for routing in large networks using multi-area
+ OSPF, abstraction through the aggregation of routing information is
+ achieved through a hierarchical partitioning of the network.
+
+2.2 Survivability Terminology
+
+ In alphabetical order, the following terms are defined in this
+ section:
+
+ backup entity, same as protection entity (section 2.2.2)
+ extra traffic (section 2.2.2)
+ non-revertive mode (section 2.2.2)
+ normalization (section 2.2.2)
+ preemptable traffic, same as extra traffic (section 2.2.2)
+ preemption priority (section 2.2.4)
+ protection (section 2.2.3)
+ protection entity (section 2.2.2)
+ protection switching (section 2.2.3)
+ protection switch time (section 2.2.4)
+ recovery (section 2.2.2)
+ recovery by rerouting, same as restoration (section 2.2.3)
+ recovery entity, same as protection entity (section 2.2.2)
+ restoration (section 2.2.3)
+ restoration priority (section 2.2.4)
+ restoration time (section 2.2.4)
+ revertive mode (section 2.2.2)
+ shared risk group (SRG) (section 2.2.2)
+ survivability (section 2.2.1)
+ working entity (section 2.2.2)
+
+
+
+
+Lai, et. al. Informational [Page 6]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+2.2.1 Survivability
+
+ Survivability is the capability of a network to maintain service
+ continuity in the presence of faults within the network [6].
+ Survivability mechanisms such as protection and restoration are
+ implemented either on a per-link basis, on a per-path basis, or
+ throughout an entire network to alleviate service disruption at
+ affordable costs. The degree of survivability is determined by the
+ network's capability to survive single failures, multiple failures,
+ and equipment failures.
+
+2.2.2 Generic Operations
+
+ This document does not discuss the sequence of events of how network
+ failures are monitored, detected, and mitigated. For more detail of
+ this aspect, see [4]. Also, the repair process following a failure
+ is out of the scope here.
+
+ A working entity is the entity that is used to carry traffic in
+ normal operation mode. Depending upon the context, an entity can be
+ a channel or a transmission link in the physical layer, an Label
+ Switched Path (LSP) in MPLS, or a logical bundle of one or more LSPs.
+
+ A protection entity, also called backup entity or recovery entity, is
+ the entity that is used to carry protected traffic in recovery
+ operation mode, i.e., when the working entity is in error or has
+ failed.
+
+ Extra traffic, also referred to as preemptable traffic, is the
+ traffic carried over the protection entity while the working entity
+ is active. Extra traffic is not protected, i.e., when the protection
+ entity is required to protect the traffic that is being carried over
+ the working entity, the extra traffic is preempted.
+
+ A shared risk group (SRG) is a set of network elements that are
+ collectively impacted by a specific fault or fault type. For
+ example, a shared risk link group (SRLG) is the union of all the
+ links on those fibers that are routed in the same physical conduit in
+ a fiber-span network. This concept includes, besides shared conduit,
+ other types of compromise such as shared fiber cable, shared right of
+ way, shared optical ring, shared office without power sharing, etc.
+ The span of an SRG, such as the length of the sharing for compromised
+ outside plant, needs to be considered on a per fault basis. The
+ concept of SRG can be extended to represent a "risk domain" and its
+ associated capabilities and summarization for traffic engineering
+ purposes. See [7] for further discussion.
+
+
+
+
+
+Lai, et. al. Informational [Page 7]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ Normalization is the sequence of events and actions taken by a
+ network that returns the network to the preferred state upon
+ completing repair of a failure. This could include the switching or
+ rerouting of affected traffic to the original repaired working
+ entities or new routes. Revertive mode refers to the case where
+ traffic is automatically returned to a repaired working entity (also
+ called switch back).
+
+ Recovery is the sequence of events and actions taken by a network
+ after the detection of a failure to maintain the required performance
+ level for existing services (e.g., according to service level
+ agreements) and to allow normalization of the network. The actions
+ include notification of the failure followed by two parallel
+ processes: (1) a repair process with fault isolation and repair of
+ the failed components, and (2) a reconfiguration process using
+ survivability mechanisms to maintain service continuity. In
+ protection, reconfiguration involves switching the affected traffic
+ from a working entity to a protection entity. In restoration,
+ reconfiguration involves path selection and rerouting for the
+ affected traffic.
+
+ Revertive mode is a procedure in which revertive action, i.e., switch
+ back from the protection entity to the working entity, is taken once
+ the failed working entity has been repaired. In non-revertive mode,
+ such action is not taken. To minimize service interruption, switch-
+ back in revertive mode should be performed at a time when there is
+ the least impact on the traffic concerned, or by using the make-
+ before-break concept.
+
+ Non-revertive mode is the case where there is no preferred path or it
+ may be desirable to minimize further disruption of the service
+ brought on by a revertive switching operation. A switch-back to the
+ original working path is not desired or not possible since the
+ original path may no longer exist after the occurrence of a fault on
+ that path.
+
+2.2.3 Survivability Techniques
+
+ Protection, also called protection switching, is a survivability
+ technique based on predetermined failure recovery: as the working
+ entity is established, a protection entity is also established.
+ Protection techniques can be implemented by several architectures:
+ 1+1, 1:1, 1:n, and m:n. In the context of SDH/SONET, they are
+ referred to as Automatic Protection Switching (APS).
+
+ In the 1+1 protection architecture, a protection entity is dedicated
+ to each working entity. The dual-feed mechanism is used whereby the
+ working entity is permanently bridged onto the protection entity at
+
+
+
+Lai, et. al. Informational [Page 8]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ the source of the protected domain. In normal operation mode,
+ identical traffic is transmitted simultaneously on both the working
+ and protection entities. At the other end (sink) of the protected
+ domain, both feeds are monitored for alarms and maintenance signals.
+ A selection between the working and protection entity is made based
+ on some predetermined criteria, such as the transmission performance
+ requirements or defect indication.
+
+ In the 1:1 protection architecture, a protection entity is also
+ dedicated to each working entity. The protected traffic is normally
+ transmitted by the working entity. When the working entity fails,
+ the protected traffic is switched to the protection entity. The two
+ ends of the protected domain must signal detection of the fault and
+ initiate the switchover.
+
+ In the 1:n protection architecture, a dedicated protection entity is
+ shared by n working entities. In this case, not all of the affected
+ traffic may be protected.
+
+ The m:n architecture is a generalization of the 1:n architecture.
+ Typically m <= n, where m dedicated protection entities are shared by
+ n working entities.
+
+ Restoration, also referred to as recovery by rerouting [4], is a
+ survivability technique that establishes new paths or path segments
+ on demand, for restoring affected traffic after the occurrence of a
+ fault. The resources in these alternate paths are the currently
+ unassigned (unreserved) resources in the same layer. Preemption of
+ extra traffic may also be used if spare resources are not available
+ to carry the higher-priority protected traffic. As initiated by
+ detection of a fault on the working path, the selection of a recovery
+ path may be based on preplanned configurations, network routing
+ policies, or current network status such as network topology and
+ fault information. Signaling is used for establishing the new paths
+ to bypass the fault. Thus, restoration involves a path selection
+ process followed by rerouting of the affected traffic from the
+ working entity to the recovery entity.
+
+2.2.4 Survivability Performance
+
+ Protection switch time is the time interval from the occurrence of a
+ network fault until the completion of the protection-switching
+ operations. It includes the detection time necessary to initiate the
+ protection switch, any hold-off time to allow for the interworking of
+ protection schemes, and the switch completion time.
+
+
+
+
+
+
+Lai, et. al. Informational [Page 9]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ Restoration time is the time interval from the occurrence of a
+ network fault to the instant when the affected traffic is either
+ completely restored, or until spare resources are exhausted, and/or
+ no more extra traffic exists that can be preempted to make room.
+
+ Restoration priority is a method of giving preference to protect
+ higher-priority traffic ahead of lower-priority traffic. Its use is
+ to help determine the order of restoring traffic after a failure has
+ occurred. The purpose is to differentiate service restoration time
+ as well as to control access to available spare capacity for
+ different classes of traffic.
+
+ Preemption priority is a method of determining which traffic can be
+ disconnected in the event that not all traffic with a higher
+ restoration priority is restored after the occurrence of a failure.
+
+2.3 Survivability Mechanisms: Comparison
+
+ In a survivable network design, spare capacity and diversity must be
+ built into the network from the beginning to support some degree of
+ self-healing whenever failures occur. A common strategy is to
+ associate each working entity with a protection entity having either
+ dedicated resources or shared resources that are pre-reserved or
+ reserved-on-demand. According to the methods of setting up a
+ protection entity, different approaches to providing survivability
+ can be classified. Generally, protection techniques are based on
+ having a dedicated protection entity set up prior to failure. Such
+ is not the case in restoration techniques, which mainly rely on the
+ use of spare capacity in the network. Hence, in terms of trade-offs,
+ protection techniques usually offer fast recovery from failure with
+ enhanced availability, while restoration techniques usually achieve
+ better resource utilization.
+
+ A 1+1 protection architecture is rather expensive since resource
+ duplication is required for the working and protection entities. It
+ is generally used for specific services that need a very high
+ availability.
+
+ A 1:1 architecture is inherently slower in recovering from failure
+ than a 1+1 architecture since communication between both ends of the
+ protection domain is required to perform the switch-over operation.
+ An advantage is that the protection entity can optionally be used to
+ carry low-priority extra traffic in normal operation, if traffic
+ preemption is allowed. Packet networks can pre-establish a
+ protection path for later use with pre-planned but not pre-reserved
+ capacity. That is, if no packets are sent onto a protection path,
+
+
+
+
+
+Lai, et. al. Informational [Page 10]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ then no bandwidth is consumed. This is not the case in transmission
+ networks like optical or TDM where path establishment and resource
+ reservation cannot be decoupled.
+
+ In the 1:n protection architecture, traffic is normally sent on the
+ working entities. When multiple working entities have failed
+ simultaneously, only one of them can be restored by the common
+ protection entity. This contention could be resolved by assigning a
+ different preemptive priority to each working entity. As in the 1:1
+ case, the protection entity can optionally be used to carry
+ preemptable traffic in normal operation.
+
+ While the m:n architecture can improve system availability with small
+ cost increases, it has rarely been implemented or standardized.
+
+ When compared with protection mechanisms, restoration mechanisms are
+ generally more frugal as no resources are committed until after the
+ fault occurs and the location of the fault is known. However,
+ restoration mechanisms are inherently slower, since more must be done
+ following the detection of a fault. Also, the time it takes for the
+ dynamic selection and establishment of alternate paths may vary,
+ depending on the amount of traffic and connections to be restored,
+ and is influenced by the network topology, technology employed, and
+ the type and severity of the fault. As a result, restoration time
+ tends to be more variable than the protection switch time needed with
+ pre-selected protection entities. Hence, in using restoration
+ mechanisms, it is essential to use restoration priority to ensure
+ that service objectives are met cost-effectively.
+
+ Once the network routing algorithms have converged after a fault, it
+ may be preferable in some cases, to reoptimize the network by
+ performing a reroute based on the current state of the network and
+ network policies.
+
+3. Survivability
+
+3.1 Scope
+
+ Interoperable approaches to network survivability were determined to
+ be an immediate requirement in packet networks as well as in
+ SDH/SONET framed TDM networks. Not as pressing at this time were
+ techniques that would cover all-optical networks (e.g., where framing
+ is unknown), as the control of these networks in a multi-vendor
+ environment appeared to have some other hurdles to first deal with.
+ Also, not of immediate interest were approaches to coordinate or
+ explicitly communicate survivability mechanisms across network layers
+ (such as from a TDM or optical network to/from an IP network).
+ However, a capability should be provided for a network operator to
+
+
+
+Lai, et. al. Informational [Page 11]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ perform fault notification and to control the operation of
+ survivability mechanisms among different layers. This may require
+ the development of corresponding OAM functionality. However, such
+ issues and those related to OAM are currently outside the scope of
+ this document. (For proposed MPLS OAM requirements, see [8, 9]).
+
+ The initial scope is to address only "backhoe failures" in the
+ inter-office connections of a service provider network. A link
+ connection in the router layer is typically comprised of multiple
+ spans in the lower layers. Therefore, the types of network failures
+ that cause a recovery to be performed include link/span failures.
+ However, linecard and node failures may not need to be treated any
+ differently than their respective link/span failures, as a router
+ failure may be represented as a set of simultaneous link failures.
+
+ Depending on the actual network configuration, drop-side interface
+ (e.g., between a customer and an access router, or between a router
+ and an optical cross-connect) may be considered either inter-domain
+ or inter-layer. Another inter-domain scenario is the use of intra-
+ office links for interconnecting a metro network and a core network,
+ with both networks being administered by the same service provider.
+ Failures at such interfaces may be similarly protected by the
+ mechanisms of this section.
+
+ Other more complex failure mechanisms such as systematic control-
+ plane failure, configuration error, or breach of security are not
+ within the scope of the survivability mechanisms discussed in this
+ document. Network impairment such as congestion that results in
+ lower throughput are also not covered.
+
+3.2 Required initial set of survivability mechanisms
+
+3.2.1 1:1 Path Protection with Pre-Established Capacity
+
+ In this protection mode, the head end of a working connection
+ establishes a protection connection to the destination. There should
+ be the ability to maintain relative restoration priorities between
+ working and protection connections, as well as between different
+ classes of protection connections.
+
+ In normal operation, traffic is only sent on the working connection,
+ though the ability to signal that traffic will be sent on both
+ connections (1+1 Path for signaling purposes) would be valuable in
+ non-packet networks. Some distinction between working and protection
+ connections is likely, either through explicit objects, or preferably
+ through implicit methods such as general classes or priorities. Head
+ ends need the ability to create connections that are as failure
+ disjoint as possible from each other. This requires SRG information
+
+
+
+Lai, et. al. Informational [Page 12]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ that can be generally assigned to either nodes or links and
+ propagated through the control or management plane. In this
+ mechanism, capacity in the protection connection is pre-established,
+ however it should be capable of carrying preemptable extra traffic in
+ non-packet networks. When protection capacity is called into service
+ during recovery, there should be the ability to promote the
+ protection connection to working status (for non-revertive mode
+ operation) with some form of make-before-break capability.
+
+3.2.2 1:1 Path Protection with Pre-Planned Capacity
+
+ Similar to the above 1:1 protection with pre-established capacity,
+ the protection connection in this case is also pre-signaled. The
+ difference is in the way protection capacity is assigned. With pre-
+ planned capacity, the mechanism supports the ability for the
+ protection capacity to be shared, or "double-booked". Operators need
+ the ability to provision different amounts of protection capacity
+ according to expected failure modes and service level agreements.
+ Thus, an operator may wish to provision sufficient restoration
+ capacity to handle a single failure affecting all connections in an
+ SRG, or may wish to provision less or more restoration capacity.
+ Mechanisms should be provided to allow restoration capacity on each
+ link to be shared by SRG-disjoint failures. In a sense, this is 1:1
+ from a path perspective; however, the protection capacity in the
+ network (on a link by link basis) is shared in a 1:n fashion, e.g.,
+ see the proposals in [10, 11]. If capacity is planned but not
+ allocated, some form of signaling could be required before traffic
+ may be sent on protection connections, especially in TDM networks.
+
+ The use of this approach improves network resource utilization, but
+ may require more careful planning. So, initial deployment might be
+ based on 1:1 path protection with pre-established capacity and the
+ local restoration mechanism to be described next.
+
+3.2.3 Local Restoration
+
+ Due to the time impact of signal propagation, dynamic recovery of an
+ entire path may not meet the service requirements of some networks.
+ The solution to this is to restore connectivity of the link or span
+ in immediate proximity to the fault, e.g., see the proposals in [12,
+ 13]. At a minimum, this approach should be able to protect against
+ connectivity-type SRGs, though protecting against node-based SRGs
+ might be worthwhile. Also, this approach is applicable to support
+ restoration on the inter-domain and inter-layer interconnection
+ scenarios using intra-office links as described in the Scope Section.
+
+
+
+
+
+
+Lai, et. al. Informational [Page 13]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ Head end systems must have some control as to whether their
+ connections are candidates for or excluded from local restoration.
+ For example, best-effort and preemptable traffic may be excluded from
+ local restoration; they only get restored if there is bandwidth
+ available. This type of control may require the definition of an
+ object in signaling.
+
+ Since local restoration may be suboptimal, a means for head end
+ systems to later perform path-level re-grooming must be supported for
+ this approach.
+
+3.2.4 Path Restoration
+
+ In this approach, connections that are impacted by a fault are
+ rerouted by the originating network element upon notification of
+ connection failure. Such a source-based approach is efficient for
+ network resources, but typically takes longer to accomplish
+ restoration. It does not involve any new mechanisms. It merely is a
+ mention of another common approach to protecting against faults in a
+ network.
+
+3.3 Applications Supported
+
+ With service continuity under failure as a goal, a network is
+ "survivable" if, in the face of a network failure, connectivity is
+ interrupted for a "brief" period and then recovered before the
+ network failure ends. The length of this interrupted period is
+ dependent upon the application supported. Here are some typical
+ applications and considerations that drive the requirements for an
+ acceptable protection switch time or restoration time:
+
+ - Best-effort data: recovery of network connectivity by rerouting at
+ the IP layer would be sufficient
+ - Premium data service: need to meet TCP timeout or application
+ protocol timer requirements
+ - Voice: call cutoff is in the range of 140 msec to 2 sec (the time
+ that a person waits after interruption of the speech path before
+ hanging up or the time that a telephone switch will disconnect a
+ call)
+ - Other real-time service (e.g., streaming, fax) where an
+ interruption would cause the session to terminate
+ - Mission-critical applications that cannot tolerate even brief
+ interruptions, for example, real-time financial transactions
+
+
+
+
+
+
+
+
+Lai, et. al. Informational [Page 14]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+3.4 Timing Bounds for Survivability Mechanisms
+
+ The approach to picking the types of survivability mechanisms
+ recommended was to consider a spectrum of mechanisms that can be used
+ to protect traffic with varying characteristics of survivability and
+ speed of protection/restoration, and then attempt to select a few
+ general points that provide some coverage across that spectrum. The
+ focus of this work is to provide requirements to which a small set of
+ detailed proposals may be developed, allowing the operator some
+ (limited) flexibility in approaches to meeting their design goals in
+ engineering multi-vendor networks. Requirements of different
+ applications as listed in the previous sub-section were discussed
+ generally, however none on the team would likely attest to the
+ scientific merit of the ability of the timing bounds below to meet
+ any specific application's needs. A few assumptions include:
+
+ 1. Approaches in which protection switch without propagation of
+ information are likely to be faster than those that do require
+ some form of fault notification to some or all elements in a
+ network.
+
+ 2. Approaches that require some form of signaling after a fault will
+ also likely suffer some timing impact.
+
+ Proposed timing bounds for different survivability mechanisms are as
+ follows (all bounds are exclusive of signal propagation):
+
+ 1:1 path protection with pre-established capacity: 100-500 ms
+ 1:1 path protection with pre-planned capacity: 100-750 ms
+ Local restoration: 50 ms
+ Path restoration: 1-5 seconds
+
+ To ensure that the service requirements for different applications
+ can be met within the above timing bounds, restoration priority must
+ be implemented to determine the order in which connections are
+ restored (to minimize service restoration time as well as to gain
+ access to available spare capacity on the best paths). For example,
+ mission critical applications may require high restoration priority.
+ At the fiber layer, instead of specific applications, it may be
+ possible that priority be given to certain classifications of
+ customers with their traffic types enclosed within the customer
+ aggregate. Preemption priority should only be used in the event that
+ not all connections can be restored, in which case connections with
+ lower preemption priority should be released. Depending on a service
+ provider's strategy in provisioning network resources for backup,
+ preemption may or may not be needed in the network.
+
+
+
+
+
+Lai, et. al. Informational [Page 15]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+3.5 Coordination Among Layers
+
+ A common design goal for networks with multiple technological layers
+ is to provide the desired level of service in the most cost-effective
+ manner. Multilayer survivability may allow the optimization of spare
+ resources through the improvement of resource utilization by sharing
+ spare capacity across different layers, though further investigations
+ are needed. Coordination during recovery among different network
+ layers (e.g., IP, SDH/SONET, optical layer) might necessitate
+ development of vertical hierarchy. The benefits of providing
+ survivability mechanisms at multiple layers, and the optimization of
+ the overall approach, must be weighed with the associated cost and
+ service impacts.
+
+ A default coordination mechanism for inter-layer interaction could be
+ the use of nested timers and current SDH/SONET fault monitoring, as
+ has been done traditionally for backward compatibility. Thus, when
+ lower-layer recovery happens in a longer time period than higher-
+ layer recovery, a hold-off timer is utilized to avoid contention
+ between the different single-layer survivability schemes. In other
+ words, multilayer interaction is addressed by having successively
+ higher multiplexing levels operate at a protection/restoration time
+ scale greater than the next lowest layer. This can impact the
+ overall time to recover service. For example, if SDH/SONET
+ protection switching is used, MPLS recovery timers must wait until
+ SDH/SONET has had time to switch. Setting such timers involves a
+ tradeoff between rapid recovery and creation of a race condition
+ where multiple layers are responding to the same fault, potentially
+ allocating resources in an inefficient manner.
+
+ In other configurations where the lower layer does not have a
+ restoration capability or is not expected to protect, say an
+ unprotected SDH/SONET linear circuit, then there must be a mechanism
+ for the lower layer to trigger the higher layer to take recovery
+ actions immediately. This difference in network configuration means
+ that implementations must allow for adjustment of hold-off timer
+ values and/or a means for a lower layer to immediately indicate to a
+ higher layer that a fault has occurred so that the higher layer can
+ take restoration or protection actions.
+
+ Furthermore, faults at higher layers should not trigger restoration
+ or protection actions at lower layers [3, 4].
+
+ It was felt that the current approach to coordination of
+ survivability approaches currently did not have significant
+ operational shortfalls. These approaches include protecting traffic
+ solely at one layer (e.g., at the IP layer over linear WDM, or at the
+ SDH/SONET layer). Where survivability mechanisms might be deployed
+
+
+
+Lai, et. al. Informational [Page 16]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ at several layers, such as when a routed network rides a SDH/SONET
+ protected network, it was felt that current coordination approaches
+ were sufficient in many cases. One exception is the hold-off of MPLS
+ recovery until the completion of SDH/SONET protection switching as
+ described above. This limits the recovery time of fast MPLS
+ restoration. Also, by design, the operations and mechanisms within a
+ given layer tend to be invisible to other layers.
+
+3.6 Evolution Toward IP Over Optical
+
+ As more pressing requirements for survivability and horizontal
+ hierarchy for edge-to-edge signaling are met with technical
+ proposals, it is believed that the benefits of merging (in some
+ manner) the control planes of multiple layers will be outlined. When
+ these benefits are self-evident, it would then seem to be the right
+ time to review whether vertical hierarchy mechanisms are needed, and
+ what the requirements might be. For example, a future requirement
+ might be to provide a better match between the recovery requirements
+ of IP networks with the recovery capability of optical transport.
+ One such proposal is described in [14].
+
+4. Hierarchy Requirements
+
+ Efforts in the area of network hierarchy should focus on mechanisms
+ that would allow more scalable edge-to-edge signaling, or signaling
+ across networks with existing network hierarchy (such as multi-area
+ OSPF). This appears to be a more urgent need than mechanisms that
+ might be needed to interconnect networks at different layers.
+
+4.1 Historical Context
+
+ One reason for horizontal hierarchy is functionality (e.g., metro
+ versus backbone). Geographic "islands" or partitions reduce the need
+ for interoperability and make administration and operations less
+ complex. Using a simpler, more interoperable, survivability scheme
+ at metro/backbone boundaries is natural for many provider network
+ architectures. In transmission networks, creating geographic islands
+ of different vendor equipment has been done for a long time because
+ multi-vendor interoperability has been difficult to achieve.
+ Traditionally, providers have to coordinate the equipment on either
+ end of a "connection," and making this interoperable reduces
+ complexity. A provider should be able to concatenate survivability
+ mechanisms in order to provide a "protected link" to the next higher
+ level. Think of SDH/SONET rings connecting to TDM DXCs with 1+1
+ line-layer protection between the ADM and the DXC port. The TDM
+ connection, e.g., a DS3, is protected but usually all equipment on
+ each SDH/SONET ring is from a single vendor. The DXC cross
+ connections are controlled by the provider and the ports are
+
+
+
+Lai, et. al. Informational [Page 17]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ physically protected resulting in a highly available design. Thus,
+ concatenation of survivability approaches can be used to cascade
+ across a horizontal hierarchy. While not perfect, it is workable in
+ the near to mid-term until multi-vendor interoperability is achieved.
+
+ While the problems associated with multi-vendor interoperability may
+ necessitate horizontal hierarchy as a practical matter in the near to
+ mid-term (at least this has been the case in TDM networks), there
+ should not be a technical reason for it in the standards developed by
+ the IETF for core networks, or even most access networks.
+ Establishing interoperability of survivability mechanisms between
+ multi-vendor equipment in core IP networks is urgently required to
+ enable adoption of IP as a viable core transport technology and to
+ facilitate the traffic engineering of future multi-service IP
+ networks [3].
+
+ Some of the largest service provider networks currently run a single
+ area/level IGP. Some service providers, as well as many large
+ enterprise networks, run multi-area Open Shortest Path First (OSPF)
+ to gain increases in scalability. Often, this was from an original
+ design, so it is difficult to say if the network truly required the
+ hierarchy to reach its current size.
+
+ Some proposals on improved mechanisms to address network hierarchy
+ have been suggested [15, 16, 17, 18, 19]. This document aims to
+ provide the concrete requirements so that these and other proposals
+ can first aim to meet some limited objectives.
+
+4.2 Applications for Horizontal Hierarchy
+
+ A primary driver for intra-domain horizontal hierarchy is signaling
+ capabilities in the context of edge-to-edge VPNs, potentially across
+ traffic-engineered data networks. There are a number of different
+ approaches to layer 2 and layer 3 VPNs and they are currently being
+ addressed by different emerging protocols in the provider-provisioned
+ VPNs (e.g., virtual routers) and Pseudo Wire Edge-to-Edge Emulation
+ (PWE3) efforts based on either MPLS and/or IP tunnels. These may or
+ may not need explicit signaling from edge to edge, but it is a common
+ perception that in order to meet SLAs, some form of edge-to-edge
+ signaling may be required.
+
+ With a large number of edges (N), scalability is concerned with
+ avoiding the O(N^2) properties of edge-to-edge signaling. However,
+ the main issue here is not with the scalability of large amounts of
+ signaling, such as in O(N^2) meshes with a "connection" between every
+ edge-pair. This is because, even if establishing and maintaining
+ connections is feasible in a large network, there might be an impact
+ on core survivability mechanisms which would cause
+
+
+
+Lai, et. al. Informational [Page 18]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ protection/restoration times to grow with N^2, which would be
+ undesirable. While some value of N may be inevitable, approaches to
+ reduce N (e.g. to pull in from the edge to aggregation points) might
+ be of value.
+
+ Thus, most service providers feel that O(N^2) meshes are not
+ necessary for VPNs, and that the number of tunnels to support VPNs
+ would be within the scalability bounds of current protocols and
+ implementations. That may be the case, as there is currently a lack
+ of ability to signal MPLS tunnels from edge to edge across IGP
+ hierarchy, such as OSPF areas. This may require the development of
+ signaling standards that support dynamic establishment and
+ potentially the restoration of LSPs across a 2-level IGP hierarchy.
+
+ For routing scalability, especially in data applications, a major
+ concern is the amount of processing/state that is required in the
+ variety of network elements. If some nodes might not be able to
+ communicate and process the state of every other node, it might be
+ preferable to limit the information. There is one school of thought
+ that says that the amount of information contained by a horizontal
+ barrier should be significant, and that impacts this might have on
+ optimality in route selection and ability to provide global
+ survivability are accepted tradeoffs.
+
+4.3 Horizontal Hierarchy Requirements
+
+ Mechanisms are required to allow for edge-to-edge signaling of
+ connections through a network. One network scenario includes medium
+ to large networks that currently have hierarchical interior routing
+ such as multi-area OSPF or multi-level Intermediate System to
+ Intermediate System (IS-IS). The primary context of this is edge-
+ to-edge signaling, which is thought to be required to assure the SLAs
+ for the layer 2 and layer 3 VPNs that are being carried across the
+ network. Another possible context would be edge-to-edge signaling in
+ TDM SDH/SONET networks with IP control, where metro and core networks
+ again might be in a hierarchical interior routing domain.
+
+ To support edge-to-edge signaling in the above network scenarios
+ within the framework of existing horizontal hierarchies, current
+ traffic engineering (TE) methods [20, 6] may need to be extended.
+ Requirements for multi-area TE need to be developed to provide
+ guidance for any necessary protocol extensions.
+
+5. Survivability and Hierarchy
+
+ When horizontal hierarchy exists in a network technology layer, a
+ question arises as to how survivability can be provided along a
+ connection that crosses hierarchical boundaries.
+
+
+
+Lai, et. al. Informational [Page 19]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ In designing protocols to meet the requirements of hierarchy, an
+ approach to consider is that boundaries are either clean, or are of
+ minimal value. However, the concept of network elements that
+ participate on both sides of a boundary might be a consideration
+ (e.g., OSPF ABRs). That would allow for devices on either side to
+ take an intra-area approach within their region of knowledge, and for
+ the ABR to do this in both areas, and splice the two protected
+ connections together at a common point (granted it is a common point
+ of failure now). If the limitations of this approach start to appear
+ in operational settings, then perhaps it would be time to start
+ thinking about route-servers and signaling propagated directives.
+ However, one initial approach might be to signal through a common
+ border router, and to consider the service as protected as it
+ consists of a concatenated set of connections which are each
+ protected within their area. Another approach might be to have a
+ least common denominator mechanism at the boundary, e.g., 1+1 port
+ protection. There should also be some standardized means for a
+ survivability scheme on one side of such a boundary to communicate
+ with the scheme on the other side regarding the success or failure of
+ the recovery action. For example, if a part of a "connection" is
+ down on one side of such a boundary, there is no need for the other
+ side to recover from failures.
+
+ In summary, at this time, approaches as described above that allow
+ concatenation of survivability schemes across hierarchical boundaries
+ seem sufficient.
+
+6. Security Considerations
+
+ The set of SRGs that are defined for a network under a common
+ administrative control and the corresponding assignment of these SRGs
+ to nodes and links within the administrative control is sensitive
+ information and needs to be protected. An SRG is an acknowledgement
+ that nodes and links that belong to an SRG are susceptible to a
+ common threat. An adversary with access to information contained in
+ an SRG could use that information to design an attack, determine the
+ scope of damage caused by the attack and, therefore, be used to
+ maximize the effect of an attack.
+
+ The label used to refer to a particular SRG must allow for an
+ encoding such that sensitive information such as physical location,
+ function, purpose, customer, fault type, etc. is not readily
+ discernable by unauthorized users.
+
+ SRG information that is propagated through the control and management
+ plane should allow for an encryption mechanism. An example of an
+ approach would be to use IPSEC [21] on all packets carrying SRG
+ information.
+
+
+
+Lai, et. al. Informational [Page 20]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+7. References
+
+ [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP
+ 9, RFC 2026, October 1996.
+
+ [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [3] K. Owens, V. Sharma, and M. Oommen, "Network Survivability
+ Considerations for Traffic Engineered IP Networks", Work in
+ Progress.
+
+ [4] V. Sharma, B. Crane, S. Makam, K. Owens, C. Huang, F.
+ Hellstrand, J. Weil, L. Andersson, B. Jamoussi, B. Cain, S.
+ Civanlar, and A. Chiu, "Framework for MPLS-based Recovery", Work
+ in Progress.
+
+ [5] M. Thorup, "Fortifying OSPF/ISIS Against Link Failure",
+ http://www.research.att.com/~mthorup/PAPERS/lf_ospf.ps
+
+ [6] Awduche, D., Chiu, A., Elwalid, A., Widjaja, I. and X. Xiao,
+ "Overview and Principles of Internet Traffic Engineering", RFC
+ 3272, May 2002.
+
+ [7] S. Dharanikota, R. Jain, D. Papadimitriou, R. Hartani, G.
+ Bernstein, V. Sharma, C. Brownmiller, Y. Xue, and J. Strand,
+ "Inter-domain routing with Shared Risk Groups", Work in
+ Progress.
+
+ [8] N. Harrison, P. Willis, S. Davari, E. Cuevas, B. Mack-Crane, E.
+ Franze, H. Ohta, T. So, S. Goldfless, and F. Chen, "Requirements
+ for OAM in MPLS Networks," Work in Progress.
+
+ [9] D. Allan and M. Azad, "A Framework for MPLS User Plane OAM,"
+ Work in Progress.
+
+ [10] S. Kini, M. Kodialam, T.V. Lakshman, S. Sengupta, and C.
+ Villamizar, "Shared Backup Label Switched Path Restoration,"
+ Work in Progress.
+
+ [11] G. Li, C. Kalmanek, J. Yates, G. Bernstein, F. Liaw, and V.
+ Sharma, "RSVP-TE Extensions For Shared-Mesh Restoration in
+ Transport Networks", Work in Progress.
+
+ [12] P. Pan (Editor), D.H. Gan, G. Swallow, J. Vasseur, D. Cooper, A.
+ Atlas, and M. Jork, "Fast Reroute Extensions to RSVP-TE for LSP
+ Tunnels", Work in Progress.
+
+
+
+
+Lai, et. al. Informational [Page 21]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ [13] A. Atlas, C. Villamizar, and C. Litvanyi, "MPLS RSVP-TE
+ Interoperability for Local Protection/Fast Reroute", Work in
+ Progress.
+
+ [14] A. Chiu and J. Strand, "Joint IP/Optical Layer Restoration after
+ a Router Failure", Proc. OFC'2001, Anaheim, CA, March 2001.
+
+ [15] K. Kompella and Y. Rekhter, "Multi-area MPLS Traffic
+ Engineering", Work in Progress.
+
+ [16] G. Ash, et. al., "Requirements for Multi-Area TE", Work in
+ Progress.
+
+ [17] A. Iwata, N. Fujita, G.R. Ash, and A. Farrel, "Crankback Routing
+ Extensions for MPLS Signaling", Work in Progress.
+
+ [18] C-Y Lee, A. Celer, N. Gammage, S. Ghanti, G. Ash, "Distributed
+ Route Exchangers", Work in Progress.
+
+ [19] C-Y Lee and S. Ghanti, "Path Request and Path Reply Message",
+ Work in Progress.
+
+ [20] Awduche, D., Malcolm, J., Agogbua, J., O'Dell, M. and J.
+ McManus, "Requirements for Traffic Engineering Over MPLS", RFC
+ 2702, September 1999.
+
+ [21] Kent, S. and R. Atkinson, "Security Architecture for the
+ Internet Protocol", RFC 2401, November 1998.
+
+8. Acknowledgments
+
+ A lot of the direction taken in this document, and by the team in its
+ initial effort was steered by the insightful questions provided by
+ Bala Rajagoplan, Greg Bernstein, Yangguang Xu, and Avri Doria. The
+ set of questions is attached as Appendix A in this document.
+
+ After the release of the first draft, a number of comments were
+ received. Thanks to the inputs from Jerry Ash, Sudheer Dharanikota,
+ Chuck Kalmanek, Dan Koller, Lyndon Ong, Steve Plote, and Yong Xue.
+
+9. Contributing Authors
+
+ Jim Boyle (PDNets), Rob Coltun (Movaz), Tim Griffin (AT&T), Ed Kern,
+ Tom Reddington (Lucent) and Malin Carlzon.
+
+
+
+
+
+
+
+Lai, et. al. Informational [Page 22]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+Appendix A: Questions used to help develop requirements
+
+ A. Definitions
+
+ 1. In determining the specific requirements, the design team should
+ precisely define the concepts "survivability", "restoration",
+ "protection", "protection switching", "recovery", "re-routing"
+ etc. and their relations. This would enable the requirements doc
+ to describe precisely which of these will be addressed. In the
+ following, the term "restoration" is used to indicate the broad
+ set of policies and mechanisms used to ensure survivability.
+
+ B. Network types and protection modes
+
+ 1. What is the scope of the requirements with regard to the types of
+ networks covered? Specifically, are the following in scope:
+
+ Restoration of connections in mesh optical networks (opaque or
+ transparent)
+ Restoration of connections in hybrid mesh-ring networks
+ Restoration of LSPs in MPLS networks (composed of LSRs overlaid on
+ a transport network, e.g., optical)
+ Any other types of networks?
+ Is commonality of approach, or optimization of approach more
+ important?
+
+ 2. What are the requirements with regard to the protection modes to
+ be supported in each network type covered? (Examples of protection
+ modes include 1+1, M:N, shared mesh, UPSR, BLSR, newly defined
+ modes such as P-cycles, etc.)
+
+ 3. What are the requirements on local span (i.e., link by link)
+ protection and end-to-end protection, and the interaction between
+ them? E.g.: what should be the granularity of connections for
+ each type (single connection, bundle of connections, etc).
+
+ C. Hierarchy
+
+ 1. Vertical (between two network layers):
+ What are the requirements for the interaction between restoration
+ procedures across two network layers, when these features are
+ offered in both layers? (Example, MPLS network realized over pt-
+ to-pt optical connections.) Under such a case,
+
+ (a) Are there any criteria to choose which layer should provide
+ protection?
+
+
+
+
+
+Lai, et. al. Informational [Page 23]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ (b) If both layers provide survivability features, what are the
+ requirements to coordinate these mechanisms?
+
+ (c) How is lack of current functionality of cross-layer
+ coordination currently hampering operations?
+
+ (d) Would the benefits be worth additional complexity associated
+ with routing isolation (e.g. VPN, areas), security, address
+ isolation and policy / authentication processes?
+
+ 2. Horizontal (between two areas or administrative subdivisions
+ within the same network layer):
+
+ (a) What are the criteria that trigger the creation of protocol or
+ administrative boundaries pertaining to restoration? (e.g.,
+ scalability? multi-vendor interoperability? what are the
+ practical issues?) multi-provider? Should multi-vendor
+ necessitate hierarchical separation?
+
+ When such boundaries are defined:
+
+ (b) What are the requirements on how protection/restoration is
+ performed end-to-end across such boundaries?
+
+ (c) If different restoration mechanisms are implemented on two
+ sides of a boundary, what are the requirements on their
+ interaction?
+
+ What is the primary driver of horizontal hierarchy? (select one)
+ - functionality (e.g. metro -v- backbone)
+ - routing scalability
+ - signaling scalability
+ - current network architecture, trying to layer on TE on top
+ of an already hierarchical network architecture
+ - routing and signalling
+
+ For signalling scalability, is it
+ - manageability
+ - processing/state of network
+ - edge-to-edge N^2 type issue
+
+ For routing scalability, is it
+ - processing/state of network
+ - are you flat and want to go hierarchical
+ - or already hierarchical?
+ - data or TDM application?
+
+
+
+
+
+Lai, et. al. Informational [Page 24]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+ D. Policy
+
+ 1. What are the requirements for policy support during
+ protection/restoration, e.g., restoration priority, preemption,
+ etc.
+
+ E. Signaling Mechanisms
+
+ 1. What are the requirements on the signaling transport mechanism
+ (e.g., in-band over SDH/SONET overhead bytes, out-of-band over an
+ IP network, etc.) used to communicate restoration protocol
+ messages between network elements? What are the bandwidth and
+ other requirements on the signaling channels?
+
+ 2. What are the requirements on fault detection/localization
+ mechanisms (which is the prelude to performing restoration
+ procedures) in the case of opaque and transparent optical
+ networks? What are the requirements in the case of MPLS
+ restoration?
+
+ 3. What are the requirements on signaling protocols to be used in
+ restoration procedures (e.g., high priority processing, security,
+ etc)?
+
+ 4. Are there any requirements on the operation of restoration
+ protocols?
+
+ F. Quantitative
+
+ 1. What are the quantitative requirements (e.g., latency) for
+ completing restoration under different protection modes (for both
+ local and end-to-end protection)?
+
+ G. Management
+
+ 1. What information should be measured/maintained by the control
+ plane at each network element pertaining to restoration events?
+
+ 2. What are the requirements for the correlation between control
+ plane and data plane failures from the restoration point of view?
+
+
+
+
+
+
+
+
+
+
+
+Lai, et. al. Informational [Page 25]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+Editors' Addresses
+
+ Wai Sum Lai
+ AT&T
+ 200 Laurel Avenue
+ Middletown, NJ 07748, USA
+
+ Phone: +1 732-420-3712
+ EMail: wlai@att.com
+
+
+ Dave McDysan
+ WorldCom
+ 22001 Loudoun County Pkwy
+ Ashburn, VA 20147, USA
+
+ EMail: dave.mcdysan@wcom.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lai, et. al. Informational [Page 26]
+
+RFC 3386 Hierarchy & Multilayer Survivability November 2002
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2002). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assigns.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is currently provided by the
+ Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Lai, et. al. Informational [Page 27]
+