1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
|
Network Working Group D. Li
Request for Comments: 5495 J. Gao
Category: Informational Huawei
A. Satyanarayana
Cisco
S. Bardalai
Fujitsu
March 2009
Description of the
Resource Reservation Protocol - Traffic-Engineered (RSVP-TE)
Graceful Restart Procedures
Status of This Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document.
This document may contain material from IETF Documents or IETF
Contributions published or made publicly available before November
10, 2008. The person(s) controlling the copyright in some of this
material may not have granted the IETF Trust the right to allow
modifications of such material outside the IETF Standards Process.
Without obtaining an adequate license from the person(s) controlling
the copyright in such materials, this document may not be modified
outside the IETF Standards Process, and derivative works of it may
not be created outside the IETF Standards Process, except to format
it for publication as an RFC or to translate it into languages other
than English.
Li, et al. Informational [Page 1]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
Abstract
The Hello message for the Resource Reservation Protocol (RSVP) has
been defined to establish and maintain basic signaling node
adjacencies for Label Switching Routers (LSRs) participating in a
Multiprotocol Label Switching (MPLS) traffic-engineered (TE) network.
The Hello message has been extended for use in Generalized MPLS
(GMPLS) networks for state recovery of control channel or nodal
faults.
The GMPLS protocol definitions for RSVP also allow a restarting node
to learn which label it previously allocated for use on a Label
Switched Path (LSP).
Further RSVP protocol extensions have been defined to enable a
restarting node to recover full control plane state by exchanging
RSVP messages with its upstream and downstream neighbors.
This document provides an informational clarification of the control
plane procedures for a GMPLS network when there are multiple node
failures, and describes how full control plane state can be recovered
in different scenarios where the order in which the nodes restart is
different.
This document does not define any new processes or procedures. All
protocol mechanisms are already defined in the referenced documents.
Li, et al. Informational [Page 2]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
Table of Contents
1. Introduction ....................................................3
2. Existing Procedures for Single Node Restart .....................4
2.1. Procedures Defined in RFC 3473 .............................4
2.2. Procedures Defined in RFC 5063 .............................5
3. Multiple Node Restart Scenarios .................................6
4. RSVP State ......................................................7
5. Procedures for Multiple Node Restart ............................7
5.1. Procedures for the Normal Node .............................8
5.2. Procedures for the Restarting Node .........................8
5.2.1. Procedures for Scenario 1 ...........................8
5.2.2. Procedures for Scenario 2 ...........................9
5.2.3. Procedures for Scenario 3 ..........................11
5.2.4. Procedures for Scenario 4 ..........................12
5.2.5. Procedures for Scenario 5 ..........................12
5.3. Consideration of the Reuse of Data Plane Resources ........12
5.4. Consideration of Management Plane Intervention ............13
6. Clarification of Restarting Node Procedure .....................13
7. Security Considerations ........................................15
8. Acknowledgments ................................................16
9. References .....................................................17
9.1. Normative References ......................................17
9.2. Informative References ....................................17
1. Introduction
The Hello message for the Resource Reservation Protocol (RSVP) has
been defined to establish and maintain basic signaling node
adjacencies for Label Switching Routers (LSRs) participating in a
Multiprotocol Label Switching (MPLS) traffic-engineered (TE) network
[RFC3209]. The Hello message has been extended for use in
Generalized MPLS (GMPLS) networks for state recovery of control
channel or nodal faults through the exchange of the Restart_Cap
Object [RFC3473].
The GMPLS protocol definitions for RSVP [RFC3473] also allow a
restarting node to learn which label it previously allocated for use
on a Label Switched Path (LSP) through the Recovery_Label Object
carried on a Path message sent to a restarting node from its upstream
neighbor.
Further RSVP protocol extensions have been defined [RFC5063] to
perform graceful restart and to enable a restarting node to recover
full control plane state by exchanging RSVP messages with its
upstream and downstream neighbors. State previously transmitted to
the upstream neighbor (principally, the downstream label) is
recovered from the upstream neighbor on a Path message (using the
Li, et al. Informational [Page 3]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
Recovery_Label Object as described in [RFC3473]). State previously
transmitted to the downstream neighbor (including the upstream label,
interface identifiers, and the explicit route) is recovered from the
downstream neighbor using a RecoveryPath message.
[RFC5063] also extends the Hello message to exchange information
about the ability to support the RecoveryPath message.
The examples and procedures in [RFC3473] and [RFC5063] focus on the
description of a single node restart when adjacent network nodes are
operative. Although the procedures are equally applicable to multi-
node restarts, no detailed explanation is provided for such a case.
This document provides an informational clarification of the control
plane procedures for a GMPLS network when there are multiple node
failures, and describes how full control plane state can be recovered
in different scenarios where the order in which the nodes restart is
different.
This document does not define any new processes or procedures. All
protocol mechanisms already defined in [RFC3473] and [RFC5063] are
definitive.
2. Existing Procedures for Single Node Restart
This section documents for information the existing procedures
defined in [RFC3473] and [RFC5063]. Those documents are definitive,
and the description here is non-normative. It is provided for
informational clarification only.
2.1. Procedures Defined in RFC 3473
In the case of nodal faults, the procedures for the restarting node
and the procedures for the neighbor of a restarting node are applied
to the corresponding nodes. These procedures, described in
[RFC3473], are summarized as follows:
For the Restarting Node:
1) Tells its neighbors that state recovery is supported using the
Hello message.
2) Recovers its RSVP state with the help of a Path message, received
from its upstream neighbor, that carries the Recovery_Label
Object.
3) For bidirectional LSPs, uses the Upstream_Label Object on the
received Path message to recover the corresponding RSVP state.
Li, et al. Informational [Page 4]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
4) If the corresponding forwarding state in the data plane does not
exist, the node treats this as a setup for a new LSP. If the
forwarding state in the data plane does exist, the forwarding
state is bound to the LSP associated with the message, and the
related forwarding state should be considered as valid and
refreshed. In addition, if the node is not the tail-end of the
LSP, the incoming label on the downstream interface is retrieved
from the forwarding state on the restarting node and set in the
Upstream_Label Object in the Path message sent to the downstream
neighbor.
For the Neighbor of a Restarting Node:
1) Sends a Path message with the Recovery_Label Object containing a
label value corresponding to the label value received in the most
recently received corresponding Resv message.
2) Resumes refreshing Path state with the restarting node.
3) Resumes refreshing Resv state with the restarting node.
2.2. Procedures Defined in RFC 5063
A new message is introduced in [RFC5063] called the RecoveryPath
message. This message is sent by the downstream neighbor of a
restarting node to convey the contents of the last received Path
message back to the restarting node.
The restarting node will receive the Path message with the
Recovery_Label Object from its upstream neighbor and/or the
RecoveryPath message from its downstream neighbor. The full RSVP
state of the restarting node can be recovered from these two
messages.
The following state can be recovered from the received Path message:
o Upstream data interface (from RSVP_Hop Object)
o Label on the upstream data interface (from Recovery_Label Object)
o Upstream label for bidirectional LSP (from Upstream_Label Object)
The following state can be recovered from the received RecoveryPath
message:
o Downstream data interface (from RSVP_Hop Object)
o Label on the downstream data interface (from Recovery_Label Object)
Li, et al. Informational [Page 5]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
o Upstream direction label for bidirectional LSP (from Upstream_Label
Object)
The other objects originally exchanged on Path and Resv messages can
be recovered from the regular Path and Resv refresh messages, or from
the RecoveryPath.
3. Multiple Node Restart Scenarios
We define the following terms for the different node types:
Restarting - The node has restarted. Communication with its neighbor
nodes is restored, and its RSVP state is under recovery.
Delayed Restarting - The node has restarted, but the communication
with a neighbor node is interrupted (for example, the neighbor
node needs to restart).
Normal - The normal node is the fully operational neighbor of a
restarting or delayed restarting node.
There are five scenarios for multi-node restart. We will focus on
the different positions of a restarting node. As shown in Figure 1,
an LSP starts from Node A, traverses Nodes B and C, and ends at Node
D.
+-----+ Path +-----+ Path +-----+ Path +-----+
| PSB |------->| PSB |------->| PSB |------->| PSB |
| | | | | | | |
| RSB |<-------| RSB |<-------| RSB |<-------| RSB |
+-----+ Resv +-----+ Resv +-----+ Resv +-----+
Node A Node B Node C Node D
Figure 1: Two Neighbor Nodes Restart
1) A restarting node with downstream delayed restarting node. For
example, in Figure 1, Nodes A and D are normal nodes, Node B is a
restarting node, and Node C is a delayed restarting node.
2) A restarting node with upstream delayed restarting node. For
example, in Figure 1, Nodes A and D are normal nodes, Node B is a
delayed restarting node, and Node C is a restarting node.
3) A restarting node with downstream and upstream delayed restarting
nodes. For example, in Figure 1, Node A is a normal node, Nodes B
and D are delayed restarting nodes, and Node C is a restarting
node.
Li, et al. Informational [Page 6]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
4) A restarting ingress node with downstream delayed restarting node.
For example, in Figure 1, Node A is a restarting node and Node B
is a delayed restarting node. Nodes C and D are normal nodes.
5) A restarting egress node with upstream delayed restarting node.
For example, in Figure 1, Nodes A and B are normal nodes, Node C
is a delayed restarting node, and Node D is a restarting node.
If the communication between two nodes is interrupted, the upstream
node may think the downstream node is a delayed restarting node, or
vice versa.
Note that if multiple nodes that are not neighbors are restarted, the
restart procedures could be applied as multiple separated restart
procedures that are exactly the same as the procedures described in
[RFC3473] and [RFC5063]. Therefore, these scenarios are not
described in this document. For example, in Figure 1, Node A and
Node C are normal nodes, and Node B and Node D are restarting nodes;
therefore, Node B could be restarted through Node A and Node C, while
Node D could be restarted through Node C separately.
4. RSVP State
For each scenario, the RSVP state that needs to be recovered at the
restarting nodes are the Path State Block (PSB) and Resv State Block
(RSB), which are created when the node receives the corresponding
Path message and Resv message.
According to [RFC2209], how to construct the PSB and RSB is really an
implementation issue. In fact, there is no requirement to maintain
separate PSB and RSB data structures. In GMPLS, there is a much
closer tie between Path and Resv state so it is possible to combine
the information into a single state block (the LSP state block). On
the other hand, if point-to-multipoint is supported, it may be
convenient to maintain separate upstream and downstream state. Note
that the PSB and RSB are not upstream and downstream state since the
PSB is responsible for receiving a Path from upstream and sending a
Path to downstream.
Regardless of how the RSVP state is implemented, on recovery there
are two logical pieces of state to be recovered and these correspond
to the PSB and RSB.
5. Procedures for Multiple Node Restart
In this document, all the nodes are assumed to have the graceful
restart capabilities that are described in [RFC3473] and [RFC5063].
Li, et al. Informational [Page 7]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
5.1. Procedures for the Normal Node
When the downstream normal node detects its neighbor restarting, it
must send a RecoveryPath message for each LSP associated with the
restarting node for which it has previously sent a Resv message and
which has not been torn down.
When the upstream normal node detects its neighbor restarting, it
must send a Path message with a Recovery_Label Object containing a
label value corresponding to the label value received in the most
recently received corresponding Resv message.
This document does not modify the procedures for the normal node,
which are described in [RFC3473] and [RFC5063].
5.2. Procedures for the Restarting Node
This document does not modify the procedures for the restarting node,
which are described in [RFC3473] and [RFC5063].
5.2.1. Procedures for Scenario 1
After the restarting node restarts, it starts a Recovery Timer. Any
RSVP state that has not been resynchronized when the Recovery Timer
expires should be cleared.
At the restarting node (Node B in the example), full
resynchronization with the upstream neighbor (Node A) is possible
because Node A is a normal node. The upstream Path information is
recovered from the Path message received from Node A. Node B also
recovers the upstream Resv information (that it had previously sent
to Node A) from the Recovery_Label Object carried in the Path message
received from Node A, but, obviously, some information (like the
Recorded_Route Object) will be missing from the new Resv message
generated by Node B and cannot be supplied until the downstream
delayed restarting node (Node C) restarts and sends a Resv.
After the upstream Path information and upstream Resv information
have been recovered by Node B, the normal refresh procedure with
upstream Node A should be started.
As per [RFC5063], the restarting node (Node B) would normally expect
to receive a RecoveryPath message from its downstream neighbor (Node
C). It would use this to recover the downstream Path information,
and would subsequently send a Path message to its downstream neighbor
and receive a Resv message. But in this scenario, because the
downstream neighbor has not restarted yet, Node B detects the
communication with
Li, et al. Informational [Page 8]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
Node C is interrupted and must wait before resynchronizing with its
downstream neighbor.
In this case, the restarting node (Node B) follows the procedures in
Section 9.3 of [RFC3473] and may run a Restart Timer to wait for the
downstream neighbor (Node C) to restart. If its downstream neighbor
(Node C) has not restarted before the timer expires, the
corresponding LSPs may be torn down according to local policy
[RFC3473]. Note, however, that the Restart Time value suggested in
[RFC3473] is based on the previous Hello message exchanged with the
node that has not restarted yet (Node C). Since this time value is
unlikely to be available to the restarting node (Node B), a
configured time value must be used if the timer is operated.
The RSVP state must be reconciled with the retained data plane state
if the cross-connect information can be retrieved from the data
plane. In the event of any mismatches, local policy will dictate the
action that must be taken, which could include:
- reprogramming the data plane
- sending an alert to the management plane
- tearing down the control plane state for the LSP
In the case that the delayed restarting node never comes back and a
Restart Timer is not used to automatically tear down LSPs, the LSPs
can be tidied up through the control plane using a PathTear from the
upstream node (Node A). Note that if Node C restarts after this
operation, the RecoveryPath message that it sends to Node B will not
be matched with any state on Node B and will receive a PathTear as
its response, resulting in the teardown of the LSP at all downstream
nodes.
5.2.2. Procedures for Scenario 2
In this case, the restarting node (Node C) can recover full
downstream state from its downstream neighbor (Node D), which is a
normal node. The downstream Path state can be recovered from the
RecoveryPath message, which is sent by Node D. This allows Node C to
send a Path refresh message to Node D, and Node D will respond with a
Resv message from which Node C can reconstruct the downstream Resv
state.
After the downstream Path information and downstream Resv information
have been recovered in Node C, the normal refresh procedure with
downstream Node D should be started.
Li, et al. Informational [Page 9]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
The restarting node would normally expect to resynchronize with its
upstream neighbor to re-learn the upstream Path and Resv state, but
in this scenario, because the upstream neighbor (Node B) has not
restarted yet, the restarting node (Node C) detects that the
communication with upstream neighbor (Node B) is interrupted. The
restarting node (Node C) follows the procedures in Section 9.3 of
[RFC3473] and may run a Restart Timer to wait for the upstream
neighbor (Node B) to restart. If its upstream neighbor (Node B) has
not restarted before the Restart Timer expires, the corresponding
LSPs may be torn down according to local policy [RFC3473]. Note,
however, that the Restart Time value suggested in [RFC3473] is based
on the previous Hello message exchanged with the node that has not
restarted yet (Node B). Since this time value is unlikely to be
available to the restarting node (Node C), a configured time value
must be used if the timer is operated.
Note that no Resv message is sent to the upstream neighbor (Node B),
because it has not restarted.
The RSVP state must be reconciled with the retained data plane state
if the cross-connect information can be retrieved from the data
plane.
In the event of any mismatches, local policy will dictate the action
that must be taken, which could include:
- reprogramming the data plane
- sending an alert to the management plane
- tearing down the control plane state for the LSP
In the case that the delayed restarting node never comes back and a
Restart Timer is not used to automatically tear down LSPs, the LSPs
cannot be tidied up through the control plane using a PathTear from
the upstream node (Node A), because there is no control plane
connectivity to Node C from the upstream direction. There are two
possibilities in [RFC3473]:
- Management action may be taken at the restarting node to tear the
LSP. This will result in the LSP being removed from Node C and a
PathTear being sent downstream to Node D.
- Management action may be taken at any downstream node (for example,
Node D), resulting in a PathErr message with the Path_State_Removed
flag set being sent to Node C to tear the LSP state.
Li, et al. Informational [Page 10]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
Note that if Node B restarts after this operation, the Path message
that it sends to Node C will not be matched with any state on Node C
and will be treated as a new Path message, resulting in LSP setup.
Node C should use the labels carried in the Path message (in the
Upstream_Label Object and in the Recovery_Label Object) to drive its
label allocation, but may use other labels according to normal LSP
setup rules.
5.2.3. Procedures for Scenario 3
In this example, the restarting node (Node C) is isolated. Its
upstream and downstream neighbors have not restarted.
The restarting node (Node C) follows the procedures in Section 9.3 of
[RFC3473] and may run a Restart Timer for each of its neighbors
(Nodes B and D). If a neighbor has not restarted before its Restart
Timer expires, the corresponding LSPs may be torn down according to
local policy [RFC3473]. Note, however, that the Restart Time values
suggested in [RFC3473] are based on the previous Hello message
exchanged with the nodes that have not restarted yet. Since these
time values are unlikely to be available to the restarting node (Node
C), a configured time value must be used if the timer is operated.
During the Recovery Time, if the upstream delayed restarting node has
restarted, the procedure for scenario 1 can be applied.
During the Recovery Time, if the downstream delayed restarting node
has restarted, the procedure for scenario 2 can be applied.
In the case that neither delayed restarting node ever comes back and
a Restart Timer is not used to automatically tear down LSPs,
management intervention is required to tidy up the control plane and
the data plane on the node that is waiting for the failed device to
restart.
If the downstream delayed restarting node restarts after the cleanup
of LSPs at Node C, the RecoveryPath message from Node D will be
responded to with a PathTear message. If the upstream delayed
restarting node restarts after the cleanup of LSPs at Node C, the
Path message from Node B will be treated as a new LSP setup request,
but the setup will fail because Node D cannot be reached; Node C will
respond with a PathErr message. Since this happens to Node B during
its restart processing, it should follow the rules of [RFC5063] and
tear down the LSP.
Li, et al. Informational [Page 11]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
5.2.4. Procedures for Scenario 4
When the ingress node (Node A) restarts, it does not know which LSPs
it caused to be created. Usually, however, this information is
retrieved from the management plane or from the configuration
requests stored in non-volatile form in the node in order to recover
the LSP state.
Furthermore, if the downstream node (Node B) is a normal node,
according to the procedures in [RFC5063], the ingress will receive a
RecoveryPath message and will understand that it was the ingress of
the LSP.
However, in this scenario, the downstream node is a delayed
restarting node, so Node A must either rely on the information from
the management plane or stored configuration, or it must wait for
Node B to restart.
In the event that Node B never restarts, management plane
intervention is needed at Node A to clean up any LSP control plane
state restored from the management plane or from local configuration,
and to release any data plane resources.
5.2.5. Procedures for Scenario 5
In this scenario, the egress node (Node D) restarts, and its upstream
neighbor (Node C) has not restarted. In this case, the egress node
may have no control plane state relating to the LSPs. It has no
downstream neighbor to help it and no management plane or
configuration information, although there will be data plane state
for the LSP. The egress node must simply wait until its upstream
neighbor restarts and gives it the information in Path messages
carrying Recovery_Label Objects.
5.3. Consideration of the Reuse of Data Plane Resources
Fundamental to the processes described above is an understanding that
data plane resources may remain in use (allocated and cross-
connected) when control plane state has not been fully resynchronized
because some control plane nodes have not restarted.
It is assumed that these data plane resources might be carrying
traffic and should not be reconfigured except through application of
operator-configured policy, or as a direct result of operator action.
In particular, new LSP setup requests from the control plane or the
management plane should not be allowed to use data plane resources
Li, et al. Informational [Page 12]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
that are still in use. Specific action must first be taken to
release the resources.
5.4. Consideration of Management Plane Intervention
The management plane must always retain the ability to control data
plane resources and to override the control plane. In this context,
the management plane must always be able to release data plane
resources that were previously in place for use by control-plane-
established LSPs. Further, the management plane must always be able
to instruct any control plane node to tear down any LSP.
Operators should be aware of the risks of misconnection that could be
caused by careless manipulation from the management plane of in-use
data plane resources.
6. Clarification of Restarting Node Procedure
According to the current graceful restart procedure [RFC3473], after
a node restarts its control plane, it needs its upstream node to send
a PATH message with a recovery label in order to synchronize its RSVP
state. If the restarted control plane becomes operational quickly,
the upstream node may not detect the restarting of the downstream
node and, therefore, may send a PATH message without a recovery
label, causing errors and unwanted connection deletion.
Li, et al. Informational [Page 13]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
N1 N2
| |
| X (Restart start)
| HELLO |
|--------------->|
| |
| SRefresh |
|--------------->|
| |
| HELLO |
|--------------->|
| |
| X (Restart complete)
| SRefresh |
|--------------->|
| NACK |
|<---------------|
| Path without |
| recovery label |
|--------------->|
| X (resource allocation failed because the
| | resources are in use)
| PathErr |
|<---------------|
| PathTear |
|--------------->|
X(LSP deletion) X (LSP deletion)
| |
Figure 2: Message Flow for Accidental LSP Deletion
The sequence diagram above depicts one scenario where the LSP may get
deleted.
In this sequence, N1 does not detect Hello failure and continues
sending SRefreshes, which may get NACK'ed by N2 once restart
completes because there is no Path state corresponding to the
SRefresh message. This NACK causes a Path refresh message to be
generated, but there is no Recovery_Label because N1 does not yet
detect that N2 has restarted, as Hello exchanges have not yet
started. The Path message is treated as "new" and fails to allocate
the resources because they are still in use. This causes a PathErr
message to be generated, which may lead to the teardown of the LSP.
To resolve the aforementioned problem, the following procedures,
which are implicit in [RFC3473] and [RFC5063], should be followed.
These procedures work together with the recovery procedures
documented in [RFC3473]. Here, it is assumed that the restarting
Li, et al. Informational [Page 14]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
node and the neighboring node(s) support the Hello extension as
documented in [RFC3209] as well as the recovery procedures documented
in [RFC3473].
After a node restarts its control plane, it should ignore and
silently drop all RSVP-TE messages (except Hello messages) it
receives from any neighbor to which no HELLO session has been
established.
The restarting node should follow [RFC3209] to establish Hello
sessions with its neighbors, after its control plane becomes
operational.
The restarting node resumes processing of RSVP-TE messages sent from
each neighbor to which the Hello session has been established.
7. Security Considerations
This document clarifies the procedures defined in [RFC3473] and
[RFC5063] to be performed on RSVP agents that neighbor one or more
restarting RSVP agents. It does not introduce any new procedures
and, therefore, does not introduce any new security risks or issues.
In the case of the control plane in general, and the RSVP agent in
particular, where one or more nodes carrying one or more LSPs are
restarted due to external attacks, the procedures defined in
[RFC5063] and described in this document provide the ability for the
restarting RSVP agents to recover the RSVP state in each restarting
node corresponding to the LSPs, with the least possible perturbation
to the rest of the network. These procedures can be considered to
provide mechanisms by which the GMPLS network can recover from
physical attacks or from attacks on remotely controlled power
supplies.
The procedures described are such that only the neighboring RSVP
agents should notice the restart of a node, and hence only they need
to perform additional processing. This allows for a network with
active LSPs to recover LSP state gracefully from an external attack,
without perturbing the data/forwarding plane state and without
propagating the error condition in the control or data plane. In
other words, the effect of the restart (which might be the result of
an attack) does not spread into the network.
Note that concern has been expressed about the vulnerability of a
restarting node to false messages received from its neighbors. For
example, a restarting node might receive a false Path message with a
Li, et al. Informational [Page 15]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
Recovery_Label Object from an upstream neighbor, or a false
RecoveryPath message from its downstream neighbor. This situation
might arise in one of four cases:
- The message is spoofed and does not come from the neighbor at all.
- The message has been modified as it was traveling from the
neighbor.
- The neighbor is defective and has generated a message in error.
- The neighbor has been subverted and has a "rogue" RSVP agent.
The first two cases may be handled using standard RSVP authentication
and integrity procedures [RFC3209], [RFC3473]. If the operator is
particularly worried, the control plane may be operated using IPsec
[RFC4301], [RFC4302], [RFC4835], [RFC4306], and [RFC2411].
Protection against defective or rogue RSVP implementations is
generally hard-to-impossible. Neighbor-to-neighbor authentication
and integrity validation is, by definition, ineffective in these
situations. For example, if a neighbor node sends a Resv during
normal LSP setup, and if that message carries a Generalized_Label
Object carrying an incorrect label value, then the receiving LSR will
use the supplied value and the LSP will be set up incorrectly.
Alternatively, if a Path message is modified by an upstream LSR to
change the destination and explicit route, there is no way for the
downstream LSR to detect this, and the LSP may be set up to the wrong
destination. Furthermore, the upstream LSR could disguise this fact
by modifying the recorded route reported in the Resv message. Thus,
these issues are in no way specific to the restart case, do not cause
any greater or different problems from the normal case, and do not
warrant specific security measures applicable to restart scenarios.
Note that the RSVP Policy_Data Object [RFC2205] provides a scope by
which secure end-to-end checks could be applied. However, very
little definition of the use of this object has been made to date.
See [MPLS-SEC] for a wider discussion of security in MPLS and GMPLS
networks.
8. Acknowledgments
We would like to thank Adrian Farrel, Dimitri Papadimitriou, and Lou
Berger for their useful comments.
Li, et al. Informational [Page 16]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
9. References
9.1. Normative References
[RFC2209] Braden, R. and L. Zhang, "Resource ReSerVation Protocol
(RSVP) -- Version 1 Message Processing Rules", RFC 2209,
September 1997.
[RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V.,
and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP
Tunnels", RFC 3209, December 2001.
[RFC3473] Berger, L., Ed., "Generalized Multi-Protocol Label
Switching (GMPLS) Signaling Resource ReserVation
Protocol-Traffic Engineering (RSVP-TE) Extensions", RFC
3473, January 2003.
[RFC5063] Satyanarayana, A., Ed., and R. Rahman, Ed., "Extensions to
GMPLS Resource Reservation Protocol (RSVP) Graceful
Restart", RFC 5063, October 2007.
9.2. Informative References
[MPLS-SEC] Fang, L., "Security Framework for MPLS and GMPLS
Networks", Work in Progress, November 2008.
[RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S.
Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
Functional Specification", RFC 2205, September 1997.
[RFC2411] Thayer, R., Doraswamy, N., and R. Glenn, "IP Security
Document Roadmap", RFC 2411, November 1998.
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005.
[RFC4302] Kent, S., "IP Authentication Header", RFC 4302, December
2005.
[RFC4306] Kaufman, C., Ed., "Internet Key Exchange (IKEv2)
Protocol", RFC 4306, December 2005.
[RFC4835] Manral, V., "Cryptographic Algorithm Implementation
Requirements for Encapsulating Security Payload (ESP) and
Authentication Header (AH)", RFC 4835, April 2007.
Li, et al. Informational [Page 17]
^L
RFC 5495 RSVP-TE Graceful Restart Procedures February 2009
Authors' Addresses
Dan Li
Huawei Technologies
F3-5-B R&D Center, Huawei Base,
Shenzhen 518129, China
Phone: +86 755 28970230
EMail: danli@huawei.com
Jianhua Gao
Huawei Technologies
F3-5-B R&D Center, Huawei Base,
Shenzhen 518129, China
Phone: +86 755 28972902
EMail: gjhhit@huawei.com
Arun Satyanarayana
Cisco Systems
170 West Tasman Dr
San Jose, CA 95134, USA
Phone: +1 408 853-3206
EMail: asatyana@cisco.com
Snigdho C. Bardalai
Fujitsu Network Communications
2801 Telecom Parkway
Richardson, Texas 75082, USA
Phone: +1 972 479 2951
EMail: snigdho.bardalai@us.fujitsu.com
Li, et al. Informational [Page 18]
^L
|