1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
|
RFC: 816
FAULT ISOLATION AND RECOVERY
David D. Clark
MIT Laboratory for Computer Science
Computer Systems and Communications Group
July, 1982
1. Introduction
Occasionally, a network or a gateway will go down, and the sequence
of hops which the packet takes from source to destination must change.
Fault isolation is that action which hosts and gateways collectively
take to determine that something is wrong; fault recovery is the
identification and selection of an alternative route which will serve to
reconnect the source to the destination. In fact, the gateways perform
most of the functions of fault isolation and recovery. There are,
however, a few actions which hosts must take if they wish to provide a
reasonable level of service. This document describes the portion of
fault isolation and recovery which is the responsibility of the host.
2. What Gateways Do
Gateways collectively implement an algorithm which identifies the
best route between all pairs of networks. They do this by exchanging
packets which contain each gateway's latest opinion about the
operational status of its neighbor networks and gateways. Assuming that
this algorithm is operating properly, one can expect the gateways to go
through a period of confusion immediately after some network or gateway
^L
2
has failed, but one can assume that once a period of negotiation has
passed, the gateways are equipped with a consistent and correct model of
the connectivity of the internet. At present this period of negotiation
may actually take several minutes, and many TCP implementations time out
within that period, but it is a design goal of the eventual algorithm
that the gateway should be able to reconstruct the topology quickly
enough that a TCP connection should be able to survive a failure of the
route.
3. Host Algorithm for Fault Recovery
Since the gateways always attempt to have a consistent and correct
model of the internetwork topology, the host strategy for fault recovery
is very simple. Whenever the host feels that something is wrong, it
asks the gateway for advice, and, assuming the advice is forthcoming, it
believes the advice completely. The advice will be wrong only during
the transient period of negotiation, which immediately follows an
outage, but will otherwise be reliably correct.
In fact, it is never necessary for a host to explicitly ask a
gateway for advice, because the gateway will provide it as appropriate.
When a host sends a datagram to some distant net, the host should be
prepared to receive back either of two advisory messages which the
gateway may send. The ICMP "redirect" message indicates that the
gateway to which the host sent the datagram is not longer the best
gateway to reach the net in question. The gateway will have forwarded
the datagram, but the host should revise its routing table to have a
different immediate address for this net. The ICMP "destination
^L
3
unreachable" message indicates that as a result of an outage, it is
currently impossible to reach the addressed net or host in any manner.
On receipt of this message, a host can either abandon the connection
immediately without any further retransmission, or resend slowly to see
if the fault is corrected in reasonable time.
If a host could assume that these two ICMP messages would always
arrive when something was amiss in the network, then no other action on
the part of the host would be required in order maintain its tables in
an optimal condition. Unfortunately, there are two circumstances under
which the messages will not arrive properly. First, during the
transient following a failure, error messages may arrive that do not
correctly represent the state of the world. Thus, hosts must take an
isolated error message with some scepticism. (This transient period is
discussed more fully below.) Second, if the host has been sending
datagrams to a particular gateway, and that gateway itself crashes, then
all the other gateways in the internet will reconstruct the topology,
but the gateway in question will still be down, and therefore cannot
provide any advice back to the host. As long as the host continues to
direct datagrams at this dead gateway, the datagrams will simply vanish
off the face of the earth, and nothing will come back in return. Hosts
must detect this failure.
If some gateway many hops away fails, this is not of concern to the
host, for then the discovery of the failure is the responsibility of the
immediate neighbor gateways, which will perform this action in a manner
invisible to the host. The problem only arises if the very first
^L
4
gateway, the one to which the host is immediately sending the datagrams,
fails. We thus identify one single task which the host must perform as
its part of fault isolation in the internet: the host must use some
strategy to detect that a gateway to which it is sending datagrams is
dead.
Let us assume for the moment that the host implements some
algorithm to detect failed gateways; we will return later to discuss
what this algorithm might be. First, let us consider what the host
should do when it has determined that a gateway is down. In fact, with
the exception of one small problem, the action the host should take is
extremely simple. The host should select some other gateway, and try
sending the datagram to it. Assuming that gateway is up, this will
either produce correct results, or some ICMP advice. Since we assume
that, ignoring temporary periods immediately following an outage, any
gateway is capable of giving correct advice, once the host has received
advice from any gateway, that host is in as good a condition as it can
hope to be.
There is always the unpleasant possibility that when the host tries
a different gateway, that gateway too will be down. Therefore, whatever
algorithm the host uses to detect a dead gateway must continuously be
applied, as the host tries every gateway in turn that it knows about.
The only difficult part of this algorithm is to specify the means
by which the host maintains the table of all of the gateways to which it
has immediate access. Currently, the specification of the internet
protocol does not architect any message by which a host can ask to be
^L
5
supplied with such a table. The reason is that different networks may
provide very different mechanisms by which this table can be filled in.
For example, if the net is a broadcast net, such as an ethernet or a
ringnet, every gateway may simply broadcast such a table from time to
time, and the host need do nothing but listen to obtain the required
information. Alternatively, the network may provide the mechanism of
logical addressing, by which a whole set of machines can be provided
with a single group address, to which a request can be sent for
assistance. Failing those two schemes, the host can build up its table
of neighbor gateways by remembering all the gateways from which it has
ever received a message. Finally, in certain cases, it may be necessary
for this table, or at least the initial entries in the table, to be
constructed manually by a manager or operator at the site. In cases
where the network in question provides absolutely no support for this
kind of host query, at least some manual intervention will be required
to get started, so that the host can find out about at least one
gateway.
4. Host Algorithms for Fault Isolation
We now return to the question raised above. What strategy should
the host use to detect that it is talking to a dead gateway, so that it
can know to switch to some other gateway in the list. In fact, there are
several algorithms which can be used. All are reasonably simple to
implement, but they have very different implications for the overhead on
the host, the gateway, and the network. Thus, to a certain extent, the
algorithm picked must depend on the details of the network and of the
host.
^L
6
1. NETWORK LEVEL DETECTION
Many networks, particularly the Arpanet, perform precisely the
required function internal to the network. If a host sends a datagram
to a dead gateway on the Arpanet, the network will return a "host dead"
message, which is precisely the information the host needs to know in
order to switch to another gateway. Some early implementations of
Internet on the Arpanet threw these messages away. That is an
exceedingly poor idea.
2. CONTINUOUS POLLING
The ICMP protocol provides an echo mechanism by which a host may
solicit a response from a gateway. A host could simply send this
message at a reasonable rate, to assure itself continuously that the
gateway was still up. This works, but, since the message must be sent
fairly often to detect a fault in a reasonable time, it can imply an
unbearable overhead on the host itself, the network, and the gateway.
This strategy is prohibited except where a specific analysis has
indicated that the overhead is tolerable.
3. TRIGGERED POLLING
If the use of polling could be restricted to only those times when
something seemed to be wrong, then the overhead would be bearable.
Provided that one can get the proper advice from one's higher level
protocols, it is possible to implement such a strategy. For example,
one could program the TCP level so that whenever it retransmitted a
^L
7
segment more than once, it sent a hint down to the IP layer which
triggered polling. This strategy does not have excessive overhead, but
does have the problem that the host may be somewhat slow to respond to
an error, since only after polling has started will the host be able to
confirm that something has gone wrong, and by then the TCP above may
have already timed out.
Both forms of polling suffer from a minor flaw. Hosts as well as
gateways respond to ICMP echo messages. Thus, polling cannot be used to
detect the error that a foreign address thought to be a gateway is
actually a host. Such a confusion can arise if the physical addresses
of machines are rearranged.
4. TRIGGERED RESELECTION
There is a strategy which makes use of a hint from a higher level,
as did the previous strategy, but which avoids polling altogether.
Whenever a higher level complains that the service seems to be
defective, the Internet layer can pick the next gateway from the list of
available gateways, and switch to it. Assuming that this gateway is up,
no real harm can come of this decision, even if it was wrong, for the
worst that will happen is a redirect message which instructs the host to
return to the gateway originally being used. If, on the other hand, the
original gateway was indeed down, then this immediately provides a new
route, so the period of time until recovery is shortened. This last
strategy seems particularly clever, and is probably the most generally
suitable for those cases where the network itself does not provide fault
isolation. (Regretably, I have forgotten who suggested this idea to me.
It is not my invention.)
^L
8
5. Higher Level Fault Detection
The previous discussion has concentrated on fault detection and
recovery at the IP layer. This section considers what the higher layers
such as TCP should do.
TCP has a single fault recovery action; it repeatedly retransmits a
segment until either it gets an acknowledgement or its connection timer
expires. As discussed above, it may use retransmission as an event to
trigger a request for fault recovery to the IP layer. In the other
direction, information may flow up from IP, reporting such things as
ICMP Destination Unreachable or error messages from the attached
network. The only subtle question about TCP and faults is what TCP
should do when such an error message arrives or its connection timer
expires.
The TCP specification discusses the timer. In the description of
the open call, the timeout is described as an optional value that the
client of TCP may specify; if any segment remains unacknowledged for
this period, TCP should abort the connection. The default for the
timeout is 30 seconds. Early TCPs were often implemented with a fixed
timeout interval, but this did not work well in practice, as the
following discussion may suggest.
Clients of TCP can be divided into two classes: those running on
immediate behalf of a human, such as Telnet, and those supporting a
program, such as a mail sender. Humans require a sophisticated response
to errors. Depending on exactly what went wrong, they may want to
^L
9
abandon the connection at once, or wait for a long time to see if things
get better. Programs do not have this human impatience, but also lack
the power to make complex decisions based on details of the exact error
condition. For them, a simple timeout is reasonable.
Based on these considerations, at least two modes of operation are
needed in TCP. One, for programs, abandons the connection without
exception if the TCP timer expires. The other mode, suitable for
people, never abandons the connection on its own initiative, but reports
to the layer above when the timer expires. Thus, the human user can see
error messages coming from all the relevant layers, TCP and ICMP, and
can request TCP to abort as appropriate. This second mode requires that
TCP be able to send an asynchronous message up to its client to report
the timeout, and it requires that error messages arriving at lower
layers similarly flow up through TCP.
At levels above TCP, fault detection is also required. Either of
the following can happen. First, the foreign client of TCP can fail,
even though TCP is still running, so data is still acknowledged and the
timer never expires. Alternatively, the communication path can fail,
without the TCP timer going off, because the local client has no data to
send. Both of these have caused trouble.
Sending mail provides an example of the first case. When sending
mail using SMTP, there is an SMTP level acknowledgement that is returned
when a piece of mail is successfully delivered. Several early mail
receiving programs would crash just at the point where they had received
all of the mail text (so TCP did not detect a timeout due to outstanding
^L
10
unacknowledged data) but before the mail was acknowledged at the SMTP
level. This failure would cause early mail senders to wait forever for
the SMTP level acknowledgement. The obvious cure was to set a timer at
the SMTP level, but the first attempt to do this did not work, for there
was no simple way to select the timer interval. If the interval
selected was short, it expired in normal operational when sending a
large file to a slow host. An interval of many minutes was needed to
prevent false timeouts, but that meant that failures were detected only
very slowly. The current solution in several mailers is to pick a
timeout interval proportional to the size of the message.
Server telnet provides an example of the other kind of failure. It
can easily happen that the communications link can fail while there is
no traffic flowing, perhaps because the user is thinking. Eventually,
the user will attempt to type something, at which time he will discover
that the connection is dead and abort it. But the host end of the
connection, having nothing to send, will not discover anything wrong,
and will remain waiting forever. In some systems there is no way for a
user in a different process to destroy or take over such a hanging
process, so there is no way to recover.
One solution to this would be to have the host server telnet query
the user end now and then, to see if it is still up. (Telnet does not
have an explicit query feature, but the host could negotiate some
unimportant option, which should produce either agreement or
disagreement in return.) The only problem with this is that a
reasonable sample interval, if applied to every user on a large system,
^L
11
can generate an unacceptable amount of traffic and system overhead. A
smart server telnet would use this query only when something seems
wrong, perhaps when there had been no user activity for some time.
In both these cases, the general conclusion is that client level
error detection is needed, and that the details of the mechanism are
very dependent on the application. Application programmers must be made
aware of the problem of failures, and must understand that error
detection at the TCP or lower level cannot solve the whole problem for
them.
6. Knowing When to Give Up
It is not obvious, when error messages such as ICMP Destination
Unreachable arrive, whether TCP should abandon the connection. The
reason that error messages are difficult to interpret is that, as
discussed above, after a failure of a gateway or network, there is a
transient period during which the gateways may have incorrect
information, so that irrelevant or incorrect error messages may
sometimes return. An isolated ICMP Destination Unreachable may arrive
at a host, for example, if a packet is sent during the period when the
gateways are trying to find a new route. To abandon a TCP connection
based on such a message arriving would be to ignore the valuable feature
of the Internet that for many internal failures it reconstructs its
function without any disruption of the end points.
But if failure messages do not imply a failure, what are they for?
In fact, error messages serve several important purposes. First, if
^L
12
they arrive in response to opening a new connection, they probably are
caused by opening the connection improperly (e.g., to a non-existent
address) rather than by a transient network failure. Second, they
provide valuable information, after the TCP timeout has occurred, as to
the probable cause of the failure. Finally, certain messages, such as
ICMP Parameter Problem, imply a possible implementation problem. In
general, error messages give valuable information about what went wrong,
but are not to be taken as absolutely reliable. A general alerting
mechanism, such as the TCP timeout discussed above, provides a good
indication that whatever is wrong is a serious condition, but without
the advisory messages to augment the timer, there is no way for the
client to know how to respond to the error. The combination of the
timer and the advice from the error messages provide a reasonable set of
facts for the client layer to have. It is important that error messages
from all layers be passed up to the client module in a useful and
consistent way.
-------
|