summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc789.txt
blob: 6acde4a93ddb74680f4d89e9adacdc1728ac7335 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
                                                          RFC 789

















    Vulnerabilities of Network Control Protocols: An Example



                          Eric C. Rosen


                  Bolt Beranek and Newman Inc.
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

     This paper has appeared in the January 1981 edition  of  the

SIGSOFT  Software  Engineering Notes, and will soon appear in the

SIGCOMM Computer Communications Review.  It is  being  circulated

as  an  RFC because it is thought that it may be of interest to a

wider audience, particularly to the internet community.  It is  a

case  study  of  a  particular  kind of problem that can arise in

large distributed systems,  and  of  the  approach  used  in  the

ARPANET to deal with one such problem.


     On  October 27, 1980, there was an unusual occurrence on the

ARPANET.  For a period of several hours, the network appeared  to

be  unusable,  due to what was later diagnosed as a high priority

software  process   running   out   of   control.    Network-wide

disturbances  are  extremely  unusual  in  the  ARPANET (none has

occurred in several years), and as a  result,  many  people  have

expressed  interest  in  learning more about the etiology of this

particular incident.  The purpose of this note is to explain what

the symptoms of the problem  were,  what  the  underlying  causes

were,  and  what  lessons  can  be  drawn.   As we shall see, the

immediate cause of the problem was  a  rather  freakish  hardware

malfunction  (which is not likely to recur) which caused a faulty

sequence of network control packets to be generated.  This faulty

sequence of control packets in turn affected the apportionment of

software resources in the IMPs, causing one of the IMP  processes

to  use  an  excessive  amount  of resources, to the detriment of

other  IMP  processes.   Restoring  the  network  to  operational


                              - 1 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

condition  was  a  relatively straightforward task.  There was no

damage other than the outage itself,  and  no  residual  problems

once  the  network  was  restored.   Nevertheless,  it  is  quite

interesting to see the way  in  which  unusual  (indeed,  unique)

circumstances  can  bring  out vulnerabilities in network control

protocols, and that shall be the focus of this paper.


     The problem began suddenly when  we  discovered  that,  with

very few exceptions, no IMP was able to communicate reliably with

any other IMP.  Attempts to go from a TIP to a host on some other

IMP   only   brought  forth  the  "net  trouble"  error  message,

indicating that no physical path  existed  between  the  pair  of

IMPs.   Connections  which already existed were summarily broken.

A flood of phone calls to the Network Control Center  (NCC)  from

all  around  the  country  indicated  that  the  problem  was not

localized, but rather seemed to be affecting virtually every IMP.


     As a first step towards trying to find out what the state of

the network actually was, we dialed up a number  of  TIPs  around

the  country.  What we generally found was that the TIPs were up,

but  that  their  lines  were  down.   That  is,  the  TIPs  were

communicating  properly  with the user over the dial-up line, but

no connections to other IMPs were possible.


     We tried manually restarting a number of IMPs which  are  in

our own building (after taking dumps, of course).  This procedure

initializes  all  of  the IMPs' dynamic data structures, and will


                              - 2 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

often clear up problems which arise when, as sometimes happens in

most complex software systems, the IMPs'  software  gets  into  a

"funny"  state.   The IMPs which were restarted worked well until

they were connected to the rest of  the  net,  after  which  they

exhibited  the same complex of symptoms as the IMPs which had not

been restarted.


     From the facts so far presented, we  were  able  to  draw  a

number  of  conclusions.   Any  problem  which  affects  all IMPs

throughout the network is usually a routing problem.   Restarting

an  IMP  re-initializes  the routing data structures, so the fact

that restarting an IMP did not alleviate the problem in that  IMP

suggested  that  the problem was due to one or more "bad" routing

updates circulating in the network.  IMPs  which  were  restarted

would  just receive the bad updates from those of their neighbors

which were not restarted.  The fact that IMPs  seemed  unable  to

keep  their lines up was also a significant clue as to the nature

of the problem.  Each  pair  of  neighboring  IMPs  runs  a  line

up/down protocol to determine whether the line connecting them is

of  sufficient  quality  to be put into operation.  This protocol

involves the sending of HELLO and I-HEARD-YOU messages.  We  have

noted  in  the  past that under conditions of extremely heavy CPU

utilization, so many buffers can pile up waiting to be served  by

the  bottleneck  CPU process, that the IMPs are unable to acquire

the  buffers  needed  for  receiving  the  HELLO  or  I-HEARD-YOU

messages.  If a condition like this lasts for any length of time,


                              - 3 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

the  IMPs  may  not be able to run the line up/down protocol, and

lines will be declared down by the IMPs' software.  On the  basis

of  all  these  facts,  our  tentative  conclusion  was that some

malformed update was causing the routing process in the  IMPs  to

use  an excessive amount of CPU time, possibly even to be running

in an infinite loop.  (This would be  quite  a  surprise  though,

since  we  tried very hard to protect ourselves against malformed

updates when we designed the routing process.)  As we shall  see,

this  tentative  conclusion, although on the right track, was not

quite correct, and the actual situation turned  out  to  be  much

more complex.


     When we examined core dumps from several IMPs, we noted that

most,  in  some cases all, of the IMPs' buffers contained routing

updates  waiting  to  be  processed.   Before   describing   this

situation further, it is necessary to explain some of the details

of  the  routing  algorithm's  updating  scheme.   (The following

explanation will of course be very brief and incomplete.  Readers

with a greater  level  of  interest  are  urged  to  consult  the

references.)  Every so often, each IMP generates a routing update

indicating  which  other  IMPs  are  its immediate neighbors over

operational  lines,  and  the  average   per-packet   delay   (in

milliseconds)  over that line.  Every IMP is required to generate

such an update at least once per minute, and no IMP is  permitted

to  generate  more than a dozen such updates over the course of a

minute.  Each  update  has  a  6-bit  sequence  number  which  is


                              - 4 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

advanced by 1 (modulo 64) for each successive update generated by

a  particular IMP.  If two updates generated by the same IMP have

sequence numbers n and m, update n  is  considered  to  be  LATER

(i.e.,  more recently generated) than update m if and only if one

of the following two conditions hold:



         (a) n > m, and n - m <= 32

         (b) n < m, and m - n > 32


(where the comparisons and subtractions treat n and m as unsigned

6-bit numbers, with  no  modulus).   When  an  IMP  generates  an

update,  it sends a copy of the update to each neighbor.  When an

IMP A receives an update u1 which was generated  by  a  different

IMP  B,  it  first  compares  the  sequence number of u1 with the

sequence number of the last update, u2, that it accepted from  B.

If  this  comparison  indicates  that  u2 is LATER than u1, u1 is

simply discarded.  If, on the other hand, u1 appears  to  be  the

LATER  update, IMP A will send u1 to all its neighbors (including

the one from which it was received).  The sequence number  of  u1

will be retained in A's tables as the LATEST received update from

B.   Of  course,  u1 is always accepted if A has seen no previous

update from B.  Note that this procedure is  designed  to  ensure

that  an  update  generated  by  a  particular  IMP  is received,

unchanged, by all other  IMPs  in  the  network,  IN  THE  PROPER

SEQUENCE.    Each routing update is broadcast (or flooded) to all

IMPs, not just to immediate neighbors of the IMP which  generated


                              - 5 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

the update (as in some other routing algorithms).  The purpose of

the  sequence numbers is to ensure that all IMPs will agree as to

which update from a given IMP  is  the  most  recently  generated

update from that IMP.


     For  reliability,  there  is  a  protocol for retransmitting

updates over individual links.  Let X and Y be neighboring  IMPs,

and let A be a third IMP.  Suppose X receives an update which was

generated by A, and transmits it to Y.  Now if in the next 100 ms

or  so, X does not receive from Y an update which originated at A

and whose sequence number is at least as recent as  that  of  the

update  X  sent  to  Y,  X concludes that its transmission of the

update did not get through to Y, and  that  a  retransmission  is

required.   (This  conclusion is warranted, since an update which

is  received  and  adjudged  to  be  the  most  recent  from  its

originating  IMP is sent to all neighbors, including the one from

which it was received.)  The IMPs do not keep the original update

packets  buffered  pending  retransmission.   Rather,   all   the

information  in  the  update  packet  is  kept in tables, and the

packet  is  re-created  from  the  tables  if  necessary  for   a

retransmission.


     This  transmission  protocol  ("flooding")  distributes  the

routing updates  in a  very  rapid  and  reliable  manner.   Once

generated by an IMP, an update will almost always reach all other

IMPs  in  a time period on the order of 100 ms.  Since an IMP can

generate no more than a dozen updates per minute, and  there  are

                              - 6 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

64  possible sequence numbers, sequence number wrap-around is not

a problem.  There is only one exception  to  this.   Suppose  two

IMPs  A  and  B  are  out  of  communication for a period of time

because there is no physical path between them.  (This may be due

either to a network partition, or to a more  mundane  occurrence,

such  as  one  of  the  IMPs  being down.)  When communication is

re-established, A and B have no way of knowing how long they have

been out of communication, or how many times the other's sequence

numbers may have wrapped around.  Comparing the  sequence  number

of  a newly received update with the sequence number of an update

received before the outage may give an incorrect result.  To deal

with this problem, the following scheme is adopted.   Let  t0  be

the time at which IMP A receives update number n generated by IMP

B.   Let  t1 be t0 plus 1 minute.  If by t1, A receives no update

generated by B with a LATER sequence number than n, A will accept

any update from B as being more recent than n.  So  if  two  IMPs

are  out  of  communication  for  a  period of time which is long

enough for the sequence numbers  to  have  wrapped  around,  this

procedure  ensures  that  proper  resynchronization  of  sequence

numbers is effected when communication is re-established.


     There is just one more facet of the updating  process  which

needs  to  be  discussed.   Because  of  the way the line up/down

protocol works, a line cannot be  brought  up  until  60  seconds

after  its performance becomes good enough to warrant operational

use.  (Roughly speaking, this is the time it takes  to  determine


                              - 7 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

that  the  line's  performance  is  good  enough.)   During  this

60-second period, no data is sent  over  the  line,  but  routing

updates are transmitted.  Remember that every node is required to

generate  a  routing update at least once per minute.  Therefore,

this procedure ensures that if two IMPs are out of  communication

because  of  the  failure  of some line, each has the most recent

update  from   the   other   by   the   time   communication   is

re-established.


     This  very  short  introduction  to  the routing algorithm's

updating protocol should provide enough background to enable  the

reader  to  understand  the  particular problem under discussion;

further justification and detail can be found in the  references.


     Let  us  return now to the discussion of the network outage.

I have already mentioned that the core dumps  showed  almost  all

buffers   holding  routing  updates  which  were  waiting  to  be

processed.  Close inspection showed that  all  the  updates  were

from  a  single  IMP, IMP 50.  By a strange "coincidence," IMP 50

had been  malfunctioning  just  before  the  network-wide  outage

occurred,  and  was  off the net during the period of the outage.

Hence it was not generating any updates during the period of  the

outage.   In  addition,  IMP 29, an immediate neighbor of IMP 50,

was also suffering hardware malfunctions (in particular, dropping

bits), but was up (though somewhat flakey) while the network  was

in  bad  shape.  Furthermore, the malfunction in IMP 50 had to do

with its ability to communicate properly with the neighboring IMP

                              - 8 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

29.  Although we did not yet understand how it was  possible  for

so  many updates from one IMP to be extant simultaneously, we did

understand enough to be able to get the network to recover.   All

that was necessary was to patch the IMPs to disregard any updates

from  IMP  50, which after all was down anyway.  When the network

is operating normally, broadcasting a patch to all  IMPs  can  be

done  in  a  matter of minutes.  With the network operating as it

was during the period of the outage, this can take as much  as  3

or  4 hours.  (Remember that the IMPs are generally unmanned, and

that the only way of controlling them from the  NCC  is  via  the

network  itself.   This  is perfectly satisfactory when an outage

affects only a small group of IMPs, but  is  an  obvious  problem

when  the  outage  has network-wide effects.)  This procedure was

fully successful in bringing the network back up.


     When we looked closely at the dumps, we saw  that  not  only

were  all  the updates on the queue from IMP 50, but they all had

one of three sequence numbers (either 8, 40,  or  44),  and  were

ordered        in        the        queue       as       follows:

8, 40, 44, 8, 40, 44, 8, 40, 44, ...  Note that by the definition

of LATER, 44 is LATER than 40 (44 > 40 and 44 - 40 <= 32), 40  is

LATER  than  8  (40 > 8 and 40 - 8 <= 32), and 8 is LATER than 44

(8 < 44 and 44 - 8 > 32).  Given the presence  of  three  updates

from the same IMP with these three sequence numbers, this is what

would  be  expected.   Since each update is LATER than one of the

others, a cycle is formed which keeps the three updates  floating


                              - 9 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

around  the  network  indefinitely.   Thus the IMPs spend most of

their CPU time and buffer space in processing these updates.  The

problem was to figure out how these three updates could  possibly

have  existed at the same time.  After all, getting from update 8

to update 40  should  require  2  or  3  full  minutes,  plus  31

intervening  sequence  numbers.   So  how could 8 still be around

when  40  was  generated,  especially  since  no   updates   with

intervening sequence numbers were present?


     Our  first thought was that maybe the real-time clock in IMP

50 was running one or two orders of magnitude faster than normal,

invalidating our assumptions about the maximum number of  updates

which  could  be  generated  in  a  given  time.   An alternative

hypothesis suggested itself however when we looked at the  binary

representations of the three sequence numbers:


          8 - 001000

         40 - 101000

         44 - 101100


Note  that  44  has only one more bit than 40, which has only one

more bit than 8.  Furthermore, the three different  updates  were

completely  identical,  except  for their sequence numbers.  This

suggests that  there  was  really  only  one  update,  44,  whose

sequence number was twice corrupted by dropped bits.  (Of course,

it's  also  possible  that  the  "real"  update  was  8,  and was

corrupted by added bits.  However, bit-dropping has proven itself


                             - 10 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

to be a much  more  common  sort  of  hardware  malfunction  than

bit-adding,  although  spontaneously  dropped  bits may sometimes

come back on spontaneously.)


     Surely, the reader will object,  there  must  be  protection

against  dropped  bits.   Yes there is protection, but apparently

not enough.  The update packets themselves are checksummed, so  a

dropped  bit  in  an update packet is readily detected.  Remember

though that if  an  update  needs  to  be  retransmitted,  it  is

recreated  from tabled information.  For maximal reliability, the

tables must  be  checksummed  also,  and  the  checksum  must  be

recomputed every time the table is accessed.  However, this would

require  either  a  large  number  of  CPU  cycles  (for frequent

checksumming of a large area of memory)  or  a  large  amount  of

memory  (to store the checksums for a lot of small areas).  Since

CPU cycles and memory are both potentially scarce resources, this

did not seem to us to  be  a  cost-effective  way  to  deal  with

problems  that  arise, say, once per year (this is the first such

problem encountered in a year and a half of running this  routing

algorithm).   Time  and  space  can  be  saved by recomputing the

checksum at  a  somewhat  slower  frequency,  but  this  is  less

reliable,  in  that it allows a certain number of dropped bits to

"fall between the cracks."  It seems likely then that one of  the

malfunctioning  IMPs  had to retransmit update 44 at least twice,

(recreating it each time from tabled information), retransmitting

it at least once with the corrupted sequence number  40,  and  at


                             - 11 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

least  once  with  the  corrupted  sequence number 8.  This would

cause those three sequence numbers to be extant  in  the  network

simultaneously,  even  though protocol is supposed to ensure that

this is impossible.


     Actually, the detection of dropped bits is most  properly  a

hardware function.  The next generation of IMP hardware (the "C30

IMP")  will  be able to detect and correct all single-bit errors,

and will detect all other bit errors.  Uncorrectable  bit  errors

will  cause  the  IMP to go into its "loader/dumper."  (An IMP in

its loader/dumper is not usable for  transferring  data,  and  is

officially   in  the  "down"  state.   However,  an  IMP  in  its

loader/dumper is easily controllable from the  NCC,  and  can  be

restarted  or  reloaded  without  on-site intervention.)  Current

hardware does have parity checking (which  should  detect  single

dropped  bits),  but  this feature has had to be turned off since

(a) there are too many spurious parity "errors,"  i.e.,  most  of

the  time when the machines complain of parity errors there don't

really seem to be any, and (b) parity errors cause  the  machines

to  simply  halt, rather than go into their loader/dumpers, which

means that on-site intervention is required to restart them.


     Pending the introduction of improved hardware, what  can  be

done  to prevent problems like this from recurring in the future?

It is easy to think of many  ways  of  avoiding  this  particular

problem,  especially  if  one does not consider the problems that

may arise from the "fixes."  For example, we  might  be  able  to

                             - 12 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

avoid  this  sort of problem by spending a lot more CPU cycles on

checksumming, but this may be too expensive because of  the  side

effects  it  would  introduce.   (Also,  it is not clear that any

memory checksumming strategy can be totally free of "cracks.")  A

very  simple  and  conservative  fix  to  prevent this particular

problem from recurring is to modify clause (a) of the  definition

of  LATER  so  that  the  "<="  is replaced by "<" (strictly less

than).  We will implement this fix, but it cannot  be  guaranteed

that no related problems will ever arise.


     What  is  really  needed  is  not some particular fix to the

routing algorithm, but a more general fix.  In  some  sense,  the

problem  we  saw  was  not really a routing problem.  The routing

code was working correctly, and the routes  that  were  generated

were correct and consistent.  The real problem is that a freakish

hardware  malfunction caused a high priority process to run wild,

devouring resources needed by other processes, thereby making the

network unusable.  The fact that the wild process was the routing

process is incidental.  In  designing  the  routing  process,  we

carefully  considered the amount of resource utilization it would

require.  By strictly controlling and limiting the rate at  which

updates  can  be  generated, we tried to prevent any situation in

which the routing process would make  excessive  demands  on  the

system.   As  we  have  seen  though, even our carefully designed

mechanisms were unable to protect against every possible sort  of

hardware  failure.  We need a better means of detecting that some


                             - 13 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

high priority process in the IMP, despite all the  safeguards  we

have  built in, is still consuming too many resources.  Once this

is  detected,  the  IMP  can  be  automatically  placed  in   its

loader/dumper.  In the case under discussion, we would have liked

to  have  all  the  IMPs  go  into  their loader/dumpers when the

problem arose.  This would have enabled us to  re-initialize  and

restart  all  the  IMPs  much more quickly.  (Although restarting

individual  IMPs  did  little  good,  restarting  all  the   IMPs

simultaneously would have cleared up the problem instantly, since

all  routing  tables  in  all  IMPs  would  have been initialized

simultaneously.)  It took us no more than an hour to  figure  out

how  to  restore  the  network;  several  additional  hours  were

required because it took so long for us to gain  control  of  the

misbehaving  IMPs  and  get  them  back  to  normal.   A built-in

software alarm system (assuming,  of  course,  that  it  was  not

subject  to  false  alarms)  might have enabled us to restore the

network more quickly, significantly reducing the duration of  the

outage.   This  is  not  to  say  that a better alarm and control

system could ever be a replacement for careful study  and  design

which   attempts   to  properly  distribute  the  utilization  of

important resources, but only that it is a necessary adjunct,  to

handle  the cases that will inevitably fall between the cracks of

even the most careful design.







                             - 14 -
^L
RFC 789                              Bolt Beranek and Newman Inc.
                                                    Eric C. Rosen

                           REFERENCES



"The New Routing Algorithm for the ARPANET," IEEE TRANSACTIONS ON

COMMUNICATIONS, May 1980, J.M. McQuillan, I. Richer, E.C.  Rosen.


"The  Updating  Protocol  of  ARPANET's  New  Routing Algorithm,"

COMPUTER NETWORKS, February 1980, E.C. Rosen.








































                             - 15 -
^L