summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc813.txt
blob: 5817050e36c5812dbdff1e86e67c270df2286bed (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
RFC:  813


              
                WINDOW AND ACKNOWLEDGEMENT STRATEGY IN TCP

                             David D. Clark
                  MIT Laboratory for Computer Science
               Computer Systems and Communications Group
                               July, 1982


     1.  Introduction


     This  document describes implementation strategies to deal with two

mechanisms in TCP, the window and the acknowledgement.  These mechanisms

are described in the specification document, but it is  possible,  while

complying with the specification, to produce implementations which yield

very  bad  performance.    Happily,  the pitfalls possible in window and

acknowledgement strategies are very easy to avoid.


     It is a much more difficult exercise to verify the performance of a

specification than the correctness.  Certainly, we have less  experience

in  this  area,  and  we  certainly  lack  any  useful formal technique.

Nonetheless, it is important to attempt a specification  in  this  area,

because  different  implementors  might  otherwise  choose superficially

reasonable algorithms  which  interact  poorly  with  each  other.  This

document  presents  a  particular  set of algorithms which have received

testing in the field, and which appear to work properly with each other.

With more experience, these algorithms may become  part  of  the  formal

specification:  until such time their use is recommended.
^L
                                   2


2.  The Mechanisms


     The acknowledgement mechanism is at the heart of TCP.  Very simply,

when  data  arrives at the recipient, the protocol requires that it send

back an acknowledgement of this data.  The protocol specifies  that  the

bytes  of  data  are  sequentially  numbered,  so that the recipient can

acknowledge data by naming the highest numbered  byte  of  data  it  has

received,  which  also  acknowledges  the  previous  bytes (actually, it

identifies the first byte of data which it has  not  yet  received,  but

this is a small detail).  The protocol contains only a general assertion

that  data  should  be acknowledged promptly, but gives no more specific

indication as to how quickly an acknowledgement must  be  sent,  or  how

much data should be acknowledged in each separate acknowledgement.


     The window mechanism is a flow control tool.  Whenever appropriate,

the  recipient of data returns to the sender a number, which is (more or

less) the size of the buffer which the receiver currently has  available

for  additional  data.   This number of bytes, called the window, is the

maximum which the sender is permitted to  transmit  until  the  receiver

returns  some  additional  window.  Sometimes, the receiver will have no

buffer space available, and will return a window value of zero.    Under

these  circumstances,the  protocol  requires  the sender to send a small

segment to the receiver now and then, to see if more data  is  accepted.

If  the  window  remains closed at zero for some substantial period, and

the sender can obtain  no  response  from  the  receiver,  the  protocol

requires  the  sender  to  conclude that the receiver has failed, and to

close  the  connection.    Again,  there  is  very  little   performance
^L
                                   3


information  in  the  specification, describing under what circumstances

the window should be increased, and how the  sender  should  respond  to

such revised information.


     A  bad implementation of the window algorithm can lead to extremely

poor performance overall.  The degradations which  occur  in  throughput

and  CPU  utilizations  can easily be several factors of ten, not just a

fractional increase.  This particular phenomenon is specific enough that

it has been given the name of Silly Window Syndrome, or  SWS.    Happily

SWS  is  easy  to  avoid  if  a few simple rules are observed.  The most

important function of this memo is to describe SWS, so that implementors

will understand the general nature  of  the  problem,  and  to  describe

algorithms  which  will  prevent  its  occurrence.    This document also

describes   performance   enhancing   algorithms   which    relate    to

acknowledgement,  and  discusses  the  way  acknowledgement  and  window

algorithms interact as part of SWS.


     3.  SILLY WINDOW SYNDROME


     In order to understand SWS, we must first  define  two  new  terms.

Superficially,  the window mechanism is very simple:  there is a number,

called "the window", which is returned from the receiver to the  sender.

However,  we  must have a more detailed way of talking about the meaning

of this number.  The receiver of data computes a  value  which  we  will

call  the  "offered  window".    In  a  simple  case, the offered window

corresponds to the amount of buffer space  available  in  the  receiver.

This  correspondence  is  not necessarily exact, but is a suitable model

for the discussion to follow.    It  is  the  offered  window  which  is
^L
                                   4


actually  transmitted  back from the receiver to the sender.  The sender

uses the offered window  to  compute  a  different  value,  the  "usable

window",  which  is  the  offered window minus the amount of outstanding

unacknowledged data.  The usable window is less than  or  equal  to  the

offered window, and can be much smaller.


     Consider  the  following  simple  example.   The receiver initially

provides an offered window of 1,000.  The sender uses up this window  by

sending  five  segments  of 200 bytes each.  The receiver, on processing

the first of these  segments,  returns  an  acknowledgement  which  also

contains  an  updated  window value.  Let us assume that the receiver of

the data has removed the first 200 bytes from the buffer,  so  that  the

receiver once again has 1,000 bytes of available buffer.  Therefore, the

receiver would return, as before, an offered window of 1,000 bytes.  The

sender,  on  receipt  of  this  first  acknowledgement, now computes the

additional number of bytes which may be sent.  In  fact,  of  the  1,000

bytes  which  the recipient is prepared to receive at this time, 800 are

already in transit, having been sent in response to the previous offered

window.  In this case, the usable window is only 200 bytes.


     Let us now consider how SWS  arises.    To  continue  the  previous

example,  assume  that at some point, when the sender computes a useable

window of 200 bytes, it has only 50 bytes to send  until  it  reaches  a

"push"  point.   It thus sends 50 bytes in one segment, and 150 bytes in

the next segment. Sometime later, this 50-byte segment  will  arrive  at

the recipient, which will process and remove the 50 bytes and once again

return  an  offered window of 1,000 bytes.  However, the sender will now
^L
                                   5


compute  that there are 950 bytes in transit in the network, so that the

useable window is now only 50 bytes.  Thus, the sender will  once  again

send  a  50  byte  segment,  even  though  there  is no longer a natural

boundary to force it.


     In fact, whenever the acknowledgement  of  a  small  segment  comes

back, the useable window associated with that acknowledgement will cause

another  segment  of  the  same  small  size  to  be  sent,  until  some

abnormality breaks the pattern.  It is easy to see  how  small  segments

arise,  because  natural  boundaries  in the data occasionally cause the

sender to take a computed useable window and divide it  up  between  two

segments.   Once that division has occurred, there is no natural way for

those useable window allocations to be recombined; thus the breaking  up

of the useable window into small pieces will persist.


     Thus,  SWS  is a degeneration in the throughput which develops over

time, during a long data transfer.  If the sender  ever  stops,  as  for

example  when  it runs out of data to send, the receiver will eventually

acknowledge all  the  outstanding  data,  so  that  the  useable  window

computed  by  the  sender  will  equal  the  full  offered window of the

receiver.  At this point the situation will  have  healed,  and  further

data  transmission  over  the  link will occur efficiently.  However, in

large file transfers, which occur without interruption,  SWS  can  cause

appalling  performance.  The network between the sender and the receiver

becomes clogged with  many  small  segments,  and  an  equal  number  of

acknowledgements,  which  in  turn  causes lost segments, which triggers

massive retransmission.  Bad cases of SWS have been seen  in  which  the
^L
                                   6


average  segment  size was one-tenth of the size the sender and receiver

were prepared to deal with, and the average number of retransmission per

successful segments sent was five.


     Happily, SWS is trivial to avoid.  The following sections  describe

two  algorithms,  one  executed  by the sender, and one by the receiver,

which appear to eliminate SWS completely.  Actually, either algorithm by

itself is sufficient to prevent SWS, and thus  protect  a  host  from  a

foreign  implementation  which  has  failed  to  deal properly with this

problem.  The  two  algorithms  taken  together  produce  an  additional

reduction  in  CPU  consumption, observed in practice to be as high as a

factor of four.


     4.  Improved Window Algorithms


     The receiver of data can take a very simple step to eliminate  SWS.

When  it  disposes of a small amount of data, it can artificially reduce

the offered window in subsequent acknowledgements, so that  the  useable

window computed by the sender does not permit the sending of any further

data.     At  some  later  time,  when  the  receiver  has  processed  a

substantially larger amount of incoming data, the artificial  limitation

on  the  offered  window  can be removed all at once, so that the sender

computes a sudden large jump rather than a sequence of  small  jumps  in

the useable window.


     At  this  level,  the  algorithm  is  quite simple, but in order to

determine exactly when the window should  be  opened  up  again,  it  is

necessary  to  look  at some of the other details of the implementation.
^L
                                   7


Depending  on whether the window is held artificially closed for a short

or long time, two problems will  develop.    The  one  we  have  already

discussed  -- never closing the window artificially -- will lead to SWS.

On the other hand, if  the  window  is  only  opened  infrequently,  the

pipeline  of data in the network between the sender and the receiver may

have emptied out while the sender was being held off, so that a delay is

introduced before additional data arrives from the sender.   This  delay

does reduce throughput, but it does not consume network resources or CPU

resources  in  the  process, as does SWS.  Thus, it is in this direction

that one ought to overcompensate.  For a simple implementation,  a  rule

of  thumb  that  seems to work in practice is to artificially reduce the

offered window until the reduction constitutes one half of the available

space, at which point increase the window to advertise the entire  space

again.  In any event, one ought to make the chunk by which the window is

opened  at  least permit one reasonably large segment.  (If the receiver

is so short of buffers that it can never advertise a large enough buffer

to permit at least one large segment, it is hopeless to expect any  sort

of high throughput.)


     There  is  an algorithm that the sender can use to achieve the same

effect described above:  a very simple and elegant rule first  described

by  Michael  Greenwald  at MIT.  The sender of the data uses the offered

window to compute a useable window, and then compares the useable window

to the offered window, and refrains from sending anything if  the  ratio

of  useable to offered is less than a certain fraction.  Clearly, if the

computed useable window is small compared to the  offered  window,  this

means  that a substantial amount of previously sent information is still
^L
                                   8


in  the  pipeline  from  the sender to the receiver, which in turn means

that the sender can count on being granted a larger  useable  window  in

the  future.    Until  the  useable window reaches a certain amount, the

sender should simply refuse to send anything.


     Simple experiments suggest that the exact value of the ratio is not

very important, but that a value of about 25 percent  is  sufficient  to

avoid  SWS  and  achieve reasonable throughput, even for machines with a

small offered window.    An  additional  enhancement  which  might  help

throughput  would be to attempt to hold off sending until one can send a

maximum size segment.  Another enhancement would be to send anyway, even

if the ratio is small, if the useable window is sufficient to  hold  the

data available up to the next "push point".


     This algorithm at the sender end is very simple.  Notice that it is

not  necessary  to  set  a timer to protect against protocol lockup when

postponing the  send  operation.    Further  acknowledgements,  as  they

arrive,  will  inevitably change the ratio of offered to useable window.

(To see this, note that when all the data in the  catanet  pipeline  has

arrived  at  the  receiver,  the resulting acknowledgement must yield an

offered window and  useable  window  that  equal  each  other.)  If  the

expected  acknowledgements  do  not arrive, the retransmission mechanism

will come into play to assure that something finally happens.  Thus,  to

add  this  algorithm  to an existing TCP implementation usually requires

one line of code.  As part of the send algorithm it is already necessary

to compute the useable window from the offered window.  It is  a  simple

matter  to add a line of code which, if the ratio is less than a certain
^L
                                   9


percent,  sets  the  useable  window to zero.  The results of SWS are so

devastating that no sender  should  be  without  this  simple  piece  of

insurance.


     5.  Improved Acknowledgement Algorithms


     In the beginning of this paper, an overly simplistic implementation

of  TCP  was described, which led to SWS.  One of the characteristics of

this implementation was that the  recipient  of  data  sent  a  separate

acknowledgement  for  every  segment  that it received.  This compulsive

acknowledgement  was  one  of  the   causes   of   SWS,   because   each

acknowledgement provided some new useable window, but even if one of the

algorithms  described  above  is  used to eliminate SWS, overly frequent

acknowledgement still has  a  substantial  problem,  which  is  that  it

greatly  increases the processing time at the sender's end.  Measurement

of TCP implementations, especially on large operating systems,  indicate

that  most  of  the  overhead  of  dealing  with a segment is not in the

processing at the TCP or IP level, but simply in the scheduling  of  the

handler which is required to deal with the segment.  A steady dribble of

acknowledgements  causes a high overhead in scheduling, with very little

to show for it.  This waste is to be avoided if possible.


     There are two reasons  for  prompt  acknowledgement.    One  is  to

prevent  retransmission.  We will discuss later how to determine whether

unnecessary  retransmission  is  occurring.    The  other   reason   one

acknowledges  promptly  is  to permit further data to be sent.  However,

the previous section makes quite clear that it is not  always  desirable

to send a little bit of data, even though the receiver may have room for
^L
                                   10


it.    Therefore,  one  can  state  a  general  rule  that  under normal

operation, the receiver of data need not,  and  for  efficiency  reasons

should  not,  acknowledge  the data unless either the acknowledgement is

intended to produce an increased useable window, is necessary  in  order

to  prevent  retransmission  or  is  being  sent  as  part  of a reverse

direction segment being sent for some other reason.  We will consider an

algorithm to achieve these goals.


     Only the recipient of  the  data  can  control  the  generation  of

acknowledgements.    Once  an  acknowledgement  has  been  sent from the

receiver back to the sender, the sender must process it.   Although  the

extra overhead is incurred at the sender's end, it is entirely under the

receiver's  control.  Therefore, we must now describe an algorithm which

occurs at the receiver's end.  Obviously, the algorithm  must  have  the

following  general form; sometimes the receiver of data, upon processing

a segment, decides not to send an acknowledgement now, but  to  postpone

the  acknowledgement until some time in the future, perhaps by setting a

timer.  The peril of this approach  is  that  on  many  large  operating

systems  it  is  extremely costly to respond to a timer event, almost as

costly as to respond to an incoming segment.  Clearly, if  the  receiver

of  the data, in order to avoid extra overhead at the sender end, spends

a great deal of time responding to timer interrupts, no overall  benefit

has been achieved, for efficiency at the sender end is achieved by great

thrashing  at  the  receiver end.  We must find an algorithm that avoids

both of these perils.


     The following scheme seems a good compromise.  The receiver of data
^L
                                   11


will   refrain   from   sending   an   acknowledgement   under   certain

circumstances, in which case it must set a timer which  will  cause  the

acknowledgement  to be sent later.  However, the receiver should do this

only where it is a reasonable guess that some other event will intervene

and prevent the necessity of the timer  interrupt.    The  most  obvious

event  on  which  to depend is the arrival of another segment.  So, if a

segment arrives, postpone sending an  acknowledgement  if  both  of  the

following  conditions  hold.    First,  the  push  bit is not set in the

segment, since it is a reasonable assumption that  there  is  more  data

coming  in  a  subsequent  segment.   Second, there is no revised window

information to be sent back.


     This algorithm will insure that the timer, although set, is  seldom

used.    The  interval  of  the  timer is related to the expected inter-

segment delay, which is in turn a function  of  the  particular  network

through  which  the  data  is  flowing.    For the Arpanet, a reasonable

interval seems to be 200 to 300 milliseconds.  Appendix A  describes  an

adaptive algorithm for measuring this delay.


     The section on improved window algorithms described both a receiver

algorithm  and  a  sender  algorithm,  and suggested that both should be

used.  The reason for this is now clear.  While the sender algorithm  is

extremely  simple,  and  useful  as insurance, the receiver algorithm is

required in order that this improved acknowledgement strategy work.   If

the  receipt  of every segment causes a new window value to be returned,

then of necessity  an  acknowledgement  will  be  sent  for  every  data

segment.    When, according to the strategy of the previous section, the
^L
                                   12


receiver  determines  to artificially reduce the offered window, that is

precisely the circumstance under which an acknowledgement  need  not  be

sent.      When   the   receiver   window  algorithm  and  the  receiver

acknowledgement algorithm are  used  together,  it  will  be  seen  that

sending  an  acknowledgement  will  be triggered by one of the following

events.  First, a push bit has been received.  Second, a temporary pause

in the data stream is detected.  Third,  the  offered  window  has  been

artificially reduced to one-half its actual value.


     In the beginning of this section, it was pointed out that there are

two  reasons  why  one must acknowledge data.  Our consideration at this

point has been concerned only with the first,  that  an  acknowledgement

must  be  returned as part of triggering the sending of new data.  It is

also necessary to acknowledge  whenever  the  failure  to  do  so  would

trigger retransmission by the sender.  Since the retransmission interval

is  selected  by  the  sender,  the  receiver  of the data cannot make a

precise  determination  of  when  the  acknowledgement  must  be   sent.

However,   there   is   a  rough  rule  the  sender  can  use  to  avoid

retransmission, provided that the receiver is reasonably well behaved.


     We will assume that sender of the data uses the optional  algorithm

described  in  the  TCP  specification,  in which the roundtrip delay is

measured using an exponential decay smoothing algorithm.  Retransmission

of a segment occurs if the measured delay for that segment  exceeds  the

smoothed  average  by  some  factor.  To see how retransmission might be

triggered, one must consider the pattern  of  segment  arrivals  at  the

receiver.   The goal of our strategy was that the sender should send off
^L
                                   13


a  number of segments in close sequence, and receive one acknowledgement

for the whole burst.  The  acknowledgement  will  be  generated  by  the

receiver  at  the time that the last segment in the burst arrives at the

receiver.  (To ensure the prompt  return  of  the  acknowledgement,  the

sender  could  turn on the "push" bit in the last segment of the burst.)

The delay observed at the sender between the initial transmission  of  a

segment  and  the  receipt  of the acknowledgement will include both the

network transit time, plus the  holding  time  at  the  receiver.    The

holding  time  will be greatest for the first segments in the burst, and

smallest for the last segments  in  the  burst.    Thus,  the  smoothing

algorithm  will  measure  a  delay  which is roughly proportional to the

average roundtrip delay for all the segments in  the  burst.    Problems

will  arise  if  the  average  delay  is  substantially smaller than the

maximum delay  and  the  smoothing  algorithm  used  has  a  very  small

threshold  for  triggering retransmission.  The widest variation between

average and maximum delay  will  occur  when  network  transit  time  is

negligible, and all delay is processing time.  In this case, the maximum

will  be  twice  the  average  (by simple algebra) so the threshold that

controls retransmission should be somewhat more than a factor of two.


     In practice, retransmission of the first segments of  a  burst  has

not  been  a  problem because the delay measured consists of the network

roundtrip  delay,  as  well  as  the  delay  due  to   withholding   the

acknowledgement,  and the roundtrip tends to dominate except in very low

roundtrip time situations (such as when sending to one's self  for  test

purposes).    This low roundtrip situation can be covered very simply by

including a minimum value below which  the  roundtrip  estimate  is  not

permitted to drop.
^L
                                   14


     In  our  experiments  with  this  algorithm,  retransmission due to

faulty calculation of the roundtrip delay occurred only once,  when  the

parameters  of  the exponential smoothing algorithm had been misadjusted

so that they were only  taking  into  account  the  last  two  or  three

segments  sent.   Clearly, this will cause trouble since the last two or

three segments of any burst are the  ones  whose  holding  time  at  the

receiver is minimal, so the resulting total estimate was much lower than

appropriate.   Once the parameters of the algorithm had been adjusted so

that the number of segments taken into account was  approximately  twice

the  number  of  segments  in  a burst of average size, with a threshold

factor of 1.5, no further retransmission has ever been identified due to

this problem, including when sending to ourself and  when  sending  over

high delay nets.


     6.  Conservative Vs. Optimistic Windows


     According  to the TCP specification, the offered window is presumed

to have some relationship to the amount of data which  the  receiver  is

actually  prepared  to receive.  However, it is not necessarily an exact

correspondence.  We will use the term "conservative window" to  describe

the case where the offered window is precisely no larger than the actual

buffering  available.  The drawback to conservative window algorithms is

that they can produce very low throughput in long delay situations.   It

is easy to see that the maximum input of a conservative window algorithm

is  one  bufferfull  every  roundtrip  delay  in the net, since the next

bufferfull cannot be launched until the  updated  window/acknowledgement

information from the previous transmission has made the roundtrip.
^L
                                   15


     In  certain  cases,  it  may  be  possible  to increase the overall

throughput of the transmission by increasing the offered window over the

actual buffer available at the receiver.  Such a strategy we  will  call

an  "optimistic  window" strategy.  The optimistic strategy works if the

network delivers the data to the recipient sufficiently slowly  that  it

can  process  the  data fast enough to keep the buffer from overflowing.

If the receiver is faster than the sender, one could, with luck,  permit

an infinitely optimistic window, in which the sender is simply permitted

to send full-speed.  If the sender is faster than the receiver, however,

and the window is too optimistic, then some segments will cause a buffer

overflow,  and  will  be  discarded.  Therefore, the correct strategy to

implement an optimistic window is to  increase  the  window  size  until

segments  start to be lost.  This only works if it is possible to detect

that the segment has been lost.  In  some  cases,  it  is  easy  to  do,

because  the  segment  is  partially processed inside the receiving host

before it is thrown away.  In other cases, overflows may actually  cause

the network interface to be clogged, which will cause the segments to be

lost  elsewhere  in the net.  It is inadvisable to attempt an optimistic

window strategy unless one is certain that the algorithm can detect  the

resulting  lost  segments.  However, the increase in throughput which is

possible from optimistic windows is quite substantial.  Any systems with

small buffer space should seriously consider  the  merit  of  optimistic

windows.


     The  selection  of an appropriate window algorithm is actually more

complicated than even the above  discussion  suggests.    The  following

considerations  are  not  presented  with  the  intention  that  they be
^L
                                   16


incorporated  in  current  implementations of TCP, but as background for

the sophisticated designer who is attempting to understand how  his  TCP

will  respond  to  a variety of networks, with different speed and delay

characteristics.  The particular pattern of windows and acknowledgements

sent from receiver to sender influences two characteristics of the  data

being  sent.    First, they control the average data rate.  Clearly, the

average rate of the  sender  cannot  exceed  the  average  rate  of  the

receiver,  or  long-term  buffer  overflow  will  occur.    Second, they

influence the burstiness of the data coming from the sender.  Burstiness

has both advantages and disadvantages.  The advantage of  burstiness  is

that  it  reduces  the  CPU processing necessary to send the data.  This

follows from the observed fact, especially on large machines, that  most

of  the  cost  of sending a segment is not the TCP or IP processing, but

the scheduling overhead of getting started.


     On the other hand, the disadvantage of burstiness is  that  it  may

cause  buffers  to overflow, either in the eventual recipient, which was

discussed above, or in an intermediate gateway,  a  problem  ignored  in

this paper.  The algorithms described above attempts to strike a balance

between  excessive  burstiness,  which  in  the  extreme cases can cause

delays because a burst is  not  requested  soon  enough,  and  excessive

fragmentation   of  the  data  stream  into  small  segments,  which  we

identified as Silly Window Syndrome.


     Under conditions of extreme delay  in  the  network,  none  of  the

algorithms   described   above   will   achieve   adequate   throughput.

Conservative window algorithms  have  a  predictable  throughput  limit,
^L
                                   17


which  is one windowfull per roundtrip delay.  Attempts to solve this by

optimistic window strategies may  cause  buffer  overflows  due  to  the

bursty  nature  of the arriving data.  A very sophisticated way to solve

this is for the receiver, having measured by some  means  the  roundtrip

delay  and  intersegment  arrival rate of the actual connection, to open

his window, not in one optimistic increment of gigantic proportion,  but

in  a number of smaller optimistic increments, which have been carefully

spaced using a timer so that the resulting smaller bursts  which  arrive

are each sufficiently small to fit into the existing buffers.  One could

visualize this as a number of requests flowing backwards through the net

which trigger in return a number of bursts which flow back spaced evenly

from  the  sender  to  the  receiver.    The  overall result is that the

receiver uses the window mechanism to  control  the  burstiness  of  the

arrivals, and the average rate.


     To  my knowledge, no such strategy has been implemented in any TCP.

First, we do not normally have delays high enough to require  this  kind

of  treatment.    Second,  the  strategy described above is probably not

stable unless it is very carefully balanced.  Just as buses on a  single

bus  route tend to bunch up, bursts which start out equally spaced could

well end up piling into each other, and forming the single  large  burst

which  the  receiver was hoping to avoid.  It is important to understand

this extreme case, however, in order to  understand  the  limits  beyond

which  TCP,  as normally implemented, with either conservative or simple

optimistic windows can be expected to  deliver  throughput  which  is  a

reasonable percentage of the actual network capacity.
^L
                                   18


     7.  Conclusions


     This  paper  describes  three  simple  algorithms  for  performance

enhancement in TCP, one at the sender end and two at the receiver.   The

sender  algorithm  is  to  refrain from sending if the useable window is

smaller than 25 percent of the offered window.  The receiver  algorithms

are first, to artificially reduce the offered window when processing new

data  if  the  resulting  reduction  does  not  represent more than some

fraction, say 50 percent, of the actual space available, and second,  to

refrain  from  sending an acknowledgment at all if two simple conditions

hold.


     Either of these algorithms will prevent the worst aspects of  Silly

Window  Syndrome, and when these algorithms are used together, they will

produce substantial improvement in CPU utilization, by  eliminating  the

process of excess acknowledgements.


     Preliminary  experiments  with  these  algorithms suggest that they

work, and work very well.  Both the sender and receiver algorithms  have

been  shown  to  eliminate  SWS,  even  when  talking  to  fairly  silly

algorithms at the other end.  The Multics  mailer,  in  particular,  had

suffered substantial attacks of SWS while sending large mail to a number

of  hosts.   We believe that implementation of the sender side algorithm

has  eliminated  every  known  case  of  SWS  detected  in  our  mailer.

Implementation  of  the  receiver  side  algorithm  produced substantial

improvements of CPU time when Multics was the sending system.    Multics

is  a  typical  large  operating system, with scheduling costs which are

large compared to the actual  processing  time  for  protocol  handlers.
^L
                                   19


Tests were done sending from Multics to a host which implemented the SWS

suppression  algorithm,  and  which  could  either  refrain  or not from

sending acknowledgements on each segment.  As predicted, suppressing the

return acknowledgements did not influence the throughput for large  data

transfer  at  all,  since the throttling effect was elsewhere.  However,

the CPU time required to process the data at the Multics end was cut  by

a  factor  of  four  (In  this experiment, the bursts of data which were

being sent were approximately eight  segments.    Thus,  the  number  of

acknowledgements in the two experiments differed by a factor of eight.)


     An  important  consideration in evaluating these algorithms is that

they must not cause the protocol implementations to deadlock.    All  of

the  recommendations  in this document have the characteristic that they

suggest one refrain  from  doing  something  even  though  the  protocol

specification  permits one to do it.  The possibility exists that if one

refrains from doing something now one may never get to do it later,  and

both  ends will halt, even though it would appear superficially that the

transaction can continue.


     Formally, the idea that things continue to work is referred  to  as

"liveness".    One  of  the  defects  of ad hoc solutions to performance

problems is the possibility that two different approaches will  interact

to  prevent  liveness.   It is believed that the algorithms described in

this paper are always live, and that is one of the reasons why there  is

a strong advantage in uniform use of this particular proposal, except in

cases where it is explicitly demonstrated not to work.


     The  argument  for liveness in these solutions proceeds as follows.
^L
                                   20


First,  the sender algorithm can only be stopped by one thing, a refusal

of the receiver to acknowledge sent data.    As  long  as  the  receiver

continues  to  acknowledge  data, the ratio of useable window to offered

window will approach one, and eventually the  sender  must  continue  to

send.    However,  notice  that the receiver algorithm we have advocated

involves refraining from acknowledging.  Therefore, we certainly do have

a situation where improper  operation  of  this  algorithm  can  prevent

liveness.


     What  we  must show is that the receiver of the data, if it chooses

to refrain from acknowledging, will do so only for a short time, and not

forever.  The design of the algorithm described above  was  intended  to

achieve  precisely  this  goal:  whenever the receiver of data refrained

from sending an acknowledgement it was required to set  a  timer.    The

only  event  that  was  permitted to clear that timer was the receipt of

another segment, which essentially reset the timer, and started it going

again.  Thus, an acknowledgement will be sent as soon  as  no  data  has

been received.  This has precisely the effect desired:  if the data flow

appears to be disrupted for any reason, the receiver responds by sending

an  up-to-date  acknowledgement.    In  fact,  the receiver algorithm is

designed  to  be  more  robust  than  this,  for  transmission   of   an

acknowledgment is triggered by two events, either a cessation of data or

a  reduction in the amount of offered window to 50 percent of the actual

value.    This  is  the  condition  which  will  normally  trigger   the

transmission of this acknowledgement.
^L
                                   21





                               APPENDIX A


     Dynamic Calculation of Acknowledgement Delay


     The  text  suggested  that  when  setting  a  timer to postpone the

sending  of  an  acknowledgement,  a  fixed  interval  of  200  to   300

milliseconds  would  work  properly  in  practice.    This  has not been

verified over a wide variety of network delays, and clearly if there  is

a  very  slow  net  which stretches out the intersegment arrival time, a

fixed interval will fail.  In a sophisticated TCP, which is expected  to

adjust   dynamically   (rather   than   manually)  to  changing  network

conditions, it would be appropriate to measure this interval and respond

dynamically.  The following algorithm, which has been  relegated  to  an

Appendix,  because  it  has not been tested, seems sensible.  Whenever a

segment arrives which does not have the push  bit  on  in  it,  start  a

timer,  which  runs  until  the  next  segment  arrives.   Average these

interarrival intervals, using an exponential  decay  smoothing  function

tuned  to take into account perhaps the last ten or twenty segments that

have come in.  Occasionally, there will be a long  interarrival  period,

even  for  a  segment  which is does not terminate a piece of data being

pushed, perhaps because a window has gone to zero or some glitch in  the

sender  or  the  network  has held up the data.  Therefore, examine each

interarrival interval, and discard it from the smoothing algorithm if it

exceeds the current estimate by some amount, perhaps a ratio of  two  or

four times.  By rejecting the larger intersegment arrival intervals, one

should obtain a smoothed estimate of the interarrival of segments inside
^L
                                   22


a  burst.   The number need not be exact, since the timer which triggers

acknowledgement can add a fairly generous fudge factor to  this  without

causing  trouble  with  the  sender's  estimate  of  the  retransmission

interval, so long as the fudge factor is constant.