1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
|
RFC: 813
WINDOW AND ACKNOWLEDGEMENT STRATEGY IN TCP
David D. Clark
MIT Laboratory for Computer Science
Computer Systems and Communications Group
July, 1982
1. Introduction
This document describes implementation strategies to deal with two
mechanisms in TCP, the window and the acknowledgement. These mechanisms
are described in the specification document, but it is possible, while
complying with the specification, to produce implementations which yield
very bad performance. Happily, the pitfalls possible in window and
acknowledgement strategies are very easy to avoid.
It is a much more difficult exercise to verify the performance of a
specification than the correctness. Certainly, we have less experience
in this area, and we certainly lack any useful formal technique.
Nonetheless, it is important to attempt a specification in this area,
because different implementors might otherwise choose superficially
reasonable algorithms which interact poorly with each other. This
document presents a particular set of algorithms which have received
testing in the field, and which appear to work properly with each other.
With more experience, these algorithms may become part of the formal
specification: until such time their use is recommended.
^L
2
2. The Mechanisms
The acknowledgement mechanism is at the heart of TCP. Very simply,
when data arrives at the recipient, the protocol requires that it send
back an acknowledgement of this data. The protocol specifies that the
bytes of data are sequentially numbered, so that the recipient can
acknowledge data by naming the highest numbered byte of data it has
received, which also acknowledges the previous bytes (actually, it
identifies the first byte of data which it has not yet received, but
this is a small detail). The protocol contains only a general assertion
that data should be acknowledged promptly, but gives no more specific
indication as to how quickly an acknowledgement must be sent, or how
much data should be acknowledged in each separate acknowledgement.
The window mechanism is a flow control tool. Whenever appropriate,
the recipient of data returns to the sender a number, which is (more or
less) the size of the buffer which the receiver currently has available
for additional data. This number of bytes, called the window, is the
maximum which the sender is permitted to transmit until the receiver
returns some additional window. Sometimes, the receiver will have no
buffer space available, and will return a window value of zero. Under
these circumstances,the protocol requires the sender to send a small
segment to the receiver now and then, to see if more data is accepted.
If the window remains closed at zero for some substantial period, and
the sender can obtain no response from the receiver, the protocol
requires the sender to conclude that the receiver has failed, and to
close the connection. Again, there is very little performance
^L
3
information in the specification, describing under what circumstances
the window should be increased, and how the sender should respond to
such revised information.
A bad implementation of the window algorithm can lead to extremely
poor performance overall. The degradations which occur in throughput
and CPU utilizations can easily be several factors of ten, not just a
fractional increase. This particular phenomenon is specific enough that
it has been given the name of Silly Window Syndrome, or SWS. Happily
SWS is easy to avoid if a few simple rules are observed. The most
important function of this memo is to describe SWS, so that implementors
will understand the general nature of the problem, and to describe
algorithms which will prevent its occurrence. This document also
describes performance enhancing algorithms which relate to
acknowledgement, and discusses the way acknowledgement and window
algorithms interact as part of SWS.
3. SILLY WINDOW SYNDROME
In order to understand SWS, we must first define two new terms.
Superficially, the window mechanism is very simple: there is a number,
called "the window", which is returned from the receiver to the sender.
However, we must have a more detailed way of talking about the meaning
of this number. The receiver of data computes a value which we will
call the "offered window". In a simple case, the offered window
corresponds to the amount of buffer space available in the receiver.
This correspondence is not necessarily exact, but is a suitable model
for the discussion to follow. It is the offered window which is
^L
4
actually transmitted back from the receiver to the sender. The sender
uses the offered window to compute a different value, the "usable
window", which is the offered window minus the amount of outstanding
unacknowledged data. The usable window is less than or equal to the
offered window, and can be much smaller.
Consider the following simple example. The receiver initially
provides an offered window of 1,000. The sender uses up this window by
sending five segments of 200 bytes each. The receiver, on processing
the first of these segments, returns an acknowledgement which also
contains an updated window value. Let us assume that the receiver of
the data has removed the first 200 bytes from the buffer, so that the
receiver once again has 1,000 bytes of available buffer. Therefore, the
receiver would return, as before, an offered window of 1,000 bytes. The
sender, on receipt of this first acknowledgement, now computes the
additional number of bytes which may be sent. In fact, of the 1,000
bytes which the recipient is prepared to receive at this time, 800 are
already in transit, having been sent in response to the previous offered
window. In this case, the usable window is only 200 bytes.
Let us now consider how SWS arises. To continue the previous
example, assume that at some point, when the sender computes a useable
window of 200 bytes, it has only 50 bytes to send until it reaches a
"push" point. It thus sends 50 bytes in one segment, and 150 bytes in
the next segment. Sometime later, this 50-byte segment will arrive at
the recipient, which will process and remove the 50 bytes and once again
return an offered window of 1,000 bytes. However, the sender will now
^L
5
compute that there are 950 bytes in transit in the network, so that the
useable window is now only 50 bytes. Thus, the sender will once again
send a 50 byte segment, even though there is no longer a natural
boundary to force it.
In fact, whenever the acknowledgement of a small segment comes
back, the useable window associated with that acknowledgement will cause
another segment of the same small size to be sent, until some
abnormality breaks the pattern. It is easy to see how small segments
arise, because natural boundaries in the data occasionally cause the
sender to take a computed useable window and divide it up between two
segments. Once that division has occurred, there is no natural way for
those useable window allocations to be recombined; thus the breaking up
of the useable window into small pieces will persist.
Thus, SWS is a degeneration in the throughput which develops over
time, during a long data transfer. If the sender ever stops, as for
example when it runs out of data to send, the receiver will eventually
acknowledge all the outstanding data, so that the useable window
computed by the sender will equal the full offered window of the
receiver. At this point the situation will have healed, and further
data transmission over the link will occur efficiently. However, in
large file transfers, which occur without interruption, SWS can cause
appalling performance. The network between the sender and the receiver
becomes clogged with many small segments, and an equal number of
acknowledgements, which in turn causes lost segments, which triggers
massive retransmission. Bad cases of SWS have been seen in which the
^L
6
average segment size was one-tenth of the size the sender and receiver
were prepared to deal with, and the average number of retransmission per
successful segments sent was five.
Happily, SWS is trivial to avoid. The following sections describe
two algorithms, one executed by the sender, and one by the receiver,
which appear to eliminate SWS completely. Actually, either algorithm by
itself is sufficient to prevent SWS, and thus protect a host from a
foreign implementation which has failed to deal properly with this
problem. The two algorithms taken together produce an additional
reduction in CPU consumption, observed in practice to be as high as a
factor of four.
4. Improved Window Algorithms
The receiver of data can take a very simple step to eliminate SWS.
When it disposes of a small amount of data, it can artificially reduce
the offered window in subsequent acknowledgements, so that the useable
window computed by the sender does not permit the sending of any further
data. At some later time, when the receiver has processed a
substantially larger amount of incoming data, the artificial limitation
on the offered window can be removed all at once, so that the sender
computes a sudden large jump rather than a sequence of small jumps in
the useable window.
At this level, the algorithm is quite simple, but in order to
determine exactly when the window should be opened up again, it is
necessary to look at some of the other details of the implementation.
^L
7
Depending on whether the window is held artificially closed for a short
or long time, two problems will develop. The one we have already
discussed -- never closing the window artificially -- will lead to SWS.
On the other hand, if the window is only opened infrequently, the
pipeline of data in the network between the sender and the receiver may
have emptied out while the sender was being held off, so that a delay is
introduced before additional data arrives from the sender. This delay
does reduce throughput, but it does not consume network resources or CPU
resources in the process, as does SWS. Thus, it is in this direction
that one ought to overcompensate. For a simple implementation, a rule
of thumb that seems to work in practice is to artificially reduce the
offered window until the reduction constitutes one half of the available
space, at which point increase the window to advertise the entire space
again. In any event, one ought to make the chunk by which the window is
opened at least permit one reasonably large segment. (If the receiver
is so short of buffers that it can never advertise a large enough buffer
to permit at least one large segment, it is hopeless to expect any sort
of high throughput.)
There is an algorithm that the sender can use to achieve the same
effect described above: a very simple and elegant rule first described
by Michael Greenwald at MIT. The sender of the data uses the offered
window to compute a useable window, and then compares the useable window
to the offered window, and refrains from sending anything if the ratio
of useable to offered is less than a certain fraction. Clearly, if the
computed useable window is small compared to the offered window, this
means that a substantial amount of previously sent information is still
^L
8
in the pipeline from the sender to the receiver, which in turn means
that the sender can count on being granted a larger useable window in
the future. Until the useable window reaches a certain amount, the
sender should simply refuse to send anything.
Simple experiments suggest that the exact value of the ratio is not
very important, but that a value of about 25 percent is sufficient to
avoid SWS and achieve reasonable throughput, even for machines with a
small offered window. An additional enhancement which might help
throughput would be to attempt to hold off sending until one can send a
maximum size segment. Another enhancement would be to send anyway, even
if the ratio is small, if the useable window is sufficient to hold the
data available up to the next "push point".
This algorithm at the sender end is very simple. Notice that it is
not necessary to set a timer to protect against protocol lockup when
postponing the send operation. Further acknowledgements, as they
arrive, will inevitably change the ratio of offered to useable window.
(To see this, note that when all the data in the catanet pipeline has
arrived at the receiver, the resulting acknowledgement must yield an
offered window and useable window that equal each other.) If the
expected acknowledgements do not arrive, the retransmission mechanism
will come into play to assure that something finally happens. Thus, to
add this algorithm to an existing TCP implementation usually requires
one line of code. As part of the send algorithm it is already necessary
to compute the useable window from the offered window. It is a simple
matter to add a line of code which, if the ratio is less than a certain
^L
9
percent, sets the useable window to zero. The results of SWS are so
devastating that no sender should be without this simple piece of
insurance.
5. Improved Acknowledgement Algorithms
In the beginning of this paper, an overly simplistic implementation
of TCP was described, which led to SWS. One of the characteristics of
this implementation was that the recipient of data sent a separate
acknowledgement for every segment that it received. This compulsive
acknowledgement was one of the causes of SWS, because each
acknowledgement provided some new useable window, but even if one of the
algorithms described above is used to eliminate SWS, overly frequent
acknowledgement still has a substantial problem, which is that it
greatly increases the processing time at the sender's end. Measurement
of TCP implementations, especially on large operating systems, indicate
that most of the overhead of dealing with a segment is not in the
processing at the TCP or IP level, but simply in the scheduling of the
handler which is required to deal with the segment. A steady dribble of
acknowledgements causes a high overhead in scheduling, with very little
to show for it. This waste is to be avoided if possible.
There are two reasons for prompt acknowledgement. One is to
prevent retransmission. We will discuss later how to determine whether
unnecessary retransmission is occurring. The other reason one
acknowledges promptly is to permit further data to be sent. However,
the previous section makes quite clear that it is not always desirable
to send a little bit of data, even though the receiver may have room for
^L
10
it. Therefore, one can state a general rule that under normal
operation, the receiver of data need not, and for efficiency reasons
should not, acknowledge the data unless either the acknowledgement is
intended to produce an increased useable window, is necessary in order
to prevent retransmission or is being sent as part of a reverse
direction segment being sent for some other reason. We will consider an
algorithm to achieve these goals.
Only the recipient of the data can control the generation of
acknowledgements. Once an acknowledgement has been sent from the
receiver back to the sender, the sender must process it. Although the
extra overhead is incurred at the sender's end, it is entirely under the
receiver's control. Therefore, we must now describe an algorithm which
occurs at the receiver's end. Obviously, the algorithm must have the
following general form; sometimes the receiver of data, upon processing
a segment, decides not to send an acknowledgement now, but to postpone
the acknowledgement until some time in the future, perhaps by setting a
timer. The peril of this approach is that on many large operating
systems it is extremely costly to respond to a timer event, almost as
costly as to respond to an incoming segment. Clearly, if the receiver
of the data, in order to avoid extra overhead at the sender end, spends
a great deal of time responding to timer interrupts, no overall benefit
has been achieved, for efficiency at the sender end is achieved by great
thrashing at the receiver end. We must find an algorithm that avoids
both of these perils.
The following scheme seems a good compromise. The receiver of data
^L
11
will refrain from sending an acknowledgement under certain
circumstances, in which case it must set a timer which will cause the
acknowledgement to be sent later. However, the receiver should do this
only where it is a reasonable guess that some other event will intervene
and prevent the necessity of the timer interrupt. The most obvious
event on which to depend is the arrival of another segment. So, if a
segment arrives, postpone sending an acknowledgement if both of the
following conditions hold. First, the push bit is not set in the
segment, since it is a reasonable assumption that there is more data
coming in a subsequent segment. Second, there is no revised window
information to be sent back.
This algorithm will insure that the timer, although set, is seldom
used. The interval of the timer is related to the expected inter-
segment delay, which is in turn a function of the particular network
through which the data is flowing. For the Arpanet, a reasonable
interval seems to be 200 to 300 milliseconds. Appendix A describes an
adaptive algorithm for measuring this delay.
The section on improved window algorithms described both a receiver
algorithm and a sender algorithm, and suggested that both should be
used. The reason for this is now clear. While the sender algorithm is
extremely simple, and useful as insurance, the receiver algorithm is
required in order that this improved acknowledgement strategy work. If
the receipt of every segment causes a new window value to be returned,
then of necessity an acknowledgement will be sent for every data
segment. When, according to the strategy of the previous section, the
^L
12
receiver determines to artificially reduce the offered window, that is
precisely the circumstance under which an acknowledgement need not be
sent. When the receiver window algorithm and the receiver
acknowledgement algorithm are used together, it will be seen that
sending an acknowledgement will be triggered by one of the following
events. First, a push bit has been received. Second, a temporary pause
in the data stream is detected. Third, the offered window has been
artificially reduced to one-half its actual value.
In the beginning of this section, it was pointed out that there are
two reasons why one must acknowledge data. Our consideration at this
point has been concerned only with the first, that an acknowledgement
must be returned as part of triggering the sending of new data. It is
also necessary to acknowledge whenever the failure to do so would
trigger retransmission by the sender. Since the retransmission interval
is selected by the sender, the receiver of the data cannot make a
precise determination of when the acknowledgement must be sent.
However, there is a rough rule the sender can use to avoid
retransmission, provided that the receiver is reasonably well behaved.
We will assume that sender of the data uses the optional algorithm
described in the TCP specification, in which the roundtrip delay is
measured using an exponential decay smoothing algorithm. Retransmission
of a segment occurs if the measured delay for that segment exceeds the
smoothed average by some factor. To see how retransmission might be
triggered, one must consider the pattern of segment arrivals at the
receiver. The goal of our strategy was that the sender should send off
^L
13
a number of segments in close sequence, and receive one acknowledgement
for the whole burst. The acknowledgement will be generated by the
receiver at the time that the last segment in the burst arrives at the
receiver. (To ensure the prompt return of the acknowledgement, the
sender could turn on the "push" bit in the last segment of the burst.)
The delay observed at the sender between the initial transmission of a
segment and the receipt of the acknowledgement will include both the
network transit time, plus the holding time at the receiver. The
holding time will be greatest for the first segments in the burst, and
smallest for the last segments in the burst. Thus, the smoothing
algorithm will measure a delay which is roughly proportional to the
average roundtrip delay for all the segments in the burst. Problems
will arise if the average delay is substantially smaller than the
maximum delay and the smoothing algorithm used has a very small
threshold for triggering retransmission. The widest variation between
average and maximum delay will occur when network transit time is
negligible, and all delay is processing time. In this case, the maximum
will be twice the average (by simple algebra) so the threshold that
controls retransmission should be somewhat more than a factor of two.
In practice, retransmission of the first segments of a burst has
not been a problem because the delay measured consists of the network
roundtrip delay, as well as the delay due to withholding the
acknowledgement, and the roundtrip tends to dominate except in very low
roundtrip time situations (such as when sending to one's self for test
purposes). This low roundtrip situation can be covered very simply by
including a minimum value below which the roundtrip estimate is not
permitted to drop.
^L
14
In our experiments with this algorithm, retransmission due to
faulty calculation of the roundtrip delay occurred only once, when the
parameters of the exponential smoothing algorithm had been misadjusted
so that they were only taking into account the last two or three
segments sent. Clearly, this will cause trouble since the last two or
three segments of any burst are the ones whose holding time at the
receiver is minimal, so the resulting total estimate was much lower than
appropriate. Once the parameters of the algorithm had been adjusted so
that the number of segments taken into account was approximately twice
the number of segments in a burst of average size, with a threshold
factor of 1.5, no further retransmission has ever been identified due to
this problem, including when sending to ourself and when sending over
high delay nets.
6. Conservative Vs. Optimistic Windows
According to the TCP specification, the offered window is presumed
to have some relationship to the amount of data which the receiver is
actually prepared to receive. However, it is not necessarily an exact
correspondence. We will use the term "conservative window" to describe
the case where the offered window is precisely no larger than the actual
buffering available. The drawback to conservative window algorithms is
that they can produce very low throughput in long delay situations. It
is easy to see that the maximum input of a conservative window algorithm
is one bufferfull every roundtrip delay in the net, since the next
bufferfull cannot be launched until the updated window/acknowledgement
information from the previous transmission has made the roundtrip.
^L
15
In certain cases, it may be possible to increase the overall
throughput of the transmission by increasing the offered window over the
actual buffer available at the receiver. Such a strategy we will call
an "optimistic window" strategy. The optimistic strategy works if the
network delivers the data to the recipient sufficiently slowly that it
can process the data fast enough to keep the buffer from overflowing.
If the receiver is faster than the sender, one could, with luck, permit
an infinitely optimistic window, in which the sender is simply permitted
to send full-speed. If the sender is faster than the receiver, however,
and the window is too optimistic, then some segments will cause a buffer
overflow, and will be discarded. Therefore, the correct strategy to
implement an optimistic window is to increase the window size until
segments start to be lost. This only works if it is possible to detect
that the segment has been lost. In some cases, it is easy to do,
because the segment is partially processed inside the receiving host
before it is thrown away. In other cases, overflows may actually cause
the network interface to be clogged, which will cause the segments to be
lost elsewhere in the net. It is inadvisable to attempt an optimistic
window strategy unless one is certain that the algorithm can detect the
resulting lost segments. However, the increase in throughput which is
possible from optimistic windows is quite substantial. Any systems with
small buffer space should seriously consider the merit of optimistic
windows.
The selection of an appropriate window algorithm is actually more
complicated than even the above discussion suggests. The following
considerations are not presented with the intention that they be
^L
16
incorporated in current implementations of TCP, but as background for
the sophisticated designer who is attempting to understand how his TCP
will respond to a variety of networks, with different speed and delay
characteristics. The particular pattern of windows and acknowledgements
sent from receiver to sender influences two characteristics of the data
being sent. First, they control the average data rate. Clearly, the
average rate of the sender cannot exceed the average rate of the
receiver, or long-term buffer overflow will occur. Second, they
influence the burstiness of the data coming from the sender. Burstiness
has both advantages and disadvantages. The advantage of burstiness is
that it reduces the CPU processing necessary to send the data. This
follows from the observed fact, especially on large machines, that most
of the cost of sending a segment is not the TCP or IP processing, but
the scheduling overhead of getting started.
On the other hand, the disadvantage of burstiness is that it may
cause buffers to overflow, either in the eventual recipient, which was
discussed above, or in an intermediate gateway, a problem ignored in
this paper. The algorithms described above attempts to strike a balance
between excessive burstiness, which in the extreme cases can cause
delays because a burst is not requested soon enough, and excessive
fragmentation of the data stream into small segments, which we
identified as Silly Window Syndrome.
Under conditions of extreme delay in the network, none of the
algorithms described above will achieve adequate throughput.
Conservative window algorithms have a predictable throughput limit,
^L
17
which is one windowfull per roundtrip delay. Attempts to solve this by
optimistic window strategies may cause buffer overflows due to the
bursty nature of the arriving data. A very sophisticated way to solve
this is for the receiver, having measured by some means the roundtrip
delay and intersegment arrival rate of the actual connection, to open
his window, not in one optimistic increment of gigantic proportion, but
in a number of smaller optimistic increments, which have been carefully
spaced using a timer so that the resulting smaller bursts which arrive
are each sufficiently small to fit into the existing buffers. One could
visualize this as a number of requests flowing backwards through the net
which trigger in return a number of bursts which flow back spaced evenly
from the sender to the receiver. The overall result is that the
receiver uses the window mechanism to control the burstiness of the
arrivals, and the average rate.
To my knowledge, no such strategy has been implemented in any TCP.
First, we do not normally have delays high enough to require this kind
of treatment. Second, the strategy described above is probably not
stable unless it is very carefully balanced. Just as buses on a single
bus route tend to bunch up, bursts which start out equally spaced could
well end up piling into each other, and forming the single large burst
which the receiver was hoping to avoid. It is important to understand
this extreme case, however, in order to understand the limits beyond
which TCP, as normally implemented, with either conservative or simple
optimistic windows can be expected to deliver throughput which is a
reasonable percentage of the actual network capacity.
^L
18
7. Conclusions
This paper describes three simple algorithms for performance
enhancement in TCP, one at the sender end and two at the receiver. The
sender algorithm is to refrain from sending if the useable window is
smaller than 25 percent of the offered window. The receiver algorithms
are first, to artificially reduce the offered window when processing new
data if the resulting reduction does not represent more than some
fraction, say 50 percent, of the actual space available, and second, to
refrain from sending an acknowledgment at all if two simple conditions
hold.
Either of these algorithms will prevent the worst aspects of Silly
Window Syndrome, and when these algorithms are used together, they will
produce substantial improvement in CPU utilization, by eliminating the
process of excess acknowledgements.
Preliminary experiments with these algorithms suggest that they
work, and work very well. Both the sender and receiver algorithms have
been shown to eliminate SWS, even when talking to fairly silly
algorithms at the other end. The Multics mailer, in particular, had
suffered substantial attacks of SWS while sending large mail to a number
of hosts. We believe that implementation of the sender side algorithm
has eliminated every known case of SWS detected in our mailer.
Implementation of the receiver side algorithm produced substantial
improvements of CPU time when Multics was the sending system. Multics
is a typical large operating system, with scheduling costs which are
large compared to the actual processing time for protocol handlers.
^L
19
Tests were done sending from Multics to a host which implemented the SWS
suppression algorithm, and which could either refrain or not from
sending acknowledgements on each segment. As predicted, suppressing the
return acknowledgements did not influence the throughput for large data
transfer at all, since the throttling effect was elsewhere. However,
the CPU time required to process the data at the Multics end was cut by
a factor of four (In this experiment, the bursts of data which were
being sent were approximately eight segments. Thus, the number of
acknowledgements in the two experiments differed by a factor of eight.)
An important consideration in evaluating these algorithms is that
they must not cause the protocol implementations to deadlock. All of
the recommendations in this document have the characteristic that they
suggest one refrain from doing something even though the protocol
specification permits one to do it. The possibility exists that if one
refrains from doing something now one may never get to do it later, and
both ends will halt, even though it would appear superficially that the
transaction can continue.
Formally, the idea that things continue to work is referred to as
"liveness". One of the defects of ad hoc solutions to performance
problems is the possibility that two different approaches will interact
to prevent liveness. It is believed that the algorithms described in
this paper are always live, and that is one of the reasons why there is
a strong advantage in uniform use of this particular proposal, except in
cases where it is explicitly demonstrated not to work.
The argument for liveness in these solutions proceeds as follows.
^L
20
First, the sender algorithm can only be stopped by one thing, a refusal
of the receiver to acknowledge sent data. As long as the receiver
continues to acknowledge data, the ratio of useable window to offered
window will approach one, and eventually the sender must continue to
send. However, notice that the receiver algorithm we have advocated
involves refraining from acknowledging. Therefore, we certainly do have
a situation where improper operation of this algorithm can prevent
liveness.
What we must show is that the receiver of the data, if it chooses
to refrain from acknowledging, will do so only for a short time, and not
forever. The design of the algorithm described above was intended to
achieve precisely this goal: whenever the receiver of data refrained
from sending an acknowledgement it was required to set a timer. The
only event that was permitted to clear that timer was the receipt of
another segment, which essentially reset the timer, and started it going
again. Thus, an acknowledgement will be sent as soon as no data has
been received. This has precisely the effect desired: if the data flow
appears to be disrupted for any reason, the receiver responds by sending
an up-to-date acknowledgement. In fact, the receiver algorithm is
designed to be more robust than this, for transmission of an
acknowledgment is triggered by two events, either a cessation of data or
a reduction in the amount of offered window to 50 percent of the actual
value. This is the condition which will normally trigger the
transmission of this acknowledgement.
^L
21
APPENDIX A
Dynamic Calculation of Acknowledgement Delay
The text suggested that when setting a timer to postpone the
sending of an acknowledgement, a fixed interval of 200 to 300
milliseconds would work properly in practice. This has not been
verified over a wide variety of network delays, and clearly if there is
a very slow net which stretches out the intersegment arrival time, a
fixed interval will fail. In a sophisticated TCP, which is expected to
adjust dynamically (rather than manually) to changing network
conditions, it would be appropriate to measure this interval and respond
dynamically. The following algorithm, which has been relegated to an
Appendix, because it has not been tested, seems sensible. Whenever a
segment arrives which does not have the push bit on in it, start a
timer, which runs until the next segment arrives. Average these
interarrival intervals, using an exponential decay smoothing function
tuned to take into account perhaps the last ten or twenty segments that
have come in. Occasionally, there will be a long interarrival period,
even for a segment which is does not terminate a piece of data being
pushed, perhaps because a window has gone to zero or some glitch in the
sender or the network has held up the data. Therefore, examine each
interarrival interval, and discard it from the smoothing algorithm if it
exceeds the current estimate by some amount, perhaps a ratio of two or
four times. By rejecting the larger intersegment arrival intervals, one
should obtain a smoothed estimate of the interarrival of segments inside
^L
22
a burst. The number need not be exact, since the timer which triggers
acknowledgement can add a fairly generous fudge factor to this without
causing trouble with the sender's estimate of the retransmission
interval, so long as the fudge factor is constant.
|