diff options
Diffstat (limited to 'doc/rfc/rfc889.txt')
-rw-r--r-- | doc/rfc/rfc889.txt | 684 |
1 files changed, 684 insertions, 0 deletions
diff --git a/doc/rfc/rfc889.txt b/doc/rfc/rfc889.txt new file mode 100644 index 0000000..1d478d0 --- /dev/null +++ b/doc/rfc/rfc889.txt @@ -0,0 +1,684 @@ +Network Working Group D.L. Mills +Request for Comments: 889 December 1983 + + + Internet Delay Experiments + +This memo reports on some measurement experiments and suggests some possible +improvements to the TCP retransmission timeout calculation. This memo is +both a status report on the measurements and advice to implementers of TCP. + +1. Introduction + + This memorandum describes two series of experiments designed to explore +the transmission characteristics of the Internet system. One series of +experiments was designed to determine the network delays with respect to +packet length, while the other was designed to assess the effectiveness of the +TCP retransmission-timeout algorithm specified in the standards documents. +Both sets of experiments were conducted during the October - November 1983 +time frame and used many hosts distributed throughout the Internet system. + + The objectives of these experiments were first to accumulate experimental +data on actual network paths that could be used as a benchmark of Internet +system performance, and second to apply these data to refine individual TCP +implementations and improve their performance. + + The experiments were done using a specially instrumented measurement host +called a Fuzzball, which consists of an LSI-11 running IP/TCP and various +application-layer protocols including TELNET, FTP and SMTP mail. Among the +various measurement packages is the original PING (Packet InterNet Groper) +program used over the last six years for numerous tests and measurements of +the Internet system and its client nets. This program contains facilities to +send various kinds of probe packets, including ICMP Echo messages, process the +reply and record elapsed times and other information in a data file, as well +as produce real-time snapshot histograms and traces. + + Following an experiment run, the data collected in the file were reduced +by another set of programs and plotted on a Peritek bit-map display with color +monitor. The plots have been found invaluable in the indentification and +understanding of the causes of netword glitches and other "zoo" phenomena. +Finally, summary data were extracted and presented in this memorandum. The +raw data files, including bit-map image files of the various plots, are +available to other experimenters upon request. + + The Fuzzballs and their local-net architecture, called DCN, have about +two-dozen clones scattered worldwide, including one (DCN1) at the Linkabit +Corporation offices in McLean, Virginia, and another at the Norwegian +Telecommunications Adminstration (NTA) near Oslo, Norway. The DCN1 Fuzzball +is connected to the ARPANET at the Mitre IMP by means of 1822 Error Control +Units operating over a 56-Kbps line. The NTA Fuzzball is connected to the +NTARE Gateway by an 1822 interface and then via VDH/HAP operating over a +9.6-Kbps line to SATNET at the Tanum (Sweden) SIMP. For most experiments +described below, these details of the local connectivity can be ignored, since +only relatively small delays are involved. + + + + +Internet Delay Experiments Page 2 +D.L. Mills + + + The remote test hosts were selected to represent canonical paths in the +Internet system and were scattered all over the world. They included some on +the ARPANET, MILNET, MINET, SATNET, TELENET and numerous local nets reachable +via these long-haul nets. As an example of the richness of the Internet +system connectivity and the experimental data base, data are included for +three different paths from the ARPANET-based measurement host to London hosts, +two via different satellite links and one via an undersea cable. + +2. Packet Length Versus Delay + + This set of experiments was designed to determine whether delays across +the Internet are significantly influenced by packet length. In cases where +the intrinsic propagation delays are high relative to the time to transmit an +individual packet, one would expect that delays would not be strongly affected +by packet length. This is the case with satellite nets, including SATNET and +WIDEBAND, but also with terrestrial nets where the degree of traffic +aggregation is high, so that the measured traffic is a small proportion of the +total traffic on the path. However, in cases where the intrinsic propagation +delays are low and the measured traffic represents the bulk of the traffic on +the path, quite the opposite would be expected. + + The objective of the experiments was to assess the degree to which TCP +performance could be improved by refining the retransmission-timeout algorithm +to include a dependency on packet length. Another objective was to determine +the nature of the delay characteristic versus packet length on tandem paths +spanning networks of widely varying architectures, including local-nets, +terrestrial long-haul nets and satellite nets. + +2.1. Experiment Design + + There were two sets of experiments to measure delays as a function of +packet length. One of these was based at DCN1, while the other was based at +NTA. All experiments used ICMP Echo/Reply messages with embedded timestamps. +A cycle consisted of sending an ICMP Echo message of specified length, waiting +for the corresponding ICMP Reply message to come back and recording the +elapsed time (normalized to one-way delay). An experiment run, resulting in +one line of the table below, consisted of 512 of these volleys. + + The length of each ICMP message was determined by a random-number +generator uniformly distributed between zero and 256. Lengths less than 40 +were rounded up to 40, which is the minimum datagram size for an ICMP message +containing timestamps and just happens to also be the minimum TCP segment +size. The maximum length was chosen to avoid complications due to +fragmentation and reassembly, since ICMP messages are not ordinarily +fragmented or reassembled by the gateways. + + The data collected were first plotted as a scatter diagram on a color +bit-map display. For all paths involving the ARPANET, this immediately +revealed two distinct characteristics, one for short (single-packet) messages +less than 126 octets in length and the other for long (multi-packet) messages + + + +Internet Delay Experiments Page 3 +D.L. Mills + + +longer than this. Linear regression lines were then fitted to each +characteristic with the results shown in the following table. (Only one +characteristic was assumed for ARPANET-exclusive paths.) The table shows for +each host the delays, in milliseconds, for each type of message along with a +rate computed on the basis of these delays. The "Host ID" column designates +the host at the remote end of the path, with a letter suffix used when +necessary to identify a particular run. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Delay Experiments Page 4 +D.L. Mills + + +Host Single-packet Rate Multi-packet Rate Comments +ID 40 125 (bps) 125 256 (bps) +--------------------------------------------------------------------------- +DCN1 to nearby local-net hosts (calibration) +DCN5 9 13 366422 DMA 1822 +DCN8 14 20 268017 Ethernet +IMP17 22 60 45228 56K 1822/ECU +FORD1 93 274 9540 9600 DDCMP base +UMD1 102 473 4663 4800 synch +DCN6 188 550 4782 4800 DDCMP +FACC 243 770 3282 9600/4800 DDCMP +FOE 608 1917 1320 9600/14.4K stat mux + +DCN1 to ARPANET hosts and local nets +MILARP 61 105 15358 133 171 27769 MILNET gateway +ISID-L 166 263 6989 403 472 15029 low-traffic period +SCORE 184 318 5088 541 608 15745 low-traffic period +RVAX 231 398 4061 651 740 11781 Purdue local net +AJAX 322 578 2664 944 1081 7681 MIT local net +ISID-H 333 520 3643 715 889 6029 high-traffic period +BERK 336 967 1078 1188 1403 4879 UC Berkeley +WASH 498 776 2441 1256 1348 11379 U Washington + +DCN1 to MILNET/MINET hosts and local nets +ISIA-L 460 563 6633 1049 1140 11489 low-traffic period +ISIA-H 564 841 2447 1275 1635 2910 high-traffic period +BRL 560 973 1645 1605 1825 4768 BRL local net +LON 585 835 2724 1775 1998 4696 MINET host (London) +HAWAII 679 980 2257 1817 1931 9238 a long way off +OFFICE3 762 1249 1396 2283 2414 8004 heavily loaded host +KOREA 897 1294 1712 2717 2770 19652 a long, long way off + +DCN1 to TELENET hosts via ARPANET +RICE 1456 2358 754 3086 3543 2297 via VAN gateway + +DCN1 to SATNET hosts and local nets via ARPANET +UCL 1089 1240 4514 1426 1548 8558 UCL zoo +NTA-L 1132 1417 2382 1524 1838 3339 low-traffic period +NTA-H 1247 1504 2640 1681 1811 8078 high-traffic period + +NTA to SATNET hosts +TANUM 107 368 6625 9600 bps Tanum line +ETAM 964 1274 5576 Etam channel echo +GOONY 972 1256 6082 Goonhilly channel echo + + + + + + + + + +Internet Delay Experiments Page 5 +D.L. Mills + + +2.2 Analysis of Results + + The data clearly show a strong correlation between delay and length, with +the longest packets showing delays two to three times the shortest. On paths +via ARPANET clones the delay characteristic shows a stonger correlation with +length for single-packet messages than for multi-packet messages, which is +consistent with a design which favors low delays for short messages and high +throughputs for longer ones. + + Most of the runs were made during off-peak hours. In the few cases where +runs were made for a particular host during both on-peak and off-peak hours, +comparison shows a greater dependency on packet length than on traffic shift. + + TCP implementors should be advised that some dependency on packet length +may have to be built into the retransmission-timeout estimation algorithm to +insure good performance over lossy nets like SATNET. They should also be +advised that some Internet paths may require stupendous timeout intervals +ranging to many seconds for the net alone, not to mention additional delays on +host-system queues. + + I call to your attention the fact that the delays (at least for the +larger packets) from ARPANET hosts (e.g. DCN1) to MILNET hosts (e.g. ISIA) +are in the same ballpark as the delays to SATNET hosts (e.g. UCL)! I have +also observed that the packet-loss rates on the MILNET path are at present not +neglible (18 in 512 for ISIA-2). Presumably, the loss is in the gateways; +however, there may well be a host or two out there swamping the gateways with +retransmitted data and which have a funny idea of the "normal" timeout +interval. The recent discovery of a bug in the TOPS-20 TCP implementation, +where spurious ACKs were generated at an alarming rate, would seem to confirm +that suspicion. + +3. Retransmission-Timeout Algorithm + + One of the basic features of TCP which allow it to be used on paths +spanning many nets of widely varying delay and packet-loss characteristics is +the retranansmission-timeout algorithm, sometimes known as the "RSRE +Algorithm" for the original designers. The algorithm operates by recording +the time and initial sequence number when a segment is transmitted, then +computing the elapsed time for that sequence number to be acknowledged. There +are various degrees of sophistication in the implementation of the algorithm, +ranging from allowing only one such computation to be in progress at a time to +allowing one for each segment outstanding at a time on the connection. + + The retransmission-timeout algorithm is basically an estimation process. +It maintains an extimate of the current roundtrip delay time and updates it as +new delay samples are computed. The algorithm smooths these samples and then +establishes a timeout, which if exceeded causes a retransmission. The +selection of the parameters of this algorithm are vitally important in order +to provide effective data transmission and avoid abuse of the Internet system +by excessive retransmissions. I have long been suspicious of the parameters + + + +Internet Delay Experiments Page 6 +D.L. Mills + + +suggested in the specification and used in some implementations, especially in +cases involving long-delay paths involving lossy nets. The experiment was +designed to simulate the operation of the algorithm using data collected from +real paths involving some pretty leaky Internet plumbing. + +3.1. Experiment Design + + The experiment data base was constructed of well over a hundred runs +using ICMP Echo/Reply messages bounced off hosts scattered all over the world. +Most runs, including all those summarized here, consisted of 512 echo/reply +cycles lasting from several seconds to twenty minutes or so. Other runs +designed to detect network glitches lasted several hours. Some runs used +packets of constant length, while others used different lengths distributed +from 40 to 256 octets. The maximum length was chosen to avoid complications +fragmented or reassembled by the gateways. + + The object of the experiment was to simulate the packet delay +distribution seen by TCP over the paths measured. Only the network delay is +of interest here, not the queueing delays within the hosts themselves, which +can be considerable. Also, only a single packet was allowed in flight, so +that stress on the network itself was minimal. Some tests were conducted +during busy periods of network activity, while others were conducted during +quiet hours. + + The 512 data points collected during each run were processed by a program +which plotted on a color bit-map display each data point (x,y), where x +represents the time since initiation of the experiment the and y the measured +delay, normalized to the one-way delay. Then, the simulated +retransmission-timeout algorithm was run on these data and its computed +timeout plotted in the same way. The display immediately reveals how the +algorithm behaves in the face of varying traffic loads, network glitches, lost +packets and superfluous retransmissions. + + Each experiment run also produced summary statistics, which are +summarized in the table below. Each line includes the Host ID, which +identifies the run. The suffix -1 indicates 40-octet packets, -2 indicates +256-octet packets and no suffix indicates uniformly distributed lengths +between 40 and 256. The Lost Packets columns refer to instances when no ICMP +Reply message was received for thirty seconds after transmission of the ICMP +Echo message, indicating probable loss of one or both messages. The RTX +Packets columns refer to instances when the computed timeout is less than the +measured delay, which would result in a superfluous retransmission. For each +of these two types of packets the column indicates the number of instances +and the Time column indicates the total accumulated time required for the +recovery action. + + For reference purposes, the Mean column indicates the computed mean delay +of the echo/reply cycles, excluding those cycles involving packet loss, while +the CoV column indicates the coefficient of variation. Finally, the Eff + + + + +Internet Delay Experiments Page 7 +D.L. Mills + + +column indicates the efficiency, computed as the ratio of the total time +accumulated while sending good data to this time plus the lost-packet and +rtx-packet time. + + Complete sets of runs were made for each of the hosts in the table below +for each of several selections of algorithm parameters. The table itself +reflects values, selected as described later, believed to be a good compromise +for use on existing paths in the Internet system. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Delay Experiments Page 8 +D.L. Mills + + +Host Total Lost Packets RTX Packets Mean CoV Eff +ID Time Time Time +------------------------------------------------------------------- +DCN1 to nearby local-net hosts (calibration) +DCN5 5 0 0 0 0 11 .15 1 +DCN8 8 0 0 0 0 16 .13 1 +IMP17 19 0 0 0 0 38 .33 1 +FORD1 86 0 0 1 .2 167 .33 .99 +UMD1 135 0 0 2 .5 263 .45 .99 +DCN6 177 0 0 0 0 347 .34 1 +FACC 368 196 222.1 6 9.2 267 1.1 .37 +FOE 670 3 7.5 21 73.3 1150 .69 .87 +FOE-1 374 0 0 26 61.9 610 .75 .83 +FOE-2 1016 3 16.7 10 47.2 1859 .41 .93 + +DCN1 to ARPANET hosts and local nets +MILARP 59 0 0 2 .5 115 .39 .99 +ISID 163 0 0 1 1.8 316 .47 .98 +ISID-1 84 0 0 2 1 163 .18 .98 +ISID-2 281 0 0 3 17 516 .91 .93 +ISID * 329 0 0 5 12.9 619 .81 .96 +SCORE 208 0 0 1 .8 405 .46 .99 +RVAX 256 1 1.3 0 0 499 .42 .99 +AJAX 365 0 0 0 0 713 .44 1 +WASH 494 0 0 2 2.8 960 .39 .99 +WASH-1 271 0 0 5 8 514 .34 .97 +WASH-2 749 1 9.8 2 17.5 1411 .4 .96 +BERK 528 20 50.1 4 35 865 1.13 .83 + +DCN1 to MILNET/MINET hosts and local nets +ISIA 436 4 7.4 2 15.7 807 .68 .94 +ISIA-1 197 0 0 0 0 385 .27 1 +ISIA-2 615 0 0 2 15 1172 .36 .97 +ISIA * 595 18 54.1 6 33.3 992 .77 .85 +BRL 644 1 3 1 1.9 1249 .43 .99 +BRL-1 318 0 0 4 13.6 596 .68 .95 +BRL-2 962 2 8.4 0 0 1864 .12 .99 +LON 677 0 0 3 11.7 1300 .51 .98 +LON-1 302 0 0 0 0 589 .06 1 +LON-2 1047 0 0 0 0 2044 .03 1 +HAWAII 709 4 12.9 3 18.5 1325 .55 .95 +OFFICE3 856 3 12.9 3 10.3 1627 .54 .97 +OFF3-1 432 2 4.2 2 6.9 823 .31 .97 +OFF3-2 1277 7 39 3 41.5 2336 .44 .93 +KOREA 1048 3 14.5 2 18.7 1982 .48 .96 +KOREA-1 506 4 8.6 1 2.2 967 .18 .97 +KOREA-2 1493 6 35.5 2 19.3 2810 .19 .96 + +DCN1 to TELENET hosts via ARPANET +RICE 677 2 6.8 3 12.1 1286 .41 .97 + + + +Internet Delay Experiments Page 9 +D.L. Mills + + +RICE-1 368 1 .1 3 2.3 715 .11 .99 +RICE-2 1002 1 4.4 1 9.5 1930 .19 .98 + +DCN1 to SATNET hosts and local nets via ARPANET +UCL 689 9 26.8 0 0 1294 .21 .96 +UCL-1 623 39 92.8 2 5.3 1025 .32 .84 +UCL-2 818 4 13.5 0 0 1571 .15 .98 +NTA 779 12 38.7 1 3.7 1438 .24 .94 +NTA-1 616 24 56.6 2 5.3 1083 .25 .89 +NTA-2 971 19 71.1 0 0 1757 .2 .92 + +NTA to SATNET hosts and local nets +TANUM 110 3 1.6 0 0 213 .41 .98 +GOONY 587 19 44.2 1 2.9 1056 .23 .91 +ETAM 608 32 76.3 1 3.1 1032 .29 .86 +UCL 612 5 12.6 2 8.5 1154 .24 .96 + +Note: * indicates randomly distributed packets during periods of high ARPANET +activity. The same entry without the * indicates randomly distributed packets +during periods of low ARPANET activity. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Internet Delay Experiments Page 10 +D.L. Mills + + +3.2 Discussion of Results + + It is immediately obvious from visual inspection of the bit-map display +that the delay distribution is more-or-less Poissonly distributed about a +relatively narrow range with important exceptions. The exceptions are +characterized by occasional spasms where one or more packets can be delayed +many times the typical value. Such glitches have been commonly noted before +on paths involving ARPANET and SATNET, but the true impact of their occurance +on the timeout algorithm is much greater than I expected. What commonly +happens is that the algorithm, when confronted with a short burst of +long-delay packets after a relatively long interval of well-mannered behavior, +takes much too long to adapt to the spasm, thus inviting many superfluous +retransmissions and leading to congestion. + + The incidence of long-delay bursts, or glitches, varied widely during the +experiments. Some of them were glitch-free, but most had at least one glitch +in 512 echo/reply volleys. Glitches did not seem to correlate well with +increases in baseline delay, which occurs as the result of traffic surges, nor +did they correlate well with instances of packet loss. I did not notice any +particular periodicity, such as might be expected with regular pinging, for +example; however, I did not process the data specially for that. + + There was no correction for packet length used in any of these +experiments, in spite of the results of the first set of experiments described +previously. This may be done in a future set of experiments. The algorithm +does cope well in the case of constant-length packets and in the case of +randomly distributed packet lengths between 40 and 256 octets, as indicated in +the table. Future experiments may involve bursts of short packets followed by +bursts of longer ones, so that the speed of adaptation of the algorithm can be +directly deterimend. + + One particularily interesting experiment involved the FOE host +(FORD-FOE), which is located in London and reached via a 14.4-Kbps undersea +cable and statistical multiplexor. The multiplexor introduces a moderate mean +delay, but with an extremely large delay dispersion. The specified +retransmission-timeout algorithm had a hard time with this circuit, as might +be expected; however, with the improvments described below, TCP performance +was acceptable. It is unlikely that many instances of such ornery circuits +will occur in the Internet system, but it is comforting to know that the +algorithm can deal effectively with them. + +3.3. Improvments to the Algorithm + + The specified retransmission-timeout algorithm, really a first-order +linear recursive filter, is characterized by two parameters, a weighting +factor F and a threshold factor G. For each measured delay sample R the delay +estimator E is updated: + + E = F*E + (1 - F)*R . + + + + +Internet Delay Experiments Page 11 +D.L. Mills + + +Then, if an interval equal to G*E expires after transmitting a packet, the +packet is retransmitted. The current TCP specification suggests values in the +range 0.8 to 0.9 for F and 1.5 to 2.0 for G. These values have been believed +reasonable up to now over ARPANET and SATNET paths. + + I found that a simple change to the algorithm made a worthwhile change in +the efficiency. The change amounts to using two values of F, one (F1) when R +< E in the expression above and the other (F2) when R >= E, with F1 > F2. The +effect is to make the algorithm more responsive to upward-going trends in +delay and less respnsive to downward-going trends. After a number of trials I +concluded that values of F1 = 15/16 and F2 = 3/4 (with G = 2) gave the best +all-around performance. The results on some paths (FOE, ISID, ISIA) were +better by some ten percent in efficiency, as compared to the values now used +in typical implementations where F = 7/8 and G = 2. The results on most paths +were better by five percent, while on a couple (FACC, UCL) the results were +worse by a few percent. + + There was no clear-cut gain in fiddling with G. The value G = 2 seemed +to represent the best overall compromise. Note that increasing G makes +superfluous retransmissions less likely, but increases the total delay when +packets are lost. Also, note that increasing F2 too much tends to cause +overshoot in the case of network glitches and leads to the same result. The +table above was constructed using F1 = 15/16, F2 = 3/4 and G = 2. + + Readers familiar with signal-detection theory will recognize my +suggestion as analogous to an ordinary peak-detector circuit. F1 represents +the discharge time-constant, while F2 represents the charge time-constant. G +represents a "squelch" threshold, as used in voice-operated switches, for +example. Some wag may be even go on to suggest a network glitch should be +called a netspurt. + + + + + + + + + + + + + + + + + + + + + + + +Internet Delay Experiments Page 12 +D.L. Mills + + +Appendix. Index of Test Hosts + +Name Address NIC Host Name +------------------------------------- +DCN1 to nearby local-net hosts (calibration) +DCN5 128.4.0.5 DCN5 +DCN8 128.4.0.8 DCN8 +IMP17 10.3.0.17 DCN-GATEWAY +FORD1 128.5.0.1 FORD1 +UMD1 128.8.0.1 UMD1 +DCN6 128.4.0.6 DCN6 +FACC 128.5.32.1 FORD-WDL1 +FOE 128.5.0.15 FORD-FOE + +DCN1 to ARPANET hosts and local nets +MILARP 10.2.0.28 ARPA-MILNET-GW +ISID 10.0.0.27 USC-ISID +SCORE 10.3.0.11 SU-SCORE +RVAX 128.10.0.2 PURDUE-MORDRED +AJAX 18.10.0.64 MIT-AJAX +WASH 10.0.0.91 WASHINGTON +BERK 10.2.0.78 UCB-VAX + +DCN1 to MILNET/MINET hosts and local nets +ISIA 26.3.0.103 USC-ISIA +BRL 192.5.21.6 BRL-VGR +LON 24.0.0.7 MINET-LON-EM +HAWAII 26.1.0.36 HAWAII-EMH +OFFICE3 26.2.0.43 OFFICE-3 +KOREA 26.0.0.117 KOREA-EMH + +DCN1 to TELENET hosts via ARPANET +RICE 14.0.0.12 RICE + +DCN1 to SATNET hosts and local nets via ARPANET +UCL 128.16.9.0 UCL-SAM +NTA 128.39.0.2 NTARE1 + +NTA to SATNET hosts and local nets +TANUM 4.0.0.64 TANUM-ECHO +GOONY 4.0.0.63 GOONHILLY-ECHO +ETAM 4.0.0.62 ETAM-ECHO + + + + + + + + + + + |