summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc970.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc970.txt')
-rw-r--r--doc/rfc/rfc970.txt513
1 files changed, 513 insertions, 0 deletions
diff --git a/doc/rfc/rfc970.txt b/doc/rfc/rfc970.txt
new file mode 100644
index 0000000..cbe03f0
--- /dev/null
+++ b/doc/rfc/rfc970.txt
@@ -0,0 +1,513 @@
+
+
+Network Working Group John Nagle
+Request for Comments: 970 FACC Palo Alto
+ December 1985
+
+ On Packet Switches With Infinite Storage
+
+
+Status of this Memo
+
+ The purpose of this RFC is to focus discussion on particular problems
+ in the ARPA-Internet and possible methods of solution. No proposed
+ solutions in this document are intended as standards for the
+ ARPA-Internet at this time. Rather, it is hoped that a general
+ consensus will emerge as to the appropriate solution to such
+ problems, leading eventually to the adoption of standards.
+ Distribution of this memo is unlimited.
+
+Abstract
+
+ Most prior work on congestion in datagram systems focuses on buffer
+ management. We find it illuminating to consider the case of a packet
+ switch with infinite storage. Such a packet switch can never run out
+ of buffers. It can, however, still become congested. The meaning of
+ congestion in an infinite-storage system is explored. We demonstrate
+ the unexpected result that a datagram network with infinite storage,
+ first-in-first-out queuing, at least two packet switches, and a
+ finite packet lifetime will, under overload, drop all packets. By
+ attacking the problem of congestion for the infinite-storage case, we
+ discover new solutions applicable to switches with finite storage.
+
+Introduction
+
+ Packet switching was first introduced in an era when computer data
+ storage was several orders of magnitude more expensive than it is
+ today. Strenuous efforts were made in the early days to build packet
+ switches with the absolute minimum of storage required for operation.
+ The problem of congestion control was generally considered to be one
+ of avoiding buffer exhaustion in the packet switches. We take a
+ different view here. We choose to begin our analysis by assuming the
+ availablity of infinite memory. This forces us to look at congestion
+ from a fresh perspective. We no longer worry about when to block or
+ which packets to discard; instead, we must think about how we want
+ the system to perform.
+
+ Pure datagram systems are especially prone to congestion problems.
+ The blocking mechanisms provided by virtual circuit systems are
+ absent. No fully effective solutions to congestion in pure datagram
+ systems are known. Most existing datagram systems behave badly under
+ overload. We will show that substantial progress can be made on the
+
+
+
+
+Nagle [Page 1]
+
+
+
+RFC 970 December 1985
+On Packet Switches With Infinite Storage
+
+
+ congestion control problem even for pure datagram systems when the
+ problem is defined as determining the order of packet transmission,
+ rather than the allocation of buffer space.
+
+A Packet Switch with Infinite Storage
+
+ Let us begin by describing a simple packet switch with infinite
+ storage. A switch has incoming and outgoing links. Each link has a
+ fixed data transfer rate. Not all links need have the same data
+ rate. Packets arrive on incoming links and are immediately assigned
+ an outgoing link by some routing mechanism not examined here. Each
+ outgoing link has a queue. Packets are removed from that queue and
+ sent on its outgoing link at the data rate for that link. Initially,
+ we will assume that queues are managed in a first in, first out
+ manner.
+
+ We assume that packets have a finite lifetime. In the DoD IP
+ protocol, packets have a time-to-live field, which is the number of
+ seconds remaining until the packet must be discarded as
+ uninteresting. As the packet travels through the network, this field
+ is decremented; if it becomes zero, the packet must be discarded.
+ The initial value for this field is fixed; in the DoD IP protocol,
+ this value is by default 15.
+
+ The time-to-live mechanism prevents queues from growing without
+ bound; when the queues become sufficiently long, packets will time
+ out before being sent. This places an upper bound on the total size
+ of all queues; this bound is determined by the total data rate for
+ all incoming links and the upper limit on the time-to-live.
+
+ However, this does not eliminate congestion. Let us see why.
+
+ Consider a simple node, with one incoming link and one outgoing link.
+ Assume that the packet arrival rate at a node exceeds the departure
+ rate. The queue length for the outgoing link will then grow until
+ the transit time through the queue exceeds the time-to-live of the
+ incoming packets. At this point, as the process serving the outgoing
+ link removes packets from the queue, it will sometimes find a packet
+ whose time-to-live field has been decremented to zero. In such a
+ case, it will discard that packet and will try again with the next
+ packet on the queue. Packets with nonzero time-to-live fields will
+ be transmitted on the outgoing link.
+
+ The packets that do get transmitted have nonzero time-to- live
+ values. But once the steady state under overload has been reached,
+ these values will be small, since the packet will have been on the
+ queue for slightly less than the maximum time-to-live. In fact, if
+
+
+Nagle [Page 2]
+
+
+
+RFC 970 December 1985
+On Packet Switches With Infinite Storage
+
+
+ the departure rate is greater than one per time-to-live unit, the
+ time-to-live of any forwarded packet will be exactly one. This
+ follows from the observation that if more than one packet is sent per
+ time-to-live unit, consecutive packets on the queue will have
+ time-to-live values that differ by no more than 1. Thus, as the
+ component of the packet switch that removes packets from the queue
+ and either sends them or discards them as expired operates, it will
+ either find packets with negative or zero time to live values (which
+ it will discard) or packets with values of 1, which it will send.
+
+ So, clearly enough, at the next node of the packet switching system,
+ the arriving packets will all have time-to-live values of 1. Since
+ we always decrement the time-to-live value by at least 1 in each
+ node, to guarantee that the time-to-live value decreases as the
+ packet travels through the network, we will in this case decrement it
+ to zero for each incoming packet and will then discard that packet.
+
+ We have thus shown that a datagram network with infinite storage,
+ first-in-first-out queuing, and a finite packet lifetime will, under
+ overload, drop all packets. This is a rather unexpected result. But
+ it is quite real. It is not an artifact of the infinite-buffer
+ assumption. The problem still occurs in networks with finite
+ storage, but the effects are less clearly seen. Datagram networks
+ are known to behave badly under overload, but analysis of this
+ behavior has been lacking. In the infinite-buffer case, the analysis
+ is quite simple, as we have shown, and we obtain considerable insight
+ into the problem.
+
+ One would expect this phenomenon to have been discovered previously.
+ But previous work on congestion control in packet switching systems
+ almost invariably focuses on buffer management. Analysis of the
+ infinite buffer case is apparently unique with this writer.
+
+ This result is directly applicable to networks with finite resources.
+ The storage required to implement a switch that can never run out of
+ buffers turns out to be quite reasonable. Let us consider a pure
+ datagram switch for an ARPANET-like network. For the case of a
+ packet switch with four 56Kb links, and an upper bound on the
+ time-to-live of 15 seconds, the maximum buffer space that could ever
+ be required is 420K bytes <1>. A switch provided with this rather
+ modest amount of memory need never drop a packet due to buffer
+ exhaustion.
+
+ This problem is not just theoretical. We have demonstrated it
+ experimentally on our own network, using a supermini with several
+ megabytes of memory as a switch. We now have experimental evidence
+ that the phenomenon described above occurs in practice. Our first
+
+
+Nagle [Page 3]
+
+
+
+RFC 970 December 1985
+On Packet Switches With Infinite Storage
+
+
+ experiment, using an Ethernet on one side of the switch and a 9600
+ baud line on the other, resulted in 916 IP datagrams queued in the
+ switch at peak. However, we were applying the load over a TCP
+ transport connection, and the transport connection timed out due to
+ excessive round trip time before the queue reached the time to live
+ limit, so we did not actually reach the stable state with the queue
+ at the maximum length as predicted by our analysis above. It is
+ interesting that we can force this condition from the position of a
+ user application atop the transport layer (TCP), and this deserves
+ further analysis.
+
+Interaction with Transport Protocols
+
+ We have thus far assumed packet sources that emit packets at a fixed
+ rate. This is a valid model for certain sources such as packet voice
+ systems. Systems that use transport protocols of the ISO TP4 or DoD
+ TCP class, however, ought to be better behaved. The key point is
+ that transport protocols used in datagram systems should behave in
+ such a way as to not overload the network, even where the network has
+ no means of keeping them from doing so. This is quite possible. In
+ a previous paper by this writer [1], the behavior of the TCP
+ transport protocol over a congested network is explored. We have
+ shown that a badly behaved transport protocol implementation can
+ cause serious harm to a datagram network, and discussed how such an
+ implementation ought to behave. In that paper we offered some
+ specific guidance on how to implement a well-behaved TCP, and
+ demonstrated that proper behavior could in some cases reduce network
+ load by an order of magnitude. In summary, the conclusions of that
+ paper are that a transport protocol, to be well behaved, should not
+ have a retransmit time shorter than the current round trip time
+ between the hosts involved, and that when informed by the network of
+ congestion, the transport protocol should take steps to reduce the
+ number of packets outstanding on the connection.
+
+ We reference our earlier work here to show that the network load
+ imposed by a transport protocol is not necessarily fixed by the
+ protocol specification. Some existing implementations of transport
+ protocols are well-behaved. Others are not. We have observed a wide
+ variability among existing TCP implementations. We have reason to
+ suspect that ISO TP4 implementations will be more uniform, given the
+ greater rigidity of the specification, but we see enough open space
+ in the TP4 standard to allow for considerable variability. We
+ suspect that there will be marginal TP4 implementations, from a
+ network viewpoint, just as there are marginal TCP implementations
+ today. These implementations will typically work quite well until
+ asked to operate over a heavily loaded network with significant
+ delays. Then we find out which ones are well-behaved.
+
+
+Nagle [Page 4]
+
+
+
+RFC 970 December 1985
+On Packet Switches With Infinite Storage
+
+
+ Even if all hosts are moderately well-behaved, there is potential for
+ trouble. Each host can normally obtain more network bandwidth by
+ transmitting more packets per unit time, since the first in, first
+ out strategy gives the most resources to the sender of the most
+ packets. But collectively, as the hosts overload the network, total
+ throughput drops. As shown above, throughput may drop to zero.
+ Thus, the optimal strategy for each host is strongly suboptimal for
+ the network as a whole.
+
+Game Theoretic Aspects of Network Congestion
+
+ This game-theory view of datagram networks leads us to a digression
+ on the stability of multi-player games. Systems in which the optimal
+ strategy for each player is suboptimal for all players are known to
+ tend towards the suboptimal state. The well-known prisoner's dilemma
+ problem in game theory is an example of a system with this property.
+ But a closer analogue is the tragedy of the commons problem in
+ economics. Where each individual can improve their own position by
+ using more of a free resource, but the total amount of the resource
+ degrades as the number of users increases, self-interest leads to
+ overload of the resource and collapse. Historically this analysis
+ was applied to the use of common grazing lands; it also applies to
+ such diverse resources as air quality and time-sharing systems. In
+ general, experience indicates that many-player systems with this type
+ of instability tend to get into serious trouble.
+
+ Solutions to the tragedy of the commons problem fall into three
+ classes: cooperative, authoritarian, and market solutions.
+ Cooperative solutions, where everyone agrees to be well-behaved, are
+ adequate for small numbers of players, but tend to break down as the
+ number of players increases. Authoritarian solutions are effective
+ when behavior can be easily monitored, but tend to fail if the
+ definition of good behavior is subtle. A market solution is possible
+ only if the rules of the game can be changed so that the optimal
+ strategy for players results in a situation that is optimal for all.
+ Where this is possible, market solutions can be quite effective.
+
+ The above analysis is generally valid for human players. In the
+ network case, we have the interesting situation that the player is a
+ computer executing a preprogrammed strategy. But this alone does not
+ insure good behavior; the strategy in the computer may be programmed
+ to optimize performance for that computer, regardless of network
+ considerations. A similar situation exists with automatic redialing
+ devices in telephony, where the user's equipment attempts to improve
+ performance over an overloaded network by rapidly redialing failed
+ calls. Since call-setup facilities are scarce resources in telephone
+ systems, this can seriously impact the network; there are countries
+
+
+Nagle [Page 5]
+
+
+
+RFC 970 December 1985
+On Packet Switches With Infinite Storage
+
+
+ that have been forced to prohibit such devices. (Brazil, for one).
+ This solution by administrative fiat is sometimes effective and
+ sometimes not, depending on the relative power of the administrative
+ authority and the users.
+
+ As transport protocols become more commercialized and competing
+ systems are available, we should expect to see attempts to tune the
+ protocols in ways that may be optimal from the point of view of a
+ single host but suboptimal from the point of view of the entire
+ network. We already see signs of this in the transport protocol
+ implementation of one popular workstation manufacturer.
+
+ So, to return to our analysis of a pure datagram internetwork, an
+ authoritarian solution would order all hosts to be "well-behaved" by
+ fiat; this might be difficult since the definition of a well-behaved
+ host in terms of its externally observed behavior is subtle. A
+ cooperative solution faces the same problem, along with the difficult
+ additional problem of applying the requisite social pressures in a
+ distributed system. A market solution requires that we make it pay
+ to be well-behaved. To do this, we will have to change the rules of
+ the game.
+
+Fairness in Packet Switching Systems
+
+ We would like to protect the network from hosts that are not
+ well-behaved. More specifically, we would like, in the presence of
+ both well-behaved and badly-behaved hosts, to insure that
+ well-behaved hosts receive better service than badly-behaved hosts.
+ We have devised a means of achieving this.
+
+ Let us consider a network that consists of high-bandwidth
+ pure-datagram local area networks without flow control (Ethernet and
+ most IEEE 802.x datagram systems are of this class, whether based on
+ carrier sensing or token passing), hosts connected to these local
+ area networks, and an interconnected wide area network composed of
+ packet switches and long-haul links. The wide area network may have
+ internal flow control, but has no way of imposing mandatory flow
+ control on the source hosts. The DoD Internet, Xerox Network Systems
+ internetworks, and the networks derived from them fit this model.
+
+ If any host on a local area network generates packets routed to the
+ wide area network at a rate greater than the wide area network can
+ absorb them, congestion will result in the packet switch connecting
+ the local and wide area networks. If the packet switches queue on a
+ strictly first in, first out basis, the badly behaved host will
+ interfere with the transmission of data by other, better-behaved
+ hosts.
+
+
+Nagle [Page 6]
+
+
+
+RFC 970 December 1985
+On Packet Switches With Infinite Storage
+
+
+ We introduce the concept of fairness. We would like to make our
+ packet switches fair; in other words, each source host should be able
+ to obtain an equal fraction of the resources of each packet switch.
+ We can do this by replacing the single first in, first out queue
+ associated with each outgoing link with multiple queues, one for each
+ source host in the entire network. We service these queues in a
+ round- robin fashion, taking one packet from each non-empty queue in
+ turn and transmitting the packets with positive time to live values
+ on the associated outgoing link, while dropping the expired packets.
+ Empty queues are skipped over and lose their turn.
+
+ This mechanism is fair; outgoing link bandwidth is parcelled out
+ equally amongst source hosts. Each source host with packets queued
+ in the switch for the specified outgoing link gets exactly one packet
+ sent on the outgoing link each time the round robin algorithm cycles.
+ So we have implemented a form of load-balancing.
+
+ We have also improved the system from a game theory point of view.
+ The optimal strategy for a given host is no longer to send as many
+ packets as possible. The optimal strategy is now to send packets at
+ a rate that keeps exactly one packet waiting to be sent in each
+ packet switch, since in this way the host will be serviced each time
+ the round-robin algorithm cycles, and the host's packets will
+ experience the minimum transit delay. This strategy is quite
+ acceptable from the network's point of view, since the length of each
+ queue will in general be between 1 and 2.
+
+ The hosts need advisory information from the network to optimize
+ their strategies. The existing Source Quench mechanism in DoD IP,
+ while minimal, is sufficient to provide this. The packet switches
+ should send a Source Quench message to a source host whenever the
+ number of packets in the queue for that source host exceeds some
+ small value, probably 2. If the hosts act to keep their traffic just
+ below the point at which Source Quench messages are received, the
+ network should run with mean queue lengths below 2 for each host.
+
+ Badly-behaved hosts can send all the datagrams they want, but will
+ not thereby increase their share of the network resources. All that
+ will happen is that packets from such hosts will experience long
+ transit times through the network. A sufficiently badly-behaved host
+ can send enough datagrams to push its own transit times up to the
+ time to live limit, in which case none of its datagrams will get
+ through. This effect will happen sooner with fair queuing than with
+ first in, first out queuing, because the badly- behaved host will
+ only obtain a share of the bandwidth inversely proportional to the
+ number of hosts using the packet switch at the moment. This is much
+
+
+
+Nagle [Page 7]
+
+
+
+RFC 970 December 1985
+On Packet Switches With Infinite Storage
+
+
+ less than the share it would have under the old system, where more
+ verbose hosts obtained more bandwidth. This provides a strong
+ incentive for badly-behaved hosts to improve their behavior.
+
+ It is worth noting that malicious, as opposed to merely
+ badly-behaved, hosts, can overload the network by using many
+ different source addresses in their datagrams, thereby impersonating
+ a large number of different hosts and obtaining a larger share of the
+ network bandwidth. This is an attack on the network; it is not likely
+ to happen by accident. It is thus a network security problem, and
+ will not be discussed further here.
+
+ Although we have made the packet switches fair, we have not thereby
+ made the network as a whole fair. This is a weakness of our
+ approach. The strategy outlined here is most applicable to a packet
+ switch at a choke point in a network, such as an entry node of a wide
+ area network or an internetwork gateway. As a strategy applicable to
+ an intermediate node of a large packet switching network, where the
+ packets from many hosts at different locations pass through the
+ switch, it is less applicable. The writer does not claim that the
+ approach described here is a complete solution to the problem of
+ congestion in datagram networks. However, it presents a solution to
+ a serious problem and a direction for future work on the general
+ case.
+
+Implementation
+
+ The problem of maintaining a separate queue for each source host for
+ each outgoing link in each packet switch seems at first to add
+ considerably to the complexity of the queuing mechanism in the packet
+ switches. There is some complexity involved, but the manipulations
+ are simpler than those required with, say, balanced binary trees.
+ One simple implementation involves providing space for pointers as
+ part of the header of each datagram buffer. The queue for each
+ source host need only be singly linked, and the queue heads (which
+ are the first buffer of each queue) need to be doubly linked so that
+ we can delete an entire queue when it is empty. Thus, we need three
+ pointers in each buffer. More elaborate strategies can be devised to
+ speed up the process when the queues are long. But the additional
+ complexity is probably not justified in practice.
+
+ Given a finite buffer supply, we may someday be faced with buffer
+ exhaustion. In such a case, we should drop the packet at the end of
+ the longest queue, since it is the one that would be transmitted
+ last. This, of course, is unfavorable to the host with the most
+ datagrams in the network, which is in keeping with our goal of
+ fairness.
+
+
+Nagle [Page 8]
+
+
+
+RFC 970 December 1985
+On Packet Switches With Infinite Storage
+
+
+Conclusion
+
+ By breaking away from packet switching's historical fixation on
+ buffer management, we have achieved some new insights into congestion
+ control in datagram systems and developed solutions for some known
+ problems in real systems. We hope that others, given this new
+ insight, will go on to make some real progress on the general
+ datagram congestion problem.
+
+References
+
+ [1] Nagle, J. "Congestion Control in IP/TCP Internetworks", ACM
+ Computer Communications Review, October 1984.
+
+Editor's Notes
+
+ <1> The buffer space required for just one 10Mb Ethernet with an
+ upper bound on the time-to-live of 255 is 318 million bytes.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Nagle [Page 9]
+