summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc8289.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc8289.txt')
-rw-r--r--doc/rfc/rfc8289.txt1403
1 files changed, 1403 insertions, 0 deletions
diff --git a/doc/rfc/rfc8289.txt b/doc/rfc/rfc8289.txt
new file mode 100644
index 0000000..dc58d3e
--- /dev/null
+++ b/doc/rfc/rfc8289.txt
@@ -0,0 +1,1403 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) K. Nichols
+Request for Comments: 8289 Pollere, Inc.
+Category: Experimental V. Jacobson
+ISSN: 2070-1721 A. McGregor, Ed.
+ J. Iyengar, Ed.
+ Google
+ January 2018
+
+
+ Controlled Delay Active Queue Management
+
+Abstract
+
+ This document describes CoDel (Controlled Delay) -- a general
+ framework that controls bufferbloat-generated excess delay in modern
+ networking environments. CoDel consists of an estimator, a setpoint,
+ and a control loop. It requires no configuration in normal Internet
+ deployments.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for examination, experimental implementation, and
+ evaluation.
+
+ This document defines an Experimental Protocol for the Internet
+ community. This document is a product of the Internet Engineering
+ Task Force (IETF). It represents the consensus of the IETF
+ community. It has received public review and has been approved for
+ publication by the Internet Engineering Steering Group (IESG). Not
+ all documents approved by the IESG are a candidate for any level of
+ Internet Standard; see Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ https://www.rfc-editor.org/info/rfc8289.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Nichols, et al. Experimental [Page 1]
+
+RFC 8289 CoDel January 2018
+
+
+Copyright Notice
+
+ Copyright (c) 2018 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (https://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
+ 2. Conventions and Terms Used in This Document . . . . . . . . . 4
+ 3. Understanding the Building Blocks of Queue Management . . . . 5
+ 3.1. Estimator . . . . . . . . . . . . . . . . . . . . . . . . 6
+ 3.2. Target Setpoint . . . . . . . . . . . . . . . . . . . . . 8
+ 3.3. Control Loop . . . . . . . . . . . . . . . . . . . . . . 10
+ 4. Overview of the CoDel AQM . . . . . . . . . . . . . . . . . . 13
+ 4.1. Non-starvation . . . . . . . . . . . . . . . . . . . . . 14
+ 4.2. Setting INTERVAL . . . . . . . . . . . . . . . . . . . . 14
+ 4.3. Setting TARGET . . . . . . . . . . . . . . . . . . . . . 14
+ 4.4. Use with Multiple Queues . . . . . . . . . . . . . . . . 15
+ 4.5. Setting Up CoDel . . . . . . . . . . . . . . . . . . . . 16
+ 5. Annotated Pseudocode for CoDel AQM . . . . . . . . . . . . . 16
+ 5.1. Data Types . . . . . . . . . . . . . . . . . . . . . . . 17
+ 5.2. Per-Queue State (codel_queue_t Instance Variables) . . . 17
+ 5.3. Constants . . . . . . . . . . . . . . . . . . . . . . . . 17
+ 5.4. Enqueue Routine . . . . . . . . . . . . . . . . . . . . . 18
+ 5.5. Dequeue Routine . . . . . . . . . . . . . . . . . . . . . 18
+ 5.6. Helper Routines . . . . . . . . . . . . . . . . . . . . . 19
+ 5.7. Implementation Considerations . . . . . . . . . . . . . . 21
+ 6. Further Experimentation . . . . . . . . . . . . . . . . . . . 21
+ 7. Security Considerations . . . . . . . . . . . . . . . . . . . 21
+ 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21
+ 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 22
+ 9.1. Normative References . . . . . . . . . . . . . . . . . . 22
+ 9.2. Informative References . . . . . . . . . . . . . . . . . 22
+ Appendix A. Applying CoDel in the Data Center . . . . . . . . . 24
+ Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 25
+ Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25
+
+
+
+
+
+Nichols, et al. Experimental [Page 2]
+
+RFC 8289 CoDel January 2018
+
+
+1. Introduction
+
+ The "persistently full buffer" problem has been discussed in the IETF
+ community since the early 80s [RFC896]. The IRTF's End-to-End
+ Research Group called for the deployment of Active Queue Management
+ (AQM) to solve the problem in 1998 [RFC2309]. Despite this
+ awareness, the problem has only gotten worse as growth in memory
+ density per Moore's Law fueled an exponential increase in buffer pool
+ size. Efforts to deploy AQM have been frustrated by difficult
+ configuration and negative impact on network utilization. This
+ "bufferbloat" problem [BLOAT] has become increasingly important
+ throughout the Internet but particularly at the consumer edge. Queue
+ management has become more critical due to increased consumer use of
+ the Internet, mixing large video transactions with time-critical VoIP
+ and gaming.
+
+ An effective AQM remediates bufferbloat at a bottleneck while "doing
+ no harm" at hops where buffers are not bloated. However, the
+ development and deployment of AQM are frequently subject to
+ misconceptions about the cause of packet queues in network buffers.
+ Network buffers exist to absorb the packet bursts that occur
+ naturally in statistically multiplexed networks. Buffers helpfully
+ absorb the queues created by reasonable packet network behavior such
+ as short-term mismatches in traffic arrival and departure rates that
+ arise from upstream resource contention, transport conversation
+ startup transients, and/or changes in the number of conversations
+ sharing a link. Unfortunately, other less useful network behaviors
+ can cause queues to fill, and their effects are not nearly as benign.
+ Discussion of these issues and the reason why the solution is not
+ simply "smaller buffers" can be found in [RFC2309], [VANQ2006],
+ [REDL1998], and [CODEL2012]. To understand queue management, it is
+ critical to understand the difference between the necessary, useful
+ "good" queue and the counterproductive "bad" queue.
+
+ Several approaches to AQM have been developed over the past two
+ decades, but none have been widely deployed due to performance
+ problems. When designed with the wrong conceptual model for queues,
+ AQMs have limited operational range, require a lot of configuration
+ tweaking, and frequently impair rather than improve performance.
+ Learning from this past history, the CoDel approach is designed to
+ meet the following goals:
+
+ o Make AQM parameterless for normal operation, with no knobs for
+ operators, users, or implementers to adjust.
+
+ o Be able to distinguish "good" queue from "bad" queue and treat
+ them differently, that is, keep delay low while permitting
+ necessary bursts of traffic.
+
+
+
+Nichols, et al. Experimental [Page 3]
+
+RFC 8289 CoDel January 2018
+
+
+ o Control delay while insensitive (or nearly so) to round-trip
+ delays, link rates, and traffic loads; this goal is to "do no
+ harm" to network traffic while controlling delay.
+
+ o Adapt to dynamically changing link rates with no negative impact
+ on utilization.
+
+ o Allow simple and efficient implementation (can easily span the
+ spectrum from low-end access points and home routers up to high-
+ end router hardware).
+
+ CoDel has five major differences from prior AQMs: use of the local
+ queue minimum to track congestion ("bad" queue), use of an efficient
+ single state variable representation of that tracked statistic, use
+ of packet sojourn time as the observed datum (rather than packets,
+ bytes, or rates), use of a mathematically determined setpoint derived
+ from maximizing network power [KLEIN81], and a modern state-space
+ controller.
+
+ CoDel configures itself based on a round-trip time metric that can be
+ set to 100 ms for the normal, terrestrial Internet. With no changes
+ to parameters, CoDel is expected to work across a wide range of
+ conditions, with varying links and the full range of terrestrial
+ round-trip times.
+
+ CoDel is easily adapted to multiple queue systems as shown by
+ [RFC8290]. Implementers and users SHOULD use the fq_codel multiple-
+ queue approach as it deals with many problems beyond the reach of an
+ AQM on a single queue.
+
+ CoDel was first published in [CODEL2012] and has been implemented in
+ the Linux kernel.
+
+ Note that while this document refers to dropping packets when
+ indicated by CoDel, it may be reasonable to ECN-mark packets instead.
+
+2. Conventions and Terms Used in This Document
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
+ capitals, as shown here.
+
+
+
+
+
+
+
+
+Nichols, et al. Experimental [Page 4]
+
+RFC 8289 CoDel January 2018
+
+
+ The following terms are used in this document and are defined as
+ follows:
+
+ sojourn time: the amount of time a packet has spent in a particular
+ buffer, i.e., the time a packet departs the buffer minus the
+ time the packet arrived at the buffer. A packet can depart a
+ buffer via transmission or drop.
+
+ standing queue: a queue (in packets, bytes, or time delay) in a
+ buffer that persists for a "long" time, where "long" is on the
+ order of the longer round-trip times through the buffer, as
+ discussed in Section 4.2. A standing queue occurs when the
+ minimum queue over the "long" time is non-zero and is usually
+ tolerable and even desirable as long as it does not exceed some
+ target delay.
+
+ bottleneck bandwidth: the limiting bandwidth along a network path.
+
+3. Understanding the Building Blocks of Queue Management
+
+ At the heart of queue management is the notion of "good" queue and
+ "bad" queue and the search for ways to get rid of the "bad" queue
+ (which only adds delay) while preserving the "good" queue (which
+ provides for good utilization). This section explains queueing, both
+ good and bad, and covers the CoDel building blocks that can be used
+ to manage packet buffers to keep their queues in the "good" range.
+
+ Packet queues form in buffers facing bottleneck links, i.e., where
+ the line rate goes from high to low or where many links converge.
+ The well-known bandwidth-delay product (sometimes called "pipe size")
+ is the bottleneck's bandwidth multiplied by the sender-receiver-
+ sender round-trip delay; it is the amount of data that has to be in
+ transit between two hosts in order to run the bottleneck link at 100%
+ utilization. To explore how queues can form, consider a long-lived
+ TCP connection with a 25-packet window sending through a connection
+ with a bandwidth-delay product of 20 packets. After an initial burst
+ of packets, the connection will settle into a 5-packet (+/-1)
+ standing queue; this standing queue size is determined by the
+ mismatch between the window size and the pipe size and is unrelated
+ to the connection's sending rate. The connection has 25 packets in
+ flight at all times, but only 20 packets arrive at the destination
+ over a round-trip time. If the TCP connection has a 30-packet
+ window, the queue will be 10 packets with no change in sending rate.
+ Similarly, if the window is 20 packets, there will be no queue, but
+ the sending rate is the same. Nothing can be inferred about the
+ sending rate from the queue size, and any queue other than transient
+ bursts only creates delays in the network. The sender needs to
+ reduce the number of packets in flight rather than the sending rate.
+
+
+
+Nichols, et al. Experimental [Page 5]
+
+RFC 8289 CoDel January 2018
+
+
+ In the above example, the 5-packet standing queue can be seen to
+ contribute nothing but delay to the connection and thus is clearly
+ "bad" queue. If, in our example, there is a single bottleneck link
+ and it is much slower than the link that feeds it (say, a high-speed
+ Ethernet link into a limited DSL uplink), then a 20-packet buffer at
+ the bottleneck might be necessary to temporarily hold the 20 packets
+ in flight to keep the bottleneck link's utilization high. The burst
+ of packets should drain completely (to 0 or 1 packets) within a
+ round-trip time, and this transient queue is "good" queue because it
+ allows the connection to keep the 20 packets in flight and the
+ bottleneck link to be fully utilized. In terms of the delay
+ experienced, the "good" queue goes away in about a round-trip time,
+ while "bad" queue hangs around for longer, causing delays.
+
+ Effective queue management detects "bad" queue while ignoring "good"
+ queue and takes action to get rid of the "bad" queue when it is
+ detected. The goal is a queue controller that accomplishes this
+ objective. To control a queue, we need three basic components:
+
+ o Estimator - To figure out what we've got.
+
+ o Target setpoint - To know what we want.
+
+ o Control loop - If what we've got isn't what we want, we need a way
+ to move it there.
+
+3.1. Estimator
+
+ The estimator both observes the queue and detects when "good" queue
+ turns to "bad" queue and vice versa. CoDel has two parts to its
+ estimator: what is observed as an indicator of queue and how the
+ observations are used to detect "good"/"bad" queue.
+
+ Queue length has been widely used as an observed indicator of
+ congestion and is frequently conflated with sending rate. Use of
+ queue length as a metric is sensitive to how and when the length is
+ observed. A high-speed arrival link to a buffer serviced at a much
+ lower rate can rapidly build up a queue that might disperse
+ completely or down to a single packet before a round-trip time has
+ elapsed. If the queue length is monitored at packet arrival (as in
+ original Random Early Detection (RED)) or departure time, every
+ packet will see a queue with one possible exception. If the queue
+ length itself is time sampled (as recommended in [REDL1998]), a truer
+ picture of the queue's occupancy can be gained at the expense of
+ considerable implementation complexity.
+
+
+
+
+
+
+Nichols, et al. Experimental [Page 6]
+
+RFC 8289 CoDel January 2018
+
+
+ The use of queue length is further complicated in networks that are
+ subject to both short- and long-term changes in available link rate
+ (as in WiFi). Link rate drops can result in a spike in queue length
+ that should be ignored unless it persists. It is not the queue
+ length that should be controlled but the amount of excess delay
+ packets experience due to a persistent or standing queue, which means
+ that the packet sojourn time in the buffer is exactly what we want to
+ track. Tracking the packet sojourn times in the buffer observes the
+ actual delay experienced by each packet. Sojourn time allows queue
+ management to be independent of link rate, gives superior performance
+ to use of buffer size, and is directly related to user-visible
+ performance. It works regardless of line rate changes or link
+ sharing by multiple queues (which the individual queues may
+ experience as changing rates).
+
+ Consider a link shared by two queues with different priorities.
+ Packets that arrive at the high-priority queue are sent as soon as
+ the link is available, while packets in the other queue have to wait
+ until the high-priority queue is empty (i.e., a strict priority
+ scheduler). The number of packets in the high-priority queue might
+ be large, but the queue is emptied quickly, and the amount of time
+ each packet spends enqueued (the sojourn time) is not large. The
+ other queue might have a smaller number of packets, but packet
+ sojourn times will include the waiting time for the high-priority
+ packets to be sent. This makes the sojourn time a good sample of the
+ congestion that each separate queue is experiencing. This example
+ also shows how the metric of sojourn time is independent of the
+ number of queues or the service discipline used and is instead
+ indicative of congestion seen by the individual queues.
+
+ How can observed sojourn time be used to separate "good" queue from
+ "bad" queue? Although averages, especially of queue length, have
+ previously been widely used as an indicator of "bad" queue, their
+ efficacy is questionable. Consider the burst that disperses every
+ round-trip time. The average queue will be one-half the burst size,
+ though this might vary depending on when the average is computed and
+ the timing of arrivals. The average queue sojourn time would be one-
+ half the time it takes to clear the burst. The average then would
+ indicate a persistent queue where there is none. Instead of
+ averages, we recommend tracking the minimum sojourn time; then, if
+ there is one packet that has a zero sojourn time, there is no
+ persistent queue.
+
+ A persistent queue can be detected by tracking the (local) minimum
+ queue delay packets experience. To ensure that this minimum value
+ does not become stale, it has to have been experienced recently,
+ i.e., during an appropriate past time interval. This interval is the
+ maximum amount of time a minimum value is considered to be in effect
+
+
+
+Nichols, et al. Experimental [Page 7]
+
+RFC 8289 CoDel January 2018
+
+
+ and is related to the amount of time it takes for the largest
+ expected burst to drain. Conservatively, this interval SHOULD be at
+ least a round-trip time to avoid falsely detecting a persistent queue
+ and not a lot more than a round-trip time to avoid delay in detecting
+ the persistent queue. This suggests that the appropriate interval
+ value is the maximum round-trip time of all the connections sharing
+ the buffer.
+
+ Note that the following key insight makes computation of the local
+ minimum efficient: it is sufficient to keep a single state variable
+ that indicates how long the minimum has been above or below the
+ target value rather than retaining all the local values to compute
+ the minimum, which leads to both storage and computational savings.
+ We use this insight in the pseudocode for CoDel later in the
+ document.
+
+ These two parts, use of sojourn time as the observed value and the
+ local minimum as the statistic to monitor queue congestion, are key
+ to CoDel's estimator building block. The local minimum sojourn time
+ provides an accurate and robust measure of standing queue and has an
+ efficient implementation. In addition, use of the minimum sojourn
+ time has important advantages in implementation. The minimum packet
+ sojourn can only be decreased when a packet is dequeued, which means
+ that all the work of CoDel can take place when packets are dequeued
+ for transmission and that no locks are needed in the implementation.
+ The minimum is the only statistic with this property.
+
+ A more detailed explanation with many pictures can be found in
+ [TSV84].
+
+3.2. Target Setpoint
+
+ Now that we have a robust way of detecting standing queue, we need a
+ target setpoint that tells us when to act. If the controller is set
+ to take action as soon as the estimator has a non-zero value, the
+ average drop rate will be maximized, which minimizes TCP goodput
+ [MACTCP1997]. Also, this policy results in no backlog over time (no
+ persistent queue), which negates much of the value of having a
+ buffer, since it maximizes the bottleneck link bandwidth lost due to
+ normal stochastic variation in packet interarrival time. We want a
+ target that maximizes utilization while minimizing delay. Early in
+ the history of packet networking, Kleinrock developed the analytic
+ machinery to do this using a quantity he called "power", which is the
+ ratio of a normalized throughput to a normalized delay [KLEIN81].
+
+
+
+
+
+
+
+Nichols, et al. Experimental [Page 8]
+
+RFC 8289 CoDel January 2018
+
+
+ It is straightforward to derive an analytic expression for the
+ average goodput of a TCP conversation at a given round-trip time r
+ and target f (where f is expressed as a fraction of r). Reno TCP,
+ for example, yields:
+
+ goodput = r (3 + 6f - f^2) / (4 (1+f))
+
+ Since the peak queue delay is simply the product of f and r, power is
+ solely a function of f since the r's in the numerator and denominator
+ cancel:
+
+ power is proportional to (1 + 2f - 1/3 f^2) / (1 + f)^2
+
+ As Kleinrock observed, the best operating point (in terms of
+ bandwidth/delay trade-off) is the peak power point, since points off
+ the peak represent a higher cost (in delay) per unit of bandwidth.
+ The power vs. f curve for any Additive Increase Multiplicative
+ Decrease (AIMD) TCP is monotone decreasing. But the curve is very
+ flat for f < 0.1, followed by an increasing curvature with a knee
+ around f = 0.2, then a steep, almost linear fall off [TSV84]. Since
+ the previous equation showed that goodput is monotone increasing with
+ f, the best operating point is near the right edge of the flat top,
+ since that represents the highest goodput achievable for a negligible
+ increase in delay. However, since the r in the model is a
+ conservative upper bound, a target of 0.1r runs the risk of pushing
+ shorter RTT connections over the knee and giving them higher delay
+ for no significant goodput increase. Generally, a more conservative
+ target of 0.05r offers a good utilization vs. delay trade-off while
+ giving enough headroom to work well with a large variation in real
+ RTT.
+
+ As the above analysis shows, a very small standing queue gives close
+ to 100% utilization of the bottleneck link. While this result was
+ for Reno TCP, the derivation uses only properties that must hold for
+ any "TCP friendly" transport. We have verified by both analysis and
+ simulation that this result holds for Reno, Cubic, and Westwood
+ [TSV84]. This results in a particularly simple form for the target:
+ the ideal range for the permitted standing queue, or the target
+ setpoint, is between 5% and 10% of the TCP connection's RTT.
+
+ We used simulation to explore the impact when TCPs are mixed with
+ other traffic and with connections of different RTTs. Accordingly,
+ we experimented extensively with values in the 5-10% of RTT range
+ and, overall, used target values between 1 and 20 milliseconds for
+ RTTs from 30 to 500 ms and link bandwidths of 64 Kbps to 100 Mbps to
+ experimentally explore a value for the target that gives consistently
+
+
+
+
+
+Nichols, et al. Experimental [Page 9]
+
+RFC 8289 CoDel January 2018
+
+
+ high utilization while controlling delay across a range of
+ bandwidths, RTTs, and traffic loads. Our results were notably
+ consistent with the mathematics above.
+
+ A congested (but not overloaded) CoDel link with traffic composed
+ solely or primarily of long-lived TCP flows will have a median delay
+ through the link that will tend to the target. For bursty traffic
+ loads and for overloaded conditions (where it is difficult or
+ impossible for all the arriving flows to be accommodated), the median
+ queues will be longer than the target.
+
+ The non-starvation drop inhibit feature dominates where the link rate
+ becomes very small. By inhibiting drops when there is less than an
+ (outbound link) MTU worth of bytes in the buffer, CoDel adapts to
+ very low bandwidth links, as shown in [CODEL2012].
+
+3.3. Control Loop
+
+ Section 3.1 describes a simple, reliable way to measure "bad"
+ (persistent) queue. Section 3.2 shows that TCP congestion control
+ dynamics gives rise to a target setpoint for this measure that's a
+ provably good balance between enhancing throughput and minimizing
+ delay. Section 3.2 also shows that this target is a constant
+ fraction of the same "largest average RTT" interval used to
+ distinguish persistent from transient queue. The only remaining
+ building block needed for a basic AQM is a "control loop" algorithm
+ to effectively drive the queueing system from any "persistent queue
+ above the target" state to a state where the persistent queue is
+ below the target.
+
+ Control theory provides a wealth of approaches to the design of
+ control loops. Most of classical control theory deals with the
+ control of linear, time-invariant, Single-Input-Single-Output (SISO)
+ systems. Control loops for these systems generally come from a well-
+ understood class known as Proportional-Integral-Derivative (PID)
+ controllers. Unfortunately, a queue is not a linear system, and an
+ AQM operates at the point of maximum non-linearity (where the output
+ link bandwidth saturates, so increased demand creates delay rather
+ than higher utilization). Output queues are also not time invariant
+ since traffic is generally a mix of connections that start and stop
+ at arbitrary times and that can have radically different behaviors
+ ranging from "open-loop" UDP audio/video to "closed-loop" congestion-
+ avoiding TCP. Finally, the constantly changing mix of connections
+ (which can't be converted to a single "lumped parameter" model
+ because of their transfer function differences) makes the system
+ Multi-Input-Multi-Output (MIMO), not SISO.
+
+
+
+
+
+Nichols, et al. Experimental [Page 10]
+
+RFC 8289 CoDel January 2018
+
+
+ Since queueing systems match none of the prerequisites for a
+ classical controller, a better approach is a modern state-space
+ controller with "no persistent queue" and "has persistent queue"
+ states. Since Internet traffic mixtures change rapidly and
+ unpredictably, a noise- and error-tolerant adaptation algorithm like
+ stochastic gradient is a good choice. Since there's essentially no
+ information in the amount of persistent queue [TSV84], the adaptation
+ should be driven by how long it has persisted.
+
+ Consider the two extremes of traffic behavior: a single, open-loop
+ UDP video stream and a single, long-lived TCP bulk data transfer. If
+ the average bandwidth of the UDP video stream is greater than the
+ bottleneck link rate, the link's queue will grow, and the controller
+ will eventually enter "has persistent queue" state and start dropping
+ packets. Since the video stream is open loop, its arrival rate is
+ unaffected by drops, so the queue will persist until the average drop
+ rate is greater than the output bandwidth deficit (= average arrival
+ rate - average departure rate); the job of the adaptation algorithm
+ is to discover this rate. For this example, the adaptation could
+ consist of simply estimating the arrival and departure rates and then
+ dropping at a rate slightly greater than their difference, but this
+ class of algorithm won't work at all for the bulk data TCP stream.
+ TCP runs in closed-loop flow balance [TSV84], so its arrival rate is
+ almost always exactly equal to the departure rate -- the queue isn't
+ the result of a rate imbalance but rather a mismatch between the TCP
+ sender's window and the source-destination-source round-trip path
+ capacity (i.e., the connection's bandwidth-delay product). The
+ sender's TCP congestion avoidance algorithm will slowly increase the
+ send window (one packet per round-trip time) [RFC5681], which will
+ eventually cause the bottleneck to enter "has persistent queue"
+ state. But, since the average input rate is the same as the average
+ output rate, the rate deficit estimation that gave the correct drop
+ rate for the video stream would compute a drop rate of zero for the
+ TCP stream. However, if the output link drops one packet as it
+ enters "has persistent queue" state, when the sender discovers this
+ (via TCP's normal packet loss repair mechanisms), it will reduce its
+ window by a factor of two [RFC5681]; so, one round-trip time after
+ the drop, the persistent queue will go away.
+
+ If there were N TCP conversations sharing the bottleneck, the
+ controller would have to drop O(N) packets (one from each
+ conversation) to make all the conversations reduce their window to
+ get rid of the persistent queue. If the traffic mix consists of
+ short (<= bandwidth-delay product) conversations, the aggregate
+ behavior becomes more like the open-loop video example since each
+ conversation is likely to have already sent all its packets by the
+ time it learns about a drop so each drop has negligible effect on
+ subsequent traffic.
+
+
+
+Nichols, et al. Experimental [Page 11]
+
+RFC 8289 CoDel January 2018
+
+
+ The controller does not know the number, duration, or kind of
+ conversations creating its queue, so it has to learn the appropriate
+ response. Since single drops can have a large effect if the degree
+ of multiplexing (the number of active conversations) is small,
+ dropping at too high a rate is likely to have a catastrophic effect
+ on throughput. Dropping at a low rate (< 1 packet per round-trip
+ time) and then increasing the drop rate slowly until the persistent
+ queue goes below the target is unlikely to overdrop and is guaranteed
+ to eventually dissipate the persistent queue. This stochastic
+ gradient learning procedure is the core of CoDel's control loop (the
+ gradient exists because a drop always reduces the (instantaneous)
+ queue, so an increasing drop rate always moves the system "down"
+ toward no persistent queue, regardless of traffic mix).
+
+ The "next drop time" is decreased in inverse proportion to the square
+ root of the number of drops since the drop state was entered, using
+ the well-known non-linear relationship of drop rate to throughput to
+ get a linear change in throughput [REDL1998][MACTCP1997].
+
+ Since the best rate to start dropping is at slightly more than one
+ packet per RTT, the controller's initial drop rate can be directly
+ derived from the estimator's interval. When the minimum sojourn time
+ first crosses the target and CoDel drops a packet, the earliest the
+ controller could see the effect of the drop is the round-trip time
+ (interval) + the local queue wait time (the target). If the next
+ drop happens any earlier than this time (interval + target), CoDel
+ will overdrop. In practice, the local queue waiting time tends to
+ vary, so making the initial drop spacing (i.e., the time to the
+ second drop) be exactly the minimum possible also leads to
+ overdropping. Analysis of simulation and real-world measured data
+ shows that the 75th percentile magnitude of this variation is less
+ than the target, so the initial drop spacing SHOULD be set to the
+ estimator's interval (i.e., initial drop spacing = interval) to
+ ensure that the controller has accounted for acceptable congestion
+ delays.
+
+ Use of the minimum statistic lets the controller be placed in the
+ dequeue routine with the estimator. This means that the control
+ signal (the drop) can be sent at the first sign of "bad" queue (as
+ indicated by the sojourn time) and that the controller can stop
+ acting as soon as the sojourn time falls below the target. Dropping
+ at dequeue has both implementation and control advantages.
+
+
+
+
+
+
+
+
+
+Nichols, et al. Experimental [Page 12]
+
+RFC 8289 CoDel January 2018
+
+
+4. Overview of the CoDel AQM
+
+ CoDel was initially designed as a bufferbloat solution for the
+ consumer network edge. The CoDel building blocks are able to adapt
+ to different or time-varying link rates, be easily used with multiple
+ queues, have excellent utilization with low delay, and have a simple
+ and efficient implementation.
+
+ The CoDel algorithm described in the rest of this document uses two
+ key variables: TARGET, which is the controller's target setpoint
+ described in Section 3.2, and INTERVAL, which is the estimator's
+ interval described in Section 3.3.
+
+ The only setting CoDel requires is the INTERVAL value, and as 100 ms
+ satisfies that definition for normal Internet usage, CoDel can be
+ parameter-free for consumer use. To ensure that link utilization is
+ not adversely affected, CoDel's estimator sets TARGET to one that
+ optimizes power. CoDel's controller does not drop packets when the
+ drop would leave the queue empty or with fewer than a Maximum
+ Transmission Unit (MTU) worth of bytes in the buffer. Section 3.2
+ shows that an ideal TARGET is 5-10% of the connection round-trip time
+ (RTT). In the open terrestrial-based Internet, especially at the
+ consumer edge, we expect most unbloated RTTs to have a ceiling of 100
+ ms [CHARB2007]. Using this RTT gives a minimum TARGET of 5 ms and
+ INTERVAL of 100 ms. In practice, uncongested links will see sojourn
+ times below TARGET more often than once per RTT, so the estimator is
+ not overly sensitive to the value of INTERVAL.
+
+ When the estimator finds a persistent delay above TARGET, the
+ controller enters the drop state where a packet is dropped, and the
+ next drop time is set. As discussed in Section 3.3, the initial next
+ drop spacing is intended to be long enough to give the endpoints time
+ to react to the single drop and therefore SHOULD be set to a value
+ equal to INTERVAL. If the estimator's output falls below TARGET, the
+ controller cancels the next drop and exits the drop state. (The
+ controller is more sensitive than the estimator to an overly short
+ INTERVAL value, since an unnecessary drop would occur and lower link
+ utilization). If the next drop time is reached while the controller
+ is still in drop state, the packet being dequeued is dropped, and the
+ next drop time is recalculated.
+
+ Additional logic prevents re-entering the drop state too soon after
+ exiting it and resumes the drop state at a recent control level, if
+ one exists. This logic is described more precisely in the pseudocode
+ below. Additional work is required to determine the frequency and
+ importance of re-entering the drop state.
+
+
+
+
+
+Nichols, et al. Experimental [Page 13]
+
+RFC 8289 CoDel January 2018
+
+
+ Note that CoDel AQM only enters its drop state when the local minimum
+ sojourn delay has exceeded TARGET for a time period long enough for
+ normal bursts to dissipate, ensuring that a burst of packets that
+ fits in the pipe will not be dropped.
+
+4.1. Non-starvation
+
+ CoDel's goals are to control delay with little or no impact on link
+ utilization and to be deployed on a wide range of link bandwidths,
+ including variable-rate links, without reconfiguration. To keep from
+ making drops when it would starve the output link, CoDel makes
+ another check before dropping to see if at least an MTU worth of
+ bytes remains in the buffer. If not, the packet SHOULD NOT be
+ dropped; therefore, CoDel exits the drop state. The MTU size can be
+ set adaptively to the largest packet seen so far or can be read from
+ the interface driver.
+
+4.2. Setting INTERVAL
+
+ The INTERVAL value is chosen to give endpoints time to react to a
+ drop without being so long that response times suffer. CoDel's
+ estimator, TARGET, and control loop all use INTERVAL. Understanding
+ their derivation shows that CoDel is the most sensitive to the value
+ of INTERVAL for single long-lived TCPs with a decreased sensitivity
+ for traffic mixes. This is fortunate, as RTTs vary across
+ connections and are not known a priori. The best policy seems to be
+ to use an INTERVAL value slightly larger than the RTT seen by most of
+ the connections using a link, a value that can be determined as the
+ largest RTT seen if the value is not an outlier (use of a 95-99th
+ percentile value should work). In practice, this value is not known
+ or measured (however, see Appendix A for an application where
+ INTERVAL is measured). An INTERVAL setting of 100 ms works well
+ across a range of RTTs from 10 ms to 1 second (excellent performance
+ is achieved in the range from 10 ms to 300 ms). For devices intended
+ for the normal terrestrial Internet, INTERVAL SHOULD have a value of
+ 100 ms. This will only cause overdropping where a long-lived TCP has
+ an RTT longer than 100 ms and there is little or no mixing with other
+ connections through the link.
+
+4.3. Setting TARGET
+
+ TARGET is the maximum acceptable persistent queue delay above which
+ CoDel is dropping or preparing to drop and below which CoDel will not
+ drop. TARGET SHOULD be set to 5 ms for normal Internet traffic.
+
+ The calculations of Section 3.2 show that the best TARGET value is
+ 5-10% of the RTT, with the low end of 5% preferred. Extensive
+ simulations exploring the impact of different TARGET values when used
+
+
+
+Nichols, et al. Experimental [Page 14]
+
+RFC 8289 CoDel January 2018
+
+
+ with mixed traffic flows with different RTTs and different bandwidths
+ show that below a TARGET of 5 ms, utilization suffers for some
+ conditions and traffic loads; above 5 ms showed very little or no
+ improvement in utilization.
+
+ Sojourn times must remain above the TARGET for INTERVAL amount of
+ time in order to enter the drop state. Any packet with a sojourn
+ time less than TARGET will reset the time that the queue was last
+ below TARGET. Since Internet traffic has very dynamic
+ characteristics, the actual sojourn delay experienced by packets
+ varies greatly and is often less than TARGET unless the overload is
+ excessive. When a link is not overloaded, it is not a bottleneck,
+ and packet sojourn times will be small or nonexistent. In the usual
+ case, there are only one or two places along a path where packets
+ will encounter a bottleneck (usually at the edge), so the total
+ amount of queueing delay experienced by a packet should be less than
+ 10 ms even under extremely congested conditions. This net delay is
+ substantially lower than common current queueing delays on the
+ Internet that grow to orders of seconds [NETAL2010] [CHARB2007].
+
+ Regarding the roles of TARGET and the minimum-tracking INTERVAL, note
+ that TARGET SHOULD NOT be increased in response to lower layers that
+ have a bursty nature, where packets are transmitted for short periods
+ interspersed with idle periods where the link is waiting for
+ permission to send. CoDel's estimator will "see" the effective
+ transmission rate over an INTERVAL amount of time, and increasing
+ TARGET only leads to longer queue delays. On the other hand, where a
+ significant additional delay is added to the intrinsic RTT of most or
+ all packets due to the waiting time for a transmission, it is
+ necessary to increase INTERVAL by that extra delay. TARGET SHOULD
+ NOT be adjusted for such short-term bursts, but INTERVAL MAY need to
+ be adjusted if the path's intrinsic RTT changes.
+
+4.4. Use with Multiple Queues
+
+ CoDel is easily adapted to multiple queue systems. With other
+ approaches, there is always a question of how to account for the fact
+ that each queue receives less than the full link rate over time and
+ usually sees a varying rate over time. This is what CoDel excels at:
+ using a packet's sojourn time in the buffer completely circumvents
+ this problem. In a multiple-queue setting, a separate CoDel
+ algorithm runs on each queue, but each CoDel instance uses the packet
+ sojourn time the same way a single-queue CoDel does. Just as a
+ single-queue CoDel adapts to changing link bandwidths [CODEL2012], so
+ does a multiple-queue CoDel system. As an optimization to avoid
+ queueing more than necessary, when testing for queue occupancy before
+ dropping, the total occupancy of all queues sharing the same output
+ link SHOULD be used. This property of CoDel has been exploited in
+
+
+
+Nichols, et al. Experimental [Page 15]
+
+RFC 8289 CoDel January 2018
+
+
+ fq_codel [RFC8290], which hashes on the packet header fields to
+ determine a specific bin, or sub-queue, for the packet and runs CoDel
+ on each bin or sub-queue, thus creating a well-mixed output flow and
+ obviating issues of reverse path flows (including "ack compression").
+
+4.5. Setting Up CoDel
+
+ CoDel is set for use in devices in the open Internet. An INTERVAL
+ setting of 100 ms is used, TARGET is set to 5% of INTERVAL, and the
+ initial drop spacing is also set to the INTERVAL. These settings
+ have been chosen so that a device, such as a small WiFi router, can
+ be sold without the need for any values to be made adjustable,
+ yielding a parameterless implementation. In addition, CoDel is
+ useful in environments with significantly different characteristics
+ from the normal Internet, for example, in switches used as a cluster
+ interconnect within a data center. Since cluster traffic is entirely
+ internal to the data center, round-trip latencies are low (typically
+ <100 us) but bandwidths are high (1-40 Gbps), so it's relatively easy
+ for the aggregation phase of a distributed computation (e.g., the
+ Reduce part of a Map/Reduce) to persistently fill and then overflow
+ the modest per-port buffering available in most high-speed switches.
+ A CoDel configured for this environment (TARGET and INTERVAL in the
+ microsecond rather than millisecond range) can minimize drops or
+ Explicit Congestion Notification (ECN) marks while keeping throughput
+ high and latency low.
+
+ Devices destined for these environments MAY use a different value for
+ INTERVAL, where suitable. If appropriate analysis indicates, the
+ TARGET MAY be set to some other value in the 5-10% of INTERVAL, and
+ the initial drop spacing MAY be set to a value of 1.0 to 1.2 times
+ INTERVAL. But these settings will cause problems, such as
+ overdropping and low throughput, if used on the open Internet, so
+ devices that allow CoDel to be configured SHOULD default to the
+ Internet-appropriate values given in this document.
+
+5. Annotated Pseudocode for CoDel AQM
+
+ What follows is the CoDel algorithm in C++-like pseudocode. Since
+ CoDel adds relatively little new code to a basic tail-drop FIFO
+ queue, we have attempted to highlight just these additions by
+ presenting CoDel as a sub-class of a basic FIFO queue base class.
+ The reference code is included to aid implementers who wish to apply
+ CoDel to queue management as described here or to adapt its
+ principles to other applications.
+
+
+
+
+
+
+
+Nichols, et al. Experimental [Page 16]
+
+RFC 8289 CoDel January 2018
+
+
+ Implementors are strongly encouraged to also look at the Linux kernel
+ version of CoDel -- a well-written, well-tested, real-world, C-based
+ implementation. As of this writing, it is available at
+ https://github.com/torvalds/linux/blob/master/net/sched/sch_codel.c.
+
+5.1. Data Types
+
+ time_t is an integer time value in units convenient for the system.
+ The code presented here uses 0 as a flag value to indicate "no time
+ set."
+
+ packet_t* is a pointer to a packet descriptor. We assume it has a
+ tstamp field capable of holding a time_t and that the field is
+ available for use by CoDel (it will be set by the enqueue routine and
+ used by the dequeue routine).
+
+ queue_t is a base class for queue objects (the parent class for
+ codel_queue_t objects). We assume it has enqueue() and dequeue()
+ methods that can be implemented in child classes. We assume it has a
+ bytes() method that returns the current queue size in bytes. This
+ can be an approximate value. The method is invoked in the dequeue()
+ method but shouldn't require a lock with the enqueue() method.
+
+ flag_t is a Boolean.
+
+5.2. Per-Queue State (codel_queue_t Instance Variables)
+
+ time_t first_above_time_ = 0; // Time to declare sojourn time above
+ // TARGET
+ time_t drop_next_ = 0; // Time to drop next packet
+ uint32_t count_ = 0; // Packets dropped in drop state
+ uint32_t lastcount_ = 0; // Count from previous iteration
+ flag_t dropping_ = false; // Set to true if in drop state
+
+5.3. Constants
+
+ time_t TARGET = MS2TIME(5); // 5 ms TARGET queue delay
+ time_t INTERVAL = MS2TIME(100); // 100 ms sliding-minimum window
+ u_int maxpacket = 512; // Maximum packet size in bytes
+ // (SHOULD use interface MTU)
+
+
+
+
+
+
+
+
+
+
+
+Nichols, et al. Experimental [Page 17]
+
+RFC 8289 CoDel January 2018
+
+
+5.4. Enqueue Routine
+
+ All the work of CoDel is done in the dequeue routine. The only CoDel
+ addition to enqueue is putting the current time in the packet's
+ tstamp field so that the dequeue routine can compute the packet's
+ sojourn time. Note that packets arriving at a full buffer will be
+ dropped, but these drops are not counted towards CoDel's
+ computations.
+
+ void codel_queue_t::enqueue(packet_t* pkt)
+ {
+ pkt->tstamp = clock();
+ queue_t::enqueue(pkt);
+ }
+
+5.5. Dequeue Routine
+
+ This is the heart of CoDel. There are two branches based on whether
+ the controller is in drop state: (i) if the controller is in drop
+ state (that is, the minimum packet sojourn time is greater than
+ TARGET), then the controller checks if it is time to leave drop state
+ or schedules the next drop(s); or (ii) if the controller is not in
+ drop state, it determines if it should enter drop state and do the
+ initial drop.
+
+ packet_t* CoDelQueue::dequeue()
+ {
+ time_t now = clock();
+ dodequeue_result r = dodequeue(now);
+ uint32_t delta;
+
+ if (dropping_) {
+ if (! r.ok_to_drop) {
+ // sojourn time below TARGET - leave drop state
+ dropping_ = false;
+ }
+ // Time for the next drop. Drop current packet and dequeue
+ // next. If the dequeue doesn't take us out of dropping
+ // state, schedule the next drop. A large backlog might
+ // result in drop rates so high that the next drop should
+ // happen now, hence the 'while' loop.
+ while (now >= drop_next_ && dropping_) {
+ drop(r.p);
+ ++count_;
+ r = dodequeue(now);
+ if (! r.ok_to_drop) {
+ // leave drop state
+ dropping_ = false;
+
+
+
+Nichols, et al. Experimental [Page 18]
+
+RFC 8289 CoDel January 2018
+
+
+ } else {
+ // schedule the next drop.
+ drop_next_ = control_law(drop_next_, count_);
+ }
+ }
+ // If we get here, we're not in drop state. The 'ok_to_drop'
+ // return from dodequeue means that the sojourn time has been
+ // above 'TARGET' for 'INTERVAL', so enter drop state.
+ } else if (r.ok_to_drop) {
+ drop(r.p);
+ r = dodequeue(now);
+ dropping_ = true;
+
+ // If min went above TARGET close to when it last went
+ // below, assume that the drop rate that controlled the
+ // queue on the last cycle is a good starting point to
+ // control it now. ('drop_next' will be at most 'INTERVAL'
+ // later than the time of the last drop, so 'now - drop_next'
+ // is a good approximation of the time from the last drop
+ // until now.) Implementations vary slightly here; this is
+ // the Linux version, which is more widely deployed and
+ // tested.
+ delta = count_ - lastcount_;
+ count_ = 1;
+ if ((delta > 1) && (now - drop_next_ < 16*INTERVAL))
+ count_ = delta;
+
+ drop_next_ = control_law(now, count_);
+ lastcount_ = count_;
+ }
+ return (r.p);
+ }
+
+5.6. Helper Routines
+
+ Since the degree of multiplexing and nature of the traffic sources is
+ unknown, CoDel acts as a closed-loop servo system that gradually
+ increases the frequency of dropping until the queue is controlled
+ (sojourn time goes below TARGET). This is the control law that
+ governs the servo. It has this form because of the sqrt(p)
+ dependence of TCP throughput on drop probability. Note that for
+ embedded systems or kernel implementation, the inverse sqrt can be
+ computed efficiently using only integer multiplication.
+
+ time_t codel_queue_t::control_law(time_t t, uint32_t count)
+ {
+ return t + INTERVAL / sqrt(count);
+ }
+
+
+
+Nichols, et al. Experimental [Page 19]
+
+RFC 8289 CoDel January 2018
+
+
+ Next is a helper routine that does the actual packet dequeue and
+ tracks whether the sojourn time is above or below TARGET and, if
+ above, if it has remained above continuously for at least INTERVAL
+ amount of time. It returns two values: a Boolean indicating if it is
+ OK to drop (sojourn time above TARGET for at least INTERVAL) and the
+ packet dequeued.
+
+ typedef struct {
+ packet_t* p;
+ flag_t ok_to_drop;
+ } dodequeue_result;
+
+ dodequeue_result codel_queue_t::dodequeue(time_t now)
+ {
+ dodequeue_result r = { queue_t::dequeue(), false };
+ if (r.p == NULL) {
+ // queue is empty - we can't be above TARGET
+ first_above_time_ = 0;
+ return r;
+ }
+
+ // To span a large range of bandwidths, CoDel runs two
+ // different AQMs in parallel. One is based on sojourn time
+ // and takes effect when the time to send an MTU-sized
+ // packet is less than TARGET. The 1st term of the "if"
+ // below does this. The other is based on backlog and takes
+ // effect when the time to send an MTU-sized packet is >=
+ // TARGET. The goal here is to keep the output link
+ // utilization high by never allowing the queue to get
+ // smaller than the amount that arrives in a typical
+ // interarrival time (MTU-sized packets arriving spaced
+ // by the amount of time it takes to send such a packet on
+ // the bottleneck). The 2nd term of the "if" does this.
+ time_t sojourn_time = now - r.p->tstamp;
+ if (sojourn_time_ < TARGET || bytes() <= maxpacket_) {
+ // went below - stay below for at least INTERVAL
+ first_above_time_ = 0;
+ } else {
+ if (first_above_time_ == 0) {
+ // just went above from below. if still above at
+ // first_above_time, will say it's ok to drop.
+ first_above_time_ = now + INTERVAL;
+ } else if (now >= first_above_time_) {
+ r.ok_to_drop = true;
+ }
+ }
+ return r;
+ }
+
+
+
+Nichols, et al. Experimental [Page 20]
+
+RFC 8289 CoDel January 2018
+
+
+5.7. Implementation Considerations
+
+ time_t is an integer time value in units convenient for the system.
+ Resolution to at least a millisecond is required, and better
+ resolution is useful up to the minimum possible packet time on the
+ output link; 64- or 32-bit widths are acceptable but with 32 bits the
+ resolution should be no finer than 2^{-16} to leave enough dynamic
+ range to represent a wide range of queue waiting times. Narrower
+ widths also have implementation issues due to overflow (wrapping) and
+ underflow (limit cycles because of truncation to zero) that are not
+ addressed in this pseudocode.
+
+ Since CoDel requires relatively little per-queue state and no direct
+ communication or state sharing between the enqueue and dequeue
+ routines, it is relatively simple to add CoDel to almost any packet
+ processing pipeline, including forwarding engines based on
+ Application-Specific Integrated Circuits (ASICs) or Network
+ Processors (NPUs). One issue to consider is dodequeue()'s use of a
+ 'bytes()' function to determine the current queue size in bytes.
+ This value does not need to be exact. If the enqueue part of the
+ pipeline keeps a running count of the total number of bytes it has
+ put into the queue, and the dequeue routine keeps a running count of
+ the total bytes it has removed from the queue, 'bytes()' is simply
+ the difference between these two counters (32-bit counters should be
+ adequate). Enqueue has to update its counter once per packet queued,
+ but it does not matter when (before, during, or after the packet has
+ been added to the queue). The worst that can happen is a slight,
+ transient underestimate of the queue size, which might cause a drop
+ to be briefly deferred.
+
+6. Further Experimentation
+
+ We encourage experimentation with the recommended values of TARGET
+ and INTERVAL for Internet settings. CoDel provides general,
+ efficient, parameterless building blocks for queue management that
+ can be applied to single or multiple queues in a variety of data
+ networking scenarios. CoDel's settings may be modified for other
+ special-purpose networking applications.
+
+7. Security Considerations
+
+ This document describes an active queue management algorithm for
+ implementation in networked devices. There are no known security
+ exposures associated with CoDel at this time.
+
+8. IANA Considerations
+
+ This document does not require actions by IANA.
+
+
+
+Nichols, et al. Experimental [Page 21]
+
+RFC 8289 CoDel January 2018
+
+
+9. References
+
+9.1. Normative References
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <https://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+ 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+ May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
+9.2. Informative References
+
+ [BLOAT] Gettys, J. and K. Nichols, "Bufferbloat: Dark Buffers in
+ the Internet", Communications of the ACM, Volume 55, Issue
+ 1, DOI 10.1145/2063176.2063196, January 2012.
+
+ [CHARB2007]
+ Dischinger, M., Haeberlen, A., Gummadi, K., and S. Saroiu,
+ "Characterizing Residential Broadband Networks",
+ Proceedings of the 7th ACM SIGCOMM Conference on Internet
+ Measurement, DOI 10.1145/1298306.1298313, October 2007.
+
+ [CODEL2012]
+ Nichols, K. and V. Jacobson, "Controlling Queue Delay",
+ ACM Queue, Volume 10, Issue 5,
+ DOI 10.1145/2208917.2209336, May 2012.
+
+ [KLEIN81] Kleinrock, L. and R. Gail, "An Invariant Property of
+ Computer Network Power", Proceedings of the International
+ Conference on Communications, June 1981,
+ <http://www.lk.cs.ucla.edu/data/files/Gail/power.pdf>.
+
+ [MACTCP1997]
+ Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The
+ Macroscopic Behavior of the TCP Congestion Avoidance
+ Algorithm", ACM SIGCOMM Computer Communications
+ Review, Volume 27, Issue 3, pp. 67-82,
+ DOI 10.1145/263932.264023, July 1997.
+
+ [NETAL2010]
+ Kreibich, C., Weaver, N., Paxson, V., and B. Nechaev,
+ "Netalyzr: Illuminating the Edge Network", Proceedings of
+ the 10th ACM SIGCOMM Conference on Internet Measurement,
+ DOI 10.1145/1879141.1879173, November 2010.
+
+
+
+
+Nichols, et al. Experimental [Page 22]
+
+RFC 8289 CoDel January 2018
+
+
+ [REDL1998] Nichols, K., Jacobson, V., and K. Poduri, "RED in a
+ Different Light", Technical Report, Cisco Systems,
+ September 1999, <http://citeseerx.ist.psu.edu/viewdoc/
+ summary?doi=10.1.1.22.9406>.
+
+ [RFC896] Nagle, J., "Congestion Control in IP/TCP Internetworks",
+ RFC 896, DOI 10.17487/RFC0896, January 1984,
+ <https://www.rfc-editor.org/info/rfc896>.
+
+ [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
+ S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
+ Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
+ S., Wroclawski, J., and L. Zhang, "Recommendations on
+ Queue Management and Congestion Avoidance in the
+ Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998,
+ <https://www.rfc-editor.org/info/rfc2309>.
+
+ [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
+ Control", RFC 5681, DOI 10.17487/RFC5681, September 2009,
+ <https://www.rfc-editor.org/info/rfc5681>.
+
+ [RFC8290] Hoeiland-Joergensen, T., McKenney, P., Taht, D.,
+ Gettys, J., and E. Dumazet, "The Flow Queue CoDel Packet
+ Scheduler and Active Queue Management Algorithm",
+ RFC 8290, DOI 10.17487/RFC8290, January 2018,
+ <https://www.rfc-editor.org/info/rfc8290>.
+
+ [TSV84] Jacobson, V., "CoDel", IETF 84, Transport Area Open
+ Meeting, July 2012,
+ <http://www.ietf.org/proceedings/84/slides/
+ slides-84-tsvarea-4.pdf>.
+
+ [VANQ2006] Jacobson, V., "A Rant on Queues", Talk at MIT Lincoln
+ Labs, Lexington, MA, July 2006,
+ <http://www.pollere.net/Pdfdocs/QrantJul06.pdf>.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Nichols, et al. Experimental [Page 23]
+
+RFC 8289 CoDel January 2018
+
+
+Appendix A. Applying CoDel in the Data Center
+
+ Nandita Dukkipati and her group at Google realized that the CoDel
+ building blocks could be applied to bufferbloat problems in data-
+ center servers, not just to Internet routers. The Linux CoDel
+ queueing discipline (qdisc) was adapted in three ways to tackle this
+ bufferbloat problem.
+
+ 1. The default CoDel action was modified to be a direct feedback
+ from qdisc to the TCP layer at dequeue. The direct feedback
+ simply reduces TCP's congestion window just as congestion control
+ would do in the event of drop. The scheme falls back to ECN
+ marking or packet drop if the TCP socket lock could not be
+ acquired at dequeue.
+
+ 2. Being located in the server makes it possible to monitor the
+ actual RTT to use as CoDel's interval rather than making a "best
+ guess" of RTT. The CoDel interval is dynamically adjusted by
+ using the maximum TCP round-trip time (RTT) of those connections
+ sharing the same qdisc/bucket. In particular, there is a history
+ entry of the maximum RTT experienced over the last second. As a
+ packet is dequeued, the RTT estimate is accessed from its TCP
+ socket. If the estimate is larger than the current CoDel
+ interval, the CoDel interval is immediately refreshed to the new
+ value. If the CoDel interval is not refreshed for over a second,
+ it is decreased to the history entry, and the process is
+ repeated. The use of the dynamic TCP RTT estimate allows the
+ interval to adapt to the actual maximum value currently seen and
+ thus lets the controller space its drop intervals appropriately.
+
+ 3. Since the mathematics of computing the setpoint are invariant, a
+ TARGET of 5% of the RTT or CoDel interval was used here.
+
+ Non-data packets were not dropped, as these are typically small and
+ sometimes critical control packets. Being located on the server,
+ there is no concern with misbehaving users as there would be on the
+ public Internet.
+
+ In several data-center workload benchmarks, which are typically
+ bursty, CoDel reduced the queueing latencies at the qdisc and thereby
+ improved the mean and 99th-percentile latencies from several tens of
+ milliseconds to less than one millisecond. The minimum tracking part
+ of the CoDel framework proved useful in disambiguating "good" queue
+ versus "bad" queue, which is particularly helpful in controlling
+ qdisc buffers that are inherently bursty because of TCP Segmentation
+ Offload (TSO).
+
+
+
+
+
+Nichols, et al. Experimental [Page 24]
+
+RFC 8289 CoDel January 2018
+
+
+Acknowledgments
+
+ The authors thank Jim Gettys for the constructive nagging that made
+ us get the work "out there" before we thought it was ready. We thank
+ Dave Taht, Eric Dumazet, and the open source community for showing
+ the value of getting it "out there" and for making it real. We thank
+ Nandita Dukkipati for contributions to Section 6 and for comments
+ that helped to substantially improve this document. We thank the AQM
+ Working Group and the Transport Area Shepherd, Wes Eddy, for
+ patiently prodding this document all the way to publication as an
+ RFC.
+
+Authors' Addresses
+
+ Kathleen Nichols
+ Pollere, Inc.
+ PO Box 370201
+ Montara, CA 94037
+ United States of America
+
+ Email: nichols@pollere.com
+
+
+ Van Jacobson
+ Google
+
+ Email: vanj@google.com
+
+
+ Andrew McGregor (editor)
+ Google
+
+ Email: andrewmcgr@google.com
+
+
+ Janardhan Iyengar (editor)
+ Google
+
+ Email: jri@google.com
+
+
+
+
+
+
+
+
+
+
+
+
+Nichols, et al. Experimental [Page 25]
+