summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc7141.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc7141.txt')
-rw-r--r--doc/rfc/rfc7141.txt2299
1 files changed, 2299 insertions, 0 deletions
diff --git a/doc/rfc/rfc7141.txt b/doc/rfc/rfc7141.txt
new file mode 100644
index 0000000..8751058
--- /dev/null
+++ b/doc/rfc/rfc7141.txt
@@ -0,0 +1,2299 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) B. Briscoe
+Request for Comments: 7141 BT
+BCP: 41 J. Manner
+Updates: 2309, 2914 Aalto University
+Category: Best Current Practice February 2014
+ISSN: 2070-1721
+
+
+ Byte and Packet Congestion Notification
+
+Abstract
+
+ This document provides recommendations of best current practice for
+ dropping or marking packets using any active queue management (AQM)
+ algorithm, including Random Early Detection (RED), BLUE, Pre-
+ Congestion Notification (PCN), and newer schemes such as CoDel
+ (Controlled Delay) and PIE (Proportional Integral controller
+ Enhanced). We give three strong recommendations: (1) packet size
+ should be taken into account when transports detect and respond to
+ congestion indications, (2) packet size should not be taken into
+ account when network equipment creates congestion signals (marking,
+ dropping), and therefore (3) in the specific case of RED, the byte-
+ mode packet drop variant that drops fewer small packets should not be
+ used. This memo updates RFC 2309 to deprecate deliberate
+ preferential treatment of small packets in AQM algorithms.
+
+Status of This Memo
+
+ This memo documents an Internet Best Current Practice.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ BCPs is available in Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc7141.
+
+
+
+
+
+
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 1]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+Copyright Notice
+
+ Copyright (c) 2014 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 2]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+Table of Contents
+
+ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4
+ 1.1. Terminology and Scoping . . . . . . . . . . . . . . . . . 6
+ 1.2. Example Comparing Packet-Mode Drop and Byte-Mode Drop . . 7
+ 2. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 9
+ 2.1. Recommendation on Queue Measurement . . . . . . . . . . . 9
+ 2.2. Recommendation on Encoding Congestion Notification . . . 10
+ 2.3. Recommendation on Responding to Congestion . . . . . . . 11
+ 2.4. Recommendation on Handling Congestion Indications When
+ Splitting or Merging Packets . . . . . . . . . . . . . . 12
+ 3. Motivating Arguments . . . . . . . . . . . . . . . . . . . . 13
+ 3.1. Avoiding Perverse Incentives to (Ab)use Smaller Packets . 13
+ 3.2. Small != Control . . . . . . . . . . . . . . . . . . . . 14
+ 3.3. Transport-Independent Network . . . . . . . . . . . . . . 14
+ 3.4. Partial Deployment of AQM . . . . . . . . . . . . . . . . 16
+ 3.5. Implementation Efficiency . . . . . . . . . . . . . . . . 17
+ 4. A Survey and Critique of Past Advice . . . . . . . . . . . . 17
+ 4.1. Congestion Measurement Advice . . . . . . . . . . . . . . 18
+ 4.1.1. Fixed-Size Packet Buffers . . . . . . . . . . . . . . 18
+ 4.1.2. Congestion Measurement without a Queue . . . . . . . 19
+ 4.2. Congestion Notification Advice . . . . . . . . . . . . . 20
+ 4.2.1. Network Bias When Encoding . . . . . . . . . . . . . 20
+ 4.2.2. Transport Bias When Decoding . . . . . . . . . . . . 22
+ 4.2.3. Making Transports Robust against Control Packet
+ Losses . . . . . . . . . . . . . . . . . . . . . . . 23
+ 4.2.4. Congestion Notification: Summary of Conflicting
+ Advice . . . . . . . . . . . . . . . . . . . . . . . 24
+ 5. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 25
+ 5.1. Bit-congestible Network . . . . . . . . . . . . . . . . . 25
+ 5.2. Bit- and Packet-Congestible Network . . . . . . . . . . . 26
+ 6. Security Considerations . . . . . . . . . . . . . . . . . . . 26
+ 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 27
+ 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 28
+ 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 28
+ 9.1. Normative References . . . . . . . . . . . . . . . . . . 28
+ 9.2. Informative References . . . . . . . . . . . . . . . . . 29
+ Appendix A. Survey of RED Implementation Status . . . . . . . . 33
+ Appendix B. Sufficiency of Packet-Mode Drop . . . . . . . . . . 34
+ B.1. Packet-Size (In)Dependence in Transports . . . . . . . . 35
+ B.2. Bit-Congestible and Packet-Congestible Indications . . . 38
+ Appendix C. Byte-Mode Drop Complicates Policing Congestion
+ Response . . . . . . . . . . . . . . . . . . . . . . 39
+
+
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 3]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+1. Introduction
+
+ This document provides recommendations of best current practice for
+ how we should correctly scale congestion control functions with
+ respect to packet size for the long term. It also recognises that
+ expediency may be necessary to deal with existing widely deployed
+ protocols that don't live up to the long-term goal.
+
+ When signalling congestion, the problem of how (and whether) to take
+ packet sizes into account has exercised the minds of researchers and
+ practitioners for as long as active queue management (AQM) has been
+ discussed. Indeed, one reason AQM was originally introduced was to
+ reduce the lock-out effects that small packets can have on large
+ packets in tail-drop queues. This memo aims to state the principles
+ we should be using and to outline how these principles will affect
+ future protocol design, taking into account pre-existing deployments.
+
+ The question of whether to take into account packet size arises at
+ three stages in the congestion notification process:
+
+ Measuring congestion: When a congested resource measures locally how
+ congested it is, should it measure its queue length in time,
+ bytes, or packets?
+
+ Encoding congestion notification into the wire protocol: When a
+ congested network resource signals its level of congestion, should
+ the probability that it drops/marks each packet depend on the size
+ of the particular packet in question?
+
+ Decoding congestion notification from the wire protocol: When a
+ transport interprets the notification in order to decide how much
+ to respond to congestion, should it take into account the size of
+ each missing or marked packet?
+
+ Consensus has emerged over the years concerning the first stage,
+ which Section 2.1 records in the RFC Series. In summary: If
+ possible, it is best to measure congestion by time in the queue;
+ otherwise, the choice between bytes and packets solely depends on
+ whether the resource is congested by bytes or packets.
+
+ The controversy is mainly around the last two stages: whether to
+ allow for the size of the specific packet notifying congestion i)
+ when the network encodes or ii) when the transport decodes the
+ congestion notification.
+
+ Currently, the RFC series is silent on this matter other than a paper
+ trail of advice referenced from [RFC2309], which conditionally
+ recommends byte-mode (packet-size dependent) drop [pktByteEmail].
+
+
+
+Briscoe & Manner Best Current Practice [Page 4]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ Reducing the number of small packets dropped certainly has some
+ tempting advantages: i) it drops fewer control packets, which tend to
+ be small and ii) it makes TCP's bit rate less dependent on packet
+ size. However, there are ways of addressing these issues at the
+ transport layer, rather than reverse engineering network forwarding
+ to fix the problems.
+
+ This memo updates [RFC2309] to deprecate deliberate preferential
+ treatment of packets in AQM algorithms solely because of their size.
+ It recommends that (1) packet size should be taken into account when
+ transports detect and respond to congestion indications, (2) not when
+ network equipment creates them. This memo also adds to the
+ congestion control principles enumerated in BCP 41 [RFC2914].
+
+ In the particular case of Random Early Detection (RED), this means
+ that the byte-mode packet drop variant should not be used to drop
+ fewer small packets, because that creates a perverse incentive for
+ transports to use tiny segments, consequently also opening up a DoS
+ vulnerability. Fortunately, all the RED implementers who responded
+ to our admittedly limited survey (Section 4.2.4) have not followed
+ the earlier advice to use byte-mode drop, so the position this memo
+ argues for seems to already exist in implementations.
+
+ However, at the transport layer, TCP congestion control is a widely
+ deployed protocol that doesn't scale with packet size (i.e., its
+ reduction in rate does not take into account the size of a lost
+ packet). To date, this hasn't been a significant problem because
+ most TCP implementations have been used with similar packet sizes.
+ But, as we design new congestion control mechanisms, this memo
+ recommends that we build in scaling with packet size rather than
+ assuming that we should follow TCP's example.
+
+ This memo continues as follows. First, it discusses terminology and
+ scoping. Section 2 gives concrete formal recommendations, followed
+ by motivating arguments in Section 3. We then critically survey the
+ advice given previously in the RFC Series and the research literature
+ (Section 4), referring to an assessment of whether or not this advice
+ has been followed in production networks (Appendix A). To wrap up,
+ outstanding issues are discussed that will need resolution both to
+ inform future protocol designs and to handle legacy AQM deployments
+ (Section 5). Then security issues are collected together in
+ Section 6 before conclusions are drawn in Section 7. The interested
+ reader can find discussion of more detailed issues on the theme of
+ byte vs. packet in the appendices.
+
+ This memo intentionally includes a non-negligible amount of material
+ on the subject. For the busy reader, Section 2 summarises the
+ recommendations for the Internet community.
+
+
+
+Briscoe & Manner Best Current Practice [Page 5]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+1.1. Terminology and Scoping
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in [RFC2119].
+
+ This memo applies to the design of all AQM algorithms, for example,
+ Random Early Detection (RED) [RFC2309], BLUE [BLUE02], Pre-Congestion
+ Notification (PCN) [RFC5670], Controlled Delay (CoDel) [CoDel], and
+ the Proportional Integral controller Enhanced (PIE) [PIE].
+ Throughout, RED is used as a concrete example because it is a widely
+ known and deployed AQM algorithm. There is no intention to imply
+ that the advice is any less applicable to the other algorithms, nor
+ that RED is preferred.
+
+ Congestion Notification: Congestion notification is a changing
+ signal that aims to communicate the probability that the network
+ resource(s) will not be able to forward the level of traffic load
+ offered (or that there is an impending risk that they will not be
+ able to).
+
+ The 'impending risk' qualifier is added, because AQM systems set a
+ virtual limit smaller than the actual limit to the resource, then
+ notify the transport when this virtual limit is exceeded in order
+ to avoid uncontrolled congestion of the actual capacity.
+
+ Congestion notification communicates a real number bounded by the
+ range [ 0 , 1 ]. This ties in with the most well-understood
+ measure of congestion notification: drop probability.
+
+ Explicit and Implicit Notification: The byte vs. packet dilemma
+ concerns congestion notification irrespective of whether it is
+ signalled implicitly by drop or explicitly using ECN [RFC3168] or
+ PCN [RFC5670]. Throughout this document, unless clear from the
+ context, the term 'marking' will be used to mean notifying
+ congestion explicitly, while 'congestion notification' will be
+ used to mean notifying congestion either implicitly by drop or
+ explicitly by marking.
+
+ Bit-congestible vs. Packet-congestible: If the load on a resource
+ depends on the rate at which packets arrive, it is called 'packet-
+ congestible'. If the load depends on the rate at which bits
+ arrive, it is called 'bit-congestible'.
+
+
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 6]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ Examples of packet-congestible resources are route look-up engines
+ and firewalls, because load depends on how many packet headers
+ they have to process. Examples of bit-congestible resources are
+ transmission links, radio power, and most buffer memory, because
+ the load depends on how many bits they have to transmit or store.
+ Some machine architectures use fixed-size packet buffers, so
+ buffer memory in these cases is packet-congestible (see
+ Section 4.1.1).
+
+ The path through a machine will typically encounter both packet-
+ congestible and bit-congestible resources. However, currently, a
+ design goal of network processing equipment such as routers and
+ firewalls is to size the packet-processing engine(s) relative to
+ the lines in order to keep packet processing uncongested, even
+ under worst-case packet rates with runs of minimum-size packets.
+ Therefore, packet congestion is currently rare (see Section 3.3 of
+ [RFC6077]), but there is no guarantee that it will not become more
+ common in the future.
+
+ Note that information is generally processed or transmitted with a
+ minimum granularity greater than a bit (e.g., octets). The
+ appropriate granularity for the resource in question should be
+ used, but for the sake of brevity we will talk in terms of bytes
+ in this memo.
+
+ Coarser Granularity: Resources may be congestible at higher levels
+ of granularity than bits or packets, for instance stateful
+ firewalls are flow-congestible and call-servers are session-
+ congestible. This memo focuses on congestion of connectionless
+ resources, but the same principles may be applicable for
+ congestion notification protocols controlling per-flow and per-
+ session processing or state.
+
+ RED Terminology: In RED, whether to use packets or bytes when
+ measuring queues is called, respectively, 'packet-mode queue
+ measurement' or 'byte-mode queue measurement'. And whether the
+ probability of dropping a particular packet is independent or
+ dependent on its size is called, respectively, 'packet-mode drop'
+ or 'byte-mode drop'. The terms 'byte-mode' and 'packet-mode'
+ should not be used without specifying whether they apply to queue
+ measurement or to drop.
+
+1.2. Example Comparing Packet-Mode Drop and Byte-Mode Drop
+
+ Taking RED as a well-known example algorithm, a central question
+ addressed by this document is whether to recommend RED's packet-mode
+ drop variant and to deprecate byte-mode drop. Table 1 compares how
+ packet-mode and byte-mode drop affect two flows of different size
+
+
+
+Briscoe & Manner Best Current Practice [Page 7]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ packets. For each it gives the expected number of packets and of
+ bits dropped in one second. Each example flow runs at the same bit
+ rate of 48 Mbps, but one is broken up into small 60 byte packets and
+ the other into large 1,500 byte packets.
+
+ To keep up the same bit rate, in one second there are about 25 times
+ more small packets because they are 25 times smaller. As can be seen
+ from the table, the packet rate is 100,000 small packets versus 4,000
+ large packets per second (pps).
+
+ Parameter Formula Small packets Large packets
+ -------------------- --------------- ------------- -------------
+ Packet size s/8 60 B 1,500 B
+ Packet size s 480 b 12,000 b
+ Bit rate x 48 Mbps 48 Mbps
+ Packet rate u = x/s 100 kpps 4 kpps
+
+ Packet-mode Drop
+ Pkt-loss probability p 0.1% 0.1%
+ Pkt-loss rate p*u 100 pps 4 pps
+ Bit-loss rate p*u*s 48 kbps 48 kbps
+
+ Byte-mode Drop MTU, M=12,000 b
+ Pkt-loss probability b = p*s/M 0.004% 0.1%
+ Pkt-loss rate b*u 4 pps 4 pps
+ Bit-loss rate b*u*s 1.92 kbps 48 kbps
+
+ Table 1: Example Comparing Packet-Mode and Byte-Mode Drop
+
+ For packet-mode drop, we illustrate the effect of a drop probability
+ of 0.1%, which the algorithm applies to all packets irrespective of
+ size. Because there are 25 times more small packets in one second,
+ it naturally drops 25 times more small packets, that is, 100 small
+ packets but only 4 large packets. But if we count how many bits it
+ drops, there are 48,000 bits in 100 small packets and 48,000 bits in
+ 4 large packets -- the same number of bits of small packets as large.
+
+ The packet-mode drop algorithm drops any bit with the same
+ probability whether the bit is in a small or a large packet.
+
+ For byte-mode drop, again we use an example drop probability of 0.1%,
+ but only for maximum size packets (assuming the link maximum
+ transmission unit (MTU) is 1,500 B or 12,000 b). The byte-mode
+ algorithm reduces the drop probability of smaller packets
+ proportional to their size, making the probability that it drops a
+ small packet 25 times smaller at 0.004%. But there are 25 times more
+ small packets, so dropping them with 25 times lower probability
+ results in dropping the same number of packets: 4 drops in both
+
+
+
+Briscoe & Manner Best Current Practice [Page 8]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ cases. The 4 small dropped packets contain 25 times less bits than
+ the 4 large dropped packets: 1,920 compared to 48,000.
+
+ The byte-mode drop algorithm drops any bit with a probability
+ proportionate to the size of the packet it is in.
+
+2. Recommendations
+
+ This section gives recommendations related to network equipment in
+ Sections 2.1 and 2.2, and we discuss the implications on transport
+ protocols in Sections 2.3 and 2.4.
+
+2.1. Recommendation on Queue Measurement
+
+ Ideally, an AQM would measure the service time of the queue to
+ measure congestion of a resource. However service time can only be
+ measured as packets leave the queue, where it is not always expedient
+ to implement a full AQM algorithm. To predict the service time as
+ packets join the queue, an AQM algorithm needs to measure the length
+ of the queue.
+
+ In this case, if the resource is bit-congestible, the AQM
+ implementation SHOULD measure the length of the queue in bytes and,
+ if the resource is packet-congestible, the implementation SHOULD
+ measure the length of the queue in packets. Subject to the
+ exceptions below, no other choice makes sense, because the number of
+ packets waiting in the queue isn't relevant if the resource gets
+ congested by bytes and vice versa. For example, the length of the
+ queue into a transmission line would be measured in bytes, while the
+ length of the queue into a firewall would be measured in packets.
+
+ To avoid the pathological effects of tail drop, the AQM can then
+ transform this service time or queue length into the probability of
+ dropping or marking a packet (e.g., RED's piecewise linear function
+ between thresholds).
+
+ What this advice means for RED as a specific example:
+
+ 1. A RED implementation SHOULD use byte-mode queue measurement for
+ measuring the congestion of bit-congestible resources and packet-
+ mode queue measurement for packet-congestible resources.
+
+ 2. An implementation SHOULD NOT make it possible to configure the
+ way a queue measures itself, because whether a queue is bit-
+ congestible or packet-congestible is an inherent property of the
+ queue.
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 9]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ Exceptions to these recommendations might be necessary, for instance
+ where a packet-congestible resource has to be configured as a proxy
+ bottleneck for a bit-congestible resource in an adjacent box that
+ does not support AQM.
+
+ The recommended approach in less straightforward scenarios, such as
+ fixed-size packet buffers, resources without a queue, and buffers
+ comprising a mix of packet and bit-congestible resources, is
+ discussed in Section 4.1. For instance, Section 4.1.1 explains that
+ the queue into a line should be measured in bytes even if the queue
+ consists of fixed-size packet buffers, because the root cause of any
+ congestion is bytes arriving too fast for the line -- packets filling
+ buffers are merely a symptom of the underlying congestion of the
+ line.
+
+2.2. Recommendation on Encoding Congestion Notification
+
+ When encoding congestion notification (e.g., by drop, ECN, or PCN),
+ the probability that network equipment drops or marks a particular
+ packet to notify congestion SHOULD NOT depend on the size of the
+ packet in question. As the example in Section 1.2 illustrates, to
+ drop any bit with probability 0.1%, it is only necessary to drop
+ every packet with probability 0.1% without regard to the size of each
+ packet.
+
+ This approach ensures the network layer offers sufficient congestion
+ information for all known and future transport protocols and also
+ ensures no perverse incentives are created that would encourage
+ transports to use inappropriately small packet sizes.
+
+ What this advice means for RED as a specific example:
+
+ 1. The RED AQM algorithm SHOULD NOT use byte-mode drop, i.e., it
+ ought to use packet-mode drop. Byte-mode drop is more complex,
+ it creates the perverse incentive to fragment segments into tiny
+ pieces and it is vulnerable to floods of small packets.
+
+ 2. If a vendor has implemented byte-mode drop, and an operator has
+ turned it on, it is RECOMMENDED that the operator use packet-mode
+ drop instead, after establishing if there are any implications on
+ the relative performance of applications using different packet
+ sizes. The unlikely possibility of some application-specific
+ legacy use of byte-mode drop is the only reason that all the
+ above recommendations on encoding congestion notification are not
+ phrased more strongly.
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 10]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ RED as a whole SHOULD NOT be switched off. Without RED, a tail-
+ drop queue biases against large packets and is vulnerable to
+ floods of small packets.
+
+ Note well that RED's byte-mode queue drop is completely orthogonal to
+ byte-mode queue measurement and should not be confused with it. If a
+ RED implementation has a byte-mode but does not specify what sort of
+ byte-mode, it is most probably byte-mode queue measurement, which is
+ fine. However, if in doubt, the vendor should be consulted.
+
+ A survey (Appendix A) showed that there appears to be little, if any,
+ installed base of the byte-mode drop variant of RED. This suggests
+ that deprecating byte-mode drop will have little, if any, incremental
+ deployment impact.
+
+2.3. Recommendation on Responding to Congestion
+
+ When a transport detects that a packet has been lost or congestion
+ marked, it SHOULD consider the strength of the congestion indication
+ as proportionate to the size in octets (bytes) of the missing or
+ marked packet.
+
+ In other words, when a packet indicates congestion (by being lost or
+ marked), it can be considered conceptually as if there is a
+ congestion indication on every octet of the packet, not just one
+ indication per packet.
+
+ To be clear, the above recommendation solely describes how a
+ transport should interpret the meaning of a congestion indication, as
+ a long term goal. It makes no recommendation on whether a transport
+ should act differently based on this interpretation. It merely aids
+ interoperability between transports, if they choose to make their
+ actions depend on the strength of congestion indications.
+
+ This definition will be useful as the IETF transport area continues
+ its programme of:
+
+ o updating host-based congestion control protocols to take packet
+ size into account, and
+
+ o making transports less sensitive to losing control packets like
+ SYNs and pure ACKs.
+
+
+
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 11]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ What this advice means for the case of TCP:
+
+ 1. If two TCP flows with different packet sizes are required to run
+ at equal bit rates under the same path conditions, this SHOULD be
+ done by altering TCP (Section 4.2.2), not network equipment (the
+ latter affects other transports besides TCP).
+
+ 2. If it is desired to improve TCP performance by reducing the
+ chance that a SYN or a pure ACK will be dropped, this SHOULD be
+ done by modifying TCP (Section 4.2.3), not network equipment.
+
+ To be clear, we are not recommending at all that TCPs under
+ equivalent conditions should aim for equal bit rates. We are merely
+ saying that anyone trying to do such a thing should modify their TCP
+ algorithm, not the network.
+
+ These recommendations are phrased as 'SHOULD' rather than 'MUST',
+ because there may be cases where expediency dictates that
+ compatibility with pre-existing versions of a transport protocol make
+ the recommendations impractical.
+
+2.4. Recommendation on Handling Congestion Indications When Splitting
+ or Merging Packets
+
+ Packets carrying congestion indications may be split or merged in
+ some circumstances (e.g., at an RTP / RTP Control Protocol (RTCP)
+ transcoder or during IP fragment reassembly). Splitting and merging
+ only make sense in the context of ECN, not loss.
+
+ The general rule to follow is that the number of octets in packets
+ with congestion indications SHOULD be equivalent before and after
+ merging or splitting. This is based on the principle used above;
+ that an indication of congestion on a packet can be considered as an
+ indication of congestion on each octet of the packet.
+
+ The above rule is not phrased with the word 'MUST' to allow the
+ following exception. There are cases in which pre-existing protocols
+ were not designed to conserve congestion-marked octets (e.g., IP
+ fragment reassembly [RFC3168] or loss statistics in RTCP receiver
+ reports [RFC3550] before ECN was added [RFC6679]). When any such
+ protocol is updated, it SHOULD comply with the above rule to conserve
+ marked octets. However, the rule may be relaxed if it would
+ otherwise become too complex to interoperate with pre-existing
+ implementations of the protocol.
+
+ One can think of a splitting or merging process as if all the
+ incoming congestion-marked octets increment a counter and all the
+ outgoing marked octets decrement the same counter. In order to
+
+
+
+Briscoe & Manner Best Current Practice [Page 12]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ ensure that congestion indications remain timely, even the smallest
+ positive remainder in the conceptual counter should trigger the next
+ outgoing packet to be marked (causing the counter to go negative).
+
+3. Motivating Arguments
+
+ This section is informative. It justifies the recommendations made
+ in the previous section.
+
+3.1. Avoiding Perverse Incentives to (Ab)use Smaller Packets
+
+ Increasingly, it is being recognised that a protocol design must take
+ care not to cause unintended consequences by giving the parties in
+ the protocol exchange perverse incentives [Evol_cc] [RFC3426]. Given
+ there are many good reasons why larger path maximum transmission
+ units (PMTUs) would help solve a number of scaling issues, we do not
+ want to create any bias against large packets that is greater than
+ their true cost.
+
+ Imagine a scenario where the same bit rate of packets will contribute
+ the same to bit congestion of a link irrespective of whether it is
+ sent as fewer larger packets or more smaller packets. A protocol
+ design that caused larger packets to be more likely to be dropped
+ than smaller ones would be dangerous in both of the following cases:
+
+ Malicious transports: A queue that gives an advantage to small
+ packets can be used to amplify the force of a flooding attack. By
+ sending a flood of small packets, the attacker can get the queue
+ to discard more large-packet traffic, allowing more attack traffic
+ to get through to cause further damage. Such a queue allows
+ attack traffic to have a disproportionately large effect on
+ regular traffic without the attacker having to do much work.
+
+ Non-malicious transports: Even if an application designer is not
+ actually malicious, if over time it is noticed that small packets
+ tend to go faster, designers will act in their own interest and
+ use smaller packets. Queues that give advantage to small packets
+ create an evolutionary pressure for applications or transports to
+ send at the same bit rate but break their data stream down into
+ tiny segments to reduce their drop rate. Encouraging a high
+ volume of tiny packets might in turn unnecessarily overload a
+ completely unrelated part of the system, perhaps more limited by
+ header processing than bandwidth.
+
+ Imagine that two unresponsive flows arrive at a bit-congestible
+ transmission link each with the same bit rate, say 1 Mbps, but one
+ consists of 1,500 B and the other 60 B packets, which are 25x
+ smaller. Consider a scenario where gentle RED [gentle_RED] is used,
+
+
+
+Briscoe & Manner Best Current Practice [Page 13]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ along with the variant of RED we advise against, i.e., where the RED
+ algorithm is configured to adjust the drop probability of packets in
+ proportion to each packet's size (byte-mode packet drop). In this
+ case, RED aims to drop 25x more of the larger packets than the
+ smaller ones. Thus, for example, if RED drops 25% of the larger
+ packets, it will aim to drop 1% of the smaller packets (but, in
+ practice, it may drop more as congestion increases; see Appendix B.4
+ of [RFC4828]). Even though both flows arrive with the same bit rate,
+ the bit rate the RED queue aims to pass to the line will be 750 kbps
+ for the flow of larger packets but 990 kbps for the smaller packets
+ (because of rate variations, it will actually be a little less than
+ this target).
+
+ Note that, although the byte-mode drop variant of RED amplifies
+ small-packet attacks, tail-drop queues amplify small-packet attacks
+ even more (see Security Considerations in Section 6). Wherever
+ possible, neither should be used.
+
+3.2. Small != Control
+
+ Dropping fewer control packets considerably improves performance. It
+ is tempting to drop small packets with lower probability in order to
+ improve performance, because many control packets tend to be smaller
+ (TCP SYNs and ACKs, DNS queries and responses, SIP messages, HTTP
+ GETs, etc). However, we must not give control packets preference
+ purely by virtue of their smallness, otherwise it is too easy for any
+ data source to get the same preferential treatment simply by sending
+ data in smaller packets. Again, we should not create perverse
+ incentives to favour small packets rather than to favour control
+ packets, which is what we intend.
+
+ Just because many control packets are small does not mean all small
+ packets are control packets.
+
+ So, rather than fix these problems in the network, we argue that the
+ transport should be made more robust against losses of control
+ packets (see Section 4.2.3).
+
+3.3. Transport-Independent Network
+
+ TCP congestion control ensures that flows competing for the same
+ resource each maintain the same number of segments in flight,
+ irrespective of segment size. So under similar conditions, flows
+ with different segment sizes will get different bit rates.
+
+ To counter this effect, it seems tempting not to follow our
+ recommendation, and instead for the network to bias congestion
+ notification by packet size in order to equalise the bit rates of
+
+
+
+Briscoe & Manner Best Current Practice [Page 14]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ flows with different packet sizes. However, in order to do this, the
+ queuing algorithm has to make assumptions about the transport, which
+ become embedded in the network. Specifically:
+
+ o The queuing algorithm has to assume how aggressively the transport
+ will respond to congestion (see Section 4.2.4). If the network
+ assumes the transport responds as aggressively as TCP NewReno, it
+ will be wrong for Compound TCP and differently wrong for Cubic
+ TCP, etc. To achieve equal bit rates, each transport then has to
+ guess what assumption the network made, and work out how to
+ replace this assumed aggressiveness with its own aggressiveness.
+
+ o Also, if the network biases congestion notification by packet
+ size, it has to assume a baseline packet size -- all proposed
+ algorithms use the local MTU (for example, see the byte-mode loss
+ probability formula in Table 1). Then if the non-Reno transports
+ mentioned above are trying to reverse engineer what the network
+ assumed, they also have to guess the MTU of the congested link.
+
+ Even though reducing the drop probability of small packets (e.g.,
+ RED's byte-mode drop) helps ensure TCP flows with different packet
+ sizes will achieve similar bit rates, we argue that this correction
+ should be made to any future transport protocols based on TCP, not to
+ the network in order to fix one transport, no matter how predominant
+ it is. Effectively, favouring small packets is reverse engineering
+ of network equipment around one particular transport protocol (TCP),
+ contrary to the excellent advice in [RFC3426], which asks designers
+ to question "Why are you proposing a solution at this layer of the
+ protocol stack, rather than at another layer?"
+
+ In contrast, if the network never takes packet size into account, the
+ transport can be certain it will never need to guess any assumptions
+ that the network has made. And the network passes two pieces of
+ information to the transport that are sufficient in all cases: i)
+ congestion notification on the packet and ii) the size of the packet.
+ Both are available for the transport to combine (by taking packet
+ size into account when responding to congestion) or not. Appendix B
+ checks that these two pieces of information are sufficient for all
+ relevant scenarios.
+
+ When the network does not take packet size into account, it allows
+ transport protocols to choose whether or not to take packet size into
+ account. However, if the network were to bias congestion
+ notification by packet size, transport protocols would have no
+ choice; those that did not take into account packet size themselves
+ would unwittingly become dependent on packet size, and those that
+ already took packet size into account would end up taking it into
+ account twice.
+
+
+
+Briscoe & Manner Best Current Practice [Page 15]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+3.4. Partial Deployment of AQM
+
+ In overview, the argument in this section runs as follows:
+
+ o Because the network does not and cannot always drop packets in
+ proportion to their size, it shouldn't be given the task of making
+ drop signals depend on packet size at all.
+
+ o Transports on the other hand don't always want to make their rate
+ response proportional to the size of dropped packets, but if they
+ want to, they always can.
+
+ The argument is similar to the end-to-end argument that says "Don't
+ do X in the network if end systems can do X by themselves, and they
+ want to be able to choose whether to do X anyway". Actually the
+ following argument is stronger; in addition it says "Don't give the
+ network task X that could be done by the end systems, if X is not
+ deployed on all network nodes, and end systems won't be able to tell
+ whether their network is doing X, or whether they need to do X
+ themselves." In this case, the X in question is "making the response
+ to congestion depend on packet size".
+
+ We will now re-run this argument reviewing each step in more depth.
+ The argument applies solely to drop, not to ECN marking.
+
+ A queue drops packets for either of two reasons: a) to signal to host
+ congestion controls that they should reduce the load and b) because
+ there is no buffer left to store the packets. Active queue
+ management tries to use drops as a signal for hosts to slow down
+ (case a) so that drops due to buffer exhaustion (case b) should not
+ be necessary.
+
+ AQM is not universally deployed in every queue in the Internet; many
+ cheap Ethernet bridges, software firewalls, NATs on consumer devices,
+ etc implement simple tail-drop buffers. Even if AQM were universal,
+ it has to be able to cope with buffer exhaustion (by switching to a
+ behaviour like tail drop), in order to cope with unresponsive or
+ excessive transports. For these reasons networks will sometimes be
+ dropping packets as a last resort (case b) rather than under AQM
+ control (case a).
+
+ When buffers are exhausted (case b), they don't naturally drop
+ packets in proportion to their size. The network can only reduce the
+ probability of dropping smaller packets if it has enough space to
+ store them somewhere while it waits for a larger packet that it can
+ drop. If the buffer is exhausted, it does not have this choice.
+ Admittedly tail drop does naturally drop somewhat fewer small
+ packets, but exactly how few depends more on the mix of sizes than
+
+
+
+Briscoe & Manner Best Current Practice [Page 16]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ the size of the packet in question. Nonetheless, in general, if we
+ wanted networks to do size-dependent drop, we would need universal
+ deployment of (packet-size dependent) AQM code, which is currently
+ unrealistic.
+
+ A host transport cannot know whether any particular drop was a
+ deliberate signal from an AQM or a sign of a queue shedding packets
+ due to buffer exhaustion. Therefore, because the network cannot
+ universally do size-dependent drop, it should not do it all.
+
+ Whereas universality is desirable in the network, diversity is
+ desirable between different transport-layer protocols -- some, like
+ standards track TCP congestion control [RFC5681], may not choose to
+ make their rate response proportionate to the size of each dropped
+ packet, while others will (e.g., TCP-Friendly Rate Control for Small
+ Packets (TFRC-SP) [RFC4828]).
+
+3.5. Implementation Efficiency
+
+ Biasing against large packets typically requires an extra multiply
+ and divide in the network (see the example byte-mode drop formula in
+ Table 1). Taking packet size into account at the transport rather
+ than in the network ensures that neither the network nor the
+ transport needs to do a multiply operation -- multiplication by
+ packet size is effectively achieved as a repeated add when the
+ transport adds to its count of marked bytes as each congestion event
+ is fed to it. Also, the work to do the biasing is spread over many
+ hosts, rather than concentrated in just the congested network
+ element. These aren't principled reasons in themselves, but they are
+ a happy consequence of the other principled reasons.
+
+4. A Survey and Critique of Past Advice
+
+ This section is informative, not normative.
+
+ The original 1993 paper on RED [RED93] proposed two options for the
+ RED active queue management algorithm: packet mode and byte mode.
+ Packet mode measured the queue length in packets and dropped (or
+ marked) individual packets with a probability independent of their
+ size. Byte mode measured the queue length in bytes and marked an
+ individual packet with probability in proportion to its size
+ (relative to the maximum packet size). In the paper's outline of
+ further work, it was stated that no recommendation had been made on
+ whether the queue size should be measured in bytes or packets, but
+ noted that the difference could be significant.
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 17]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ When RED was recommended for general deployment in 1998 [RFC2309],
+ the two modes were mentioned implying the choice between them was a
+ question of performance, referring to a 1997 email [pktByteEmail] for
+ advice on tuning. A later addendum to this email introduced the
+ insight that there are in fact two orthogonal choices:
+
+ o whether to measure queue length in bytes or packets (Section 4.1),
+ and
+
+ o whether the drop probability of an individual packet should depend
+ on its own size (Section 4.2).
+
+ The rest of this section is structured accordingly.
+
+4.1. Congestion Measurement Advice
+
+ The choice of which metric to use to measure queue length was left
+ open in RFC 2309. It is now well understood that queues for bit-
+ congestible resources should be measured in bytes, and queues for
+ packet-congestible resources should be measured in packets
+ [pktByteEmail].
+
+ Congestion in some legacy bit-congestible buffers is only measured in
+ packets not bytes. In such cases, the operator has to take into
+ account a typical mix of packet sizes when setting the thresholds.
+ Any AQM algorithm on such a buffer will be oversensitive to high
+ proportions of small packets, e.g., a DoS attack, and under-sensitive
+ to high proportions of large packets. However, there is no need to
+ make allowances for the possibility of such a legacy in future
+ protocol design. This is safe because any under-sensitivity during
+ unusual traffic mixes cannot lead to congestion collapse given that
+ the buffer will eventually revert to tail drop, which discards
+ proportionately more large packets.
+
+4.1.1. Fixed-Size Packet Buffers
+
+ The question of whether to measure queues in bytes or packets seems
+ to be well understood. However, measuring congestion is confusing
+ when the resource is bit-congestible but the queue into the resource
+ is packet-congestible. This section outlines the approach to take.
+
+ Some, mostly older, queuing hardware allocates fixed-size buffers in
+ which to store each packet in the queue. This hardware forwards
+ packets to the line in one of two ways:
+
+ o With some hardware, any fixed-size buffers not completely filled
+ by a packet are padded when transmitted to the wire. This case
+ should clearly be treated as packet-congestible, because both
+
+
+
+Briscoe & Manner Best Current Practice [Page 18]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ queuing and transmission are in fixed MTU-size units. Therefore,
+ the queue length in packets is a good model of congestion of the
+ link.
+
+ o More commonly, hardware with fixed-size packet buffers transmits
+ packets to the line without padding. This implies a hybrid
+ forwarding system with transmission congestion dependent on the
+ size of packets but queue congestion dependent on the number of
+ packets, irrespective of their size.
+
+ Nonetheless, there would be no queue at all unless the line had
+ become congested -- the root cause of any congestion is too many
+ bytes arriving for the line. Therefore, the AQM should measure
+ the queue length as the sum of all the packet sizes in bytes that
+ are queued up waiting to be serviced by the line, irrespective of
+ whether each packet is held in a fixed-size buffer.
+
+ In the (unlikely) first case where use of padding means the queue
+ should be measured in packets, further confusion is likely because
+ the fixed buffers are rarely all one size. Typically, pools of
+ different-sized buffers are provided (Cisco uses the term 'buffer
+ carving' for the process of dividing up memory into these pools
+ [IOSArch]). Usually, if the pool of small buffers is exhausted,
+ arriving small packets can borrow space in the pool of large buffers,
+ but not vice versa. However, there is no need to consider all this
+ complexity, because the root cause of any congestion is still line
+ overload -- buffer consumption is only the symptom. Therefore, the
+ length of the queue should be measured as the sum of the bytes in the
+ queue that will be transmitted to the line, including any padding.
+ In the (unusual) case of transmission with padding, this means the
+ sum of the sizes of the small buffers queued plus the sum of the
+ sizes of the large buffers queued.
+
+ We will return to borrowing of fixed-size buffers when we discuss
+ biasing the drop/marking probability of a specific packet because of
+ its size in Section 4.2.1. But here, we can repeat the simple rule
+ for how to measure the length of queues of fixed buffers: no matter
+ how complicated the buffering scheme is, ultimately a transmission
+ line is nearly always bit-congestible so the number of bytes queued
+ up waiting for the line measures how congested the line is, and it is
+ rarely important to measure how congested the buffering system is.
+
+4.1.2. Congestion Measurement without a Queue
+
+ AQM algorithms are nearly always described assuming there is a queue
+ for a congested resource and the algorithm can use the queue length
+ to determine the probability that it will drop or mark each packet.
+ But not all congested resources lead to queues. For instance, power-
+
+
+
+Briscoe & Manner Best Current Practice [Page 19]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ limited resources are usually bit-congestible if energy is primarily
+ required for transmission rather than header processing, but it is
+ rare for a link protocol to build a queue as it approaches maximum
+ power.
+
+ Nonetheless, AQM algorithms do not require a queue in order to work.
+ For instance, spectrum congestion can be modelled by signal quality
+ using the target bit-energy-to-noise-density ratio. And, to model
+ radio power exhaustion, transmission-power levels can be measured and
+ compared to the maximum power available. [ECNFixedWireless] proposes
+ a practical and theoretically sound way to combine congestion
+ notification for different bit-congestible resources at different
+ layers along an end-to-end path, whether wireless or wired, and
+ whether with or without queues.
+
+ In wireless protocols that use request to send / clear to send
+ (RTS / CTS) control, such as some variants of IEEE802.11, it is
+ reasonable to base an AQM on the time spent waiting for transmission
+ opportunities (TXOPs) even though the wireless spectrum is usually
+ regarded as congested by bits (for a given coding scheme). This is
+ because requests for TXOPs queue up as the spectrum gets congested by
+ all the bits being transferred. So the time that TXOPs are queued
+ directly reflects bit congestion of the spectrum.
+
+4.2. Congestion Notification Advice
+
+4.2.1. Network Bias When Encoding
+
+4.2.1.1. Advice on Packet-Size Bias in RED
+
+ The previously mentioned email [pktByteEmail] referred to by
+ [RFC2309] advised that most scarce resources in the Internet were
+ bit-congestible, which is still believed to be true (Section 1.1).
+ But it went on to offer advice that is updated by this memo. It said
+ that drop probability should depend on the size of the packet being
+ considered for drop if the resource is bit-congestible, but not if it
+ is packet-congestible. The argument continued that if packet drops
+ were inflated by packet size (byte-mode dropping), "a flow's fraction
+ of the packet drops is then a good indication of that flow's fraction
+ of the link bandwidth in bits per second". This was consistent with
+ a referenced policing mechanism being worked on at the time for
+ detecting unusually high bandwidth flows, eventually published in
+ 1999 [pBox]. However, the problem could and should have been solved
+ by making the policing mechanism count the volume of bytes randomly
+ dropped, not the number of packets.
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 20]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ A few months before RFC 2309 was published, an addendum was added to
+ the above archived email referenced from the RFC, in which the final
+ paragraph seemed to partially retract what had previously been said.
+ It clarified that the question of whether the probability of
+ dropping/marking a packet should depend on its size was not related
+ to whether the resource itself was bit-congestible, but a completely
+ orthogonal question. However, the only example given had the queue
+ measured in packets but packet drop depended on the size of the
+ packet in question. No example was given the other way round.
+
+ In 2000, Cnodder et al. [REDbyte] pointed out that there was an error
+ in the part of the original 1993 RED algorithm that aimed to
+ distribute drops uniformly, because it didn't correctly take into
+ account the adjustment for packet size. They recommended an
+ algorithm called RED_4 to fix this. But they also recommended a
+ further change, RED_5, to adjust the drop rate dependent on the
+ square of the relative packet size. This was indeed consistent with
+ one implied motivation behind RED's byte-mode drop -- that we should
+ reverse engineer the network to improve the performance of dominant
+ end-to-end congestion control mechanisms. This memo makes a
+ different recommendations in Section 2.
+
+ By 2003, a further change had been made to the adjustment for packet
+ size, this time in the RED algorithm of the ns2 simulator. Instead
+ of taking each packet's size relative to a 'maximum packet size', it
+ was taken relative to a 'mean packet size', intended to be a static
+ value representative of the 'typical' packet size on the link. We
+ have not been able to find a justification in the literature for this
+ change; however, Eddy and Allman conducted experiments [REDbias] that
+ assessed how sensitive RED was to this parameter, amongst other
+ things. This changed algorithm can often lead to drop probabilities
+ of greater than 1 (which gives a hint that there is probably a
+ mistake in the theory somewhere).
+
+ On 10-Nov-2004, this variant of byte-mode packet drop was made the
+ default in the ns2 simulator. It seems unlikely that byte-mode drop
+ has ever been implemented in production networks (Appendix A);
+ therefore, any conclusions based on ns2 simulations that use RED
+ without disabling byte-mode drop are likely to behave very
+ differently from RED in production networks.
+
+4.2.1.2. Packet-Size Bias Regardless of AQM
+
+ The byte-mode drop variant of RED (or a similar variant of other AQM
+ algorithms) is not the only possible bias towards small packets in
+ queuing systems. We have already mentioned that tail-drop queues
+ naturally tend to lock out large packets once they are full.
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 21]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ But also, queues with fixed-size buffers reduce the probability that
+ small packets will be dropped if (and only if) they allow small
+ packets to borrow buffers from the pools for larger packets (see
+ Section 4.1.1). Borrowing effectively makes the maximum queue size
+ for small packets greater than that for large packets, because more
+ buffers can be used by small packets while less will fit large
+ packets. Incidentally, the bias towards small packets from buffer
+ borrowing is nothing like as large as that of RED's byte-mode drop.
+
+ Nonetheless, fixed-buffer memory with tail drop is still prone to
+ lock out large packets, purely because of the tail-drop aspect. So,
+ fixed-size packet buffers should be augmented with a good AQM
+ algorithm and packet-mode drop. If an AQM is too complicated to
+ implement with multiple fixed buffer pools, the minimum necessary to
+ prevent large-packet lockout is to ensure that smaller packets never
+ use the last available buffer in any of the pools for larger packets.
+
+4.2.2. Transport Bias When Decoding
+
+ The above proposals to alter the network equipment to bias towards
+ smaller packets have largely carried on outside the IETF process.
+ Whereas, within the IETF, there are many different proposals to alter
+ transport protocols to achieve the same goals, i.e., either to make
+ the flow bit rate take into account packet size, or to protect
+ control packets from loss. This memo argues that altering transport
+ protocols is the more principled approach.
+
+ A recently approved experimental RFC adapts its transport-layer
+ protocol to take into account packet sizes relative to typical TCP
+ packet sizes. This proposes a new small-packet variant of TCP-
+ friendly rate control (TFRC [RFC5348]), which is called TFRC-SP
+ [RFC4828]. Essentially, it proposes a rate equation that inflates
+ the flow rate by the ratio of a typical TCP segment size (1,500 B
+ including TCP header) over the actual segment size [PktSizeEquCC].
+ (There are also other important differences of detail relative to
+ TFRC, such as using virtual packets [CCvarPktSize] to avoid
+ responding to multiple losses per round trip and using a minimum
+ inter-packet interval.)
+
+ Section 4.5.1 of the TFRC-SP specification discusses the implications
+ of operating in an environment where queues have been configured to
+ drop smaller packets with proportionately lower probability than
+ larger ones. But it only discusses TCP operating in such an
+ environment, only mentioning TFRC-SP briefly when discussing how to
+ define fairness with TCP. And it only discusses the byte-mode
+ dropping version of RED as it was before Cnodder et al. pointed out
+ that it didn't sufficiently bias towards small packets to make TCP
+ independent of packet size.
+
+
+
+Briscoe & Manner Best Current Practice [Page 22]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ So the TFRC-SP specification doesn't address the issue of whether the
+ network or the transport _should_ handle fairness between different
+ packet sizes. In Appendix B.4 of RFC 4828, it discusses the
+ possibility of both TFRC-SP and some network buffers duplicating each
+ other's attempts to deliberately bias towards small packets. But the
+ discussion is not conclusive, instead reporting simulations of many
+ of the possibilities in order to assess performance but not
+ recommending any particular course of action.
+
+ The paper originally proposing TFRC with virtual packets (VP-TFRC)
+ [CCvarPktSize] proposed that there should perhaps be two variants to
+ cater for the different variants of RED. However, as the TFRC-SP
+ authors point out, there is no way for a transport to know whether
+ some queues on its path have deployed RED with byte-mode packet drop
+ (except if an exhaustive survey found that no one has deployed it! --
+ see Appendix A). Incidentally, VP-TFRC also proposed that byte-mode
+ RED dropping should really square the packet-size compensation factor
+ (like that of Cnodder's RED_5, but apparently unaware of it).
+
+ Pre-congestion notification [RFC5670] is an IETF technology to use a
+ virtual queue for AQM marking for packets within one Diffserv class
+ in order to give early warning prior to any real queuing. The PCN-
+ marking algorithms have been designed not to take into account packet
+ size when forwarding through queues. Instead, the general principle
+ has been to take the sizes of marked packets into account when
+ monitoring the fraction of marking at the edge of the network, as
+ recommended here.
+
+4.2.3. Making Transports Robust against Control Packet Losses
+
+ Recently, two RFCs have defined changes to TCP that make it more
+ robust against losing small control packets [RFC5562] [RFC5690]. In
+ both cases, they note that the case for these two TCP changes would
+ be weaker if RED were biased against dropping small packets. We
+ argue here that these two proposals are a safer and more principled
+ way to achieve TCP performance improvements than reverse engineering
+ RED to benefit TCP.
+
+ Although there are no known proposals, it would also be possible and
+ perfectly valid to make control packets robust against drop by
+ requesting a scheduling class with lower drop probability, which
+ would be achieved by re-marking to a Diffserv code point [RFC2474]
+ within the same behaviour aggregate.
+
+ Although not brought to the IETF, a simple proposal from Wischik
+ [DupTCP] suggests that the first three packets of every TCP flow
+ should be routinely duplicated after a short delay. It shows that
+ this would greatly improve the chances of short flows completing
+
+
+
+Briscoe & Manner Best Current Practice [Page 23]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ quickly, but it would hardly increase traffic levels on the Internet,
+ because Internet bytes have always been concentrated in the large
+ flows. It further shows that the performance of many typical
+ applications depends on completion of long serial chains of short
+ messages. It argues that, given most of the value people get from
+ the Internet is concentrated within short flows, this simple
+ expedient would greatly increase the value of the best-effort
+ Internet at minimal cost. A similar but more extensive approach has
+ been evaluated on Google servers [GentleAggro].
+
+ The proposals discussed in this sub-section are experimental
+ approaches that are not yet in wide operational use, but they are
+ existence proofs that transports can make themselves robust against
+ loss of control packets. The examples are all TCP-based, but
+ applications over non-TCP transports could mitigate loss of control
+ packets by making similar use of Diffserv, data duplication, FEC,
+ etc.
+
+4.2.4. Congestion Notification: Summary of Conflicting Advice
+
+ +-----------+-----------------+-----------------+-------------------+
+ | transport | RED_1 (packet- | RED_4 (linear | RED_5 (square |
+ | cc | mode drop) | byte-mode drop) | byte-mode drop) |
+ +-----------+-----------------+-----------------+-------------------+
+ | TCP or | s/sqrt(p) | sqrt(s/p) | 1/sqrt(p) |
+ | TFRC | | | |
+ | TFRC-SP | 1/sqrt(p) | 1/sqrt(s*p) | 1/(s*sqrt(p)) |
+ +-----------+-----------------+-----------------+-------------------+
+
+ Table 2: Dependence of flow bit rate per RTT on packet size, s, and
+ drop probability, p, when there is network and/or transport bias
+ towards small packets to varying degrees
+
+ Table 2 aims to summarise the potential effects of all the advice
+ from different sources. Each column shows a different possible AQM
+ behaviour in different queues in the network, using the terminology
+ of Cnodder et al. outlined earlier (RED_1 is basic RED with packet-
+ mode drop). Each row shows a different transport behaviour: TCP
+ [RFC5681] and TFRC [RFC5348] on the top row with TFRC-SP [RFC4828]
+ below. Each cell shows how the bits per round trip of a flow depends
+ on packet size, s, and drop probability, p. In order to declutter
+ the formulae to focus on packet-size dependence, they are all given
+ per round trip, which removes any RTT term.
+
+ Let us assume that the goal is for the bit rate of a flow to be
+ independent of packet size. Suppressing all inessential details, the
+ table shows that this should either be achievable by not altering the
+ TCP transport in a RED_5 network, or using the small packet TFRC-SP
+
+
+
+Briscoe & Manner Best Current Practice [Page 24]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ transport (or similar) in a network without any byte-mode dropping
+ RED (top right and bottom left). Top left is the 'do nothing'
+ scenario, while bottom right is the 'do both' scenario in which the
+ bit rate would become far too biased towards small packets. Of
+ course, if any form of byte-mode dropping RED has been deployed on a
+ subset of queues that congest, each path through the network will
+ present a different hybrid scenario to its transport.
+
+ Whatever the case, we can see that the linear byte-mode drop column
+ in the middle would considerably complicate the Internet. Even if
+ one believes the network should be doing the biasing, linear byte-
+ mode drop is a half-way house that doesn't bias enough towards small
+ packets. Section 2 recommends that _all_ bias in network equipment
+ towards small packets should be turned off -- if indeed any equipment
+ vendors have implemented it -- leaving packet-size bias solely as the
+ preserve of the transport layer (solely the leftmost, packet-mode
+ drop column).
+
+ In practice, it seems that no deliberate bias towards small packets
+ has been implemented for production networks. Of the 19% of vendors
+ who responded to a survey of 84 equipment vendors, none had
+ implemented byte-mode drop in RED (see Appendix A for details).
+
+5. Outstanding Issues and Next Steps
+
+5.1. Bit-congestible Network
+
+ For a connectionless network with nearly all resources being bit-
+ congestible, the recommended position is clear -- the network should
+ not make allowance for packet sizes and the transport should. This
+ leaves two outstanding issues:
+
+ o The question of how to handle any legacy AQM deployments using
+ byte-mode drop;
+
+ o The need to start a programme to update transport congestion
+ control protocol standards to take packet size into account.
+
+ A survey of equipment vendors (Section 4.2.4) found no evidence that
+ byte-mode packet drop had been implemented, so deployment will be
+ sparse at best. A migration strategy is not really needed to remove
+ an algorithm that may not even be deployed.
+
+ A programme of experimental updates to take packet size into account
+ in transport congestion control protocols has already started with
+ TFRC-SP [RFC4828].
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 25]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+5.2. Bit- and Packet-Congestible Network
+
+ The position is much less clear-cut if the Internet becomes populated
+ by a more even mix of both packet-congestible and bit-congestible
+ resources (see Appendix B.2). This problem is not pressing, because
+ most Internet resources are designed to be bit-congestible before
+ packet processing starts to congest (see Section 1.1).
+
+ The IRTF's Internet Congestion Control Research Group (ICCRG) has set
+ itself the task of reaching consensus on generic forwarding
+ mechanisms that are necessary and sufficient to support the
+ Internet's future congestion control requirements (the first
+ challenge in [RFC6077]). The research question of whether packet
+ congestion might become common and what to do if it does may in the
+ future be explored in the IRTF (the "Challenge 3: Packet Size" in
+ [RFC6077]).
+
+ Note that sometimes it seems that resources might be congested by
+ neither bits nor packets, e.g., where the queue for access to a
+ wireless medium is in units of transmission opportunities. However,
+ the root cause of congestion of the underlying spectrum is overload
+ of bits (see Section 4.1.2).
+
+6. Security Considerations
+
+ This memo recommends that queues do not bias drop probability due to
+ packets size. For instance, dropping small packets less often than
+ large ones creates a perverse incentive for transports to break down
+ their flows into tiny segments. One of the benefits of implementing
+ AQM was meant to be to remove this perverse incentive that tail-drop
+ queues gave to small packets.
+
+ In practice, transports cannot all be trusted to respond to
+ congestion. So another reason for recommending that queues not bias
+ drop probability towards small packets is to avoid the vulnerability
+ to small-packet DDoS attacks that would otherwise result. One of the
+ benefits of implementing AQM was meant to be to remove tail drop's
+ DoS vulnerability to small packets, so we shouldn't add it back
+ again.
+
+ If most queues implemented AQM with byte-mode drop, the resulting
+ network would amplify the potency of a small-packet DDoS attack. At
+ the first queue, the stream of packets would push aside a greater
+ proportion of large packets, so more of the small packets would
+ survive to attack the next queue. Thus a flood of small packets
+ would continue on towards the destination, pushing regular traffic
+ with large packets out of the way in one queue after the next, but
+ suffering much less drop itself.
+
+
+
+Briscoe & Manner Best Current Practice [Page 26]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ Appendix C explains why the ability of networks to police the
+ response of _any_ transport to congestion depends on bit-congestible
+ network resources only doing packet-mode drop, not byte-mode drop.
+ In summary, it says that making drop probability depend on the size
+ of the packets that bits happen to be divided into simply encourages
+ the bits to be divided into smaller packets. Byte-mode drop would
+ therefore irreversibly complicate any attempt to fix the Internet's
+ incentive structures.
+
+7. Conclusions
+
+ This memo identifies the three distinct stages of the congestion
+ notification process where implementations need to decide whether to
+ take packet size into account. The recommendations provided in
+ Section 2 of this memo are different in each case:
+
+ o When network equipment measures the length of a queue, if it is
+ not feasible to use time; it is recommended to count in bytes if
+ the network resource is congested by bytes, or to count in packets
+ if is congested by packets.
+
+ o When network equipment decides whether to drop (or mark) a packet,
+ it is recommended that the size of the particular packet should
+ not be taken into account.
+
+ o However, when a transport algorithm responds to a dropped or
+ marked packet, the size of the rate reduction should be
+ proportionate to the size of the packet.
+
+ In summary, the answers are 'it depends', 'no', and 'yes',
+ respectively.
+
+ For the specific case of RED, this means that byte-mode queue
+ measurement will often be appropriate, but the use of byte-mode drop
+ is very strongly discouraged.
+
+ At the transport layer, the IETF should continue updating congestion
+ control protocols to take into account the size of each packet that
+ indicates congestion. Also, the IETF should continue to make
+ protocols less sensitive to losing control packets like SYNs, pure
+ ACKs, and DNS exchanges. Although many control packets happen to be
+ small, the alternative of network equipment favouring all small
+ packets would be dangerous. That would create perverse incentives to
+ split data transfers into smaller packets.
+
+ The memo develops these recommendations from principled arguments
+ concerning scaling, layering, incentives, inherent efficiency,
+ security, and 'policeability'. It also addresses practical issues
+
+
+
+Briscoe & Manner Best Current Practice [Page 27]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ such as specific buffer architectures and incremental deployment.
+ Indeed, a limited survey of RED implementations is discussed, which
+ shows there appears to be little, if any, installed base of RED's
+ byte-mode drop. Therefore, it can be deprecated with little, if any,
+ incremental deployment complications.
+
+ The recommendations have been developed on the well-founded basis
+ that most Internet resources are bit-congestible, not packet-
+ congestible. We need to know the likelihood that this assumption
+ will prevail in the longer term and, if it might not, what protocol
+ changes will be needed to cater for a mix of the two. The IRTF
+ Internet Congestion Control Research Group (ICCRG) is currently
+ working on these problems [RFC6077].
+
+8. Acknowledgements
+
+ Thank you to Sally Floyd, who gave extensive and useful review
+ comments. Also thanks for the reviews from Philip Eardley, David
+ Black, Fred Baker, David Taht, Toby Moncaster, Arnaud Jacquet, and
+ Mirja Kuehlewind, as well as helpful explanations of different
+ hardware approaches from Larry Dunn and Fred Baker. We are grateful
+ to Bruce Davie and his colleagues for providing a timely and
+ efficient survey of RED implementation in Cisco's product range.
+ Also, grateful thanks to Toby Moncaster, Will Dormann, John Regnault,
+ Simon Carter, and Stefaan De Cnodder who further helped survey the
+ current status of RED implementation and deployment, and, finally,
+ thanks to the anonymous individuals who responded.
+
+ Bob Briscoe and Jukka Manner were partly funded by Trilogy and
+ Trilogy 2, research projects (ICT-216372, ICT-317756) supported by
+ the European Community under its Seventh Framework Programme. The
+ views expressed here are those of the authors only.
+
+9. References
+
+9.1. Normative References
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+ [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
+ S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
+ Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
+ S., Wroclawski, J., and L. Zhang, "Recommendations on
+ Queue Management and Congestion Avoidance in the
+ Internet", RFC 2309, April 1998.
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 28]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, RFC
+ 2914, September 2000.
+
+ [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
+ of Explicit Congestion Notification (ECN) to IP", RFC
+ 3168, September 2001.
+
+9.2. Informative References
+
+ [BLUE02] Feng, W-c., Shin, K., Kandlur, D., and D. Saha, "The BLUE
+ active queue management algorithms", IEEE/ACM Transactions
+ on Networking 10(4) 513-528, August 2002,
+ <http://dx.doi.org/10.1109/TNET.2002.801399>.
+
+ [CCvarPktSize]
+ Widmer, J., Boutremans, C., and J-Y. Le Boudec, "End-to-
+ end congestion control for TCP-friendly flows with
+ variable packet size", ACM CCR 34(2) 137-151, April 2004,
+ <http://doi.acm.org/10.1145/997150.997162>.
+
+ [CHOKe_Var_Pkt]
+ Psounis, K., Pan, R., and B. Prabhaker, "Approximate Fair
+ Dropping for Variable-Length Packets", IEEE Micro
+ 21(1):48-56, January-February 2001,
+ <http://ieeexplore.ieee.org/xpl/
+ articleDetails.jsp?arnumber=903061>.
+
+ [CoDel] Nichols, K. and V. Jacobson, "Controlled Delay Active
+ Queue Management", Work in Progress, February 2013.
+
+ [DRQ] Shin, M., Chong, S., and I. Rhee, "Dual-Resource TCP/AQM
+ for Processing-Constrained Networks", IEEE/ACM
+ Transactions on Networking Vol 16, issue 2, April 2008,
+ <http://dx.doi.org/10.1109/TNET.2007.900415>.
+
+ [DupTCP] Wischik, D., "Short messages", Philosophical Transactions
+ of the Royal Society A 366(1872):1941-1953, June 2008,
+ <http://rsta.royalsocietypublishing.org/content/366/1872/
+ 1941.full.pdf+html>.
+
+ [ECNFixedWireless]
+ Siris, V., "Resource Control for Elastic Traffic in CDMA
+ Networks", Proc. ACM MOBICOM'02 , September 2002,
+ <http://www.ics.forth.gr/netlab/publications/
+ resource_control_elastic_cdma.html>.
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 29]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the
+ evolution of congestion control", Automatica
+ 35(12)1969-1985, December 1999,
+ <http://www.sciencedirect.com/science/article/pii/
+ S0005109899001351>.
+
+ [GentleAggro]
+ Flach, T., Dukkipati, N., Terzis, A., Raghavan, B.,
+ Cardwell, N., Cheng, Y., Jain, A., Hao, S., Katz-Bassett,
+ E., and R. Govindan, "Reducing web latency: the virtue of
+ gentle aggression", ACM SIGCOMM CCR 43(4)159-170, August
+ 2013, <http://doi.acm.org/10.1145/2486001.2486014>.
+
+ [IOSArch] Bollapragada, V., White, R., and C. Murphy, "Inside Cisco
+ IOS Software Architecture", Cisco Press: CCIE Professional
+ Development ISBN13: 978-1-57870-181-0, July 2000.
+
+ [PIE] Pan, R., Natarajan, P., Piglione, C., Prabhu, M.,
+ Subramanian, V., Baker, F., and B. Steeg, "PIE: A
+ Lightweight Control Scheme To Address the Bufferbloat
+ Problem", Work in Progress, February 2014.
+
+ [PktSizeEquCC]
+ Vasallo, P., "Variable Packet Size Equation-Based
+ Congestion Control", ICSI Technical Report tr-00-008,
+ 2000, <http://http.icsi.berkeley.edu/ftp/global/pub/
+ techreports/2000/tr-00-008.pdf>.
+
+ [RED93] Floyd, S. and V. Jacobson, "Random Early Detection (RED)
+ gateways for Congestion Avoidance", IEEE/ACM Transactions
+ on Networking 1(4) 397--413, August 1993,
+ <http://ieeexplore.ieee.org/xpls/
+ abs_all.jsp?arnumber=251892>.
+
+ [REDbias] Eddy, W. and M. Allman, "A Comparison of RED's Byte and
+ Packet Modes", Computer Networks 42(3) 261--280, June
+ 2003,
+ <http://www.ir.bbn.com/documents/articles/redbias.ps>.
+
+ [REDbyte] De Cnodder, S., Elloumi, O., and K. Pauwels, "Effect of
+ different packet sizes on RED performance", Proc. 5th IEEE
+ Symposium on Computers and Communications (ISCC) 793-799,
+ July 2000, <http://ieeexplore.ieee.org/xpls/
+ abs_all.jsp?arnumber=860741>.
+
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 30]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black,
+ "Definition of the Differentiated Services Field (DS
+ Field) in the IPv4 and IPv6 Headers", RFC 2474, December
+ 1998.
+
+ [RFC3426] Floyd, S., "General Architectural and Policy
+ Considerations", RFC 3426, November 2002.
+
+ [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
+ Jacobson, "RTP: A Transport Protocol for Real-Time
+ Applications", STD 64, RFC 3550, July 2003.
+
+ [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion
+ Control for Voice Traffic in the Internet", RFC 3714,
+ March 2004.
+
+ [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control
+ (TFRC): The Small-Packet (SP) Variant", RFC 4828, April
+ 2007.
+
+ [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
+ Friendly Rate Control (TFRC): Protocol Specification", RFC
+ 5348, September 2008.
+
+ [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K.
+ Ramakrishnan, "Adding Explicit Congestion Notification
+ (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, June
+ 2009.
+
+ [RFC5670] Eardley, P., "Metering and Marking Behaviour of PCN-
+ Nodes", RFC 5670, November 2009.
+
+ [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
+ Control", RFC 5681, September 2009.
+
+ [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding
+ Acknowledgement Congestion Control to TCP", RFC 5690,
+ February 2010.
+
+ [RFC6077] Papadimitriou, D., Welzl, M., Scharf, M., and B. Briscoe,
+ "Open Research Issues in Internet Congestion Control", RFC
+ 6077, February 2011.
+
+ [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P.,
+ and K. Carlberg, "Explicit Congestion Notification (ECN)
+ for RTP over UDP", RFC 6679, August 2012.
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 31]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ [RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion
+ Exposure (ConEx) Concepts and Use Cases", RFC 6789,
+ December 2012.
+
+ [Rate_fair_Dis]
+ Briscoe, B., "Flow Rate Fairness: Dismantling a Religion",
+ ACM CCR 37(2)63-74, April 2007,
+ <http://portal.acm.org/citation.cfm?id=1232926>.
+
+ [gentle_RED]
+ Floyd, S., "Recommendation on using the "gentle_" variant
+ of RED", Web page , March 2000,
+ <http://www.icir.org/floyd/red/gentle.html>.
+
+ [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End
+ Congestion Control", IEEE/ACM Transactions on Networking
+ 7(4) 458--472, August 1999, <http://ieeexplore.ieee.org/
+ xpls/abs_all.jsp?arnumber=793002>.
+
+ [pktByteEmail]
+ Floyd, S., "RED: Discussions of Byte and Packet Modes",
+ email, March 1997,
+ <http://ee.lbl.gov/floyd/REDaveraging.txt>.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 32]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+Appendix A. Survey of RED Implementation Status
+
+ This Appendix is informative, not normative.
+
+ In May 2007 a survey was conducted of 84 vendors to assess how widely
+ drop probability based on packet size has been implemented in RED
+ Table 3. About 19% of those surveyed replied, giving a sample size
+ of 16. Although in most cases we do not have permission to identify
+ the respondents, we can say that those that have responded include
+ most of the larger equipment vendors, covering a large fraction of
+ the market. The two who gave permission to be identified were Cisco
+ and Alcatel-Lucent. The others range across the large network
+ equipment vendors at L3 & L2, firewall vendors, wireless equipment
+ vendors, as well as large software businesses with a small selection
+ of networking products. All those who responded confirmed that they
+ have not implemented the variant of RED with drop dependent on packet
+ size (2 were fairly sure they had not but needed to check more
+ thoroughly). At the time the survey was conducted, Linux did not
+ implement RED with packet-size bias of drop, although we have not
+ investigated a wider range of open source code.
+
+ +-------------------------------+----------------+--------------+
+ | Response | No. of vendors | % of vendors |
+ +-------------------------------+----------------+--------------+
+ | Not implemented | 14 | 17% |
+ | Not implemented (probably) | 2 | 2% |
+ | Implemented | 0 | 0% |
+ | No response | 68 | 81% |
+ | Total companies/orgs surveyed | 84 | 100% |
+ +-------------------------------+----------------+--------------+
+
+ Table 3: Vendor Survey on byte-mode drop variant of RED (lower drop
+ probability for small packets)
+
+ Where reasons were given for why the byte-mode drop variant had not
+ been implemented, the extra complexity of packet-bias code was most
+ prevalent, though one vendor had a more principled reason for
+ avoiding it -- similar to the argument of this document.
+
+ Our survey was of vendor implementations, so we cannot be certain
+ about operator deployment. But we believe many queues in the
+ Internet are still tail drop. The company of one of the co-authors
+ (BT) has widely deployed RED; however, many tail-drop queues are
+ bound to still exist, particularly in access network equipment and on
+ middleboxes like firewalls, where RED is not always available.
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 33]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ Routers using a memory architecture based on fixed-size buffers with
+ borrowing may also still be prevalent in the Internet. As explained
+ in Section 4.2.1, these also provide a marginal (but legitimate) bias
+ towards small packets. So even though RED byte-mode drop is not
+ prevalent, it is likely there is still some bias towards small
+ packets in the Internet due to tail-drop and fixed-buffer borrowing.
+
+Appendix B. Sufficiency of Packet-Mode Drop
+
+ This Appendix is informative, not normative.
+
+ Here we check that packet-mode drop (or marking) in the network gives
+ sufficiently generic information for the transport layer to use. We
+ check against a 2x2 matrix of four scenarios that may occur now or in
+ the future (Table 4). Checking the two scenarios in each of the
+ horizontal and vertical dimensions tests the extremes of sensitivity
+ to packet size in the transport and in the network respectively.
+
+ Note that this section does not consider byte-mode drop at all.
+ Having deprecated byte-mode drop, the goal here is to check that
+ packet-mode drop will be sufficient in all cases.
+
+ +-------------------------------+-----------------+-----------------+
+ | Transport -> | a) Independent | b) Dependent on |
+ | ----------------------------- | of packet size | packet size of |
+ | Network | of congestion | congestion |
+ | | notifications | notifications |
+ +-------------------------------+-----------------+-----------------+
+ | 1) Predominantly bit- | Scenario a1) | Scenario b1) |
+ | congestible network | | |
+ | 2) Mix of bit-congestible and | Scenario a2) | Scenario b2) |
+ | pkt-congestible network | | |
+ +-------------------------------+-----------------+-----------------+
+
+ Table 4: Four Possible Congestion Scenarios
+
+ Appendix B.1 focuses on the horizontal dimension of Table 4 checking
+ that packet-mode drop (or marking) gives sufficient information,
+ whether or not the transport uses it -- scenarios b) and a)
+ respectively.
+
+ Appendix B.2 focuses on the vertical dimension of Table 4, checking
+ that packet-mode drop gives sufficient information to the transport
+ whether resources in the network are bit-congestible or packet-
+ congestible (these terms are defined in Section 1.1).
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 34]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ Notation: To be concrete, we will compare two flows with different
+ packet sizes, s_1 and s_2. As an example, we will take
+ s_1 = 60 B = 480 b and s_2 = 1,500 B = 12,000 b.
+
+ A flow's bit rate, x [bps], is related to its packet rate, u
+ [pps], by
+
+ x(t) = s*u(t).
+
+ In the bit-congestible case, path congestion will be denoted by
+ p_b, and in the packet-congestible case by p_p. When either case
+ is implied, the letter p alone will denote path congestion.
+
+B.1. Packet-Size (In)Dependence in Transports
+
+ In all cases, we consider a packet-mode drop queue that indicates
+ congestion by dropping (or marking) packets with probability p
+ irrespective of packet size. We use an example value of loss
+ (marking) probability, p=0.1%.
+
+ A transport like TCP as specified in RFC 5681 treats a congestion
+ notification on any packet whatever its size as one event. However,
+ a network with just the packet-mode drop algorithm gives more
+ information if the transport chooses to use it. We will use Table 5
+ to illustrate this.
+
+ We will set aside the last column until later. The columns labelled
+ 'Flow 1' and 'Flow 2' compare two flows consisting of 60 B and
+ 1,500 B packets respectively. The body of the table considers two
+ separate cases, one where the flows have an equal bit rate and the
+ other with equal packet rates. In both cases, the two flows fill a
+ 96 Mbps link. Therefore, in the equal bit rate case, they each have
+ half the bit rate (48Mbps). Whereas, with equal packet rates, Flow 1
+ uses 25 times smaller packets so it gets 25 times less bit rate -- it
+ only gets 1/(1+25) of the link capacity (96 Mbps / 26 = 4 Mbps after
+ rounding). In contrast Flow 2 gets 25 times more bit rate (92 Mbps)
+ in the equal packet rate case because its packets are 25 times
+ larger. The packet rate shown for each flow could easily be derived
+ once the bit rate was known by dividing the bit rate by packet size,
+ as shown in the column labelled 'Formula'.
+
+
+
+
+
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 35]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ Parameter Formula Flow 1 Flow 2 Combined
+ ----------------------- ----------- -------- -------- --------
+ Packet size s/8 60 B 1,500 B (Mix)
+ Packet size s 480 b 12,000 b (Mix)
+ Pkt loss probability p 0.1% 0.1% 0.1%
+
+ EQUAL BIT RATE CASE
+ Bit rate x 48 Mbps 48 Mbps 96 Mbps
+ Packet rate u = x/s 100 kpps 4 kpps 104 kpps
+ Absolute pkt-loss rate p*u 100 pps 4 pps 104 pps
+ Absolute bit-loss rate p*u*s 48 kbps 48 kbps 96 kbps
+ Ratio of lost/sent pkts p*u/u 0.1% 0.1% 0.1%
+ Ratio of lost/sent bits p*u*s/(u*s) 0.1% 0.1% 0.1%
+
+ EQUAL PACKET RATE CASE
+ Bit rate x 4 Mbps 92 Mbps 96 Mbps
+ Packet rate u = x/s 8 kpps 8 kpps 15 kpps
+ Absolute pkt-loss rate p*u 8 pps 8 pps 15 pps
+ Absolute bit-loss rate p*u*s 4 kbps 92 kbps 96 kbps
+ Ratio of lost/sent pkts p*u/u 0.1% 0.1% 0.1%
+ Ratio of lost/sent bits p*u*s/(u*s) 0.1% 0.1% 0.1%
+
+ Table 5: Absolute Loss Rates and Loss Ratios for Flows of Small and
+ Large Packets and Both Combined
+
+ So far, we have merely set up the scenarios. We now consider
+ congestion notification in the scenario. Two TCP flows with the same
+ round-trip time aim to equalise their packet-loss rates over time;
+ that is, the number of packets lost in a second, which is the packets
+ per second (u) multiplied by the probability that each one is dropped
+ (p). Thus, TCP converges on the case labelled 'Equal packet rate' in
+ the table, where both flows aim for the same absolute packet-loss
+ rate (both 8 pps in the table).
+
+ Packet-mode drop actually gives flows sufficient information to
+ measure their loss rate in bits per second, if they choose, not just
+ packets per second. Each flow can count the size of a lost or marked
+ packet and scale its rate response in proportion (as TFRC-SP does).
+ The result is shown in the row entitled 'Absolute bit-loss rate',
+ where the bits lost in a second is the packets per second (u)
+ multiplied by the probability of losing a packet (p) multiplied by
+ the packet size (s). Such an algorithm would try to remove any
+ imbalance in the bit-loss rate such as the wide disparity in the case
+ labelled 'Equal packet rate' (4k bps vs. 92 kbps). Instead, a
+ packet-size-dependent algorithm would aim for equal bit-loss rates,
+ which would drive both flows towards the case labelled 'Equal bit
+ rate', by driving them to equal bit-loss rates (both 48 kbps in this
+ example).
+
+
+
+Briscoe & Manner Best Current Practice [Page 36]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ The explanation so far has assumed that each flow consists of packets
+ of only one constant size. Nonetheless, it extends naturally to
+ flows with mixed packet sizes. In the right-most column of Table 5,
+ a flow of mixed-size packets is created simply by considering Flow 1
+ and Flow 2 as a single aggregated flow. There is no need for a flow
+ to maintain an average packet size. It is only necessary for the
+ transport to scale its response to each congestion indication by the
+ size of each individual lost (or marked) packet. Taking, for
+ example, the case labelled 'Equal packet rate', in one second about 8
+ small packets and 8 large packets are lost (making closer to 15 than
+ 16 losses per second due to rounding). If the transport multiplies
+ each loss by its size, in one second it responds to 8*480 and
+ 8*12,000 lost bits, adding up to 96,000 lost bits in a second. This
+ double checks correctly, being the same as 0.1% of the total bit rate
+ of 96 Mbps. For completeness, the formula for absolute bit-loss rate
+ is p(u1*s1+u2*s2).
+
+ Incidentally, a transport will always measure the loss probability
+ the same, irrespective of whether it measures in packets or in bytes.
+ In other words, the ratio of lost packets to sent packets will be the
+ same as the ratio of lost bytes to sent bytes. (This is why TCP's
+ bit rate is still proportional to packet size, even when byte
+ counting is used, as recommended for TCP in [RFC5681], mainly for
+ orthogonal security reasons.) This is intuitively obvious by
+ comparing two example flows; one with 60 B packets, the other with
+ 1,500 B packets. If both flows pass through a queue with drop
+ probability 0.1%, each flow will lose 1 in 1,000 packets. In the
+ stream of 60 B packets, the ratio of lost bytes to sent bytes will be
+ 60 B in every 60,000 B; and in the stream of 1,500 B packets, the
+ loss ratio will be 1,500 B out of 1,500,000 B. When the transport
+ responds to the ratio of lost to sent packets, it will measure the
+ same ratio whether it measures in packets or bytes: 0.1% in both
+ cases. The fact that this ratio is the same whether measured in
+ packets or bytes can be seen in Table 5, where the ratio of lost
+ packets to sent packets and the ratio of lost bytes to sent bytes is
+ always 0.1% in all cases (recall that the scenario was set up with
+ p=0.1%).
+
+ This discussion of how the ratio can be measured in packets or bytes
+ is only raised here to highlight that it is irrelevant to this memo!
+ Whether or not a transport depends on packet size depends on how this
+ ratio is used within the congestion control algorithm.
+
+ So far, we have shown that packet-mode drop passes sufficient
+ information to the transport layer so that the transport can take bit
+ congestion into account, by using the sizes of the packets that
+ indicate congestion. We have also shown that the transport can
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 37]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ choose not to take packet size into account if it wishes. We will
+ now consider whether the transport can know which to do.
+
+B.2. Bit-Congestible and Packet-Congestible Indications
+
+ As a thought-experiment, imagine an idealised congestion notification
+ protocol that supports both bit-congestible and packet-congestible
+ resources. It would require at least two ECN flags, one for each of
+ the bit-congestible and packet-congestible resources.
+
+ 1. A packet-congestible resource trying to code congestion level p_p
+ into a packet stream should mark the idealised 'packet
+ congestion' field in each packet with probability p_p
+ irrespective of the packet's size. The transport should then
+ take a packet with the packet congestion field marked to mean
+ just one mark, irrespective of the packet size.
+
+ 2. A bit-congestible resource trying to code time-varying byte-
+ congestion level p_b into a packet stream should mark the 'byte
+ congestion' field in each packet with probability p_b, again
+ irrespective of the packet's size. Unlike before, the transport
+ should take a packet with the byte congestion field marked to
+ count as a mark on each byte in the packet.
+
+ This hides a fundamental problem -- much more fundamental than
+ whether we can magically create header space for yet another ECN
+ flag, or whether it would work while being deployed incrementally.
+ Distinguishing drop from delivery naturally provides just one
+ implicit bit of congestion indication information -- the packet is
+ either dropped or not. It is hard to drop a packet in two ways that
+ are distinguishable remotely. This is a similar problem to that of
+ distinguishing wireless transmission losses from congestive losses.
+
+ This problem would not be solved, even if ECN were universally
+ deployed. A congestion notification protocol must survive a
+ transition from low levels of congestion to high. Marking two states
+ is feasible with explicit marking, but it is much harder if packets
+ are dropped. Also, it will not always be cost-effective to implement
+ AQM at every low-level resource, so drop will often have to suffice.
+
+ We are not saying two ECN fields will be needed (and we are not
+ saying that somehow a resource should be able to drop a packet in one
+ of two different ways so that the transport can distinguish which
+ sort of drop it was!). These two congestion notification channels
+ are a conceptual device to illustrate a dilemma we could face in the
+ future. Section 3 gives four good reasons why it would be a bad idea
+ to allow for packet size by biasing drop probability in favour of
+ small packets within the network. The impracticality of our thought
+
+
+
+Briscoe & Manner Best Current Practice [Page 38]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ experiment shows that it will be hard to give transports a practical
+ way to know whether or not to take into account the size of
+ congestion indication packets.
+
+ Fortunately, this dilemma is not pressing because by design most
+ equipment becomes bit-congested before its packet processing becomes
+ congested (as already outlined in Section 1.1). Therefore,
+ transports can be designed on the relatively sound assumption that a
+ congestion indication will usually imply bit congestion.
+
+ Nonetheless, although the above idealised protocol isn't intended for
+ implementation, we do want to emphasise that research is needed to
+ predict whether there are good reasons to believe that packet
+ congestion might become more common, and if so, to find a way to
+ somehow distinguish between bit and packet congestion [RFC3714].
+
+ Recently, the dual resource queue (DRQ) proposal [DRQ] has been made
+ on the premise that, as network processors become more cost-
+ effective, per-packet operations will become more complex
+ (irrespective of whether more function in the network is desirable).
+ Consequently the premise is that CPU congestion will become more
+ common. DRQ is a proposed modification to the RED algorithm that
+ folds both bit congestion and packet congestion into one signal
+ (either loss or ECN).
+
+ Finally, we note one further complication. Strictly, packet-
+ congestible resources are often cycle-congestible. For instance, for
+ routing lookups, load depends on the complexity of each lookup and
+ whether or not the pattern of arrivals is amenable to caching. This
+ also reminds us that any solution must not require a forwarding
+ engine to use excessive processor cycles in order to decide how to
+ say it has no spare processor cycles.
+
+Appendix C. Byte-Mode Drop Complicates Policing Congestion Response
+
+ This section is informative, not normative.
+
+ There are two main classes of approach to policing congestion
+ response: (i) policing at each bottleneck link or (ii) policing at
+ the edges of networks. Packet-mode drop in RED is compatible with
+ either, while byte-mode drop precludes edge policing.
+
+ The simplicity of an edge policer relies on one dropped or marked
+ packet being equivalent to another of the same size without having to
+ know which link the drop or mark occurred at. However, the byte-mode
+ drop algorithm has to depend on the local MTU of the line -- it needs
+ to use some concept of a 'normal' packet size. Therefore, one
+ dropped or marked packet from a byte-mode drop algorithm is not
+
+
+
+Briscoe & Manner Best Current Practice [Page 39]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+ necessarily equivalent to another from a different link. A policing
+ function local to the link can know the local MTU where the
+ congestion occurred. However, a policer at the edge of the network
+ cannot, at least not without a lot of complexity.
+
+ The early research proposals for type (i) policing at a bottleneck
+ link [pBox] used byte-mode drop, then detected flows that contributed
+ disproportionately to the number of packets dropped. However, with
+ no extra complexity, later proposals used packet-mode drop and looked
+ for flows that contributed a disproportionate amount of dropped bytes
+ [CHOKe_Var_Pkt].
+
+ Work is progressing on the Congestion Exposure (ConEx) protocol
+ [RFC6789], which enables a type (ii) edge policer located at a user's
+ attachment point. The idea is to be able to take an integrated view
+ of the effect of all a user's traffic on any link in the
+ internetwork. However, byte-mode drop would effectively preclude
+ such edge policing because of the MTU issue above.
+
+ Indeed, making drop probability depend on the size of the packets
+ that bits happen to be divided into would simply encourage the bits
+ to be divided into smaller packets in order to confuse policing. In
+ contrast, as long as a dropped/marked packet is taken to mean that
+ all the bytes in the packet are dropped/marked, a policer can remain
+ robust against sequences of bits being re-divided into different size
+ packets or across different size flows [Rate_fair_Dis].
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 40]
+
+RFC 7141 Byte and Packet Congestion Notification February 2014
+
+
+Authors' Addresses
+
+ Bob Briscoe
+ BT
+ B54/77, Adastral Park
+ Martlesham Heath
+ Ipswich IP5 3RE
+ UK
+
+ Phone: +44 1473 645196
+ EMail: bob.briscoe@bt.com
+ URI: http://bobbriscoe.net/
+
+ Jukka Manner
+ Aalto University
+ Department of Communications and Networking (Comnet)
+ P.O. Box 13000
+ FIN-00076 Aalto
+ Finland
+
+ Phone: +358 9 470 22481
+ EMail: jukka.manner@aalto.fi
+ URI: http://www.netlab.tkk.fi/~jmanner/
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Briscoe & Manner Best Current Practice [Page 41]
+