summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc3247.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc3247.txt')
-rw-r--r--doc/rfc/rfc3247.txt1347
1 files changed, 1347 insertions, 0 deletions
diff --git a/doc/rfc/rfc3247.txt b/doc/rfc/rfc3247.txt
new file mode 100644
index 0000000..42a4582
--- /dev/null
+++ b/doc/rfc/rfc3247.txt
@@ -0,0 +1,1347 @@
+
+
+
+
+
+
+Network Working Group A. Charny
+Request for Comments: 3247 Cisco Systems, Inc.
+Category: Informational J.C.R. Bennett
+ Motorola
+ K. Benson
+ Tellabs
+ J.Y. Le Boudec
+ EPFL
+ A. Chiu
+ Celion Networks
+ W. Courtney
+ TRW
+ S. Davari
+ PMC-Sierra
+ V. Firoiu
+ Nortel Networks
+ C. Kalmanek
+ AT&T Research
+ K.K. Ramakrishnan
+ TeraOptic Networks
+ March 2002
+
+
+ Supplemental Information for the New Definition
+ of the EF PHB (Expedited Forwarding Per-Hop Behavior)
+
+Status of this Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2001). All Rights Reserved.
+
+Abstract
+
+ This document was written during the process of clarification of
+ RFC2598 "An Expedited Forwarding PHB" that led to the publication of
+ revised specification of EF "An Expedited Forwarding PHB". Its
+ primary motivation is providing additional explanation to the revised
+ EF definition and its properties. The document also provides
+ additional implementation examples and gives some guidance for
+ computation of the numerical parameters of the new definition for
+ several well known schedulers and router architectures.
+
+
+
+
+
+Charny, et. al. Informational [Page 1]
+
+RFC 3247 Supplemental Information March 2002
+
+
+Table of Contents
+
+ 1 Introduction ........................................... 2
+ 2 Definition of EF PHB ................................... 3
+ 2.1 The formal definition .................................. 3
+ 2.2 Relation to Packet Scale Rate Guarantee ................ 6
+ 2.3 The need for dual characterization of EF PHB ........... 7
+ 3 Per Packet delay ....................................... 9
+ 3.1 Single hop delay bound ................................. 9
+ 3.2 Multi-hop worst case delay ............................. 10
+ 4 Packet loss ............................................ 10
+ 5 Implementation considerations .......................... 11
+ 5.1 The output buffered model with EF FIFO at the output. .. 12
+ 5.1.1 Strict Non-preemptive Priority Queue ................... 12
+ 5.1.2 WF2Q ................................................... 13
+ 5.1.3 Deficit Round Robin (DRR) .............................. 13
+ 5.1.4 Start-Time Fair Queuing and Self-Clocked Fair Queuing .. 13
+ 5.2 Router with Internal Delay and EF FIFO at the output ... 13
+ 6 Security Considerations ................................ 14
+ 7 References ............................................. 14
+ Appendix A. Difficulties with the RFC 2598 EF PHB Definition .. 16
+ Appendix B. Alternative Characterization of Packet Scale Rate
+ Guarantee ......................................... 20
+ Acknowledgements .............................................. 22
+ Authors' Addresses ............................................ 22
+ Full Copyright Statement ...................................... 24
+
+1. Introduction
+
+ The Expedited Forwarding (EF) Per-Hop Behavior (PHB) was designed to
+ be used to build a low-loss, low-latency, low-jitter, assured
+ bandwidth service. The potential benefits of this service, and
+ therefore the EF PHB, are enormous. Because of the great value of
+ this PHB, it is critical that the forwarding behavior required of and
+ delivered by an EF-compliant node be specific, quantifiable, and
+ unambiguous.
+
+ Unfortunately, the definition of EF PHB in the original RFC2598 [10]
+ was not sufficiently precise (see Appendix A and [4]). A more
+ precise definition is given in [6]. This document is intended to aid
+ in the understanding of the properties of the new definition and
+ provide supplemental information not included in the text of [6] for
+ sake of brevity.
+
+ This document is outlined as follows. In section 2, we briefly
+ restate the definition for EF PHB of [6]. We then provide some
+ additional discussion of this definition and describe some of its
+ properties. We discuss the issues associated with per-packet delay
+
+
+
+Charny, et. al. Informational [Page 2]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ and loss in sections 3 and 4. In section 5 we discuss the impact of
+ known scheduling architectures on the critical parameters of the new
+ definition. We also discuss the impact of deviation of real devices
+ from the ideal output-buffered model on the magnitude of the critical
+ parameters in the definition.
+
+2. Definition of EF PHB
+
+2.1. The formal definition
+
+ An intuitive explanation of the new EF definition is described in
+ [6]. Here we restate the formal definition from [6] verbatim.
+
+ A node that supports EF on an interface I at some configured rate R
+ MUST satisfy the following equations:
+
+ d_j <= f_j + E_a for all j>0 (eq_1)
+
+ where f_j is defined iteratively by
+
+ f_0 = 0, d_0 = 0
+
+ f_j = max(a_j, min(d_j-1, f_j-1)) + l_j/R, for all j > 0 (eq_2)
+
+ In this definition:
+
+ - d_j is the time that the last bit of the j-th EF packet to
+ depart actually leaves the node from the interface I.
+
+ - f_j is the target departure time for the j-th EF packet to
+ depart from I, the "ideal" time at or before which the last bit
+ of that packet should leave the node.
+
+ - a_j is the time that the last bit of the j-th EF packet
+ destined to the output I actually arrives at the node.
+
+ - l_j is the size (bits) of the j-th EF packet to depart from I.
+ l_j is measured on the IP datagram (IP header plus payload) and
+ does not include any lower layer (e.g. MAC layer) overhead.
+
+ - R is the EF configured rate at output I (in bits/second).
+
+ - E_a is the error term for the treatment of the EF aggregate.
+ Note that E_a represents the worst case deviation between
+ actual departure time of an EF packet and ideal departure time
+ of the same packet, i.e. E_a provides an upper bound on (d_j -
+ f_j) for all j.
+
+
+
+
+Charny, et. al. Informational [Page 3]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ - d_0 and f_0 do not refer to a real packet departure but are
+ used purely for the purposes of the recursion. The time origin
+ should be chosen such that no EF packets are in the system at
+ time 0.
+
+ - for the definitions of a_j and d_j, the "last bit" of the
+ packet includes the layer 2 trailer if present, because a
+ packet cannot generally be considered available for forwarding
+ until such a trailer has been received.
+
+ An EF-compliant node MUST be able to be characterized by the range of
+ possible R values that it can support on each of its interfaces while
+ conforming to these equations, and the value of E_a that can be met
+ on each interface. R may be line rate or less. E_a MAY be specified
+ as a worst-case value for all possible R values or MAY be expressed
+ as a function of R.
+
+ Note also that, since a node may have multiple inputs and complex
+ internal scheduling, the j-th EF packet to arrive at the node
+ destined for a certain interface may not be the j-th EF packet to
+ depart from that interface. It is in this sense that eq_1 and eq_2
+ are unaware of packet identity.
+
+ In addition, a node that supports EF on an interface I at some
+ configured rate R MUST satisfy the following equations:
+
+ D_j <= F_j + E_p for all j>0 (eq_3)
+
+ where F_j is defined iteratively by
+
+ F_0 = 0, D_0 = 0
+
+ F_j = max(A_j, min(D_j-1, F_j-1)) + L_j/R, for all j > 0 (eq_4)
+
+ In this definition:
+
+ - D_j is the actual departure time of the individual EF packet
+ that arrived at the node destined for interface I at time A_j,
+ i.e., given a packet which was the j-th EF packet destined for
+ I to arrive at the node via any input, D_j is the time at which
+ the last bit of that individual packet actually leaves the node
+ from the interface I.
+
+ - F_j is the target departure time for the individual EF packet
+ that arrived at the node destined for interface I at time A_j.
+
+
+
+
+
+
+Charny, et. al. Informational [Page 4]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ - A_j is the time that the last bit of the j-th EF packet
+ destined to the output I to arrive actually arrives at the
+ node.
+
+ - L_j is the size (bits) of the j-th EF packet to arrive at the
+ node that is destined to output I. L_j is measured on the IP
+ datagram (IP header plus payload) and does not include any
+ lower layer (e.g. MAC layer) overhead.
+
+ - R is the EF configured rate at output I (in bits/second).
+
+ - E_p is the error term for the treatment of individual EF
+ packets. Note that E_p represents the worst case deviation
+ between the actual departure time of an EF packet and the ideal
+ departure time of the same packet, i.e. E_p provides an upper
+ bound on (D_j - F_j) for all j.
+
+ - D_0 and F_0 do not refer to a real packet departure but are
+ used purely for the purposes of the recursion. The time origin
+ should be chosen such that no EF packets are in the system at
+ time 0.
+
+ - for the definitions of A_j and D_j, the "last bit" of the
+ packet includes the layer 2 trailer if present, because a
+ packet cannot generally be considered available for forwarding
+ until such a trailer has been received.
+
+ It is the fact that D_j and F_j refer to departure times for the j-th
+ packet to arrive that makes eq_3 and eq_4 aware of packet identity.
+ This is the critical distinction between the last two equations and
+ the first two.
+
+ An EF-compliant node SHOULD be able to be characterized by the range
+ of possible R values that it can support on each of its interfaces
+ while conforming to these equations, and the value of E_p that can be
+ met on each interface. E_p MAY be specified as a worst-case value
+ for all possible R values or MAY be expressed as a function of R. An
+ E_p value of "undefined" MAY be specified.
+
+ Finally, there is an additional recommendation in [6] that an EF
+ compliant node SHOULD NOT reorder packets within a micorflow.
+
+ The definitions described in this section are referred to as
+ aggregate and packet-identity-aware packet scale rate guarantee
+ [4],[2]. An alternative mathematical characterization of packet
+ scale rate guarantee is given in Appendix B.
+
+
+
+
+
+Charny, et. al. Informational [Page 5]
+
+RFC 3247 Supplemental Information March 2002
+
+
+2.2. Relation to Packet Scale Rate Guarantee
+
+ Consider the case of an ideal output-buffered device with an EF FIFO
+ at the output. For such a device, the i-th packet to arrive to the
+ device is also the i-th packet to depart from the device. Therefore,
+ in this ideal model the aggregate behavior and packet-identity-aware
+ characteristics are identical, and E_a = E_p. In this section we
+ therefore omit the subscript and refer to the latency term simply as
+ E.
+
+ It could be shown that for such an ideal device the definition of
+ section 2.1 is stronger than the well-known rate-latency curve [2] in
+ the sense that if a scheduler satisfies the EF definition it also
+ satisfies the rate-latency curve. As a result, all the properties
+ known for the rate-latency curve also apply to the modified EF
+ definition. However, we argue below that the definition of section
+ 2.1 is more suitable to reflect the intent of EF PHB than the rate-
+ latency curve.
+
+ It is shown in [2] that the rate-latency curve is equivalent to the
+ following definition:
+
+ Definition of Rate Latency Curve (RLC):
+
+ D(j) <= F'(j) + E (eq_5)
+
+ where
+
+ F'(0)=0, F'(j)=max(a(j), F'(j-1))+ L(j)/R for all j>0 (eq_6)
+
+ It can be easily verified that the EF definition of section 2.1 is
+ stronger than RLC by noticing that for all j, F'(j) >= F(j).
+
+ It is easy to see that F'(j) in the definition of RLC corresponds to
+ the time the j-th departure should have occurred should the EF
+ aggregate be constantly served exactly at its configured rate R.
+ Following the common convention, we refer to F'(j) as the "fluid
+ finish time" of the j-th packet to depart.
+
+ The intuitive meaning of the rate-latency curve of RLC is that any
+ packet is served at most time E later than this packet would finish
+ service in the fluid model.
+
+ For RLC (and hence for the stronger EF definition) it holds that in
+ any interval (0,t) the EF aggregate gets close to the desired service
+ rate R (as long as there is enough traffic to sustain this rate).
+ The discrepancy between the ideal and the actual service in this
+ interval depends on the latency term E, which in turn depends on the
+
+
+
+Charny, et. al. Informational [Page 6]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ scheduling implementation. The smaller E, the smaller the difference
+ between the configured rate and the actual rate achieved by the
+ scheduler.
+
+ While RLC guarantees the desired rate to the EF aggregate in all
+ intervals (0,t) to within a specified error, it may nevertheless
+ result in large gaps in service. For example, suppose that (a large
+ number) N of identical EF packets of length L arrived from different
+ interfaces to the EF queue in the absence of any non-EF traffic.
+ Then any work-conserving scheduler will serve all N packets at link
+ speed. When the last packet is sent at time NL/C, where C is the
+ capacity of output link, F'(N) will be equal to NL/R. That is, the
+ scheduler is running ahead of ideal, since NL/C < NL/R for R < C.
+ Suppose now that at time NL/C a large number of non-EF packets
+ arrive, followed by a single EF packet. Then the scheduler can
+ legitimately delay starting to send the EF packet until time
+ F'(N+1)=(N+1)L/R + E - L/C. This means that the EF aggregate will
+ have no service at all in the interval (NL/C, (N+1)L/R + E - L/C).
+ This interval can be quite large if R is substantially smaller than
+ C. In essence, the EF aggregate can be "punished" by a gap in
+ service for receiving faster service than its configured rate at the
+ beginning.
+
+ The new EF definition alleviates this problem by introducing the term
+ min(D(j-1), F(j-1)) in the recursion. Essentially, this means that
+ the fluid finishing time is "reset" if that packet is sent before its
+ "ideal" departure time. As a consequence of that, for the case where
+ the EF aggregate is served in the FIFO order, suppose a packet
+ arrives at time t to a server satisfying the EF definition. The
+ packet will be transmitted no later than time t + Q(t)/R + E, where
+ Q(t) is the EF queue size at time t (including the packet under
+ discussion)[4].
+
+2.3. The need for dual characterization of EF PHB
+
+ In a more general case, where either the output scheduler does not
+ serve the EF packets in a FIFO order, or the variable internal delay
+ in the device reorders packets while delivering them to the output
+ (or both), the i-th packet destined to a given output interface to
+ arrive to the device may no longer be the i-th packet to depart from
+ that interface. In that case the packet-identity-aware and the
+ aggregate definitions are no longer identical.
+
+ The aggregate behavior definition can be viewed as a truly aggregate
+ characteristic of the service provided to EF packets. For an
+ analogy, consider a dark reservoir to which all arriving packets are
+ placed. A scheduler is allowed to pick a packet from the reservoir
+ in a random order, without any knowledge of the order of packet
+
+
+
+Charny, et. al. Informational [Page 7]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ arrivals. The aggregate part of the definition measures the accuracy
+ of the output rate provided to the EF aggregate as a whole. The
+ smaller E_a, the more accurate is the assurance that the reservoir is
+ drained at least at the configured rate.
+
+ Note that in this reservoir analogy packets of EF aggregate may be
+ arbitrarily reordered. However, the definition of EF PHB given in
+ [6] explicitly requires that no packet reordering occur within a
+ microflow. This requirement restricts the scheduling
+ implementations, or, in the reservoir analogy, the order of pulling
+ packets out of the reservoir to make sure that packets within a
+ microflow are not reordered, but it still allows reordering at the
+ aggregate level.
+
+ Note that reordering within the aggregate, as long as there is no
+ flow-level reordering, does not necessarily reflect a "bad" service.
+ Consider for example a scheduler that arbitrates among 10 different
+ EF "flows" with diverse rates. A scheduler that is aware of the rate
+ requirements may choose to send a packet of the faster flow before a
+ packet of the slower flow to maintain lower jitter at the flow level.
+ In particular, an ideal "flow"-aware WFQ scheduler will cause
+ reordering within the aggregate, while maintaining packet ordering
+ and small jitter at the flow level.
+
+ It is intuitively clear that for such a scheduler, as well as for a
+ simpler FIFO scheduler, the "accuracy" of the service rate is crucial
+ for minimizing "flow"-level jitter. The packet-identity-aware
+ definition quantifies this accuracy of the service rate.
+
+ However, the small value of E_a does not give any assurances about
+ the absolute value of per-packet delay. In fact, if the input rate
+ exceeds the configured rate, the aggregate behavior definition may
+ result in arbitrarily large delay of a subset of packets. This is
+ the primary motivation for the packet-identity-aware definition.
+
+ The primary goal of the packet-aware characterization of the EF
+ implementation is that, unlike the aggregate behavior
+ characterization, it provides a way to find a per-packet delay bound
+ as a function of input traffic parameters.
+
+ While the aggregate behavior definition characterizes the accuracy of
+ the service rate of the entire EF aggregate, the packet-identity-
+ aware part of the definition characterizes the deviation of the
+ device from an ideal server that serves the EF aggregate in FIFO
+ order at least at the configured rate.
+
+ The value of E_p in the packet-identity-aware definition is therefore
+ affected by two factors: the accuracy of the aggregate rate service
+
+
+
+Charny, et. al. Informational [Page 8]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ and the degree of packet reordering within the EF aggregate (under
+ the constraint that packets within the same microflow are not
+ reordered). Therefore, a sub-aggregate aware device that provides an
+ ideal service rate to the aggregate, and also provides an ideal rate
+ service for each of the sub-aggregates, may nevertheless have a very
+ large value of E_p (in this case E_p must be at least equal to the
+ ratio of the maximum packet size divided by the smallest rate of any
+ sub aggregate). As a result, a large value of E_p does not
+ necessarily mean that the service provided to EF aggregate is bad -
+ rather it may be an indication that the service is good, but non-
+ FIFO. On the other hand, a large value of E_p may also mean that the
+ aggregate service is very inaccurate (bursty), and hence in this case
+ the large value of E_p reflects a poor quality of implementation.
+
+ As a result, a large number of E_p does not necessarily provide any
+ guidance on the quality of the EF implementation. However, a small
+ value of E_p does indicate a high quality FIFO implementation.
+
+ Since E_p and E_a relate to different aspects of the EF
+ implementation, they should be considered together to determine the
+ quality of the implementation.
+
+3. Per Packet delay
+
+ The primary motivation for the packet-identity-aware definition is
+ that it allows quantification of the per-packet delay bound. This
+ section discusses the issues with computing per-packet delay.
+
+3.1. Single hop delay bound
+
+ If the total traffic arriving to an output port I from all inputs is
+ constrained by a leaky bucket with parameters (R, B), where R is the
+ configured rate at I, and B is the bucket depth (burst), then the
+ delay of any packet departing from I is bounded by D_p, given by
+
+ D_p = B/R + E_p (eq_7)
+
+ (see appendix B).
+
+ Because the delay bound depends on the configured rate R and the
+ input burstiness B, it is desirable for both of these parameters to
+ be visible to a user of the device. A PDB desiring a particular
+ delay bound may need to limit the range of configured rates and
+ allowed burstiness that it can support in order to deliver such
+ bound. Equation (eq_7) provides a means for determining an
+ acceptable operating region for the device with a given E_p. It may
+ also be useful to limit the total offered load to a given output to
+ some rate R_1 < R (e.g. to obtain end-to-end delay bounds [5]). It
+
+
+
+Charny, et. al. Informational [Page 9]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ is important to realize that, while R_1 may also be a configurable
+ parameter of the device, the delay bound in (eq_7) does not depend on
+ it. It may be possible to get better bounds explicitly using the
+ bound on the input rate, but the bound (eq_7) does not take advantage
+ of this information.
+
+3.2. Multi-hop worst case delay
+
+ Although the PHB defines inherently local behavior, in this section
+ we briefly discuss the issue of per-packet delay as the packet
+ traverses several hops implementing EF PHB. Given a delay bound
+ (eq_7) at a single hop, it is tempting to conclude that per-packet
+ bound across h hops is simply h times the bound (eq_7). However,
+ this is not necessarily the case, unless B represents the worst case
+ input burstiness across all nodes in the network.
+
+ Unfortunately, obtaining such a worst case value of B is not trivial.
+ If EF PHB is implemented using aggregate class-based scheduling where
+ all EF packets share a single FIFO, the effect of jitter accumulation
+ may result in an increase in burstiness from hop to hop. In
+ particular, it can be shown that unless severe restrictions on EF
+ utilization are imposed, even if all EF flows are ideally shaped at
+ the ingress, then for any value of delay D it is possible to
+ construct a network where EF utilization on any link is bounded not
+ to exceed a given factor, no flow traverses more than a specified
+ number of hops, but there exists a packet that experiences a delay
+ more than D [5]. This result implies that the ability to limit the
+ worst case burstiness and the resulting end-to-end delay across
+ several hops may require not only limiting EF utilization on all
+ links, but also constraining the global network topology. Such
+ topology constraints would need to be specified in the definition of
+ any PDB built on top of EF PHB, if such PDB requires a strict worst
+ case delay bound.
+
+4. Packet loss
+
+ Any device with finite buffering may need to drop packets if the
+ input burstiness becomes sufficiently high. To meet the low loss
+ objective of EF, a node may be characterized by the operating region
+ in which loss of EF due to congestion will not occur. This may be
+ specified as a token bucket of rate r <= R and burst size B that can
+ be offered from all inputs to a given output interface without loss.
+
+ However, as discussed in the previous section, the phenomenon of
+ jitter accumulation makes it generally difficult to guarantee that
+ the input burstiness never exceeds the specified operating region for
+ nodes internal to the DiffServ domain. A no-loss guarantee across
+ multiple hops may require specification of constraints on network
+
+
+
+Charny, et. al. Informational [Page 10]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ topology which are outside the scope of inherently local definition
+ of a PHB. Thus, it must be possible to establish whether a device
+ conforms to the EF definition even when some packets are lost.
+
+ This can be done by performing an "off-line" test of conformance to
+ equations (eq_1)- (eq_4). After observing a sequence of packets
+ entering and leaving the node, the packets which did not leave are
+ assumed lost and are notionally removed from the input stream. The
+ remaining packets now constitute the arrival stream and the packets
+ which left the node constitute the departure stream. Conformance to
+ the equations can thus be verified by considering only those packets
+ that successfully passed through the node.
+
+ Note that specification of which packets are lost in the case when
+ loss does occur is beyond the scope of the definition of EF PHB.
+ However, those packets that were not lost must conform to the
+ equations definition of EF PHB in section 2.1.
+
+5. Implementation considerations
+
+ A packet passing through a router will experience delay for a number
+ of reasons. Two familiar components of this delay are the time the
+ packet spends in a buffer at an outgoing link waiting for the
+ scheduler to select it and the time it takes to actually transmit the
+ packet on the outgoing line.
+
+ There may be other components of a packet's delay through a router,
+ however. A router might have to do some amount of header processing
+ before the packet can be given to the correct output scheduler, for
+ example. In another case a router may have a FIFO buffer (called a
+ transmission queue in [7]) where the packet sits after being selected
+ by the output scheduler but before it is transmitted. In cases such
+ as these, the extra delay a packet may experience can be accounted
+ for by absorbing it into the latency terms E_a and E_p.
+
+ Implementing EF on a router with a multi-stage switch fabric requires
+ special attention. A packet may experience additional delays due to
+ the fact that it must compete with other traffic for forwarding
+ resources at multiple contention points in the switch core. The
+ delay an EF packet may experience before it even reaches the output-
+ link scheduler should be included in the latency term. Input-
+ buffered and input/output-buffered routers based on crossbar design
+ may also require modification of their latency terms. The factors
+ such as the speedup factor and the choice of crossbar arbitration
+ algorithms may affect the latency terms substantially.
+
+
+
+
+
+
+Charny, et. al. Informational [Page 11]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ Delay in the switch core comes from two sources, both of which must
+ be considered. The first part of this delay is the fixed delay a
+ packet experiences regardless of the other traffic. This component
+ of the delay includes the time it takes for things such as packet
+ segmentation and reassembly in cell based cores, enqueueing and
+ dequeuing at each stage, and transmission between stages. The second
+ part of the switch core delay is variable and depends on the type and
+ amount of other traffic traversing the core. This delay comes about
+ if the stages in the core mix traffic flowing between different
+ input/output port pairs. Thus, EF packets must compete against other
+ traffic for forwarding resources in the core. Some of this
+ competing traffic may even be traffic from other, non-EF aggregates.
+ This introduces extra delay, that can also be absorbed by the latency
+ term in the definition.
+
+ To capture these considerations, in this section we will consider two
+ simplified implementation examples. The first is an ideal output
+ buffered node where packets entering the device from an input
+ interface are immediately delivered to the output scheduler. In this
+ model the properties of the output scheduler fully define the values
+ of the parameters E_a and E_p. We will consider the case where the
+ output scheduler implements aggregate class-based queuing, so that
+ all EF packets share a single queue. We will discuss the values of
+ E_a and E_p for a variety of class-based schedulers widely
+ considered.
+
+ The second example will consider a router modeled as a black box with
+ a known bound on the variable delay a packet can experience from the
+ time it arrives to an input to the time it is delivered to its
+ destination output. The output scheduler in isolation is assumed to
+ be an aggregate scheduler where all EF packets share a single FIFO
+ queue, with a known value of E_a(S)=E_p(S)=E(S). This model provides
+ a reasonable abstraction to a large class of router implementations.
+
+5.1. The output buffered model with EF FIFO at the output.
+
+ As has been mentioned earlier, in this model E_a = E_p, so we shall
+ omit the subscript and refer to both terms as latency E. The
+ remainder of this subsection discusses E for a number of scheduling
+ implementations.
+
+5.1.1. Strict Non-preemptive Priority Queue
+
+ A Strict Priority scheduler in which all EF packets share a single
+ FIFO queue which is served at strict non-preemptive priority over
+ other queues satisfies the EF definition with the latency term E =
+ MTU/C where MTU is the maximum packet size and C is the speed of the
+ output link.
+
+
+
+Charny, et. al. Informational [Page 12]
+
+RFC 3247 Supplemental Information March 2002
+
+
+5.1.2. WF2Q
+
+ Another scheduler that satisfies the EF definition with a small
+ latency term is WF2Q described in [1]. A class-based WF2Q scheduler,
+ in which all EF traffic shares a single queue with the weight
+ corresponding to the configured rate of the EF aggregate satisfies
+ the EF definition with the latency term E = MTU/C+MTU/R.
+
+5.1.3. Deficit Round Robin (DRR)
+
+ For DRR [12], E can be shown to grow linearly with
+ N*(r_max/r_min)*MTU, where r_min and r_max denote the smallest and
+ the largest rate among the rate assignments of all queues in the
+ scheduler, and N is the number of queues in the scheduler.
+
+5.1.4. Start-Time Fair Queuing and Self-Clocked Fair Queuing
+
+ For Start-Time Fair Queuing (SFQ) [9] and Self-Clocked Fair Queuing
+ (SCFQ) [8] E can be shown to grow linearly with the number of queues
+ in the scheduler.
+
+5.2. Router with Internal Delay and EF FIFO at the output
+
+ In this section we consider a router which is modeled as follows. A
+ packet entering the router may experience a variable delay D_v with a
+ known upper bound D. That is, 0<=D_v<=D. At the output all EF
+ packets share a single class queue. Class queues are scheduled by a
+ scheduler with a known value E_p(S)=E(S) (where E(S) corresponds to
+ the model where this scheduler is implemented in an ideal output
+ buffered device).
+
+ The computation of E_p is more complicated in this case. For such
+ device, it can be shown that E_p = E(S)+2D+2B/R (see [13]).
+
+ Recall from the discussion of section 3 that bounding input
+ burstiness B may not be easy in a general topology. In the absence
+ of the knowledge of a bound on B one can bound E_p as E_p = E(S) +
+ D*C_inp/R (see [13]).
+
+ Note also that the bounds in this section are derived using only the
+ bound on the variable portion of the interval delay and the error
+ bound of the output scheduler. If more details about the
+ architecture of a device are available, it may be possible to compute
+ better bounds.
+
+
+
+
+
+
+
+Charny, et. al. Informational [Page 13]
+
+RFC 3247 Supplemental Information March 2002
+
+
+6. Security Considerations
+
+ This informational document provides additional information to aid in
+ understanding of the EF PHB described in [6]. It adds no new
+ functions to it. As a result, it adds no security issues to those
+ described in that specification.
+
+7. References
+
+ [1] J.C.R. Bennett and H. Zhang, "WF2Q: Worst-case Fair Weighted
+ Fair Queuing", INFOCOM'96, March 1996.
+
+ [2] J.-Y. Le Boudec, P. Thiran, "Network Calculus", Springer Verlag
+ Lecture Notes in Computer Science volume 2050, June 2001
+ (available online at http://lcawww.epfl.ch).
+
+ [3] Bradner, S., "Key Words for Use in RFCs to Indicate Requirement
+ Levels", BCP 14, RFC 2119, March 1997.
+
+ [4] J.C.R. Bennett, K. Benson, A. Charny, W. Courtney, J.Y. Le
+ Boudec, "Delay Jitter Bounds and Packet Scale Rate Guarantee
+ for Expedited Forwarding", Proc. Infocom 2001, April 2001.
+
+ [5] A. Charny, J.-Y. Le Boudec "Delay Bounds in a Network with
+ Aggregate Scheduling". Proc. of QoFIS'2000, September 25-26,
+ 2000, Berlin, Germany.
+
+ [6] Davie, B., Charny, A., Baker, F., Bennett, J.C.R., Benson, K.,
+ Boudec, J., Chiu, A., Courtney, W., Davari, S., Firoiu, V.,
+ Kalmanek, C., Ramakrishnan, K.K. and D. Stiliadis, "An
+ Expedited Forwarding PHB (Per-Hop Behavior)", RFC 3246, March
+ 2002.
+
+ [7] T. Ferrari and P. F. Chimento, "A Measurement-Based Analysis of
+ Expedited Forwarding PHB Mechanisms," Eighth International
+ Workshop on Quality of Service, Pittsburgh, PA, June 2000.
+
+ [8] S.J. Golestani. "A Self-clocked Fair Queuing Scheme for Broad-
+ band Applications". In Proceedings of IEEE INFOCOM'94, pages
+ 636-646, Toronto, CA, April 1994.
+
+ [9] P. Goyal, H.M. Vin, and H. Chen. "Start-time Fair Queuing: A
+ Scheduling Algorithm for Integrated Services". In Proceedings
+ of the ACM-SIGCOMM 96, pages 157-168, Palo Alto, CA, August
+ 1996.
+
+ [10] Jacobson, V., Nichols, K. and K. Poduri, "An Expedited
+ Forwarding PHB", RFC 2598, June 1999.
+
+
+
+Charny, et. al. Informational [Page 14]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ [11] Jacobson, V., Nichols, K. and K. Poduri, "The 'Virtual Wire'
+ Behavior Aggregate", Work in Progress.
+
+ [12] M. Shreedhar and G. Varghese. "Efficient Fair Queuing Using
+ Deficit Round Robin". In Proceedings of SIGCOMM'95, pages
+ 231-243, Boston, MA, September 1995.
+
+ [13] Le Boudec, J.-Y., Charny, A. "Packet Scale Rate Guarantee for
+ non-FIFO Nodes", Infocom 2002, New York, June 2002.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Charny, et. al. Informational [Page 15]
+
+RFC 3247 Supplemental Information March 2002
+
+
+Appendix A. Difficulties with the RFC 2598 EF PHB Definition
+
+ The definition of the EF PHB as given in [10] states:
+
+ "The EF PHB is defined as a forwarding treatment for a particular
+ diffserv aggregate where the departure rate of the aggregate's
+ packets from any diffserv node must equal or exceed a configurable
+ rate. The EF traffic SHOULD receive this rate independent of the
+ intensity of any other traffic attempting to transit the node. It
+ [the EF PHB departure rate] SHOULD average at least the configured
+ rate when measured over any time interval equal to or longer than the
+ time it takes to send an output link MTU sized packet at the
+ configured rate."
+
+ A literal interpretation of the definition would consider the
+ behaviors given in the next two subsections as non-compliant. The
+ definition also unnecessarily constrains the maximum configurable
+ rate of an EF aggregate.
+
+A.1 Perfectly-Clocked Forwarding
+
+ Consider the following stream forwarded from a router with EF-
+ configured rate R=C/2, where C is the output line rate. In the
+ illustration, E is an MTU-sized EF packet while x is a non-EF packet
+ or unused capacity, also of size MTU.
+
+ E x E x E x E x E x E x...
+ |-----|
+
+ The interval between the vertical bars is 3*MTU/C, which is greater
+ than MTU/(C/2), and so is subject to the EF PHB definition. During
+ this interval, 3*MTU/2 bits of the EF aggregate should be forwarded,
+ but only MTU bits are forwarded. Therefore, while this forwarding
+ pattern should be considered compliant under any reasonable
+ interpretation of the EF PHB, it actually does not formally comply
+ with the definition of RFC 2598.
+
+ Note that this forwarding pattern can occur in any work-conserving
+ scheduler in an ideal output-buffered architecture where EF packets
+ arrive in a perfectly clocked manner according to the above pattern
+ and are forwarded according to exactly the same pattern in the
+ absence of any non-EF traffic.
+
+ Trivial as this example may be, it reveals the lack of mathematical
+ precision in the formal definition. The fact that no work-conserving
+ scheduler can formally comply with the definition is unfortunate, and
+ appears to warrant some changes to the definition that would correct
+ this problem.
+
+
+
+Charny, et. al. Informational [Page 16]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ The underlying reason for the problem described here is quite simple
+ - one can only expect that the EF aggregate is served at configured
+ rate in some interval where there is enough backlog of EF packets to
+ sustain that rate. In the example above the packets come in exactly
+ at the rate at which they are served, and so there is no persistent
+ backlog. Certainly, if the input rate is even smaller than the
+ configured rate of the EF aggregate, there will be no backlog as
+ well, and a similar formal difficulty will occur.
+
+ A seemingly simple solution to this difficulty might be to require
+ that the EF aggregate is served at its configured rate only when the
+ queue is backlogged. However, as we show in the remainder of this
+ section, this solution does not suffice.
+
+A.2 Router Internal Delay
+
+ We now argue that the example considered in the previous section is
+ not as trivial as it may seem at first glance.
+
+ Consider a router with EF configured rate R = C/2 as in the previous
+ example, but with an internal delay of 3T (where T = MTU/C) between
+ the time that a packet arrives at the router and the time that it is
+ first eligible for forwarding at the output link. Such things as
+ header processing, route look-up, and delay in switching through a
+ multi-layer fabric could cause this delay. Now suppose that EF
+ traffic arrives regularly at a rate of (2/3)R = C/3. The router will
+ perform as shown below.
+
+ EF Packet Number 1 2 3 4 5 6 ...
+
+ Arrival (at router) 0 3T 6T 9T 12T 15T ...
+
+ Arrival (at scheduler) 3T 6T 9T 12T 15T 18T ...
+
+ Departure 4T 7T 10T 13T 16T 19T ...
+
+ Again, the output does not satisfy the RFC 2598 definition of EF PHB.
+ As in the previous example, the underlying reason for this problem is
+ that the scheduler cannot forward EF traffic faster than it arrives.
+ However, it can be easily seen that the existence of internal delay
+ causes one packet to be inside the router at all times. An external
+ observer will rightfully conclude that the number of EF packets that
+ arrived to the router is always at least one greater than the number
+ of EF packets that left the router, and therefore the EF aggregate is
+ constantly backlogged. However, while the EF aggregate is
+ continuously backlogged, the observed output rate is nevertheless
+ strictly less that the configured rate.
+
+
+
+
+Charny, et. al. Informational [Page 17]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ This example indicates that the simple addition of the condition that
+ EF aggregate must receive its configured rate only when the EF
+ aggregate is backlogged does not suffice in this case.
+
+ Yet, the problem described here is of fundamental importance in
+ practice. Most routers have a certain amount of internal delay. A
+ vendor declaring EF compliance is not expected to simultaneously
+ declare the details of the internals of the router. Therefore, the
+ existence of internal delay may cause a perfectly reasonable EF
+ implementation to display seemingly non-conformant behavior, which is
+ clearly undesirable.
+
+A.3 Maximum Configurable Rate and Provisioning Efficiency
+
+ It is well understood that with any non-preemptive scheduler, the
+ RFC-2598-compliant configurable rate for an EF aggregate cannot
+ exceed C/2 [11]. This is because an MTU-sized EF packet may arrive
+ to an empty queue at time t just as an MTU-sized non-EF packet begins
+ service. The maximum number of EF bits that could be forwarded
+ during the interval [t, t + 2*MTU/C] is MTU. But if configured rate
+ R > C/2, then this interval would be of length greater than MTU/R,
+ and more than MTU EF bits would have to be served during this
+ interval for the router to be compliant. Thus, R must be no greater
+ than C/2.
+
+ It can be shown that for schedulers other than PQ, such as various
+ implementations of WFQ, the maximum compliant configured rate may be
+ much smaller than 50%. For example, for SCFQ [8] the maximum
+ configured rate cannot exceed C/N, where N is the number of queues in
+ the scheduler. For WRR, mentioned as compliant in section 2.2 of RFC
+ 2598, this limitation is even more severe. This is because in these
+ schedulers a packet arriving to an empty EF queue may be forced to
+ wait until one packet from each other queue (in the case of SCFQ) or
+ until several packets from each other queue (in the case of WRR) are
+ served before it will finally be forwarded.
+
+ While it is frequently assumed that the configured rate of EF traffic
+ will be substantially smaller than the link bandwidth, the
+ requirement that this rate should never exceed 50% of the link
+ bandwidth appears unnecessarily limiting. For example, in a fully
+ connected mesh network, where any flow traverses a single link on its
+ way from source to its destination there seems no compelling reason
+ to limit the amount of EF traffic to 50% (or an even smaller
+ percentage for some schedulers) of the link bandwidth.
+
+ Another, perhaps even more striking example is the fact that even a
+ TDM circuit with dedicated slots cannot be configured to forward EF
+ packets at more than 50% of the link speed without violating RFC 2598
+
+
+
+Charny, et. al. Informational [Page 18]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ (unless the entire link is configured for EF). If the configured
+ rate of EF traffic is greater than 50% (but less than the link
+ speed), there will always exist an interval longer than MTU/R in
+ which less than the configured rate is achieved. For example,
+ suppose the configured rate of the EF aggregate is 2C/3. Then the
+ forwarding pattern of the TDM circuit might be
+
+ E E x E E x E E x ...
+ |---|
+
+ where only one packet is served in the marked interval of length 2T =
+ 2MTU/C. But at least 4/3 MTU would have to be served during this
+ interval by a router in compliance with the definition in RFC 2598.
+ The fact that even a TDM line cannot be booked over 50% by EF traffic
+ indicates that the restriction is artificial and unnecessary.
+
+A.4 The Non-trivial Nature of the Difficulties
+
+ One possibility to correct the problems discussed in the previous
+ sections might be to attempt to clarify the definition of the
+ intervals to which the definition applied or by averaging over
+ multiple intervals. However, an attempt to do so meets with
+ considerable analytical and implementation difficulties. For
+ example, attempting to align interval start times with some epochs of
+ the forwarded stream appears to require a certain degree of global
+ clock synchronization and is fraught with the risk of
+ misinterpretation and mistake in practice.
+
+ Another approach might be to allow averaging of the rates over some
+ larger time scale. However, it is unclear exactly what finite time
+ scale would suffice in all reasonable cases. Furthermore, this
+ approach would compromise the notion of very short-term time scale
+ guarantees that are the essence of EF PHB.
+
+ We also explored a combination of two simple fixes. The first is the
+ addition of the condition that the only intervals subject to the
+ definition are those that fall inside a period during which the EF
+ aggregate is continuously backlogged in the router (i.e., when an EF
+ packet is in the router). The second is the addition of an error
+ (latency) term that could serve as a figure-of-merit in the
+ advertising of EF services.
+
+ With the addition of these two changes the candidate definition
+ becomes as follows:
+
+
+
+
+
+
+
+Charny, et. al. Informational [Page 19]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ In any interval of time (t1, t2) in which EF traffic is continuously
+ backlogged, at least R(t2 - t1 - E) bits of EF traffic must be
+ served, where R is the configured rate for the EF aggregate and E is
+ an implementation-specific latency term.
+
+ The "continuously backlogged" condition eliminates the insufficient-
+ packets-to-forward difficulty while the addition of the latency term
+ of size MTU/C resolves the perfectly-clocked forwarding example
+ (section A.1), and also removes the limitation on EF configured rate.
+
+ However, neither fix (nor the two of them together) resolves the
+ example of section A.2. To see this, recall that in the example of
+ section A.2 the EF aggregate is continuously backlogged, but the
+ service rate of the EF aggregate is consistently smaller than the
+ configured rate, and therefore no finite latency term will suffice to
+ bring the example into conformance.
+
+Appendix B. Alternative Characterization of Packet Scale Rate Guarantee
+
+ The proofs of several bounds in this document can be found in [13].
+ These proofs use an algebraic characterization of the aggregate
+ definition given by (eq_1), (eq_2), and packet identity aware
+ definition given by (eq_3), (eq_4). Since this characterization is
+ of interest on its own, we present it in this section.
+
+Theorem B1. Characterization of the aggregate definition without
+ f_n.
+
+ Consider a system where packets are numbered 1, 2, ... in order of
+ arrival. As in the aggregate definition, call a_n the n-th arrival
+ time, d_n - the n-th departure time, and l_n the size of the n-th
+ packet to depart. Define by convention d_0=0. The aggregate
+ definition with rate R and latency E_a is equivalent to saying that
+ for all n and all 0<=j<= n-1:
+
+ d_n <= E_a + d_j + (l_(j+1) + ... + l_n)/R (eq_b1)
+
+ or
+
+ there exists some j+1 <= k <= n such that
+
+ d_n <= E_a + a_k + (l_k + ... + l_n)/R (eq_b2)
+
+
+
+
+
+
+
+
+
+Charny, et. al. Informational [Page 20]
+
+RFC 3247 Supplemental Information March 2002
+
+
+Theorem B2. Characterization of packet-identity-aware definition
+ without F_n.
+
+ Consider a system where packets are numbered 1, 2, ... in order of
+ arrival. As in the packet-identity-aware definition, call A_n, D_n
+ the arrival and departure times for the n-th packet, and L_n the size
+ of this packet. Define by convention D_0=0. The packet identity
+ aware definition with rate R and latency E_p is equivalent to saying
+ that for all n and all 0<=j<= n-1:
+
+ D_n <= E_p + D_j + (L_{j+1} + ... + L_n)/R (eq_b3)
+
+ or
+
+ there exists some j+1 <= k <= n such that
+
+ D_n <= E_p + A_k + (L_k + ... + L_n)/R (eq_b4)
+
+ For the proofs of both Theorems, see [13].
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Charny, et. al. Informational [Page 21]
+
+RFC 3247 Supplemental Information March 2002
+
+
+Acknowledgements
+
+ This document could not have been written without Fred Baker, Bruce
+ Davie and Dimitrios Stiliadis. Their time, support and insightful
+ comments were invaluable.
+
+Authors' Addresses
+
+ Anna Charny
+ Cisco Systems
+ 300 Apollo Drive
+ Chelmsford, MA 01824
+
+ EMail: acharny@cisco.com
+
+ Jon Bennett
+ Motorola
+ 3 Highwood Drive East
+ Tewksbury, MA 01876
+
+ EMail: jcrb@motorola.com
+
+ Kent Benson
+ Tellabs Research Center
+ 3740 Edison Lake Parkway #101
+ Mishawaka, IN 46545
+
+ EMail: Kent.Benson@tellabs.com
+
+ Jean-Yves Le Boudec
+ ICA-EPFL, INN
+ Ecublens, CH-1015
+ Lausanne-EPFL, Switzerland
+
+ EMail: jean-yves.leboudec@epfl.ch
+
+ Angela Chiu
+ Celion Networks
+ 1 Sheila Drive, Suite 2
+ Tinton Falls, NJ 07724
+
+ EMail: angela.chiu@celion.com
+
+
+
+
+
+
+
+
+
+Charny, et. al. Informational [Page 22]
+
+RFC 3247 Supplemental Information March 2002
+
+
+ Bill Courtney
+ TRW
+ Bldg. 201/3702
+ One Space Park
+ Redondo Beach, CA 90278
+
+ EMail: bill.courtney@trw.com
+
+ Shahram Davari
+ PMC-Sierra Inc
+ 411 Legget Drive
+ Ottawa, ON K2K 3C9, Canada
+
+ EMail: shahram_davari@pmc-sierra.com
+
+ Victor Firoiu
+ Nortel Networks
+ 600 Tech Park
+ Billerica, MA 01821
+
+ EMail: vfiroiu@nortelnetworks.com
+
+ Charles Kalmanek
+ AT&T Labs-Research
+ 180 Park Avenue, Room A113,
+ Florham Park NJ
+
+ EMail: crk@research.att.com
+
+ K.K. Ramakrishnan
+ TeraOptic Networks, Inc.
+ 686 W. Maude Ave
+ Sunnyvale, CA 94086
+
+ EMail: kk@teraoptic.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Charny, et. al. Informational [Page 23]
+
+RFC 3247 Supplemental Information March 2002
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2001). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assigns.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is currently provided by the
+ Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Charny, et. al. Informational [Page 24]
+