1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
|
Network Working Group A. Charny
Request for Comments: 3247 Cisco Systems, Inc.
Category: Informational J.C.R. Bennett
Motorola
K. Benson
Tellabs
J.Y. Le Boudec
EPFL
A. Chiu
Celion Networks
W. Courtney
TRW
S. Davari
PMC-Sierra
V. Firoiu
Nortel Networks
C. Kalmanek
AT&T Research
K.K. Ramakrishnan
TeraOptic Networks
March 2002
Supplemental Information for the New Definition
of the EF PHB (Expedited Forwarding Per-Hop Behavior)
Status of this Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2001). All Rights Reserved.
Abstract
This document was written during the process of clarification of
RFC2598 "An Expedited Forwarding PHB" that led to the publication of
revised specification of EF "An Expedited Forwarding PHB". Its
primary motivation is providing additional explanation to the revised
EF definition and its properties. The document also provides
additional implementation examples and gives some guidance for
computation of the numerical parameters of the new definition for
several well known schedulers and router architectures.
Charny, et. al. Informational [Page 1]
^L
RFC 3247 Supplemental Information March 2002
Table of Contents
1 Introduction ........................................... 2
2 Definition of EF PHB ................................... 3
2.1 The formal definition .................................. 3
2.2 Relation to Packet Scale Rate Guarantee ................ 6
2.3 The need for dual characterization of EF PHB ........... 7
3 Per Packet delay ....................................... 9
3.1 Single hop delay bound ................................. 9
3.2 Multi-hop worst case delay ............................. 10
4 Packet loss ............................................ 10
5 Implementation considerations .......................... 11
5.1 The output buffered model with EF FIFO at the output. .. 12
5.1.1 Strict Non-preemptive Priority Queue ................... 12
5.1.2 WF2Q ................................................... 13
5.1.3 Deficit Round Robin (DRR) .............................. 13
5.1.4 Start-Time Fair Queuing and Self-Clocked Fair Queuing .. 13
5.2 Router with Internal Delay and EF FIFO at the output ... 13
6 Security Considerations ................................ 14
7 References ............................................. 14
Appendix A. Difficulties with the RFC 2598 EF PHB Definition .. 16
Appendix B. Alternative Characterization of Packet Scale Rate
Guarantee ......................................... 20
Acknowledgements .............................................. 22
Authors' Addresses ............................................ 22
Full Copyright Statement ...................................... 24
1. Introduction
The Expedited Forwarding (EF) Per-Hop Behavior (PHB) was designed to
be used to build a low-loss, low-latency, low-jitter, assured
bandwidth service. The potential benefits of this service, and
therefore the EF PHB, are enormous. Because of the great value of
this PHB, it is critical that the forwarding behavior required of and
delivered by an EF-compliant node be specific, quantifiable, and
unambiguous.
Unfortunately, the definition of EF PHB in the original RFC2598 [10]
was not sufficiently precise (see Appendix A and [4]). A more
precise definition is given in [6]. This document is intended to aid
in the understanding of the properties of the new definition and
provide supplemental information not included in the text of [6] for
sake of brevity.
This document is outlined as follows. In section 2, we briefly
restate the definition for EF PHB of [6]. We then provide some
additional discussion of this definition and describe some of its
properties. We discuss the issues associated with per-packet delay
Charny, et. al. Informational [Page 2]
^L
RFC 3247 Supplemental Information March 2002
and loss in sections 3 and 4. In section 5 we discuss the impact of
known scheduling architectures on the critical parameters of the new
definition. We also discuss the impact of deviation of real devices
from the ideal output-buffered model on the magnitude of the critical
parameters in the definition.
2. Definition of EF PHB
2.1. The formal definition
An intuitive explanation of the new EF definition is described in
[6]. Here we restate the formal definition from [6] verbatim.
A node that supports EF on an interface I at some configured rate R
MUST satisfy the following equations:
d_j <= f_j + E_a for all j>0 (eq_1)
where f_j is defined iteratively by
f_0 = 0, d_0 = 0
f_j = max(a_j, min(d_j-1, f_j-1)) + l_j/R, for all j > 0 (eq_2)
In this definition:
- d_j is the time that the last bit of the j-th EF packet to
depart actually leaves the node from the interface I.
- f_j is the target departure time for the j-th EF packet to
depart from I, the "ideal" time at or before which the last bit
of that packet should leave the node.
- a_j is the time that the last bit of the j-th EF packet
destined to the output I actually arrives at the node.
- l_j is the size (bits) of the j-th EF packet to depart from I.
l_j is measured on the IP datagram (IP header plus payload) and
does not include any lower layer (e.g. MAC layer) overhead.
- R is the EF configured rate at output I (in bits/second).
- E_a is the error term for the treatment of the EF aggregate.
Note that E_a represents the worst case deviation between
actual departure time of an EF packet and ideal departure time
of the same packet, i.e. E_a provides an upper bound on (d_j -
f_j) for all j.
Charny, et. al. Informational [Page 3]
^L
RFC 3247 Supplemental Information March 2002
- d_0 and f_0 do not refer to a real packet departure but are
used purely for the purposes of the recursion. The time origin
should be chosen such that no EF packets are in the system at
time 0.
- for the definitions of a_j and d_j, the "last bit" of the
packet includes the layer 2 trailer if present, because a
packet cannot generally be considered available for forwarding
until such a trailer has been received.
An EF-compliant node MUST be able to be characterized by the range of
possible R values that it can support on each of its interfaces while
conforming to these equations, and the value of E_a that can be met
on each interface. R may be line rate or less. E_a MAY be specified
as a worst-case value for all possible R values or MAY be expressed
as a function of R.
Note also that, since a node may have multiple inputs and complex
internal scheduling, the j-th EF packet to arrive at the node
destined for a certain interface may not be the j-th EF packet to
depart from that interface. It is in this sense that eq_1 and eq_2
are unaware of packet identity.
In addition, a node that supports EF on an interface I at some
configured rate R MUST satisfy the following equations:
D_j <= F_j + E_p for all j>0 (eq_3)
where F_j is defined iteratively by
F_0 = 0, D_0 = 0
F_j = max(A_j, min(D_j-1, F_j-1)) + L_j/R, for all j > 0 (eq_4)
In this definition:
- D_j is the actual departure time of the individual EF packet
that arrived at the node destined for interface I at time A_j,
i.e., given a packet which was the j-th EF packet destined for
I to arrive at the node via any input, D_j is the time at which
the last bit of that individual packet actually leaves the node
from the interface I.
- F_j is the target departure time for the individual EF packet
that arrived at the node destined for interface I at time A_j.
Charny, et. al. Informational [Page 4]
^L
RFC 3247 Supplemental Information March 2002
- A_j is the time that the last bit of the j-th EF packet
destined to the output I to arrive actually arrives at the
node.
- L_j is the size (bits) of the j-th EF packet to arrive at the
node that is destined to output I. L_j is measured on the IP
datagram (IP header plus payload) and does not include any
lower layer (e.g. MAC layer) overhead.
- R is the EF configured rate at output I (in bits/second).
- E_p is the error term for the treatment of individual EF
packets. Note that E_p represents the worst case deviation
between the actual departure time of an EF packet and the ideal
departure time of the same packet, i.e. E_p provides an upper
bound on (D_j - F_j) for all j.
- D_0 and F_0 do not refer to a real packet departure but are
used purely for the purposes of the recursion. The time origin
should be chosen such that no EF packets are in the system at
time 0.
- for the definitions of A_j and D_j, the "last bit" of the
packet includes the layer 2 trailer if present, because a
packet cannot generally be considered available for forwarding
until such a trailer has been received.
It is the fact that D_j and F_j refer to departure times for the j-th
packet to arrive that makes eq_3 and eq_4 aware of packet identity.
This is the critical distinction between the last two equations and
the first two.
An EF-compliant node SHOULD be able to be characterized by the range
of possible R values that it can support on each of its interfaces
while conforming to these equations, and the value of E_p that can be
met on each interface. E_p MAY be specified as a worst-case value
for all possible R values or MAY be expressed as a function of R. An
E_p value of "undefined" MAY be specified.
Finally, there is an additional recommendation in [6] that an EF
compliant node SHOULD NOT reorder packets within a micorflow.
The definitions described in this section are referred to as
aggregate and packet-identity-aware packet scale rate guarantee
[4],[2]. An alternative mathematical characterization of packet
scale rate guarantee is given in Appendix B.
Charny, et. al. Informational [Page 5]
^L
RFC 3247 Supplemental Information March 2002
2.2. Relation to Packet Scale Rate Guarantee
Consider the case of an ideal output-buffered device with an EF FIFO
at the output. For such a device, the i-th packet to arrive to the
device is also the i-th packet to depart from the device. Therefore,
in this ideal model the aggregate behavior and packet-identity-aware
characteristics are identical, and E_a = E_p. In this section we
therefore omit the subscript and refer to the latency term simply as
E.
It could be shown that for such an ideal device the definition of
section 2.1 is stronger than the well-known rate-latency curve [2] in
the sense that if a scheduler satisfies the EF definition it also
satisfies the rate-latency curve. As a result, all the properties
known for the rate-latency curve also apply to the modified EF
definition. However, we argue below that the definition of section
2.1 is more suitable to reflect the intent of EF PHB than the rate-
latency curve.
It is shown in [2] that the rate-latency curve is equivalent to the
following definition:
Definition of Rate Latency Curve (RLC):
D(j) <= F'(j) + E (eq_5)
where
F'(0)=0, F'(j)=max(a(j), F'(j-1))+ L(j)/R for all j>0 (eq_6)
It can be easily verified that the EF definition of section 2.1 is
stronger than RLC by noticing that for all j, F'(j) >= F(j).
It is easy to see that F'(j) in the definition of RLC corresponds to
the time the j-th departure should have occurred should the EF
aggregate be constantly served exactly at its configured rate R.
Following the common convention, we refer to F'(j) as the "fluid
finish time" of the j-th packet to depart.
The intuitive meaning of the rate-latency curve of RLC is that any
packet is served at most time E later than this packet would finish
service in the fluid model.
For RLC (and hence for the stronger EF definition) it holds that in
any interval (0,t) the EF aggregate gets close to the desired service
rate R (as long as there is enough traffic to sustain this rate).
The discrepancy between the ideal and the actual service in this
interval depends on the latency term E, which in turn depends on the
Charny, et. al. Informational [Page 6]
^L
RFC 3247 Supplemental Information March 2002
scheduling implementation. The smaller E, the smaller the difference
between the configured rate and the actual rate achieved by the
scheduler.
While RLC guarantees the desired rate to the EF aggregate in all
intervals (0,t) to within a specified error, it may nevertheless
result in large gaps in service. For example, suppose that (a large
number) N of identical EF packets of length L arrived from different
interfaces to the EF queue in the absence of any non-EF traffic.
Then any work-conserving scheduler will serve all N packets at link
speed. When the last packet is sent at time NL/C, where C is the
capacity of output link, F'(N) will be equal to NL/R. That is, the
scheduler is running ahead of ideal, since NL/C < NL/R for R < C.
Suppose now that at time NL/C a large number of non-EF packets
arrive, followed by a single EF packet. Then the scheduler can
legitimately delay starting to send the EF packet until time
F'(N+1)=(N+1)L/R + E - L/C. This means that the EF aggregate will
have no service at all in the interval (NL/C, (N+1)L/R + E - L/C).
This interval can be quite large if R is substantially smaller than
C. In essence, the EF aggregate can be "punished" by a gap in
service for receiving faster service than its configured rate at the
beginning.
The new EF definition alleviates this problem by introducing the term
min(D(j-1), F(j-1)) in the recursion. Essentially, this means that
the fluid finishing time is "reset" if that packet is sent before its
"ideal" departure time. As a consequence of that, for the case where
the EF aggregate is served in the FIFO order, suppose a packet
arrives at time t to a server satisfying the EF definition. The
packet will be transmitted no later than time t + Q(t)/R + E, where
Q(t) is the EF queue size at time t (including the packet under
discussion)[4].
2.3. The need for dual characterization of EF PHB
In a more general case, where either the output scheduler does not
serve the EF packets in a FIFO order, or the variable internal delay
in the device reorders packets while delivering them to the output
(or both), the i-th packet destined to a given output interface to
arrive to the device may no longer be the i-th packet to depart from
that interface. In that case the packet-identity-aware and the
aggregate definitions are no longer identical.
The aggregate behavior definition can be viewed as a truly aggregate
characteristic of the service provided to EF packets. For an
analogy, consider a dark reservoir to which all arriving packets are
placed. A scheduler is allowed to pick a packet from the reservoir
in a random order, without any knowledge of the order of packet
Charny, et. al. Informational [Page 7]
^L
RFC 3247 Supplemental Information March 2002
arrivals. The aggregate part of the definition measures the accuracy
of the output rate provided to the EF aggregate as a whole. The
smaller E_a, the more accurate is the assurance that the reservoir is
drained at least at the configured rate.
Note that in this reservoir analogy packets of EF aggregate may be
arbitrarily reordered. However, the definition of EF PHB given in
[6] explicitly requires that no packet reordering occur within a
microflow. This requirement restricts the scheduling
implementations, or, in the reservoir analogy, the order of pulling
packets out of the reservoir to make sure that packets within a
microflow are not reordered, but it still allows reordering at the
aggregate level.
Note that reordering within the aggregate, as long as there is no
flow-level reordering, does not necessarily reflect a "bad" service.
Consider for example a scheduler that arbitrates among 10 different
EF "flows" with diverse rates. A scheduler that is aware of the rate
requirements may choose to send a packet of the faster flow before a
packet of the slower flow to maintain lower jitter at the flow level.
In particular, an ideal "flow"-aware WFQ scheduler will cause
reordering within the aggregate, while maintaining packet ordering
and small jitter at the flow level.
It is intuitively clear that for such a scheduler, as well as for a
simpler FIFO scheduler, the "accuracy" of the service rate is crucial
for minimizing "flow"-level jitter. The packet-identity-aware
definition quantifies this accuracy of the service rate.
However, the small value of E_a does not give any assurances about
the absolute value of per-packet delay. In fact, if the input rate
exceeds the configured rate, the aggregate behavior definition may
result in arbitrarily large delay of a subset of packets. This is
the primary motivation for the packet-identity-aware definition.
The primary goal of the packet-aware characterization of the EF
implementation is that, unlike the aggregate behavior
characterization, it provides a way to find a per-packet delay bound
as a function of input traffic parameters.
While the aggregate behavior definition characterizes the accuracy of
the service rate of the entire EF aggregate, the packet-identity-
aware part of the definition characterizes the deviation of the
device from an ideal server that serves the EF aggregate in FIFO
order at least at the configured rate.
The value of E_p in the packet-identity-aware definition is therefore
affected by two factors: the accuracy of the aggregate rate service
Charny, et. al. Informational [Page 8]
^L
RFC 3247 Supplemental Information March 2002
and the degree of packet reordering within the EF aggregate (under
the constraint that packets within the same microflow are not
reordered). Therefore, a sub-aggregate aware device that provides an
ideal service rate to the aggregate, and also provides an ideal rate
service for each of the sub-aggregates, may nevertheless have a very
large value of E_p (in this case E_p must be at least equal to the
ratio of the maximum packet size divided by the smallest rate of any
sub aggregate). As a result, a large value of E_p does not
necessarily mean that the service provided to EF aggregate is bad -
rather it may be an indication that the service is good, but non-
FIFO. On the other hand, a large value of E_p may also mean that the
aggregate service is very inaccurate (bursty), and hence in this case
the large value of E_p reflects a poor quality of implementation.
As a result, a large number of E_p does not necessarily provide any
guidance on the quality of the EF implementation. However, a small
value of E_p does indicate a high quality FIFO implementation.
Since E_p and E_a relate to different aspects of the EF
implementation, they should be considered together to determine the
quality of the implementation.
3. Per Packet delay
The primary motivation for the packet-identity-aware definition is
that it allows quantification of the per-packet delay bound. This
section discusses the issues with computing per-packet delay.
3.1. Single hop delay bound
If the total traffic arriving to an output port I from all inputs is
constrained by a leaky bucket with parameters (R, B), where R is the
configured rate at I, and B is the bucket depth (burst), then the
delay of any packet departing from I is bounded by D_p, given by
D_p = B/R + E_p (eq_7)
(see appendix B).
Because the delay bound depends on the configured rate R and the
input burstiness B, it is desirable for both of these parameters to
be visible to a user of the device. A PDB desiring a particular
delay bound may need to limit the range of configured rates and
allowed burstiness that it can support in order to deliver such
bound. Equation (eq_7) provides a means for determining an
acceptable operating region for the device with a given E_p. It may
also be useful to limit the total offered load to a given output to
some rate R_1 < R (e.g. to obtain end-to-end delay bounds [5]). It
Charny, et. al. Informational [Page 9]
^L
RFC 3247 Supplemental Information March 2002
is important to realize that, while R_1 may also be a configurable
parameter of the device, the delay bound in (eq_7) does not depend on
it. It may be possible to get better bounds explicitly using the
bound on the input rate, but the bound (eq_7) does not take advantage
of this information.
3.2. Multi-hop worst case delay
Although the PHB defines inherently local behavior, in this section
we briefly discuss the issue of per-packet delay as the packet
traverses several hops implementing EF PHB. Given a delay bound
(eq_7) at a single hop, it is tempting to conclude that per-packet
bound across h hops is simply h times the bound (eq_7). However,
this is not necessarily the case, unless B represents the worst case
input burstiness across all nodes in the network.
Unfortunately, obtaining such a worst case value of B is not trivial.
If EF PHB is implemented using aggregate class-based scheduling where
all EF packets share a single FIFO, the effect of jitter accumulation
may result in an increase in burstiness from hop to hop. In
particular, it can be shown that unless severe restrictions on EF
utilization are imposed, even if all EF flows are ideally shaped at
the ingress, then for any value of delay D it is possible to
construct a network where EF utilization on any link is bounded not
to exceed a given factor, no flow traverses more than a specified
number of hops, but there exists a packet that experiences a delay
more than D [5]. This result implies that the ability to limit the
worst case burstiness and the resulting end-to-end delay across
several hops may require not only limiting EF utilization on all
links, but also constraining the global network topology. Such
topology constraints would need to be specified in the definition of
any PDB built on top of EF PHB, if such PDB requires a strict worst
case delay bound.
4. Packet loss
Any device with finite buffering may need to drop packets if the
input burstiness becomes sufficiently high. To meet the low loss
objective of EF, a node may be characterized by the operating region
in which loss of EF due to congestion will not occur. This may be
specified as a token bucket of rate r <= R and burst size B that can
be offered from all inputs to a given output interface without loss.
However, as discussed in the previous section, the phenomenon of
jitter accumulation makes it generally difficult to guarantee that
the input burstiness never exceeds the specified operating region for
nodes internal to the DiffServ domain. A no-loss guarantee across
multiple hops may require specification of constraints on network
Charny, et. al. Informational [Page 10]
^L
RFC 3247 Supplemental Information March 2002
topology which are outside the scope of inherently local definition
of a PHB. Thus, it must be possible to establish whether a device
conforms to the EF definition even when some packets are lost.
This can be done by performing an "off-line" test of conformance to
equations (eq_1)- (eq_4). After observing a sequence of packets
entering and leaving the node, the packets which did not leave are
assumed lost and are notionally removed from the input stream. The
remaining packets now constitute the arrival stream and the packets
which left the node constitute the departure stream. Conformance to
the equations can thus be verified by considering only those packets
that successfully passed through the node.
Note that specification of which packets are lost in the case when
loss does occur is beyond the scope of the definition of EF PHB.
However, those packets that were not lost must conform to the
equations definition of EF PHB in section 2.1.
5. Implementation considerations
A packet passing through a router will experience delay for a number
of reasons. Two familiar components of this delay are the time the
packet spends in a buffer at an outgoing link waiting for the
scheduler to select it and the time it takes to actually transmit the
packet on the outgoing line.
There may be other components of a packet's delay through a router,
however. A router might have to do some amount of header processing
before the packet can be given to the correct output scheduler, for
example. In another case a router may have a FIFO buffer (called a
transmission queue in [7]) where the packet sits after being selected
by the output scheduler but before it is transmitted. In cases such
as these, the extra delay a packet may experience can be accounted
for by absorbing it into the latency terms E_a and E_p.
Implementing EF on a router with a multi-stage switch fabric requires
special attention. A packet may experience additional delays due to
the fact that it must compete with other traffic for forwarding
resources at multiple contention points in the switch core. The
delay an EF packet may experience before it even reaches the output-
link scheduler should be included in the latency term. Input-
buffered and input/output-buffered routers based on crossbar design
may also require modification of their latency terms. The factors
such as the speedup factor and the choice of crossbar arbitration
algorithms may affect the latency terms substantially.
Charny, et. al. Informational [Page 11]
^L
RFC 3247 Supplemental Information March 2002
Delay in the switch core comes from two sources, both of which must
be considered. The first part of this delay is the fixed delay a
packet experiences regardless of the other traffic. This component
of the delay includes the time it takes for things such as packet
segmentation and reassembly in cell based cores, enqueueing and
dequeuing at each stage, and transmission between stages. The second
part of the switch core delay is variable and depends on the type and
amount of other traffic traversing the core. This delay comes about
if the stages in the core mix traffic flowing between different
input/output port pairs. Thus, EF packets must compete against other
traffic for forwarding resources in the core. Some of this
competing traffic may even be traffic from other, non-EF aggregates.
This introduces extra delay, that can also be absorbed by the latency
term in the definition.
To capture these considerations, in this section we will consider two
simplified implementation examples. The first is an ideal output
buffered node where packets entering the device from an input
interface are immediately delivered to the output scheduler. In this
model the properties of the output scheduler fully define the values
of the parameters E_a and E_p. We will consider the case where the
output scheduler implements aggregate class-based queuing, so that
all EF packets share a single queue. We will discuss the values of
E_a and E_p for a variety of class-based schedulers widely
considered.
The second example will consider a router modeled as a black box with
a known bound on the variable delay a packet can experience from the
time it arrives to an input to the time it is delivered to its
destination output. The output scheduler in isolation is assumed to
be an aggregate scheduler where all EF packets share a single FIFO
queue, with a known value of E_a(S)=E_p(S)=E(S). This model provides
a reasonable abstraction to a large class of router implementations.
5.1. The output buffered model with EF FIFO at the output.
As has been mentioned earlier, in this model E_a = E_p, so we shall
omit the subscript and refer to both terms as latency E. The
remainder of this subsection discusses E for a number of scheduling
implementations.
5.1.1. Strict Non-preemptive Priority Queue
A Strict Priority scheduler in which all EF packets share a single
FIFO queue which is served at strict non-preemptive priority over
other queues satisfies the EF definition with the latency term E =
MTU/C where MTU is the maximum packet size and C is the speed of the
output link.
Charny, et. al. Informational [Page 12]
^L
RFC 3247 Supplemental Information March 2002
5.1.2. WF2Q
Another scheduler that satisfies the EF definition with a small
latency term is WF2Q described in [1]. A class-based WF2Q scheduler,
in which all EF traffic shares a single queue with the weight
corresponding to the configured rate of the EF aggregate satisfies
the EF definition with the latency term E = MTU/C+MTU/R.
5.1.3. Deficit Round Robin (DRR)
For DRR [12], E can be shown to grow linearly with
N*(r_max/r_min)*MTU, where r_min and r_max denote the smallest and
the largest rate among the rate assignments of all queues in the
scheduler, and N is the number of queues in the scheduler.
5.1.4. Start-Time Fair Queuing and Self-Clocked Fair Queuing
For Start-Time Fair Queuing (SFQ) [9] and Self-Clocked Fair Queuing
(SCFQ) [8] E can be shown to grow linearly with the number of queues
in the scheduler.
5.2. Router with Internal Delay and EF FIFO at the output
In this section we consider a router which is modeled as follows. A
packet entering the router may experience a variable delay D_v with a
known upper bound D. That is, 0<=D_v<=D. At the output all EF
packets share a single class queue. Class queues are scheduled by a
scheduler with a known value E_p(S)=E(S) (where E(S) corresponds to
the model where this scheduler is implemented in an ideal output
buffered device).
The computation of E_p is more complicated in this case. For such
device, it can be shown that E_p = E(S)+2D+2B/R (see [13]).
Recall from the discussion of section 3 that bounding input
burstiness B may not be easy in a general topology. In the absence
of the knowledge of a bound on B one can bound E_p as E_p = E(S) +
D*C_inp/R (see [13]).
Note also that the bounds in this section are derived using only the
bound on the variable portion of the interval delay and the error
bound of the output scheduler. If more details about the
architecture of a device are available, it may be possible to compute
better bounds.
Charny, et. al. Informational [Page 13]
^L
RFC 3247 Supplemental Information March 2002
6. Security Considerations
This informational document provides additional information to aid in
understanding of the EF PHB described in [6]. It adds no new
functions to it. As a result, it adds no security issues to those
described in that specification.
7. References
[1] J.C.R. Bennett and H. Zhang, "WF2Q: Worst-case Fair Weighted
Fair Queuing", INFOCOM'96, March 1996.
[2] J.-Y. Le Boudec, P. Thiran, "Network Calculus", Springer Verlag
Lecture Notes in Computer Science volume 2050, June 2001
(available online at http://lcawww.epfl.ch).
[3] Bradner, S., "Key Words for Use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
[4] J.C.R. Bennett, K. Benson, A. Charny, W. Courtney, J.Y. Le
Boudec, "Delay Jitter Bounds and Packet Scale Rate Guarantee
for Expedited Forwarding", Proc. Infocom 2001, April 2001.
[5] A. Charny, J.-Y. Le Boudec "Delay Bounds in a Network with
Aggregate Scheduling". Proc. of QoFIS'2000, September 25-26,
2000, Berlin, Germany.
[6] Davie, B., Charny, A., Baker, F., Bennett, J.C.R., Benson, K.,
Boudec, J., Chiu, A., Courtney, W., Davari, S., Firoiu, V.,
Kalmanek, C., Ramakrishnan, K.K. and D. Stiliadis, "An
Expedited Forwarding PHB (Per-Hop Behavior)", RFC 3246, March
2002.
[7] T. Ferrari and P. F. Chimento, "A Measurement-Based Analysis of
Expedited Forwarding PHB Mechanisms," Eighth International
Workshop on Quality of Service, Pittsburgh, PA, June 2000.
[8] S.J. Golestani. "A Self-clocked Fair Queuing Scheme for Broad-
band Applications". In Proceedings of IEEE INFOCOM'94, pages
636-646, Toronto, CA, April 1994.
[9] P. Goyal, H.M. Vin, and H. Chen. "Start-time Fair Queuing: A
Scheduling Algorithm for Integrated Services". In Proceedings
of the ACM-SIGCOMM 96, pages 157-168, Palo Alto, CA, August
1996.
[10] Jacobson, V., Nichols, K. and K. Poduri, "An Expedited
Forwarding PHB", RFC 2598, June 1999.
Charny, et. al. Informational [Page 14]
^L
RFC 3247 Supplemental Information March 2002
[11] Jacobson, V., Nichols, K. and K. Poduri, "The 'Virtual Wire'
Behavior Aggregate", Work in Progress.
[12] M. Shreedhar and G. Varghese. "Efficient Fair Queuing Using
Deficit Round Robin". In Proceedings of SIGCOMM'95, pages
231-243, Boston, MA, September 1995.
[13] Le Boudec, J.-Y., Charny, A. "Packet Scale Rate Guarantee for
non-FIFO Nodes", Infocom 2002, New York, June 2002.
Charny, et. al. Informational [Page 15]
^L
RFC 3247 Supplemental Information March 2002
Appendix A. Difficulties with the RFC 2598 EF PHB Definition
The definition of the EF PHB as given in [10] states:
"The EF PHB is defined as a forwarding treatment for a particular
diffserv aggregate where the departure rate of the aggregate's
packets from any diffserv node must equal or exceed a configurable
rate. The EF traffic SHOULD receive this rate independent of the
intensity of any other traffic attempting to transit the node. It
[the EF PHB departure rate] SHOULD average at least the configured
rate when measured over any time interval equal to or longer than the
time it takes to send an output link MTU sized packet at the
configured rate."
A literal interpretation of the definition would consider the
behaviors given in the next two subsections as non-compliant. The
definition also unnecessarily constrains the maximum configurable
rate of an EF aggregate.
A.1 Perfectly-Clocked Forwarding
Consider the following stream forwarded from a router with EF-
configured rate R=C/2, where C is the output line rate. In the
illustration, E is an MTU-sized EF packet while x is a non-EF packet
or unused capacity, also of size MTU.
E x E x E x E x E x E x...
|-----|
The interval between the vertical bars is 3*MTU/C, which is greater
than MTU/(C/2), and so is subject to the EF PHB definition. During
this interval, 3*MTU/2 bits of the EF aggregate should be forwarded,
but only MTU bits are forwarded. Therefore, while this forwarding
pattern should be considered compliant under any reasonable
interpretation of the EF PHB, it actually does not formally comply
with the definition of RFC 2598.
Note that this forwarding pattern can occur in any work-conserving
scheduler in an ideal output-buffered architecture where EF packets
arrive in a perfectly clocked manner according to the above pattern
and are forwarded according to exactly the same pattern in the
absence of any non-EF traffic.
Trivial as this example may be, it reveals the lack of mathematical
precision in the formal definition. The fact that no work-conserving
scheduler can formally comply with the definition is unfortunate, and
appears to warrant some changes to the definition that would correct
this problem.
Charny, et. al. Informational [Page 16]
^L
RFC 3247 Supplemental Information March 2002
The underlying reason for the problem described here is quite simple
- one can only expect that the EF aggregate is served at configured
rate in some interval where there is enough backlog of EF packets to
sustain that rate. In the example above the packets come in exactly
at the rate at which they are served, and so there is no persistent
backlog. Certainly, if the input rate is even smaller than the
configured rate of the EF aggregate, there will be no backlog as
well, and a similar formal difficulty will occur.
A seemingly simple solution to this difficulty might be to require
that the EF aggregate is served at its configured rate only when the
queue is backlogged. However, as we show in the remainder of this
section, this solution does not suffice.
A.2 Router Internal Delay
We now argue that the example considered in the previous section is
not as trivial as it may seem at first glance.
Consider a router with EF configured rate R = C/2 as in the previous
example, but with an internal delay of 3T (where T = MTU/C) between
the time that a packet arrives at the router and the time that it is
first eligible for forwarding at the output link. Such things as
header processing, route look-up, and delay in switching through a
multi-layer fabric could cause this delay. Now suppose that EF
traffic arrives regularly at a rate of (2/3)R = C/3. The router will
perform as shown below.
EF Packet Number 1 2 3 4 5 6 ...
Arrival (at router) 0 3T 6T 9T 12T 15T ...
Arrival (at scheduler) 3T 6T 9T 12T 15T 18T ...
Departure 4T 7T 10T 13T 16T 19T ...
Again, the output does not satisfy the RFC 2598 definition of EF PHB.
As in the previous example, the underlying reason for this problem is
that the scheduler cannot forward EF traffic faster than it arrives.
However, it can be easily seen that the existence of internal delay
causes one packet to be inside the router at all times. An external
observer will rightfully conclude that the number of EF packets that
arrived to the router is always at least one greater than the number
of EF packets that left the router, and therefore the EF aggregate is
constantly backlogged. However, while the EF aggregate is
continuously backlogged, the observed output rate is nevertheless
strictly less that the configured rate.
Charny, et. al. Informational [Page 17]
^L
RFC 3247 Supplemental Information March 2002
This example indicates that the simple addition of the condition that
EF aggregate must receive its configured rate only when the EF
aggregate is backlogged does not suffice in this case.
Yet, the problem described here is of fundamental importance in
practice. Most routers have a certain amount of internal delay. A
vendor declaring EF compliance is not expected to simultaneously
declare the details of the internals of the router. Therefore, the
existence of internal delay may cause a perfectly reasonable EF
implementation to display seemingly non-conformant behavior, which is
clearly undesirable.
A.3 Maximum Configurable Rate and Provisioning Efficiency
It is well understood that with any non-preemptive scheduler, the
RFC-2598-compliant configurable rate for an EF aggregate cannot
exceed C/2 [11]. This is because an MTU-sized EF packet may arrive
to an empty queue at time t just as an MTU-sized non-EF packet begins
service. The maximum number of EF bits that could be forwarded
during the interval [t, t + 2*MTU/C] is MTU. But if configured rate
R > C/2, then this interval would be of length greater than MTU/R,
and more than MTU EF bits would have to be served during this
interval for the router to be compliant. Thus, R must be no greater
than C/2.
It can be shown that for schedulers other than PQ, such as various
implementations of WFQ, the maximum compliant configured rate may be
much smaller than 50%. For example, for SCFQ [8] the maximum
configured rate cannot exceed C/N, where N is the number of queues in
the scheduler. For WRR, mentioned as compliant in section 2.2 of RFC
2598, this limitation is even more severe. This is because in these
schedulers a packet arriving to an empty EF queue may be forced to
wait until one packet from each other queue (in the case of SCFQ) or
until several packets from each other queue (in the case of WRR) are
served before it will finally be forwarded.
While it is frequently assumed that the configured rate of EF traffic
will be substantially smaller than the link bandwidth, the
requirement that this rate should never exceed 50% of the link
bandwidth appears unnecessarily limiting. For example, in a fully
connected mesh network, where any flow traverses a single link on its
way from source to its destination there seems no compelling reason
to limit the amount of EF traffic to 50% (or an even smaller
percentage for some schedulers) of the link bandwidth.
Another, perhaps even more striking example is the fact that even a
TDM circuit with dedicated slots cannot be configured to forward EF
packets at more than 50% of the link speed without violating RFC 2598
Charny, et. al. Informational [Page 18]
^L
RFC 3247 Supplemental Information March 2002
(unless the entire link is configured for EF). If the configured
rate of EF traffic is greater than 50% (but less than the link
speed), there will always exist an interval longer than MTU/R in
which less than the configured rate is achieved. For example,
suppose the configured rate of the EF aggregate is 2C/3. Then the
forwarding pattern of the TDM circuit might be
E E x E E x E E x ...
|---|
where only one packet is served in the marked interval of length 2T =
2MTU/C. But at least 4/3 MTU would have to be served during this
interval by a router in compliance with the definition in RFC 2598.
The fact that even a TDM line cannot be booked over 50% by EF traffic
indicates that the restriction is artificial and unnecessary.
A.4 The Non-trivial Nature of the Difficulties
One possibility to correct the problems discussed in the previous
sections might be to attempt to clarify the definition of the
intervals to which the definition applied or by averaging over
multiple intervals. However, an attempt to do so meets with
considerable analytical and implementation difficulties. For
example, attempting to align interval start times with some epochs of
the forwarded stream appears to require a certain degree of global
clock synchronization and is fraught with the risk of
misinterpretation and mistake in practice.
Another approach might be to allow averaging of the rates over some
larger time scale. However, it is unclear exactly what finite time
scale would suffice in all reasonable cases. Furthermore, this
approach would compromise the notion of very short-term time scale
guarantees that are the essence of EF PHB.
We also explored a combination of two simple fixes. The first is the
addition of the condition that the only intervals subject to the
definition are those that fall inside a period during which the EF
aggregate is continuously backlogged in the router (i.e., when an EF
packet is in the router). The second is the addition of an error
(latency) term that could serve as a figure-of-merit in the
advertising of EF services.
With the addition of these two changes the candidate definition
becomes as follows:
Charny, et. al. Informational [Page 19]
^L
RFC 3247 Supplemental Information March 2002
In any interval of time (t1, t2) in which EF traffic is continuously
backlogged, at least R(t2 - t1 - E) bits of EF traffic must be
served, where R is the configured rate for the EF aggregate and E is
an implementation-specific latency term.
The "continuously backlogged" condition eliminates the insufficient-
packets-to-forward difficulty while the addition of the latency term
of size MTU/C resolves the perfectly-clocked forwarding example
(section A.1), and also removes the limitation on EF configured rate.
However, neither fix (nor the two of them together) resolves the
example of section A.2. To see this, recall that in the example of
section A.2 the EF aggregate is continuously backlogged, but the
service rate of the EF aggregate is consistently smaller than the
configured rate, and therefore no finite latency term will suffice to
bring the example into conformance.
Appendix B. Alternative Characterization of Packet Scale Rate Guarantee
The proofs of several bounds in this document can be found in [13].
These proofs use an algebraic characterization of the aggregate
definition given by (eq_1), (eq_2), and packet identity aware
definition given by (eq_3), (eq_4). Since this characterization is
of interest on its own, we present it in this section.
Theorem B1. Characterization of the aggregate definition without
f_n.
Consider a system where packets are numbered 1, 2, ... in order of
arrival. As in the aggregate definition, call a_n the n-th arrival
time, d_n - the n-th departure time, and l_n the size of the n-th
packet to depart. Define by convention d_0=0. The aggregate
definition with rate R and latency E_a is equivalent to saying that
for all n and all 0<=j<= n-1:
d_n <= E_a + d_j + (l_(j+1) + ... + l_n)/R (eq_b1)
or
there exists some j+1 <= k <= n such that
d_n <= E_a + a_k + (l_k + ... + l_n)/R (eq_b2)
Charny, et. al. Informational [Page 20]
^L
RFC 3247 Supplemental Information March 2002
Theorem B2. Characterization of packet-identity-aware definition
without F_n.
Consider a system where packets are numbered 1, 2, ... in order of
arrival. As in the packet-identity-aware definition, call A_n, D_n
the arrival and departure times for the n-th packet, and L_n the size
of this packet. Define by convention D_0=0. The packet identity
aware definition with rate R and latency E_p is equivalent to saying
that for all n and all 0<=j<= n-1:
D_n <= E_p + D_j + (L_{j+1} + ... + L_n)/R (eq_b3)
or
there exists some j+1 <= k <= n such that
D_n <= E_p + A_k + (L_k + ... + L_n)/R (eq_b4)
For the proofs of both Theorems, see [13].
Charny, et. al. Informational [Page 21]
^L
RFC 3247 Supplemental Information March 2002
Acknowledgements
This document could not have been written without Fred Baker, Bruce
Davie and Dimitrios Stiliadis. Their time, support and insightful
comments were invaluable.
Authors' Addresses
Anna Charny
Cisco Systems
300 Apollo Drive
Chelmsford, MA 01824
EMail: acharny@cisco.com
Jon Bennett
Motorola
3 Highwood Drive East
Tewksbury, MA 01876
EMail: jcrb@motorola.com
Kent Benson
Tellabs Research Center
3740 Edison Lake Parkway #101
Mishawaka, IN 46545
EMail: Kent.Benson@tellabs.com
Jean-Yves Le Boudec
ICA-EPFL, INN
Ecublens, CH-1015
Lausanne-EPFL, Switzerland
EMail: jean-yves.leboudec@epfl.ch
Angela Chiu
Celion Networks
1 Sheila Drive, Suite 2
Tinton Falls, NJ 07724
EMail: angela.chiu@celion.com
Charny, et. al. Informational [Page 22]
^L
RFC 3247 Supplemental Information March 2002
Bill Courtney
TRW
Bldg. 201/3702
One Space Park
Redondo Beach, CA 90278
EMail: bill.courtney@trw.com
Shahram Davari
PMC-Sierra Inc
411 Legget Drive
Ottawa, ON K2K 3C9, Canada
EMail: shahram_davari@pmc-sierra.com
Victor Firoiu
Nortel Networks
600 Tech Park
Billerica, MA 01821
EMail: vfiroiu@nortelnetworks.com
Charles Kalmanek
AT&T Labs-Research
180 Park Avenue, Room A113,
Florham Park NJ
EMail: crk@research.att.com
K.K. Ramakrishnan
TeraOptic Networks, Inc.
686 W. Maude Ave
Sunnyvale, CA 94086
EMail: kk@teraoptic.com
Charny, et. al. Informational [Page 23]
^L
RFC 3247 Supplemental Information March 2002
Full Copyright Statement
Copyright (C) The Internet Society (2001). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Charny, et. al. Informational [Page 24]
^L
|