summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc8238.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc8238.txt')
-rw-r--r--doc/rfc/rfc8238.txt1123
1 files changed, 1123 insertions, 0 deletions
diff --git a/doc/rfc/rfc8238.txt b/doc/rfc/rfc8238.txt
new file mode 100644
index 0000000..7259ab8
--- /dev/null
+++ b/doc/rfc/rfc8238.txt
@@ -0,0 +1,1123 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) L. Avramov
+Request for Comments: 8238 Google
+Category: Informational J. Rapp
+ISSN: 2070-1721 VMware
+ August 2017
+
+
+ Data Center Benchmarking Terminology
+
+Abstract
+
+ The purposes of this informational document are to establish
+ definitions and describe measurement techniques for data center
+ benchmarking, as well as to introduce new terminology applicable to
+ performance evaluations of data center network equipment. This
+ document establishes the important concepts for benchmarking network
+ switches and routers in the data center and is a prerequisite for the
+ test methodology document (RFC 8239). Many of these terms and
+ methods may be applicable to network equipment beyond the scope of
+ this document as the technologies originally applied in the data
+ center are deployed elsewhere.
+
+Status of This Memo
+
+ This document is not an Internet Standards Track specification; it is
+ published for informational purposes.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Not all documents
+ approved by the IESG are a candidate for any level of Internet
+ Standard; see Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc8238.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 1]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+Copyright Notice
+
+ Copyright (c) 2017 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 2]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+Table of Contents
+
+ 1. Introduction ....................................................4
+ 1.1. Requirements Language ......................................5
+ 1.2. Definition Format ..........................................5
+ 2. Latency .........................................................5
+ 2.1. Definition .................................................5
+ 2.2. Discussion .................................................7
+ 2.3. Measurement Units ..........................................7
+ 3. Jitter ..........................................................8
+ 3.1. Definition .................................................8
+ 3.2. Discussion .................................................8
+ 3.3. Measurement Units ..........................................8
+ 4. Calibration of the Physical Layer ...............................9
+ 4.1. Definition .................................................9
+ 4.2. Discussion .................................................9
+ 4.3. Measurement Units ..........................................9
+ 5. Line Rate ......................................................10
+ 5.1. Definition ................................................10
+ 5.2. Discussion ................................................10
+ 5.3. Measurement Units .........................................11
+ 6. Buffering ......................................................12
+ 6.1. Buffer ....................................................12
+ 6.1.1. Definition .........................................12
+ 6.1.2. Discussion .........................................14
+ 6.1.3. Measurement Units ..................................14
+ 6.2. Incast ....................................................15
+ 6.2.1. Definition .........................................15
+ 6.2.2. Discussion .........................................15
+ 6.2.3. Measurement Units ..................................16
+ 7. Application Throughput: Data Center Goodput ....................16
+ 7.1. Definition ................................................16
+ 7.2. Discussion ................................................16
+ 7.3. Measurement Units .........................................16
+ 8. Security Considerations ........................................17
+ 9. IANA Considerations ............................................18
+ 10. References ....................................................18
+ 10.1. Normative References .....................................18
+ 10.2. Informative References ...................................19
+ Acknowledgments ...................................................20
+ Authors' Addresses ................................................20
+
+
+
+
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 3]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+1. Introduction
+
+ Traffic patterns in the data center are not uniform and are
+ constantly changing. They are dictated by the nature and variety of
+ applications utilized in the data center. They can be largely
+ east-west traffic flows (server to server inside the data center) in
+ one data center and north-south (from the outside of the data center
+ to the server) in another, while some may combine both. Traffic
+ patterns can be bursty in nature and contain many-to-one,
+ many-to-many, or one-to-many flows. Each flow may also be small and
+ latency sensitive or large and throughput sensitive while containing
+ a mix of UDP and TCP traffic. All of these may coexist in a single
+ cluster and flow through a single network device simultaneously.
+ Benchmarking tests for network devices have long used [RFC1242],
+ [RFC2432], [RFC2544], [RFC2889], and [RFC3918]. These benchmarks
+ have largely been focused around various latency attributes and max
+ throughput of the Device Under Test (DUT) being benchmarked. These
+ standards are good at measuring theoretical max throughput,
+ forwarding rates, and latency under testing conditions, but they do
+ not represent real traffic patterns that may affect these networking
+ devices. The data center networking devices covered are switches and
+ routers.
+
+ Currently, typical data center networking devices are
+ characterized by:
+
+ - High port density (48 ports or more).
+
+ - High speed (currently, up to 100 GB/s per port).
+
+ - High throughput (line rate on all ports for Layer 2 and/or
+ Layer 3).
+
+ - Low latency (in the microsecond or nanosecond range).
+
+ - Low amount of buffer (in the MB range per networking device).
+
+ - Layer 2 and Layer 3 forwarding capability (Layer 3 not mandatory).
+
+ This document defines a set of definitions, metrics, and new
+ terminology, including congestion scenarios and switch buffer
+ analysis, and redefines basic definitions in order to represent a
+ wide mix of traffic conditions. The test methodologies are defined
+ in [RFC8239].
+
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 4]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+1.1. Requirements Language
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
+ capitals, as shown here.
+
+1.2. Definition Format
+
+ - Term to be defined (e.g., "latency").
+
+ - Definition: The specific definition for the term.
+
+ - Discussion: A brief discussion about the term, its application,
+ and any restrictions on measurement procedures.
+
+ - Measurement Units: Methodology for measurements and units used to
+ report measurements of the term in question, if applicable.
+
+2. Latency
+
+2.1. Definition
+
+ Latency is the amount of time it takes a frame to transit the DUT.
+ Latency is measured in units of time (seconds, milliseconds,
+ microseconds, and so on). The purpose of measuring latency is to
+ understand the impact of adding a device in the communication path.
+
+ The latency interval can be assessed between different combinations
+ of events, regardless of the type of switching device
+ (bit forwarding, aka cut-through; or a store-and-forward device).
+ [RFC1242] defined latency differently for each of these types of
+ devices.
+
+ Traditionally, the latency measurement definitions are:
+
+ - FILO (First In Last Out):
+
+ The time interval starting when the end of the first bit of the
+ input frame reaches the input port and ending when the last bit of
+ the output frame is seen on the output port.
+
+
+
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 5]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+ - FIFO (First In First Out):
+
+ The time interval starting when the end of the first bit of the
+ input frame reaches the input port and ending when the start of
+ the first bit of the output frame is seen on the output port.
+ Latency (as defined in [RFC1242]) for bit-forwarding devices uses
+ these events.
+
+ - LILO (Last In Last Out):
+
+ The time interval starting when the last bit of the input frame
+ reaches the input port and the last bit of the output frame is
+ seen on the output port.
+
+ - LIFO (Last In First Out):
+
+ The time interval starting when the last bit of the input frame
+ reaches the input port and ending when the first bit of the output
+ frame is seen on the output port. Latency (as defined in
+ [RFC1242]) for store-and-forward devices uses these events.
+
+ Another possible way to summarize the four definitions above is to
+ refer to the bit positions as they normally occur: input to output.
+
+ - FILO is FL (First bit Last bit).
+
+ - FIFO is FF (First bit First bit).
+
+ - LILO is LL (Last bit Last bit).
+
+ - LIFO is LF (Last bit First bit).
+
+ This definition, as explained in this section in the context of
+ data center switch benchmarking, is in lieu of the previous
+ definition of "latency" as provided in RFC 1242, Section 3.8 and
+ quoted here:
+
+ For store and forward devices: The time interval starting when the
+ last bit of the input frame reaches the input port and ending when
+ the first bit of the output frame is seen on the output port.
+
+ For bit forwarding devices: The time interval starting when the
+ end of the first bit of the input frame reaches the input port and
+ ending when the start of the first bit of the output frame is seen
+ on the output port.
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 6]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+ To accommodate both types of network devices and hybrids of the two
+ types that have emerged, switch latency measurements made according
+ to this document MUST be measured with the FILO events. FILO will
+ include the latency of the switch and the latency of the frame as
+ well as the serialization delay. It is a picture of the "whole"
+ latency going through the DUT. For applications that are latency
+ sensitive and can function with initial bytes of the frame, FIFO
+ (or, for bit-forwarding devices, latency per RFC 1242) MAY be used.
+ In all cases, the event combinations used in latency measurements
+ MUST be reported.
+
+2.2. Discussion
+
+ As mentioned in Section 2.1, FILO is the most important measuring
+ definition.
+
+ Not all DUTs are exclusively cut-through or store-and-forward.
+ Data center DUTs are frequently store-and-forward for smaller packet
+ sizes and then change to cut-through behavior at specific larger
+ packet sizes. The value of the packet size at which the behavior
+ changes MAY be configurable, depending on the DUT manufacturer. FILO
+ covers both scenarios: store-and-forward and cut-through. The
+ threshold for the change in behavior does not matter for
+ benchmarking, since FILO covers both possible scenarios.
+
+ The LIFO mechanism can be used with store-and-forward switches
+ but not with cut-through switches, as it will provide negative
+ latency values for larger packet sizes because LIFO removes the
+ serialization delay. Therefore, this mechanism MUST NOT be used when
+ comparing the latencies of two different DUTs.
+
+2.3. Measurement Units
+
+ The measuring methods to use for benchmarking purposes are as
+ follows:
+
+ 1) FILO MUST be used as a measuring method, as this will include the
+ latency of the packet; today, the application commonly needs to
+ read the whole packet to process the information and take an
+ action.
+
+ 2) FIFO MAY be used for certain applications able to process the data
+ as the first bits arrive -- for example, with a Field-Programmable
+ Gate Array (FPGA).
+
+ 3) LIFO MUST NOT be used because, unlike all the other methods, it
+ subtracts the latency of the packet.
+
+
+
+
+Avramov & Rapp Informational [Page 7]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+3. Jitter
+
+3.1. Definition
+
+ In the context of the data center, jitter is synonymous with the
+ common term "delay variation". It is derived from multiple
+ measurements of one-way delay, as described in RFC 3393. The
+ mandatory definition of "delay variation" is the Packet Delay
+ Variation (PDV) as defined in Section 4.2 of [RFC5481]. When
+ considering a stream of packets, the delays of all packets are
+ subtracted from the minimum delay over all packets in the stream.
+ This facilitates the assessment of the range of delay variation
+ (Max - Min) or a high percentile of PDV (99th percentile, for
+ robustness against outliers).
+
+ When First-bit to Last-bit timestamps are used for delay measurement,
+ then delay variation MUST be measured using packets or frames of the
+ same size, since the definition of latency includes the serialization
+ time for each packet. Otherwise, if using First-bit to First-bit,
+ the size restriction does not apply.
+
+3.2. Discussion
+
+ In addition to a PDV range and/or a high percentile of PDV,
+ Inter-Packet Delay Variation (IPDV) as defined in Section 4.1 of
+ [RFC5481] (differences between two consecutive packets) MAY be used
+ for the purpose of determining how packet spacing has changed during
+ transfer -- for example, to see if a packet stream has become closely
+ spaced or "bursty". However, the absolute value of IPDV SHOULD NOT
+ be used, as this "collapses" the "bursty" and "dispersed" sides of
+ the IPDV distribution together.
+
+3.3. Measurement Units
+
+ The measurement of delay variation is expressed in units of seconds.
+ A PDV histogram MAY be provided for the population of packets
+ measured.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 8]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+4. Calibration of the Physical Layer
+
+4.1. Definition
+
+ Calibration of the physical layer consists of defining and measuring
+ the latency of the physical devices used to perform tests on the DUT.
+
+ It includes the list of all physical-layer components used, as
+ specified here:
+
+ - Type of device used to generate traffic / measure traffic.
+
+ - Type of line cards used on the traffic generator.
+
+ - Type of transceivers on the traffic generator.
+
+ - Type of transceivers on the DUT.
+
+ - Type of cables.
+
+ - Length of cables.
+
+ - Software name and version of the traffic generator and DUT.
+
+ - A list of enabled features on the DUT MAY be provided and is
+ recommended (especially in the case of control-plane protocols,
+ such as the Link Layer Discovery Protocol and Spanning Tree). A
+ comprehensive configuration file MAY be provided to this effect.
+
+4.2. Discussion
+
+ Calibration of the physical layer contributes to end-to-end latency
+ and should be taken into account when evaluating the DUT. Small
+ variations in the physical components of the test may impact the
+ latency being measured; therefore, they MUST be described when
+ presenting results.
+
+4.3. Measurement Units
+
+ It is RECOMMENDED that all cables used for testing (1) be of the same
+ type and length and (2) come from the same vendor whenever possible.
+ It is a MUST to document the cable specifications listed in
+ Section 4.1, along with the test results. The test report MUST
+ specify whether or not the cable latency has been subtracted from the
+ test measurements. The accuracy of the traffic-generator
+ measurements MUST be provided (for current test equipment, this is
+ usually a value within a range of 20 ns).
+
+
+
+
+Avramov & Rapp Informational [Page 9]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+5. Line Rate
+
+5.1. Definition
+
+ The transmit timing, or maximum transmitted data rate, is controlled
+ by the "transmit clock" in the DUT. The receive timing (maximum
+ ingress data rate) is derived from the transmit clock of the
+ connected interface.
+
+ The line rate or physical-layer frame rate is the maximum capacity to
+ send frames of a specific size at the transmit clock frequency of
+ the DUT.
+
+ The term "nominal value of line rate" defines the maximum speed
+ capability for the given port -- for example (expressed as Gigabit
+ Ethernet), 1 GE, 10 GE, 40 GE, 100 GE.
+
+ The frequency ("clock rate") of the transmit clock in any two
+ connected interfaces will never be precisely the same; therefore, a
+ tolerance is needed. This will be expressed by a Parts Per Million
+ (PPM) value. The IEEE standards allow a specific +/- variance in the
+ transmit clock rate, and Ethernet is designed to allow for small,
+ normal variations between the two clock rates. This results in a
+ tolerance of the line-rate value when traffic is generated from test
+ equipment to a DUT.
+
+ Line rate SHOULD be measured in frames per second (FPS).
+
+5.2. Discussion
+
+ For a transmit clock source, most Ethernet switches use "clock
+ modules" (also called "oscillator modules") that are sealed,
+ internally temperature-compensated, and very accurate. The output
+ frequency of these modules is not adjustable because it is not
+ necessary. Many test sets, however, offer a software-controlled
+ adjustment of the transmit clock rate. These adjustments SHOULD be
+ used to "compensate" the test equipment in order to not send more
+ than the line rate of the DUT.
+
+ To allow for the minor variations typically found in the clock rate
+ of commercially available clock modules and other crystal-based
+ oscillators, Ethernet standards specify the maximum transmit
+ clock-rate variation to be not more than +/- 100 PPM from a
+ calculated center frequency. Therefore, a DUT must be able to accept
+ frames at a rate within +/- 100 PPM to comply with the standards.
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 10]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+ Very few clock circuits are precisely +/- 0.0 PPM because:
+
+ 1. The Ethernet standards allow a maximum variance of +/- 100 PPM
+ over time. Therefore, it is normal for the frequency of the
+ oscillator circuits to experience variation over time and over a
+ wide temperature range, among other external factors.
+
+ 2. The crystals, or clock modules, usually have a specific +/- PPM
+ variance that is significantly better than +/- 100 PPM.
+ Oftentimes, this is +/- 30 PPM or better in order to be considered
+ a "certification instrument".
+
+ When testing an Ethernet switch throughput at "line rate", any
+ specific switch will have a clock-rate variance. If a test set is
+ running +1 PPM faster than a switch under test and a sustained
+ line-rate test is performed, a gradual increase in latency and,
+ eventually, packet drops as buffers fill and overflow in the switch,
+ can be observed. Depending on how much clock variance there is
+ between the two connected systems, the effect may be seen after the
+ traffic stream has been running for a few hundred microseconds, a few
+ milliseconds, or seconds. The same low latency, and no packet loss,
+ can be demonstrated by setting the test set's link occupancy to
+ slightly less than 100 percent link occupancy. Typically, 99 percent
+ link occupancy produces excellent low latency and no packet loss. No
+ Ethernet switch or router will have a transmit clock rate of exactly
+ +/- 0.0 PPM. Very few (if any) test sets have a clock rate that is
+ precisely +/- 0.0 PPM.
+
+ Test-set equipment manufacturers are well aware of the standards and
+ allow a software-controlled +/- 100 PPM "offset" (clock-rate
+ adjustment) to compensate for normal variations in the clock speed of
+ DUTs. This offset adjustment allows engineers to determine the
+ approximate speed at which the connected device is operating and
+ verify that it is within parameters allowed by standards.
+
+5.3. Measurement Units
+
+ "Line rate" can be measured in terms of "frame rate":
+
+ Frame Rate = Transmit-Clock-Frequency /
+ (Frame-Length*8 + Minimum_Gap + Preamble + Start-Frame Delimiter)
+
+ Minimum_Gap represents the interframe gap. This formula "scales up"
+ or "scales down" to represent 1 GB Ethernet, 10 GB Ethernet, and
+ so on.
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 11]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+ Example for 1 GB Ethernet speed with 64-byte frames:
+
+ Frame Rate = 1,000,000,000 / (64*8 + 96 + 56 + 8)
+
+ = 1,000,000,000 / 672
+
+ = 1,488,095.2 FPS
+
+ Considering the allowance of +/- 100 PPM, a switch may "legally"
+ transmit traffic at a frame rate between 1,487,946.4 FPS and
+ 1,488,244 FPS. Each 1 PPM variation in clock rate will translate to
+ a frame-rate increase or decrease of 1.488 FPS.
+
+ In a production network, it is very unlikely that one would see
+ precise line rate over a very brief period. There is no observable
+ difference between dropping packets at 99% of line rate and 100% of
+ line rate.
+
+ Line rate can be measured at 100% of line rate with a -100 PPM
+ adjustment.
+
+ Line rate SHOULD be measured at 99.98% with a 0 PPM adjustment.
+
+ The PPM adjustment SHOULD only be used for a line-rate measurement.
+
+6. Buffering
+
+6.1. Buffer
+
+6.1.1. Definition
+
+ Buffer Size: The term "buffer size" represents the total amount of
+ frame-buffering memory available on a DUT. This size is expressed
+ in B (bytes), KB (kilobytes), MB (megabytes), or GB (gigabytes).
+ When the buffer size is expressed, an indication of the frame MTU
+ (Maximum Transmission Unit) used for that measurement is also
+ necessary, as well as the CoS (Class of Service) or DSCP
+ (Differentiated Services Code Point) value set, as oftentimes the
+ buffers are carved by a quality-of-service implementation. Please
+ refer to Section 3 of [RFC8239] for further details.
+
+ Example: The Buffer Size of the DUT when sending 1518-byte frames
+ is 18 MB.
+
+ Port Buffer Size: The port buffer size is the amount of buffer for
+ a single ingress port, a single egress port, or a combination of
+ ingress and egress buffering locations for a single port. We
+ mention the three locations for the port buffer because the DUT's
+
+
+
+Avramov & Rapp Informational [Page 12]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+ buffering scheme can be unknown or untested, so knowing the buffer
+ location helps clarify the buffer architecture and, consequently,
+ the total buffer size. The Port Buffer Size is an informational
+ value that MAY be provided by the DUT vendor. It is not a value
+ that is tested by benchmarking. Benchmarking will be done using
+ the Maximum Port Buffer Size or Maximum Buffer Size methodology.
+
+ Maximum Port Buffer Size: In most cases, this is the same as the Port
+ Buffer Size. In a certain type of switch architecture called
+ "SoC" (switch on chip), there is a port buffer and a shared buffer
+ pool available for all ports. The Maximum Port Buffer Size, in
+ terms of an SoC buffer, represents the sum of the port buffer and
+ the maximum value of shared buffer allowed for this port, defined
+ in terms of B (bytes), KB (kilobytes), MB (megabytes), or GB
+ (gigabytes). The Maximum Port Buffer Size needs to be expressed
+ along with the frame MTU used for the measurement and the CoS or
+ DSCP bit value set for the test.
+
+ Example: A DUT has been measured to have 3 KB of port buffer for
+ 1518-byte frames, and a total of 4.7 MB of maximum port buffer for
+ 1518-byte frames and a CoS of 0.
+
+ Maximum DUT Buffer Size: This is the total buffer size that a DUT can
+ be measured to have. It is most likely different than the Maximum
+ Port Buffer Size. It can also be different from the sum of
+ Maximum Port Buffer Size. The Maximum Buffer Size needs to be
+ expressed along with the frame MTU used for the measurement and
+ along with the CoS or DSCP value set during the test.
+
+ Example: A DUT has been measured to have 3 KB of port buffer for
+ 1518-byte frames and a total of 4.7 MB of maximum port buffer for
+ 1518-byte frames. The DUT has a Maximum Buffer Size of 18 MB at
+ 1500 B and a CoS of 0.
+
+ Burst: A burst is a fixed number of packets sent over a percentage of
+ line rate for a defined port speed. The amount of frames sent is
+ evenly distributed across the interval T. A constant, C, can be
+ defined to provide the average time between two evenly spaced
+ consecutive packets.
+
+ Microburst: A microburst is a type of burst where packet drops occur
+ when there is not sustained or noticeable congestion on a link or
+ device. One characteristic of a microburst is when the burst
+ is not evenly distributed over T and is less than the constant C
+ (C = the average time between two evenly spaced consecutive
+ packets).
+
+
+
+
+
+Avramov & Rapp Informational [Page 13]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+ Intensity of Microburst: This is a percentage and represents the
+ level, between 1 and 100%, of the microburst. The higher the
+ number, the higher the microburst is.
+
+ I=[1-[ (Tp2-Tp1)+(Tp3-Tp2)+....(TpN-Tp(n-1) ] / Sum(packets)]]*100
+
+ The above definitions are not meant to comment on the ideal sizing of
+ a buffer but rather on how to measure it. A larger buffer is not
+ necessarily better and can cause issues with bufferbloat.
+
+6.1.2. Discussion
+
+ When measuring buffering on a DUT, it is important to understand the
+ behavior of each and every port. This provides data for the total
+ amount of buffering available on the switch. The terms of buffer
+ efficiency help one understand the optimum packet size for the buffer
+ or the real volume of the buffer available for a specific packet
+ size. This section does not discuss how to conduct the test
+ methodology; instead, it explains the buffer definitions and what
+ metrics should be provided for comprehensive data center
+ device-buffering benchmarking.
+
+6.1.3. Measurement Units
+
+ When the DUT buffer is measured:
+
+ - The buffer size MUST be measured.
+
+ - The port buffer size MAY be provided for each port.
+
+ - The maximum port buffer size MUST be measured.
+
+ - The maximum DUT buffer size MUST be measured.
+
+ - The intensity of the microburst MAY be mentioned when a microburst
+ test is performed.
+
+ - The CoS or DSCP value set during the test SHOULD be provided.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 14]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+6.2. Incast
+
+6.2.1. Definition
+
+ The term "Incast", very commonly utilized in the data center, refers
+ to the many-to-one or many-to-many traffic patterns. As defined in
+ this section, it measures the number of ingress and egress ports and
+ the percentage of synchronization attributed to them. Typically, in
+ the data center, it would refer to many different ingress server
+ ports (many), sending traffic to a common uplink (many-to-one), or
+ multiple uplinks (many-to-many). This pattern is generalized for any
+ network as many incoming ports sending traffic to one or a few
+ uplinks.
+
+ Synchronous arrival time: When two or more frames of sizes L1 and L2
+ arrive at their respective ingress port or multiple ingress ports
+ and there is an overlap of arrival times for any of the bits on
+ the DUT, then the L1 and L2 frames have synchronous arrival times.
+ This is called "Incast", regardless of whether the pattern is
+ many-to-one (simpler) or many-to-many.
+
+ Asynchronous arrival time: This is any condition not defined by
+ "synchronous arrival time".
+
+ Percentage of synchronization: This defines the level of overlap
+ (amount of bits) between frames of sizes L1,L2..Ln.
+
+ Example: Two 64-byte frames of length L1 and L2 arrive at ingress
+ port 1 and port 2 of the DUT. There is an overlap of 6.4 bytes
+ between the two, where the L1 and L2 frames were on their
+ respective ingress ports at the same time. Therefore, the
+ percentage of synchronization is 10%.
+
+ Stateful traffic: Stateful traffic is packets exchanged with a
+ stateful protocol, such as TCP.
+
+ Stateless traffic: Stateless traffic is packets exchanged with a
+ stateless protocol, such as UDP.
+
+6.2.2. Discussion
+
+ In this scenario, buffers are used on the DUT. In an ingress
+ buffering mechanism, the ingress port buffers would be used along
+ with virtual output queues, when available, whereas in an egress
+ buffering mechanism, the egress buffer of the one outgoing port would
+ be used.
+
+
+
+
+
+Avramov & Rapp Informational [Page 15]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+ In either case, regardless of where the buffer memory is located in
+ the switch architecture, the Incast creates buffer utilization.
+
+ When one or more frames have synchronous arrival times at the DUT,
+ they are considered to be forming an Incast.
+
+6.2.3. Measurement Units
+
+ It is a MUST to measure the number of ingress and egress ports.
+
+ It is a MUST to have a non-null percentage of synchronization, which
+ MUST be specified.
+
+7. Application Throughput: Data Center Goodput
+
+7.1. Definition
+
+ In data center networking, a balanced network is a function of
+ maximal throughput and minimal loss at any given time. This is
+ captured by the Goodput [TCP-INCAST]. Goodput is the
+ application-level throughput. For standard TCP applications, a very
+ small loss can have a dramatic effect on application throughput.
+ [RFC2647] provides a definition of Goodput; the definition in this
+ document is a variant of that definition.
+
+ Goodput is the number of bits per unit of time forwarded to the
+ correct destination interface of the DUT, minus any bits
+ retransmitted.
+
+7.2. Discussion
+
+ In data center benchmarking, the goodput is a value that SHOULD be
+ measured. It provides a realistic idea of the usage of the available
+ bandwidth. A goal in data center environments is to maximize the
+ goodput while minimizing loss.
+
+7.3. Measurement Units
+
+ The Goodput, G, is then measured by the following formula:
+
+ G = (S/F) x V bytes per second
+
+ - S represents the payload bytes, not including packet or
+ TCP headers.
+
+ - F is the frame size.
+
+ - V is the speed of the media in bytes per second.
+
+
+
+Avramov & Rapp Informational [Page 16]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+ Example: A TCP file transfer over HTTP on 10 GB/s media.
+
+ The file cannot be transferred over Ethernet as a single
+ continuous stream. It must be broken down into individual frames
+ of 1500 B when the standard MTU is used. Each packet requires
+ 20 B of IP header information and 20 B of TCP header information;
+ therefore, 1460 B are available per packet for the file transfer.
+ Linux-based systems are further limited to 1448 B, as they also
+ carry a 12 B timestamp. Finally, in this example the date is
+ transmitted over Ethernet, which adds 26 B of overhead per packet
+ to 1500 B, increasing it to 1526 B.
+
+ G = 1460/1526 x 10 Gbit/s, which is 9.567 Gbit/s or 1.196 GB/s.
+
+ Please note: This example does not take into consideration the
+ additional Ethernet overhead, such as the interframe gap (a
+ minimum of 96 bit times), nor does it account for collisions
+ (which have a variable impact, depending on the network load).
+
+ When conducting Goodput measurements, please document, in addition to
+ the items listed in Section 4.1, the following information:
+
+ - The TCP stack used.
+
+ - OS versions.
+
+ - Network Interface Card (NIC) firmware version and model.
+
+ For example, Windows TCP stacks and different Linux versions can
+ influence TCP-based test results.
+
+8. Security Considerations
+
+ Benchmarking activities as described in this memo are limited to
+ technology characterization using controlled stimuli in a laboratory
+ environment, with dedicated address space and the constraints
+ specified in the sections above.
+
+ The benchmarking network topology will be an independent test setup
+ and MUST NOT be connected to devices that may forward the test
+ traffic into a production network or misroute traffic to the test
+ management network.
+
+ Further, benchmarking is performed on a "black-box" basis, relying
+ solely on measurements observable external to the DUT.
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 17]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+ Special capabilities SHOULD NOT exist in the DUT specifically for
+ benchmarking purposes. Any implications for network security arising
+ from the DUT SHOULD be identical in the lab and in production
+ networks.
+
+9. IANA Considerations
+
+ This document does not require any IANA actions.
+
+10. References
+
+10.1. Normative References
+
+ [RFC1242] Bradner, S., "Benchmarking Terminology for Network
+ Interconnection Devices", RFC 1242, DOI 10.17487/RFC1242,
+ July 1991, <https://www.rfc-editor.org/info/rfc1242>.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <https://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for
+ Network Interconnect Devices", RFC 2544,
+ DOI 10.17487/RFC2544, March 1999,
+ <https://www.rfc-editor.org/info/rfc2544>.
+
+ [RFC5481] Morton, A. and B. Claise, "Packet Delay Variation
+ Applicability Statement", RFC 5481, DOI 10.17487/RFC5481,
+ March 2009, <https://www.rfc-editor.org/info/rfc5481>.
+
+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in
+ RFC 2119 Key Words", BCP 14, RFC 8174,
+ DOI 10.17487/RFC8174, May 2017,
+ <https://www.rfc-editor.org/info/rfc8174>.
+
+ [RFC8239] Avramov, L. and J. Rapp, "Data Center Benchmarking
+ Methodology", RFC 8239, DOI 10.17487/RFC8239, August 2017,
+ <https://www.rfc-editor.org/info/rfc8239>.
+
+
+
+
+
+
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 18]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+10.2. Informative References
+
+ [RFC2432] Dubray, K., "Terminology for IP Multicast Benchmarking",
+ RFC 2432, DOI 10.17487/RFC2432, October 1998,
+ <https://www.rfc-editor.org/info/rfc2432>.
+
+ [RFC2647] Newman, D., "Benchmarking Terminology for Firewall
+ Performance", RFC 2647, DOI 10.17487/RFC2647, August 1999,
+ <https://www.rfc-editor.org/info/rfc2647>.
+
+ [RFC2889] Mandeville, R. and J. Perser, "Benchmarking Methodology
+ for LAN Switching Devices", RFC 2889,
+ DOI 10.17487/RFC2889, August 2000,
+ <https://www.rfc-editor.org/info/rfc2889>.
+
+ [RFC3918] Stopp, D. and B. Hickman, "Methodology for IP Multicast
+ Benchmarking", RFC 3918, DOI 10.17487/RFC3918,
+ October 2004, <https://www.rfc-editor.org/info/rfc3918>.
+
+ [TCP-INCAST]
+ Chen, Y., Griffith, R., Zats, D., Joseph, A., and R. Katz,
+ "Understanding TCP Incast and Its Implications for Big
+ Data Workloads", April 2012, <http://yanpeichen.com/
+ professional/usenixLoginIncastReady.pdf>.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 19]
+
+RFC 8238 Data Center Benchmarking Terminology August 2017
+
+
+Acknowledgments
+
+ The authors would like to thank Al Morton, Scott Bradner, Ian Cox,
+ and Tim Stevenson for their reviews and feedback.
+
+Authors' Addresses
+
+ Lucien Avramov
+ Google
+ 1600 Amphitheatre Parkway
+ Mountain View, CA 94043
+ United States of America
+
+ Email: lucien.avramov@gmail.com
+
+
+ Jacob Rapp
+ VMware
+ 3401 Hillview Ave.
+ Palo Alto, CA 94304
+ United States of America
+
+ Email: jhrapp@gmail.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Avramov & Rapp Informational [Page 20]
+