summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc6576.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc6576.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc6576.txt')
-rw-r--r--doc/rfc/rfc6576.txt2075
1 files changed, 2075 insertions, 0 deletions
diff --git a/doc/rfc/rfc6576.txt b/doc/rfc/rfc6576.txt
new file mode 100644
index 0000000..3bd939f
--- /dev/null
+++ b/doc/rfc/rfc6576.txt
@@ -0,0 +1,2075 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF) R. Geib, Ed.
+Request for Comments: 6576 Deutsche Telekom
+BCP: 176 A. Morton
+Category: Best Current Practice AT&T Labs
+ISSN: 2070-1721 R. Fardid
+ Cariden Technologies
+ A. Steinmitz
+ Deutsche Telekom
+ March 2012
+
+
+ IP Performance Metrics (IPPM) Standard Advancement Testing
+
+Abstract
+
+ This document specifies tests to determine if multiple independent
+ instantiations of a performance-metric RFC have implemented the
+ specifications in the same way. This is the performance-metric
+ equivalent of interoperability, required to advance RFCs along the
+ Standards Track. Results from different implementations of metric
+ RFCs will be collected under the same underlying network conditions
+ and compared using statistical methods. The goal is an evaluation of
+ the metric RFC itself to determine whether its definitions are clear
+ and unambiguous to implementors and therefore a candidate for
+ advancement on the IETF Standards Track. This document is an
+ Internet Best Current Practice.
+
+Status of This Memo
+
+ This memo documents an Internet Best Current Practice.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ BCPs is available in Section 2 of RFC 5741.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ http://www.rfc-editor.org/info/rfc6576.
+
+
+
+
+
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 1]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+Copyright Notice
+
+ Copyright (c) 2012 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (http://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Simplified BSD License text as described in Section 4.e of
+ the Trust Legal Provisions and are provided without warranty as
+ described in the Simplified BSD License.
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 1.1. Requirements Language ......................................5
+ 2. Basic Idea ......................................................5
+ 3. Verification of Conformance to a Metric Specification ...........7
+ 3.1. Tests of an Individual Implementation against a Metric
+ Specification ..............................................8
+ 3.2. Test Setup Resulting in Identical Live Network
+ Testing Conditions .........................................9
+ 3.3. Tests of Two or More Different Implementations
+ against a Metric Specification ............................15
+ 3.4. Clock Synchronization .....................................16
+ 3.5. Recommended Metric Verification Measurement Process .......17
+ 3.6. Proposal to Determine an Equivalence Threshold for
+ Each Metric Evaluated .....................................20
+ 4. Acknowledgements ...............................................21
+ 5. Contributors ...................................................21
+ 6. Security Considerations ........................................21
+ 7. References .....................................................21
+ 7.1. Normative References ......................................21
+ 7.2. Informative References ....................................23
+ Appendix A. An Example on a One-Way Delay Metric Validation ......24
+ A.1. Compliance to Metric Specification Requirements ...........24
+ A.2. Examples Related to Statistical Tests for One-Way Delay ...25
+ Appendix B. Anderson-Darling K-sample Reference and 2 Sample
+ C++ Code .............................................27
+ Appendix C. Glossary .............................................36
+
+
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 2]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+1. Introduction
+
+ The Internet Standards Process as updated by RFC 6410 [RFC6410]
+ specifies that widespread deployment and use is sufficient to show
+ interoperability as a condition for advancement to Internet Standard.
+ The previous requirement of interoperability tests prior to advancing
+ an RFC to the Standard maturity level specified in RFC 2026 [RFC2026]
+ and RFC 5657 [RFC5657] has been removed. While the modified
+ requirement is applicable to protocols, wide deployment of different
+ measurement systems does not prove that the implementations measure
+ metrics in a standard way. Section 5.3 of RFC 5657 [RFC5657]
+ explicitly mentions the special case of Standards that are not "on-
+ the-wire" protocols. While this special case is not explicitly
+ mentioned by RFC 6410 [RFC6410], the four criteria in Section 2.2 of
+ RFC 6410 [RFC6410] are augmented by this document for RFCs that
+ specify performance metrics. This document takes the position that
+ flexible metric definitions can be proven to be clear and unambiguous
+ through tests that compare the results from independent
+ implementations. It describes tests that infer whether metric
+ specifications are sufficient using a definition of metric
+ "interoperability": measuring equivalent results (in a statistical
+ sense) under the same network conditions. The document expands on
+ this problem and its solution.
+
+ In the case of a protocol specification, the notion of
+ "interoperability" is reasonably intuitive -- the implementations
+ must successfully "talk to each other", while exercising all features
+ and options. To achieve interoperability, two implementors need to
+ interpret the protocol specifications in equivalent ways. In the
+ case of IP Performance Metrics (IPPM), this definition of
+ interoperability is only useful for test and control protocols like
+ the One-Way Active Measurement Protocol (OWAMP) [RFC4656] and the
+ Two-Way Active Measurement Protocol (TWAMP) [RFC5357].
+
+ A metric specification RFC describes one or more metric definitions,
+ methods of measurement, and a way to report the results of
+ measurement. One example would be a way to test and report the one-
+ way delay that data packets incur while being sent from one network
+ location to another, using the One-Way Delay Metric.
+
+ In the case of metric specifications, the conditions that satisfy the
+ "interoperability" requirement are less obvious, and there is a need
+ for IETF agreement on practices to judge metric specification
+ "interoperability" in the context of the IETF Standards Process.
+ This memo provides methods that should be suitable to evaluate metric
+ specifications for Standards Track advancement. The methods proposed
+ here MAY be generally applicable to metric specification RFCs beyond
+ those developed under the IPPM Framework [RFC2330].
+
+
+
+Geib, et al. Best Current Practice [Page 3]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ Since many implementations of IP metrics are embedded in measurement
+ systems that do not interact with one another (they were built before
+ OWAMP and TWAMP), the interoperability evaluation called for in the
+ IETF Standards Process cannot be determined by observing that
+ independent implementations interact properly for various protocol
+ exchanges. Instead, verifying that different implementations give
+ statistically equivalent results under controlled measurement
+ conditions takes the place of interoperability observations. Even
+ when evaluating OWAMP and TWAMP RFCs for Standards Track advancement,
+ the methods described here are useful to evaluate the measurement
+ results because their validity would not be ascertained in protocol
+ interoperability testing.
+
+ The Standards advancement process aims at producing confidence that
+ the metric definitions and supporting material are clearly worded and
+ unambiguous, or reveals ways in which the metric definitions can be
+ revised to achieve clarity. The process also permits identification
+ of options that were not implemented, so that they can be removed
+ from the advancing specification. Thus, the product of this process
+ is information about the metric specification RFC itself:
+ determination of the specifications or definitions that are clear and
+ unambiguous and those that are not (as opposed to an evaluation of
+ the implementations that assist in the process).
+
+ This document defines a process to verify that implementations (or
+ practically, measurement systems) have interpreted the metric
+ specifications in equivalent ways and produce equivalent results.
+
+ Testing for statistical equivalence requires ensuring identical test
+ setups (or awareness of differences) to the best possible extent.
+ Thus, producing identical test conditions is a core goal of this
+ memo. Another important aspect of this process is to test individual
+ implementations against specific requirements in the metric
+ specifications using customized tests for each requirement. These
+ tests can distinguish equivalent interpretations of each specific
+ requirement.
+
+ Conclusions on equivalence are reached by two measures.
+
+ First, implementations are compared against individual metric
+ specifications to make sure that differences in implementation are
+ minimized or at least known.
+
+ Second, a test setup is proposed ensuring identical networking
+ conditions so that unknowns are minimized and comparisons are
+ simplified. The resulting separate data sets may be seen as samples
+ taken from the same underlying distribution. Using statistical
+ methods, the equivalence of the results is verified. To illustrate
+
+
+
+Geib, et al. Best Current Practice [Page 4]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ application of the process and methods defined here, evaluation of
+ the One-Way Delay Metric [RFC2679] is provided in Appendix A. While
+ test setups will vary with the metrics to be validated, the general
+ methodology of determining equivalent results will not. Documents
+ defining test setups to evaluate other metrics should be developed
+ once the process proposed here has been agreed and approved.
+
+ The metric RFC advancement process begins with a request for protocol
+ action accompanied by a memo that documents the supporting tests and
+ results. The procedures of [RFC2026] are expanded in [RFC5657],
+ including sample implementation and interoperability reports.
+ [TESTPLAN] can serve as a template for a metric RFC report that
+ accompanies the protocol action request to the Area Director,
+ including a description of the test setup, procedures, results for
+ each implementation, and conclusions.
+
+1.1. Requirements Language
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [RFC2119].
+
+2. Basic Idea
+
+ The implementation of a standard compliant metric is expected to meet
+ the requirements of the related metric specification. So, before
+ comparing two metric implementations, each metric implementation is
+ individually compared against the metric specification.
+
+ Most metric specifications leave freedom to implementors on non-
+ fundamental aspects of an individual metric (or options). Comparing
+ different measurement results using a statistical test with the
+ assumption of identical test path and testing conditions requires
+ knowledge of all differences in the overall test setup. Metric
+ specification options chosen by implementors have to be documented.
+ It is RECOMMENDED to use identical metric options for any test
+ proposed here (an exception would be if a variable parameter of the
+ metric definition is not configurable in one or more
+ implementations). Calibrations specified by metric standards SHOULD
+ be performed to further identify (and possibly reduce) potential
+ sources of error in the test setup.
+
+ The IPPM Framework [RFC2330] expects that a "methodology for a metric
+ should have the property that it is repeatable: if the methodology is
+ used multiple times under identical conditions, it should result in
+ consistent measurements". This means an implementation is expected
+ to repeatedly measure a metric with consistent results (repeatability
+ with the same result). Small deviations in the test setup are
+
+
+
+Geib, et al. Best Current Practice [Page 5]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ expected to lead to small deviations in results only. To
+ characterize statistical equivalence in the case of small deviations,
+ [RFC2330] and [RFC2679] suggest to apply a 95% confidence interval.
+ Quoting RFC 2679, "95 percent was chosen because ... a particular
+ confidence level should be specified so that the results of
+ independent implementations can be compared".
+
+ Two different implementations are expected to produce statistically
+ equivalent results if they both measure a metric under the same
+ networking conditions. Formulating in statistical terms: separate
+ metric implementations collect separate samples from the same
+ underlying statistical process (the same network conditions). The
+ statistical hypothesis to be tested is the expectation that both
+ samples do not expose statistically different properties. This
+ requires careful test design:
+
+ o The measurement test setup must be self-consistent to the largest
+ possible extent. To minimize the influence of the test and
+ measurement setup on the result, network conditions and paths MUST
+ be identical for the compared implementations to the largest
+ possible degree. This includes both the stability and non-
+ ambiguity of routes taken by the measurement packets. See
+ [RFC2330] for a discussion on self-consistency.
+
+ o To minimize the influence of implementation options on the result,
+ metric implementations SHOULD use identical options and parameters
+ for the metric under evaluation.
+
+ o The sample size must be large enough to minimize its influence on
+ the consistency of the test results. This consideration may be
+ especially important if two implementations measure with different
+ average packet transmission rates.
+
+ o The implementation with the lowest average packet transmission
+ rate determines the smallest temporal interval for which samples
+ can be compared.
+
+ o Repeat comparisons with several independent metric samples to
+ avoid random indications of compatibility (or the lack of it).
+
+ The metric specifications themselves are the primary focus of
+ evaluation, rather than the implementations of metrics. The
+ documentation produced by the advancement process should identify
+ which metric definitions and supporting material were found to be
+ clearly worded and unambiguous, OR it should identify ways in which
+ the metric specification text should be revised to achieve clarity
+ and unified interpretation.
+
+
+
+
+Geib, et al. Best Current Practice [Page 6]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ The process should also permit identification of options that were
+ not implemented, so that they can be removed from the advancing
+ specification (this is an aspect more typical of protocol advancement
+ along the Standards Track).
+
+ Note that this document does not propose to base interoperability
+ indications of performance-metric implementations on comparisons of
+ individual singletons. Individual singletons may be impacted by many
+ statistical effects while they are measured. Comparing two
+ singletons of different implementations may result in failures with
+ higher probability than comparing samples.
+
+3. Verification of Conformance to a Metric Specification
+
+ This section specifies how to verify compliance of two or more IPPM
+ implementations against a metric specification. This document only
+ proposes a general methodology. Compliance criteria to a specific
+ metric implementation need to be defined for each individual metric
+ specification. The only exception is the statistical test comparing
+ two metric implementations that are simultaneously tested. This test
+ is applicable without metric-specific decision criteria.
+
+ Several testing options exist to compare two or more implementations:
+
+ o Use a single test lab to compare the implementations and emulate
+ the Internet with an impairment generator.
+
+ o Use a single test lab to compare the implementations and measure
+ across the Internet.
+
+ o Use remotely separated test labs to compare the implementations
+ and emulate the Internet with two "identically" configured
+ impairment generators.
+
+ o Use remotely separated test labs to compare the implementations
+ and measure across the Internet.
+
+ o Use remotely separated test labs to compare the implementations,
+ measure across the Internet, and include a single impairment
+ generator to impact all measurement flows in a non-discriminatory
+ way.
+
+ The first two approaches work, but involve higher expenses than the
+ others (due to travel and/or shipping plus installation). For the
+ third option, ensuring two identically configured impairment
+ generators requires well-defined test cases and possibly identical
+ hardware and software.
+
+
+
+
+Geib, et al. Best Current Practice [Page 7]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ As documented in a test report [TESTPLAN], the last option was
+ required to prove compatibility of two delay metric implementations.
+ An impairment generator is probably required when testing
+ compatibility of most other metrics, and it is therefore RECOMMENDED
+ to include an impairment generator in metric test setups.
+
+3.1. Tests of an Individual Implementation against a Metric
+ Specification
+
+ A metric implementation is compliant with a metric specification if
+ it supports the requirements classified as "MUST" and "REQUIRED" in
+ the related metric specification. An implementation that implements
+ all requirements is fully compliant with the specification, and the
+ degree of compliance SHOULD be noted in the conclusions of the
+ report.
+
+ Further, supported options of a metric implementation SHOULD be
+ documented in sufficient detail to evaluate whether the specification
+ was correctly interpreted. The documentation of chosen options
+ should minimize (and recognize) differences in the test setup if two
+ metric implementations are compared. Further, this documentation is
+ used to validate or clarify the wording of the metric specification
+ option, to remove options that saw no implementation or that are
+ badly specified from the metric specification. This documentation
+ SHOULD be included for all implementation-relevant specifications of
+ a metric picked for a comparison, even those that are not explicitly
+ marked as "MUST" or "REQUIRED" in the RFC text. This applies for the
+ following sections of all metric specifications:
+
+ o Singleton Definition of the Metric.
+
+ o Sample Definition of the Metric.
+
+ o Statistics Definition of the Metric. As statistics are compared
+ by the test specified here, this documentation is required even in
+ the case that the metric specification does not contain a
+ Statistics Definition.
+
+ o Timing- and Synchronization-related specification (if relevant for
+ the Metric).
+
+ o Any other technical part present or missing in the metric
+ specification, which is relevant for the implementation of the
+ Metric.
+
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 8]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ [RFC2330] and [RFC2679] emphasize precision as an aim of IPPM metric
+ implementations. A single IPPM-conforming implementation should
+ under otherwise identical network conditions produce precise results
+ for repeated measurements of the same metric.
+
+ RFC 2330 prefers the "empirical distribution function" (EDF) to
+ describe collections of measurements. RFC 2330 determines, that
+ "unless otherwise stated, IPPM goodness-of-fit tests are done using
+ 5% significance". The goodness-of-fit test determines by which
+ precision two or more samples of a metric implementation belong to
+ the same underlying distribution (of measured network performance
+ events). The goodness-of-fit test suggested for the metric test is
+ the Anderson-Darling K sample test (ADK sample test, K stands for the
+ number of samples to be compared) [ADK]. Please note that RFC 2330
+ and RFC 2679 apply an Anderson-Darling goodness-of-fit test, too.
+
+ The results of a repeated test with a single implementation MUST pass
+ an ADK sample test with a confidence level of 95%. The conditions
+ for which the ADK test has been passed with the specified confidence
+ level MUST be documented. To formulate this differently, the
+ requirement is to document the set of parameters with the smallest
+ deviation at which the results of the tested metric implementation
+ pass an ADK test with a confidence level of 95%. The minimum
+ resolution available in the reported results from each implementation
+ MUST be taken into account in the ADK test.
+
+ The test conditions to be documented for a passed metric test
+ include:
+
+ o The metric resolution at which a test was passed (e.g., the
+ resolution of timestamps).
+
+ o The parameters modified by an impairment generator.
+
+ o The impairment generator parameter settings.
+
+3.2. Test Setup Resulting in Identical Live Network Testing Conditions
+
+ Two major issues complicate tests for metric compliance across live
+ networks under identical testing conditions. One is the general
+ point that metric definition implementations cannot be conveniently
+ examined in field measurement scenarios. The other one is more
+ broadly described as "parallelism in devices and networks", including
+ mechanisms like those that achieve load balancing (see [RFC4928]).
+
+ This section proposes two measures to deal with both issues.
+ Tunneling mechanisms can be used to avoid parallel processing of
+ different flows in the network. Measuring by separate parallel probe
+
+
+
+Geib, et al. Best Current Practice [Page 9]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ flows results in repeated collection of data. If both measures are
+ combined, Wide Area Network (WAN) conditions are identical for a
+ number of independent measurement flows, no matter what the network
+ conditions are in detail.
+
+ Any measurement setup must be made to avoid the probing traffic
+ itself to impede the metric measurement. The created measurement
+ load must not result in congestion at the access link connecting the
+ measurement implementation to the WAN. The created measurement load
+ must not overload the measurement implementation itself, e.g., by
+ causing a high CPU load or by causing timestamp imprecision due to
+ unwanted queuing while transmitting or receiving test packets.
+
+ Tunneling multiple flows destined for a single physical port of a
+ network element allows transmission of all packets via the same path.
+ Applying tunnels to avoid undesired influence of standard routing for
+ measurement purposes is a concept known from literature, see e.g.,
+ GRE-encapsulated multicast probing [GU-Duffield]. An existing
+ IP-in-IP tunnel protocol can be applied to avoid Equal-Cost Multi-
+ Path (ECMP) routing of different measurement streams if it meets the
+ following criteria:
+
+ o Inner IP packets from different measurement implementations are
+ mapped into a single tunnel with a single outer IP origin and
+ destination address as well as origin and destination port numbers
+ that are identical for all packets.
+
+ o An easily accessible tunneling protocol allows for carrying out a
+ metric test from more test sites.
+
+ o A low operational overhead may enable a broader audience to set up
+ a metric test with the desired properties.
+
+ o The tunneling protocol should be reliable and stable in setup and
+ operation to avoid disturbances or influence on the test results.
+
+ o The tunneling protocol should not incur any extra cost for those
+ interested in setting up a metric test.
+
+ An illustration of a test setup with two layer 2 tunnels and two
+ flows between two linecards of one implementation is given in
+ Figure 1.
+
+
+
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 10]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ Implementation ,---. +--------+
+ +~~~~~~~~~~~/ \~~~~~~| Remote |
+ +------->-----F2->-| / \ |->---+ |
+ | +---------+ | Tunnel 1( ) | | |
+ | | transmit|-F1->-| ( ) |->+ | |
+ | | LC1 | +~~~~~~~~~| |~~~~| | | |
+ | | receive |-<--+ ( ) | F1 F2 |
+ | +---------+ | |Internet | | | | |
+ *-------<-----+ F2 | | | | | |
+ +---------+ | | +~~~~~~~~~| |~~~~| | | |
+ | transmit|-* *-| | | |--+<-* |
+ | LC2 | | Tunnel 2( ) | | |
+ | receive |-<-F1-| \ / |<-* |
+ +---------+ +~~~~~~~~~~~\ /~~~~~~| Router |
+ `-+-' +--------+
+
+ For simplicity, only two linecards of one implementation and two
+ flows F between them are shown.
+
+ Figure 1: Illustration of a Test Setup with Two Layer 2 Tunnels
+
+ Figure 2 shows the network elements required to set up layer 2
+ tunnels as shown by Figure 1.
+
+ Implementation
+
+ +-----+ ,---.
+ | LC1 | / \
+ +-----+ / \ +------+
+ | +-------+ ( ) +-------+ |Remote|
+ +--------+ | | | | | | | |
+ |Ethernet| | Tunnel| |Internet | | Tunnel| | |
+ |Switch |--| Head |--| |--| Head |--| |
+ +--------+ | Router| | | | Router| | |
+ | | | ( ) | | |Router|
+ +-----+ +-------+ \ / +-------+ +------+
+ | LC2 | \ /
+ +-----+ `-+-'
+
+ Figure 2: Illustration of a Hardware Setup to Realize the Test Setup
+ Illustrated by Figure 1 with Layer 2 Tunnels or Pseudowires
+
+
+
+
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 11]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ The test setup successfully used during a delay metric test
+ [TESTPLAN] is given as an example in Figure 3. Note that the shown
+ setup allows a metric test between two remote sites.
+
+ +----+ +----+ +----+ +----+
+ |LC10| |LC11| ,---. |LC20| |LC21|
+ +----+ +----+ / \ +-------+ +----+ +----+
+ | V10 | V11 / \ | Tunnel| | V20 | V21
+ | | ( ) | Head | | |
+ +--------+ +------+ | | | Router|__+----------+
+ |Ethernet| |Tunnel| |Internet | +---B---+ |Ethernet |
+ |Switch |--|Head |-| | | |Switch |
+ +-+--+---+ |Router| | | +---+---+ +--+--+----+
+ |__| +--A---+ ( )--|Option.| |__|
+ \ / |Impair.|
+ Bridge \ / |Gener. | Bridge
+ V20 to V21 `-+-? +-------+ V10 to V11
+
+ Figure 3: Example of Test Setup Successfully Used during a Delay
+ Metic Test
+
+ In Figure 3, LC10 identifies measurement clients / linecards. V10
+ and the others denote VLANs. All VLANs are using the same tunnel
+ from A to B and in the reverse direction. The remote site VLANs are
+ U-bridged at the local site Ethernet switch. The measurement packets
+ of site 1 travel tunnel A->B first, are U-bridged at site 2, and
+ travel tunnel B->A second. Measurement packets of site 2 travel
+ tunnel B->A first, are U-bridged at site 1, and travel tunnel A->B
+ second. So, all measurement packets pass the same tunnel segments,
+ but in different segment order.
+
+ If tunneling is applied, two tunnels MUST carry all test traffic in
+ between the test site and the remote site. For example, if 802.1Q
+ Virtual LANs (VLANs) are applied and the measurement streams are
+ carried in different VLANs, the IP tunnel or pseudowires respectively
+ are setup in physical port mode to avoid setup of pseudowires per
+ VLAN (which may see different paths due to ECMP routing); see
+ [RFC4448]. The remote router and the Ethernet switch shown in
+ Figure 3 have to support 802.1Q in this setup.
+
+ The IP packet size of the metric implementation SHOULD be chosen
+ small enough to avoid fragmentation due to the added Ethernet and
+ tunnel headers. Otherwise, the impact of tunnel overhead on
+ fragmentation and interface MTU size must be understood and taken
+ into account (see [RFC4459]).
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 12]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ An Ethernet port mode IP tunnel carrying several 802.1Q VLANs each
+ containing measurement traffic of a single measurement system was
+ successfully applied when testing compatibility of two metric
+ implementations [TESTPLAN]. Ethernet over Layer 2 Tunneling Protocol
+ Version 3 (L2TPv3) [RFC4719] was picked for this test.
+
+ The following headers may have to be accounted for when calculating
+ total packet length, if VLANs and Ethernet over L2TPv3 tunnels are
+ applied:
+
+ o Ethernet 802.1Q: 22 bytes.
+
+ o L2TPv3 Header: 4-16 bytes for L2TPv3 data messages over IP; 16-28
+ bytes for L2TPv3 data messages over UDP.
+
+ o IPv4 Header (outer IP header): 20 bytes.
+
+ o MPLS Labels may be added by a carrier. Each MPLS Label has a
+ length of 4 bytes. At the time of this writing, between 1 and 4
+ Labels seems to be a fair guess of what's expected.
+
+ The applicability of one or more of the following tunneling protocols
+ may be investigated by interested parties if Ethernet over L2TPv3 is
+ felt to be unsuitable: IP in IP [RFC2003] or Generic Routing
+ Encapsulation (GRE) [RFC2784]. RFC 4928 [RFC4928] proposes measures
+ how to avoid ECMP treatment in MPLS networks.
+
+ L2TP is a commodity tunneling protocol [RFC2661]. At the time of
+ this writing, L2TPv3 [RFC3931] is the latest version of L2TP. If
+ L2TPv3 is applied, software-based implementations of this protocol
+ are not suitable for the test setup, as such implementations may
+ cause incalculable delay shifts.
+
+ Ethernet pseudowires may also be set up on MPLS networks [RFC4448].
+ While there is no technical issue with this solution, MPLS interfaces
+ are mostly found in the network provider domain. Hence, not all of
+ the above criteria for selecting a tunneling protocol are met.
+
+ Note that setting up a metric test environment is not a plug-and-play
+ issue. Skilled networking engineers should be consulted and involved
+ if a setup between remote sites is preferred.
+
+ Passing or failing an ADK test with 2 samples could be a random
+ result (note that [RFC2330] defines a sample as a set of singleton
+ metric values produced by a measurement stream, and we continue to
+ use this terminology here). The error margin of a statistical test
+ is higher if the number of samples it is based on is low (the number
+ of samples taken influences the so-called "degree of freedom" of a
+
+
+
+Geib, et al. Best Current Practice [Page 13]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ statistical test, and a higher degree of freedom produces more
+ reliable results). To pass an ADK with higher probability, the
+ number of samples collected per implementation under identical
+ networking conditions SHOULD be greater than 2. Hardware and load
+ constraints may enforce an upper limit on the number of simultaneous
+ measurement streams. The ADK test allows one to combine different
+ samples (see Section 9 of [ADK]) and then to run a 2-sample test
+ between combined samples. At least 4 samples per implementation
+ captured under identical networking conditions is RECOMMENDED when
+ comparing different metric implementations by a statistical test.
+
+ It is RECOMMENDED that tests be carried out by establishing N
+ different parallel measurement flows. Two or three linecards per
+ implementation serving to send or receive measurement flows should be
+ sufficient to create 4 or more parallel measurement flows. Other
+ options are to separate flows by DiffServ marks (without deploying
+ any Quality of Service (QoS) in the inner or outer tunnel) or to use
+ a single Constant Bitrate (CBR) flow and evaluate whether every n-th
+ singleton belongs to a specific measurement flow. Note that a
+ practical test indeed showed that ADK passed with 4 samples even if a
+ 2-sample test failed [TESTPLAN].
+
+ Some additional guidelines to calculate and compare samples to
+ perform a metric test are:
+
+ o Comparing different probes of a common underlying distribution in
+ terms of metrics characterizing a communication network requires
+ respecting the temporal nature for which the assumption of a
+ common underlying distribution may hold. Any singletons or
+ samples to be compared must be captured within the same time
+ interval.
+
+ o If statistical events like rates are used to characterize measured
+ metrics of a time interval, a minimum of 5 singletons of a
+ relevant metric should be picked to ensure a minimum confidence
+ into the reported value. The error margin of the determined rate
+ depends on the number of singletons (refer to statistical
+ textbooks on student's t-test). As an example, any packet loss
+ measurement interval to be compared with the results of another
+ implementation contains at least five lost packets to have some
+ confidence that the observed loss rate wasn't caused by a small
+ number of random packet drops.
+
+ o The minimum number of singletons or samples to be compared by an
+ Anderson-Darling test should be 100 per tested metric
+ implementation. Note that the Anderson-Darling test detects small
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 14]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ differences in distributions fairly well and will fail for a high
+ number of compared results (RFC 2330 mentions an example with 8192
+ measurements where an Anderson-Darling test always failed).
+
+ o Generally, the Anderson-Darling test is sensitive to differences
+ in the accuracy or bias associated with varying implementations or
+ test conditions. These dissimilarities may result in differing
+ averages of samples to be compared. An example may be different
+ packet sizes, resulting in a constant delay difference between
+ compared samples. Therefore, samples to be compared by an
+ Anderson-Darling test MAY be calibrated by the difference of the
+ average values of the samples. Any calibration of this kind MUST
+ be documented in the test result.
+
+3.3. Tests of Two or More Different Implementations against a Metric
+ Specification
+
+ [RFC2330] expects that "a methodology for a given metric exhibits
+ continuity if, for small variations in conditions, it results in
+ small variations in the resulting measurements. Slightly more
+ precisely, for every positive epsilon, there exists a positive delta,
+ such that if two sets of conditions are within delta of each other,
+ then the resulting measurements will be within epsilon of each
+ other". A small variation in conditions in the context of the metric
+ test proposed here can be seen as different implementations measuring
+ the same metric along the same path.
+
+ IPPM metric specifications, however, allow for implementor options to
+ the largest possible degree. It cannot be expected that two
+ implementors allow 100% identical options in their implementations.
+ Testers SHOULD pick the same metric measurement configurations for
+ their systems when comparing their implementations by a metric test.
+
+ In some cases, a goodness-of-fit test may not be possible or show
+ disappointing results. To clarify the difficulties arising from
+ different metric implementation options, the individual options
+ picked for every compared metric implementation should be documented
+ as specified in Section 3.5. If the cause of the failure is a lack
+ of specification clarity or multiple legitimate interpretations of
+ the definition text, the text should be modified and the resulting
+ memo proposed for consensus and (possible) advancement to Internet
+ Standard.
+
+ The same statistical test as applicable to quantify precision of a
+ single metric implementation must be used to compare metric result
+ equivalence for different implementations. To document
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 15]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ compatibility, the smallest measurement resolution at which the
+ compared implementations passed the ADK sample test must be
+ documented.
+
+ For different implementations of the same metric, "variations in
+ conditions" are reasonably expected. The ADK test comparing samples
+ of the different implementations may result in a lower precision than
+ the test for precision in the same-implementation comparison.
+
+3.4. Clock Synchronization
+
+ Clock synchronization effects require special attention. Accuracy of
+ one-way active delay measurements for any metric implementation
+ depends on clock synchronization between the source and destination
+ of tests. Ideally, one-way active delay measurement [RFC2679] test
+ endpoints either have direct access to independent GPS or CDMA-based
+ time sources or indirect access to nearby NTP primary (stratum 1)
+ time sources, equipped with GPS receivers. Access to these time
+ sources may not be available at all test locations associated with
+ different Internet paths, for a variety of reasons out of scope of
+ this document.
+
+ When secondary (stratum 2 and above) time sources are used with NTP
+ running across the same network, whose metrics are subject to
+ comparative implementation tests, network impairments can affect
+ clock synchronization and distort sample one-way values and their
+ interval statistics. Discarding sample one-way delay values for any
+ implementation is recommended when one of the following reliability
+ conditions is met:
+
+ o Delay is measured and is finite in one direction but not the
+ other.
+
+ o Absolute value of the difference between the sum of one-way
+ measurements in both directions and the round-trip measurement is
+ greater than X% of the latter value.
+
+ Examination of the second condition requires round-trip time (RTT)
+ measurement for reference, e.g., based on TWAMP [RFC5357] in
+ conjunction with one-way delay measurement.
+
+ Specification of X% to strike a balance between identification of
+ unreliable one-way delay samples and misidentification of reliable
+ samples under a wide range of Internet path RTTs requires further
+ study.
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 16]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ An IPPM-compliant metric implementation of an RFC that requires
+ synchronized clocks is expected to provide precise measurement
+ results.
+
+ IF an implementation publishes a specification of its precision, such
+ as "a precision of 1 ms (+/- 500 us) with a confidence of 95%", then
+ the specification should be met over a useful measurement duration.
+ For example, if the metric is measured along an Internet path that is
+ stable and not congested, then the precision specification should be
+ met over durations of an hour or more.
+
+3.5. Recommended Metric Verification Measurement Process
+
+ In order to meet their obligations under the IETF Standards Process,
+ the IESG must be convinced that each metric specification advanced to
+ Internet Standard status is clearly written, that there are a
+ sufficient number of verified equivalent implementations, and that
+ options that have been implemented are documented.
+
+ In the context of this document, metrics are designed to measure some
+ characteristic of a data network. An aim of any metric definition
+ should be that it is specified in a way that can reliably measure the
+ specific characteristic in a repeatable way across multiple
+ independent implementations.
+
+ Each metric, statistic, or option of those to be validated MUST be
+ compared against a reference measurement or another implementation as
+ specified in this document.
+
+ Finally, the metric definitions, embodied in the text of the RFCs,
+ are the objects that require evaluation and possible revision in
+ order to advance to Internet Standard.
+
+ IF two (or more) implementations do not measure an equivalent metric
+ as specified by this document,
+
+ AND sources of measurement error do not adequately explain the lack
+ of agreement,
+
+ THEN the details of each implementation should be audited along with
+ the exact definition text to determine if there is a lack of clarity
+ that has caused the implementations to vary in a way that affects the
+ correspondence of the results.
+
+ IF there was a lack of clarity or multiple legitimate interpretations
+ of the definition text,
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 17]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ THEN the text should be modified and the resulting memo proposed for
+ consensus and (possible) advancement along the Standards Track.
+
+ Finally, all the findings MUST be documented in a report that can
+ support advancement to Internet Standard, as described here (similar
+ to the reports described in [RFC5657]). The list of measurement
+ devices used in testing satisfies the implementation requirement,
+ while the test results provide information on the quality of each
+ specification in the metric RFC (the surrogate for feature
+ interoperability).
+
+ The complete process of advancing a metric specification to a
+ Standard as defined by this document is illustrated in Figure 4.
+
+ ,---.
+ / \
+ ( Start )
+ \ / Implementations
+ `-+-' +-------+
+ | /| 1 `.
+ +---+----+ / +-------+ `.-----------+ ,-------.
+ | RFC | / |Check for | ,' was RFC `. YES
+ | | / |Equivalence.... clause x ------+
+ | |/ +-------+ |under | `. clear? ,' |
+ | Metric \.....| 2 ....relevant | `---+---' +----+-----+
+ | Metric |\ +-------+ |identical | No | |Report |
+ | Metric | \ |network | +--+----+ |results + |
+ | ... | \ |conditions | |Modify | |Advance |
+ | | \ +-------+ | | |Spec +--+RFC |
+ +--------+ \| n |.'+-----------+ +-------+ |request |
+ +-------+ +----------+
+
+ Figure 4: Illustration of the Metric Standardization Process
+
+ Any recommendation for the advancement of a metric specification MUST
+ be accompanied by an implementation report. The implementation
+ report needs to include the tests performed, the applied test setup,
+ the specific metrics in the RFC, and reports of the tests performed
+ with two or more implementations. The test plan needs to specify the
+ precision reached for each measured metric and thus define the
+ meaning of "statistically equivalent" for the specific metrics being
+ tested.
+
+ Ideally, the test plan would co-evolve with the development of the
+ metric, since that's when participants have the clearest context in
+ their minds regarding the different subtleties that can arise.
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 18]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ In particular, the implementation report MUST include the following
+ at minimum:
+
+ o The metric compared and the RFC specifying it. This includes
+ statements as required by Section 3.1 ("Tests of an Individual
+ Implementation against a Metric Specification") of this document.
+
+ o The measurement configuration and setup.
+
+ o A complete specification of the measurement stream (mean rate,
+ statistical distribution of packets, packet size or mean packet
+ size, and their distribution), Differentiated Services Code Point
+ (DSCP), and any other measurement stream properties that could
+ result in deviating results. Deviations in results can also be
+ caused if chosen IP addresses and ports of different
+ implementations result in different layer 2 or layer 3 paths due
+ to operation of Equal Cost Multi-Path routing in an operational
+ network.
+
+ o The duration of each measurement to be used for a metric
+ validation, the number of measurement points collected for each
+ metric during each measurement interval (i.e., the probe size),
+ and the level of confidence derived from this probe size for each
+ measurement interval.
+
+ o The result of the statistical tests performed for each metric
+ validation as required by Section 3.3 ("Tests of Two or More
+ Different Implementations against a Metric Specification") of this
+ document.
+
+ o A parameterization of laboratory conditions and applied traffic
+ and network conditions allowing reproduction of these laboratory
+ conditions for readers of the implementation report.
+
+ o The documentation helping to improve metric specifications defined
+ by this section.
+
+ All of the tests for each set SHOULD be run in a test setup as
+ specified in Section 3.2 ("Test Setup Resulting in Identical Live
+ Network Testing Conditions".
+
+ If a different test setup is chosen, it is recommended to avoid
+ effects falsifying results of validation measurements caused by real
+ data networks (like parallelism in devices and networks). Data
+ networks may forward packets differently in the case of:
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 19]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ o Different packet sizes chosen for different metric
+ implementations. A proposed countermeasure is selecting the same
+ packet size when validating results of two samples or a sample
+ against an original distribution.
+
+ o Selection of differing IP addresses and ports used by different
+ metric implementations during metric validation tests. If ECMP is
+ applied on the IP or MPLS level, different paths can result (note
+ that it may be impossible to detect an MPLS ECMP path from an IP
+ endpoint). A proposed countermeasure is to connect the
+ measurement equipment to be compared by a NAT device or establish
+ a single tunnel to transport all measurement traffic. The aim is
+ to have the same IP addresses and port for all measurement packets
+ or to avoid ECMP-based local routing diversion by using a layer 2
+ tunnel.
+
+ o Different IP options.
+
+ o Different DSCP.
+
+ o If the N measurements are captured using sequential measurements
+ instead of simultaneous ones, then the following factors come into
+ play: time varying paths and load conditions.
+
+3.6. Proposal to Determine an Equivalence Threshold for Each Metric
+ Evaluated
+
+ This section describes a proposal for maximum error of equivalence,
+ based on performance comparison of identical implementations. This
+ comparison may be useful for both ADK and non-ADK comparisons.
+
+ Each metric is tested by two or more implementations (cross-
+ implementation testing).
+
+ Each metric is also tested twice simultaneously by the *same*
+ implementation, using different Src/Dst Address pairs and other
+ differences such that the connectivity differences of the cross-
+ implementation tests are also experienced and measured by the same
+ implementation.
+
+ Comparative results for the same implementation represent a bound on
+ cross-implementation equivalence. This should be particularly useful
+ when the metric does *not* produce a continuous distribution of
+ singleton values, such as with a loss metric or a duplication metric.
+ Appendix A indicates how the ADK will work for one-way delay and
+ should be likewise applicable to distributions of delay variation.
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 20]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ Appendix B discusses two possible ways to perform the ADK analysis:
+ the R statistical language [Rtool] with ADK package [Radk] and C++
+ code.
+
+ Conclusion: the implementation with the largest difference in
+ homogeneous comparison results is the lower bound on the equivalence
+ threshold, noting that there may be other systematic errors to
+ account for when comparing implementations.
+
+ Thus, when evaluating equivalence in cross-implementation results:
+
+ Maximum_Error = Same_Implementation_Error + Systematic_Error
+
+ and only the systematic error need be decided beforehand.
+
+ In the case of ADK comparison, the largest same-implementation
+ resolution of distribution equivalence can be used as a limit on
+ cross-implementation resolutions (at the same confidence level).
+
+4. Acknowledgements
+
+ Gerhard Hasslinger commented a first draft version of this document;
+ he suggested statistical tests and the evaluation of time series
+ information. Matthias Wieser's thesis on a metric test resulted in
+ new input for this document. Henk Uijterwaal and Lars Eggert have
+ encouraged and helped to organize this work. Mike Hamilton, Scott
+ Bradner, David Mcdysan, and Emile Stephan commented on this document.
+ Carol Davids reviewed a version of the document before it became a WG
+ item.
+
+5. Contributors
+
+ Scott Bradner, Vern Paxson, and Allison Mankin drafted [METRICTEST],
+ and major parts of it are included in this document.
+
+6. Security Considerations
+
+ This memo does not raise any specific security issues.
+
+7. References
+
+7.1. Normative References
+
+ [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003,
+ October 1996.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+
+
+Geib, et al. Best Current Practice [Page 21]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ [RFC2330] Paxson, V., Almes, G., Mahdavi, J., and M. Mathis,
+ "Framework for IP Performance Metrics", RFC 2330,
+ May 1998.
+
+ [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G.,
+ Zorn, G., and B. Palter, "Layer Two Tunneling Protocol
+ "L2TP"", RFC 2661, August 1999.
+
+ [RFC2679] Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way
+ Delay Metric for IPPM", RFC 2679, September 1999.
+
+ [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P.
+ Traina, "Generic Routing Encapsulation (GRE)",
+ RFC 2784, March 2000.
+
+ [RFC3931] Lau, J., Townsley, M., and I. Goyret, "Layer Two
+ Tunneling Protocol - Version 3 (L2TPv3)", RFC 3931,
+ March 2005.
+
+ [RFC4448] Martini, L., Rosen, E., El-Aawar, N., and G. Heron,
+ "Encapsulation Methods for Transport of Ethernet over
+ MPLS Networks", RFC 4448, April 2006.
+
+ [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and
+ M. Zekauskas, "A One-way Active Measurement Protocol
+ (OWAMP)", RFC 4656, September 2006.
+
+ [RFC4719] Aggarwal, R., Townsley, M., and M. Dos Santos,
+ "Transport of Ethernet Frames over Layer 2 Tunneling
+ Protocol Version 3 (L2TPv3)", RFC 4719, November 2006.
+
+ [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding
+ Equal Cost Multipath Treatment in MPLS Networks",
+ BCP 128, RFC 4928, June 2007.
+
+ [RFC5657] Dusseault, L. and R. Sparks, "Guidance on
+ Interoperation and Implementation Reports for
+ Advancement to Draft Standard", BCP 9, RFC 5657,
+ September 2009.
+
+ [RFC6410] Housley, R., Crocker, D., and E. Burger, "Reducing the
+ Standards Track to Two Maturity Levels", BCP 9,
+ RFC 6410, October 2011.
+
+
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 22]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+7.2. Informative References
+
+ [ADK] Scholz, F. and M. Stephens, "K-sample Anderson-Darling
+ Tests of Fit, for Continuous and Discrete Cases",
+ University of Washington, Technical Report No. 81,
+ May 1986.
+
+ [GU-Duffield] Gu, Y., Duffield, N., Breslau, L., and S. Sen, "GRE
+ Encapsulated Multicast Probing: A Scalable Technique
+ for Measuring One-Way Loss", SIGMETRICS'07 San Diego,
+ California, USA, June 2007.
+
+ [METRICTEST] Bradner, S. and V. Paxson, "Advancement of metrics
+ specifications on the IETF Standards Track", Work
+ in Progress, August 2007.
+
+ [RFC2026] Bradner, S., "The Internet Standards Process --
+ Revision 3", BCP 9, RFC 2026, October 1996.
+
+ [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the-
+ Network Tunneling", RFC 4459, April 2006.
+
+ [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and
+ J. Babiarz, "A Two-Way Active Measurement Protocol
+ (TWAMP)", RFC 5357, October 2008.
+
+ [Radk] Scholz, F., "adk: Anderson-Darling K-Sample Test and
+ Combinations of Such Tests. R package version 1.0",
+ 2008.
+
+ [Rtool] R Development Core Team, "R: A language and
+ environment for statistical computing. R Foundation
+ for Statistical Computing, Vienna, Austria. ISBN
+ 3-900051-07-0", 2011, <http://www.R-project.org/>.
+
+ [TESTPLAN] Ciavattone, L., Geib, R., Morton, A., and M. Wieser,
+ "Test Plan and Results for Advancing RFC 2679 on the
+ Standards Track", Work in Progress, March 2012.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 23]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+Appendix A. An Example on a One-Way Delay Metric Validation
+
+ The text of this appendix is not binding. It is an example of what
+ parts of a One-Way Delay Metric test could look like.
+
+A.1. Compliance to Metric Specification Requirements
+
+ One-Way Delay, Loss Threshold, RFC 2679
+
+ This test determines if implementations use the same configured
+ maximum waiting time delay from one measurement to another under
+ different delay conditions and correctly declare packets arriving in
+ excess of the waiting time threshold as lost. See Sections 3.5 (3rd
+ bullet point) and 3.8.2 of [RFC2679].
+
+ (1) Configure a path with 1-second one-way constant delay.
+
+ (2) Measure one-way delay with 2 or more implementations, using
+ identical waiting time thresholds for loss set at 2 seconds.
+
+ (3) Configure the path with 3-second one-way delay.
+
+ (4) Repeat measurements.
+
+ (5) Observe that the increase measured in step 4 caused all packets
+ to be declared lost and that all packets that arrive
+ successfully in step 2 are assigned a valid one-way delay.
+
+ One-Way Delay, First Bit to Last Bit, RFC 2679
+
+ This test determines if implementations register the same relative
+ increase in delay from one measurement to another under different
+ delay conditions. This test tends to cancel the sources of error
+ that may be present in an implementation. See Section 3.7.2 of
+ [RFC2679] and Section 10.2 of [RFC2330].
+
+ (1) Configure a path with X ms one-way constant delay and ideally
+ include a low-speed link.
+
+ (2) Measure one-way delay with 2 or more implementations, using
+ identical options and equal size small packets (e.g., 100 octet
+ IP payload).
+
+ (3) Maintain the same path with X ms one-way delay.
+
+ (4) Measure one-way delay with 2 or more implementations, using
+ identical options and equal size large packets (e.g., 1500 octet
+ IP payload).
+
+
+
+Geib, et al. Best Current Practice [Page 24]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ (5) Observe that the increase measured in steps 2 and 4 is
+ equivalent to the increase in ms expected due to the larger
+ serialization time for each implementation. Most of the
+ measurement errors in each system should cancel, if they are
+ stationary.
+
+ One-Way Delay, RFC 2679
+
+ This test determines if implementations register the same relative
+ increase in delay from one measurement to another under different
+ delay conditions. This test tends to cancel the sources of error
+ that may be present in an implementation. This test is intended to
+ evaluate measurements in Sections 3 and 4 of [RFC2679].
+
+ (1) Configure a path with X ms one-way constant delay.
+
+ (2) Measure one-way delay with 2 or more implementations, using
+ identical options.
+
+ (3) Configure the path with X+Y ms one-way delay.
+
+ (4) Repeat measurements.
+
+ (5) Observe that the increase measured in steps 2 and 4 is ~Y ms for
+ each implementation. Most of the measurement errors in each
+ system should cancel, if they are stationary.
+
+ Error Calibration, RFC 2679
+
+ This is a simple check to determine if an implementation reports the
+ error calibration as required in Section 4.8 of [RFC2679]. Note that
+ the context (Type-P) must also be reported.
+
+A.2. Examples Related to Statistical Tests for One-Way Delay
+
+ A one-way delay measurement may pass an ADK test with a timestamp
+ result of 1 ms. The same test may fail if timestamps with a
+ resolution of 100 microseconds are evaluated. The implementation is
+ then conforming to the metric specification up to a timestamp
+ resolution of 1 ms.
+
+ Let's assume another one-way delay measurement comparison between
+ implementation 1 probing with a frequency of 2 probes per second and
+ implementation 2 probing at a rate of 2 probes every 3 minutes. To
+ ensure reasonable confidence in results, sample metrics are
+ calculated from at least 5 singletons per compared time interval.
+ This means that sample delay values are calculated for each system
+ for identical 6-minute intervals for the duration of the whole test.
+
+
+
+Geib, et al. Best Current Practice [Page 25]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ Per 6-minute interval, the sample metric is calculated from 720
+ singletons for implementation 1 and from 6 singletons for
+ implementation 2. Note that if outliers are not filtered, moving
+ averages are an option for an evaluation too. The minimum move of an
+ averaging interval is three minutes in this example.
+
+ The data in Table 1 may result from measuring one-way delay with
+ implementation 1 (see column Implemnt_1) and implementation 2 (see
+ column Implemnt_2). Each data point in the table represents a
+ (rounded) average of the sampled delay values per interval. The
+ resolution of the clock is one micro-second. The difference in the
+ delay values may result, e.g., from different probe packet sizes.
+
+ +------------+------------+-----------------------------+
+ | Implemnt_1 | Implemnt_2 | Implemnt_2 - Delta_Averages |
+ +------------+------------+-----------------------------+
+ | 5000 | 6549 | 4997 |
+ | 5008 | 6555 | 5003 |
+ | 5012 | 6564 | 5012 |
+ | 5015 | 6565 | 5013 |
+ | 5019 | 6568 | 5016 |
+ | 5022 | 6570 | 5018 |
+ | 5024 | 6573 | 5021 |
+ | 5026 | 6575 | 5023 |
+ | 5027 | 6577 | 5025 |
+ | 5029 | 6580 | 5028 |
+ | 5030 | 6585 | 5033 |
+ | 5032 | 6586 | 5034 |
+ | 5034 | 6587 | 5035 |
+ | 5036 | 6588 | 5036 |
+ | 5038 | 6589 | 5037 |
+ | 5039 | 6591 | 5039 |
+ | 5041 | 6592 | 5040 |
+ | 5043 | 6599 | 5047 |
+ | 5046 | 6606 | 5054 |
+ | 5054 | 6612 | 5060 |
+ +------------+------------+-----------------------------+
+
+ Table 1
+
+ Average values of sample metrics captured during identical time
+ intervals are compared. This excludes random differences caused by
+ differing probing intervals or differing temporal distance of
+ singletons resulting from their Poisson-distributed sending times.
+
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 26]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ In the example, 20 values have been picked (note that at least 100
+ values are recommended for a single run of a real test). Data must
+ be ordered by ascending rank. The data of Implemnt_1 and Implemnt_2
+ as shown in the first two columns of Table 1 clearly fails an ADK
+ test with 95% confidence.
+
+ The results of Implemnt_2 are now reduced by the difference of the
+ averages of column 2 (rounded to 6581 us) and column 1 (rounded to
+ 5029 us), which is 1552 us. The result may be found in column 3 of
+ Table 1. Comparing column 1 and column 3 of the table by an ADK test
+ shows that the data contained in these columns passes an ADK test
+ with 95% confidence.
+
+ Comment: Extensive averaging was used in this example because of the
+ vastly different sampling frequencies. As a result, the
+ distributions compared do not exactly align with a metric in
+ [RFC2679] but illustrate the ADK process adequately.
+
+Appendix B. Anderson-Darling K-sample Reference and 2 Sample C++ Code
+
+ There are many statistical tools available, and this appendix
+ describes two that are familiar to the authors.
+
+ The "R tool" is a language and command-line environment for
+ statistical computing and plotting [Rtool]. With the optional "adk"
+ package installed [Radk], it can perform individual and combined
+ sample ADK computations. The user must consult the package
+ documentation and the original paper [ADK] to interpret the results,
+ but this is as it should be.
+
+ The C++ code below will perform an AD2-sample comparison when
+ compiled and presented with two column vectors in a file (using white
+ space as separation). This version contains modifications made by
+ Wes Eddy in Sept 2011 to use the vectors and run as a stand-alone
+ module. The status of the comparison can be checked on the command
+ line with "$ echo $?" or the last line can be replaced with a printf
+ statement for adk_result instead.
+
+ /*
+
+ Copyright (c) 2012 IETF Trust and the persons identified
+ as authors of the code. All rights reserved.
+
+ Redistribution and use in source and binary forms, with
+ or without modification, is permitted pursuant to, and subject
+ to the license terms contained in, the Simplified BSD License
+ set forth in Section 4.c of the IETF Trust's Legal Provisions
+ Relating to IETF Documents (http://trustee.ietf.org/license-info).
+
+
+
+Geib, et al. Best Current Practice [Page 27]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ */
+
+ /* Routines for computing the Anderson-Darling 2 sample
+ * test statistic.
+ *
+ * Implemented based on the description in
+ * "Anderson-Darling K Sample Test" Heckert, Alan and
+ * Filliben, James, editors, Dataplot Reference Manual,
+ * Chapter 15 Auxiliary, NIST, 2004.
+ * Official Reference by 2010
+ * Heckert, N. A. (2001). Dataplot website at the
+ * National Institute of Standards and Technology:
+ * http://www.itl.nist.gov/div898/software/dataplot.html/
+ * June 2001.
+ */
+
+ #include <iostream>
+ #include <fstream>
+ #include <vector>
+ #include <sstream>
+
+ using namespace std;
+
+ int main() {
+ vector<double> vec1, vec2;
+ double adk_result;
+ static int k, val_st_z_samp1, val_st_z_samp2,
+ val_eq_z_samp1, val_eq_z_samp2,
+ j, n_total, n_sample1, n_sample2, L,
+ max_number_samples, line, maxnumber_z;
+ static int column_1, column_2;
+ static double adk, n_value, z, sum_adk_samp1,
+ sum_adk_samp2, z_aux;
+ static double H_j, F1j, hj, F2j, denom_1_aux, denom_2_aux;
+ static bool next_z_sample2, equal_z_both_samples;
+ static int stop_loop1, stop_loop2, stop_loop3,old_eq_line2,
+ old_eq_line1;
+
+ static double adk_criterium = 1.993;
+
+ /* vec1 and vec2 to be initialized with sample 1 and
+ * sample 2 values in ascending order */
+ while (!cin.eof()) {
+ double f1, f2;
+ cin >> f1;
+ cin >> f2;
+ vec1.push_back(f1);
+ vec2.push_back(f2);
+
+
+
+Geib, et al. Best Current Practice [Page 28]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ }
+
+ k = 2;
+ n_sample1 = vec1.size() - 1;
+ n_sample2 = vec2.size() - 1;
+
+ // -1 because vec[0] is a dummy value
+ n_total = n_sample1 + n_sample2;
+
+ /* value equal to the line with a value = zj in sample 1.
+ * Here j=1, so the line is 1.
+ */
+ val_eq_z_samp1 = 1;
+
+ /* value equal to the line with a value = zj in sample 2.
+ * Here j=1, so the line is 1.
+ */
+ val_eq_z_samp2 = 1;
+
+ /* value equal to the last line with a value < zj
+ * in sample 1. Here j=1, so the line is 0.
+ */
+ val_st_z_samp1 = 0;
+
+ /* value equal to the last line with a value < zj
+ * in sample 1. Here j=1, so the line is 0.
+ */
+ val_st_z_samp2 = 0;
+
+ sum_adk_samp1 = 0;
+ sum_adk_samp2 = 0;
+ j = 1;
+
+ // as mentioned above, j=1
+ equal_z_both_samples = false;
+
+ next_z_sample2 = false;
+
+ //assuming the next z to be of sample 1
+ stop_loop1 = n_sample1 + 1;
+
+ // + 1 because vec[0] is a dummy, see n_sample1 declaration
+ stop_loop2 = n_sample2 + 1;
+ stop_loop3 = n_total + 1;
+
+ /* The required z values are calculated until all values
+ * of both samples have been taken into account. See the
+ * lines above for the stoploop values. Construct required
+
+
+
+Geib, et al. Best Current Practice [Page 29]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ * to avoid a mathematical operation in the while condition.
+ */
+ while (((stop_loop1 > val_eq_z_samp1)
+ || (stop_loop2 > val_eq_z_samp2)) && stop_loop3 > j)
+ {
+ if(val_eq_z_samp1 < n_sample1+1)
+ {
+ /* here, a preliminary zj value is set.
+ * See below how to calculate the actual zj.
+ */
+ z = vec1[val_eq_z_samp1];
+
+ /* this while sequence calculates the number of values
+ * equal to z.
+ */
+ while ((val_eq_z_samp1+1 < n_sample1)
+ && z == vec1[val_eq_z_samp1+1] )
+ {
+ val_eq_z_samp1++;
+ }
+ }
+ else
+ {
+ val_eq_z_samp1 = 0;
+ val_st_z_samp1 = n_sample1;
+
+ // this should be val_eq_z_samp1 - 1 = n_sample1
+ }
+
+ if(val_eq_z_samp2 < n_sample2+1)
+ {
+ z_aux = vec2[val_eq_z_samp2];;
+
+ /* this while sequence calculates the number of values
+ * equal to z_aux
+ */
+
+ while ((val_eq_z_samp2+1 < n_sample2)
+ && z_aux == vec2[val_eq_z_samp2+1] )
+ {
+ val_eq_z_samp2++;
+ }
+
+ /* the smaller of the two actual data values is picked
+ * as the next zj.
+ */
+
+ if(z > z_aux)
+
+
+
+Geib, et al. Best Current Practice [Page 30]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ {
+ z = z_aux;
+ next_z_sample2 = true;
+ }
+ else
+ {
+ if (z == z_aux)
+ {
+ equal_z_both_samples = true;
+ }
+
+ /* This is the case if the last value of column1 is
+ * smaller than the remaining values of column2.
+ */
+ if (val_eq_z_samp1 == 0)
+ {
+ z = z_aux;
+ next_z_sample2 = true;
+ }
+ }
+ }
+ else
+ {
+ val_eq_z_samp2 = 0;
+ val_st_z_samp2 = n_sample2;
+
+ // this should be val_eq_z_samp2 - 1 = n_sample2
+
+ }
+
+ /* in the following, sum j = 1 to L is calculated for
+ * sample 1 and sample 2.
+ */
+ if (equal_z_both_samples)
+ {
+
+ /* hj is the number of values in the combined sample
+ * equal to zj
+ */
+ hj = val_eq_z_samp1 - val_st_z_samp1
+ + val_eq_z_samp2 - val_st_z_samp2;
+
+ /* H_j is the number of values in the combined sample
+ * smaller than zj plus one half the number of
+ * values in the combined sample equal to zj
+ * (that's hj/2).
+ */
+ H_j = val_st_z_samp1 + val_st_z_samp2
+
+
+
+Geib, et al. Best Current Practice [Page 31]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ + hj / 2;
+
+ /* F1j is the number of values in the 1st sample
+ * that are less than zj plus one half the number
+ * of values in this sample that are equal to zj.
+ */
+
+ F1j = val_st_z_samp1 + (double)
+ (val_eq_z_samp1 - val_st_z_samp1) / 2;
+
+ /* F2j is the number of values in the 1st sample
+ * that are less than zj plus one half the number
+ * of values in this sample that are equal to zj.
+ */
+ F2j = val_st_z_samp2 + (double)
+ (val_eq_z_samp2 - val_st_z_samp2) / 2;
+
+ /* set the line of values equal to zj to the
+ * actual line of the last value picked for zj.
+ */
+ val_st_z_samp1 = val_eq_z_samp1;
+
+ /* Set the line of values equal to zj to the actual
+ * line of the last value picked for zj of each
+ * sample. This is required as data smaller than zj
+ * is accounted differently than values equal to zj.
+ */
+ val_st_z_samp2 = val_eq_z_samp2;
+
+ /* next the lines of the next values z, i.e., zj+1
+ * are addressed.
+ */
+ val_eq_z_samp1++;
+
+ /* next the lines of the next values z, i.e.,
+ * zj+1 are addressed
+ */
+ val_eq_z_samp2++;
+ }
+ else
+ {
+
+ /* the smaller z value was contained in sample 2;
+ * hence, this value is the zj to base the following
+ * calculations on.
+ */
+ if (next_z_sample2)
+ {
+
+
+
+Geib, et al. Best Current Practice [Page 32]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ /* hj is the number of values in the combined
+ * sample equal to zj; in this case, these are
+ * within sample 2 only.
+ */
+ hj = val_eq_z_samp2 - val_st_z_samp2;
+
+ /* H_j is the number of values in the combined sample
+ * smaller than zj plus one half the number of
+ * values in the combined sample equal to zj
+ * (that's hj/2).
+ */
+ H_j = val_st_z_samp1 + val_st_z_samp2
+ + hj / 2;
+
+ /* F1j is the number of values in the 1st sample that
+ * are less than zj plus one half the number of values in
+ * this sample that are equal to zj.
+ * As val_eq_z_samp2 < val_eq_z_samp1, these are the
+ * val_st_z_samp1 only.
+ */
+ F1j = val_st_z_samp1;
+
+ /* F2j is the number of values in the 1st sample that
+ * are less than zj plus one half the number of values in
+ * this sample that are equal to zj. The latter are from
+ * sample 2 only in this case.
+ */
+
+ F2j = val_st_z_samp2 + (double)
+ (val_eq_z_samp2 - val_st_z_samp2) / 2;
+
+ /* Set the line of values equal to zj to the actual line
+ * of the last value picked for zj of sample 2 only in
+ * this case.
+ */
+ val_st_z_samp2 = val_eq_z_samp2;
+
+ /* next the line of the next value z, i.e., zj+1 is
+ * addressed. Here, only sample 2 must be addressed.
+ */
+
+ val_eq_z_samp2++;
+ if (val_eq_z_samp1 == 0)
+ {
+ val_eq_z_samp1 = stop_loop1;
+ }
+ }
+
+
+
+
+Geib, et al. Best Current Practice [Page 33]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ /* the smaller z value was contained in sample 2;
+ * hence, this value is the zj to base the following
+ * calculations on.
+ */
+
+ else
+ {
+
+ /* hj is the number of values in the combined
+ * sample equal to zj; in this case, these are
+ * within sample 1 only.
+ */
+ hj = val_eq_z_samp1 - val_st_z_samp1;
+
+ /* H_j is the number of values in the combined
+ * sample smaller than zj plus one half the number
+ * of values in the combined sample equal to zj
+ * (that's hj/2).
+ */
+
+ H_j = val_st_z_samp1 + val_st_z_samp2
+ + hj / 2;
+
+ /* F1j is the number of values in the 1st sample that
+ * are less than zj plus; in this case, these are within
+ * sample 1 only one half the number of values in this
+ * sample that are equal to zj. The latter are from
+ * sample 1 only in this case.
+ */
+
+ F1j = val_st_z_samp1 + (double)
+ (val_eq_z_samp1 - val_st_z_samp1) / 2;
+
+ /* F2j is the number of values in the 1st sample that
+ * are less than zj plus one half the number of values
+ * in this sample that are equal to zj. As
+ * val_eq_z_samp1 < val_eq_z_samp2, these are the
+ * val_st_z_samp2 only.
+ */
+
+ F2j = val_st_z_samp2;
+
+ /* Set the line of values equal to zj to the actual line
+ * of the last value picked for zj of sample 1 only in
+ * this case.
+ */
+
+ val_st_z_samp1 = val_eq_z_samp1;
+
+
+
+Geib, et al. Best Current Practice [Page 34]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+ /* next the line of the next value z, i.e., zj+1 is
+ * addressed. Here, only sample 1 must be addressed.
+ */
+ val_eq_z_samp1++;
+
+ if (val_eq_z_samp2 == 0)
+ {
+ val_eq_z_samp2 = stop_loop2;
+ }
+ }
+ }
+
+ denom_1_aux = n_total * F1j - n_sample1 * H_j;
+ denom_2_aux = n_total * F2j - n_sample2 * H_j;
+
+ sum_adk_samp1 = sum_adk_samp1 + hj
+ * (denom_1_aux * denom_1_aux) /
+ (H_j * (n_total - H_j)
+ - n_total * hj / 4);
+ sum_adk_samp2 = sum_adk_samp2 + hj
+ * (denom_2_aux * denom_2_aux) /
+ (H_j * (n_total - H_j)
+ - n_total * hj / 4);
+
+ next_z_sample2 = false;
+ equal_z_both_samples = false;
+
+ /* index to count the z. It is only required to prevent
+ * the while slope to execute endless
+ */
+ j++;
+ }
+
+ // calculating the adk value is the final step.
+ adk_result = (double) (n_total - 1) / (n_total
+ * n_total * (k - 1))
+ * (sum_adk_samp1 / n_sample1
+ + sum_adk_samp2 / n_sample2);
+
+
+ /* if(adk_result <= adk_criterium)
+ * adk_2_sample test is passed
+ */
+ return adk_result <= adk_criterium;
+ }
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 35]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+Appendix C. Glossary
+
+ +-------------+-----------------------------------------------------+
+ | ADK | Anderson-Darling K-Sample test, a test used to |
+ | | check whether two samples have the same statistical |
+ | | distribution. |
+ | ECMP | Equal Cost Multipath, a load-balancing mechanism |
+ | | evaluating MPLS Labels stacks, IP addresses, and |
+ | | ports. |
+ | EDF | The "empirical distribution function" of a set of |
+ | | scalar measurements is a function F(x), which for |
+ | | any x gives the fractional proportion of the total |
+ | | measurements that were smaller than or equal to x. |
+ | Metric | A measured quantity related to the performance and |
+ | | reliability of the Internet, expressed by a value. |
+ | | This could be a singleton (single value), a sample |
+ | | of single values, or a statistic based on a sample |
+ | | of singletons. |
+ | OWAMP | One-Way Active Measurement Protocol, a protocol for |
+ | | communication between IPPM measurement systems |
+ | | specified by IPPM. |
+ | OWD | One-Way Delay, a performance metric specified by |
+ | | IPPM. |
+ | Sample | A sample metric is derived from a given singleton |
+ | metric | metric by evaluating a number of distinct instances |
+ | | together. |
+ | Singleton | A singleton metric is, in a sense, one atomic |
+ | metric | measurement of this metric. |
+ | Statistical | A 'statistical' metric is derived from a given |
+ | metric | sample metric by computing some statistic of the |
+ | | values defined by the singleton metric on the |
+ | | sample. |
+ | TWAMP | Two-way Active Measurement Protocol, a protocol for |
+ | | communication between IPPM measurement systems |
+ | | specified by IPPM. |
+ +-------------+-----------------------------------------------------+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 36]
+
+RFC 6576 IPPM Standard Advancement Testing March 2012
+
+
+Authors' Addresses
+
+ Ruediger Geib (editor)
+ Deutsche Telekom
+ Heinrich Hertz Str. 3-7
+ Darmstadt 64295
+ Germany
+
+ Phone: +49 6151 58 12747
+ EMail: Ruediger.Geib@telekom.de
+
+
+ Al Morton
+ AT&T Labs
+ 200 Laurel Avenue South
+ Middletown, NJ 07748
+ USA
+
+ Phone: +1 732 420 1571
+ Fax: +1 732 368 1192
+ EMail: acmorton@att.com
+ URI: http://home.comcast.net/~acmacm/
+
+
+ Reza Fardid
+ Cariden Technologies
+ 888 Villa Street, Suite 500
+ Mountain View, CA 94041
+ USA
+
+ Phone:
+ EMail: rfardid@cariden.com
+
+
+ Alexander Steinmitz
+ Deutsche Telekom
+ Memmelsdorfer Str. 209b
+ Bamberg 96052
+ Germany
+
+ Phone:
+ EMail: Alexander.Steinmitz@telekom.de
+
+
+
+
+
+
+
+
+
+Geib, et al. Best Current Practice [Page 37]
+