summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc2330.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc2330.txt')
-rw-r--r--doc/rfc/rfc2330.txt2243
1 files changed, 2243 insertions, 0 deletions
diff --git a/doc/rfc/rfc2330.txt b/doc/rfc/rfc2330.txt
new file mode 100644
index 0000000..38a7a5f
--- /dev/null
+++ b/doc/rfc/rfc2330.txt
@@ -0,0 +1,2243 @@
+
+
+
+
+
+
+Network Working Group V. Paxson
+Request for Comments: 2330 Lawrence Berkeley National Lab
+Category: Informational G. Almes
+ Advanced Network & Services
+ J. Mahdavi
+ M. Mathis
+ Pittsburgh Supercomputer Center
+ May 1998
+
+
+ Framework for IP Performance Metrics
+
+
+1. Status of this Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+
+2. Copyright Notice
+
+ Copyright (C) The Internet Society (1998). All Rights Reserved.
+
+Table of Contents
+
+ 1. STATUS OF THIS MEMO.............................................1
+ 2. COPYRIGHT NOTICE................................................1
+ 3. INTRODUCTION....................................................2
+ 4. CRITERIA FOR IP PERFORMANCE METRICS.............................3
+ 5. TERMINOLOGY FOR PATHS AND CLOUDS................................4
+ 6. FUNDAMENTAL CONCEPTS............................................5
+ 6.1 Metrics......................................................5
+ 6.2 Measurement Methodology......................................6
+ 6.3 Measurements, Uncertainties, and Errors......................7
+ 7. METRICS AND THE ANALYTICAL FRAMEWORK............................8
+ 8. EMPIRICALLY SPECIFIED METRICS..................................11
+ 9. TWO FORMS OF COMPOSITION.......................................12
+ 9.1 Spatial Composition of Metrics..............................12
+ 9.2 Temporal Composition of Formal Models and Empirical Metrics.13
+ 10. ISSUES RELATED TO TIME........................................14
+ 10.1 Clock Issues...............................................14
+ 10.2 The Notion of "Wire Time"..................................17
+ 11. SINGLETONS, SAMPLES, AND STATISTICS............................19
+ 11.1 Methods of Collecting Samples..............................20
+ 11.1.1 Poisson Sampling........................................21
+ 11.1.2 Geometric Sampling......................................22
+ 11.1.3 Generating Poisson Sampling Intervals...................22
+
+
+
+Paxson, et. al. Informational [Page 1]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ 11.2 Self-Consistency...........................................24
+ 11.3 Defining Statistical Distributions.........................25
+ 11.4 Testing For Goodness-of-Fit................................27
+ 12. AVOIDING STOCHASTIC METRICS....................................28
+ 13. PACKETS OF TYPE P..............................................29
+ 14. INTERNET ADDRESSES VS. HOSTS...................................30
+ 15. STANDARD-FORMED PACKETS........................................30
+ 16. ACKNOWLEDGEMENTS...............................................31
+ 17. SECURITY CONSIDERATIONS........................................31
+ 18. APPENDIX.......................................................32
+ 19. REFERENCES.....................................................38
+ 20. AUTHORS' ADDRESSES.............................................39
+ 21. FULL COPYRIGHT STATEMENT.......................................40
+
+
+3. Introduction
+
+ The purpose of this memo is to define a general framework for
+ particular metrics to be developed by the IETF's IP Performance
+ Metrics effort, begun by the Benchmarking Methodology Working Group
+ (BMWG) of the Operational Requirements Area, and being continued by
+ the IP Performance Metrics Working Group (IPPM) of the Transport
+ Area.
+
+ We begin by laying out several criteria for the metrics that we
+ adopt. These criteria are designed to promote an IPPM effort that
+ will maximize an accurate common understanding by Internet users and
+ Internet providers of the performance and reliability both of end-
+ to-end paths through the Internet and of specific 'IP clouds' that
+ comprise portions of those paths.
+
+ We next define some Internet vocabulary that will allow us to speak
+ clearly about Internet components such as routers, paths, and clouds.
+
+ We then define the fundamental concepts of 'metric' and 'measurement
+ methodology', which allow us to speak clearly about measurement
+ issues. Given these concepts, we proceed to discuss the important
+ issue of measurement uncertainties and errors, and develop a key,
+ somewhat subtle notion of how they relate to the analytical framework
+ shared by many aspects of the Internet engineering discipline. We
+ then introduce the notion of empirically defined metrics, and finish
+ this part of the document with a general discussion of how metrics
+ can be 'composed'.
+
+ The remainder of the document deals with a variety of issues related
+ to defining sound metrics and methodologies: how to deal with
+ imperfect clocks; the notion of 'wire time' as distinct from 'host
+ time'; how to aggregate sets of singleton metrics into samples and
+
+
+
+Paxson, et. al. Informational [Page 2]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ derive sound statistics from those samples; why it is recommended to
+ avoid thinking about Internet properties in probabilistic terms (such
+ as the probability that a packet is dropped), since these terms often
+ include implicit assumptions about how the network behaves; the
+ utility of defining metrics in terms of packets of a generic type;
+ the benefits of preferring IP addresses to DNS host names; and the
+ notion of 'standard-formed' packets. An appendix discusses the
+ Anderson-Darling test for gauging whether a set of values matches a
+ given statistical distribution, and gives C code for an
+ implementation of the test.
+
+ In some sections of the memo, we will surround some commentary text
+ with the brackets {Comment: ... }. We stress that this commentary is
+ only commentary, and is not itself part of the framework document or
+ a proposal of particular metrics. In some cases this commentary will
+ discuss some of the properties of metrics that might be envisioned,
+ but the reader should assume that any such discussion is intended
+ only to shed light on points made in the framework document, and not
+ to suggest any specific metrics.
+
+
+4. Criteria for IP Performance Metrics
+
+ The overarching goal of the IP Performance Metrics effort is to
+ achieve a situation in which users and providers of Internet
+ transport service have an accurate common understanding of the
+ performance and reliability of the Internet component 'clouds' that
+ they use/provide.
+
+ To achieve this, performance and reliability metrics for paths
+ through the Internet must be developed. In several IETF meetings
+ criteria for these metrics have been specified:
+
+ + The metrics must be concrete and well-defined,
+ + A methodology for a metric should have the property that it is
+ repeatable: if the methodology is used multiple times under
+ identical conditions, the same measurements should result in the
+ same measurements.
+ + The metrics must exhibit no bias for IP clouds implemented with
+ identical technology,
+ + The metrics must exhibit understood and fair bias for IP clouds
+ implemented with non-identical technology,
+ + The metrics must be useful to users and providers in understanding
+ the performance they experience or provide,
+
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 3]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ + The metrics must avoid inducing artificial performance goals.
+
+
+5. Terminology for Paths and Clouds
+
+ The following list defines terms that need to be precise in the
+ development of path metrics. We begin with low-level notions of
+ 'host', 'router', and 'link', then proceed to define the notions of
+ 'path', 'IP cloud', and 'exchange' that allow us to segment a path
+ into relevant pieces.
+
+ host A computer capable of communicating using the Internet
+ protocols; includes "routers".
+
+ link A single link-level connection between two (or more) hosts;
+ includes leased lines, ethernets, frame relay clouds, etc.
+
+ routerA host which facilitates network-level communication between
+ hosts by forwarding IP packets.
+
+ path A sequence of the form < h0, l1, h1, ..., ln, hn >, where n >=
+ 0, each hi is a host, each li is a link between hi-1 and hi,
+ each h1...hn-1 is a router. A pair <li, hi> is termed a 'hop'.
+ In an appropriate operational configuration, the links and
+ routers in the path facilitate network-layer communication of
+ packets from h0 to hn. Note that path is a unidirectional
+ concept.
+
+ subpath
+ Given a path, a subpath is any subsequence of the given path
+ which is itself a path. (Thus, the first and last element of a
+ subpath is a host.)
+
+ cloudAn undirected (possibly cyclic) graph whose vertices are routers
+ and whose edges are links that connect pairs of routers.
+ Formally, ethernets, frame relay clouds, and other links that
+ connect more than two routers are modelled as fully-connected
+ meshes of graph edges. Note that to connect to a cloud means to
+ connect to a router of the cloud over a link; this link is not
+ itself part of the cloud.
+
+ exchange
+ A special case of a link, an exchange directly connects either a
+ host to a cloud and/or one cloud to another cloud.
+
+ cloud subpath
+ A subpath of a given path, all of whose hosts are routers of a
+ given cloud.
+
+
+
+Paxson, et. al. Informational [Page 4]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ path digest
+ A sequence of the form < h0, e1, C1, ..., en, hn >, where n >=
+ 0, h0 and hn are hosts, each e1 ... en is an exchange, and each
+ C1 ... Cn-1 is a cloud subpath.
+
+6. Fundamental Concepts
+
+
+6.1. Metrics
+
+ In the operational Internet, there are several quantities related to
+ the performance and reliability of the Internet that we'd like to
+ know the value of. When such a quantity is carefully specified, we
+ term the quantity a metric. We anticipate that there will be
+ separate RFCs for each metric (or for each closely related group of
+ metrics).
+
+ In some cases, there might be no obvious means to effectively measure
+ the metric; this is allowed, and even understood to be very useful in
+ some cases. It is required, however, that the specification of the
+ metric be as clear as possible about what quantity is being
+ specified. Thus, difficulty in practical measurement is sometimes
+ allowed, but ambiguity in meaning is not.
+
+ Each metric will be defined in terms of standard units of
+ measurement. The international metric system will be used, with the
+ following points specifically noted:
+
+ + When a unit is expressed in simple meters (for distance/length) or
+ seconds (for duration), appropriate related units based on
+ thousands or thousandths of acceptable units are acceptable.
+ Thus, distances expressed in kilometers (km), durations expressed
+ in milliseconds (ms), or microseconds (us) are allowed, but not
+ centimeters (because the prefix is not in terms of thousands or
+ thousandths).
+ + When a unit is expressed in a combination of units, appropriate
+ related units based on thousands or thousandths of acceptable
+ units are acceptable, but all such thousands/thousandths must be
+ grouped at the beginning. Thus, kilo-meters per second (km/s) is
+ allowed, but meters per millisecond is not.
+ + The unit of information is the bit.
+ + When metric prefixes are used with bits or with combinations
+ including bits, those prefixes will have their metric meaning
+ (related to decimal 1000), and not the meaning conventional with
+ computer storage (related to decimal 1024). In any RFC that
+ defines a metric whose units include bits, this convention will be
+ followed and will be repeated to ensure clarity for the reader.
+
+
+
+
+Paxson, et. al. Informational [Page 5]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ + When a time is given, it will be expressed in UTC.
+
+ Note that these points apply to the specifications for metrics and
+ not, for example, to packet formats where octets will likely be used
+ in preference/addition to bits.
+
+ Finally, we note that some metrics may be defined purely in terms of
+ other metrics; such metrics are call 'derived metrics'.
+
+
+6.2. Measurement Methodology
+
+ For a given set of well-defined metrics, a number of distinct
+ measurement methodologies may exist. A partial list includes:
+
+ + Direct measurement of a performance metric using injected test
+ traffic. Example: measurement of the round-trip delay of an IP
+ packet of a given size over a given route at a given time.
+ + Projection of a metric from lower-level measurements. Example:
+ given accurate measurements of propagation delay and bandwidth for
+ each step along a path, projection of the complete delay for the
+ path for an IP packet of a given size.
+ + Estimation of a constituent metric from a set of more aggregated
+ measurements. Example: given accurate measurements of delay for a
+ given one-hop path for IP packets of different sizes, estimation
+ of propagation delay for the link of that one-hop path.
+ + Estimation of a given metric at one time from a set of related
+ metrics at other times. Example: given an accurate measurement of
+ flow capacity at a past time, together with a set of accurate
+ delay measurements for that past time and the current time, and
+ given a model of flow dynamics, estimate the flow capacity that
+ would be observed at the current time.
+
+ This list is by no means exhaustive. The purpose is to point out the
+ variety of measurement techniques.
+
+ When a given metric is specified, a given measurement approach might
+ be noted and discussed. That approach, however, is not formally part
+ of the specification.
+
+ A methodology for a metric should have the property that it is
+ repeatable: if the methodology is used multiple times under identical
+ conditions, it should result in consistent measurements.
+
+ Backing off a little from the word 'identical' in the previous
+ paragraph, we could more accurately use the word 'continuity' to
+ describe a property of a given methodology: a methodology for a given
+ metric exhibits continuity if, for small variations in conditions, it
+
+
+
+Paxson, et. al. Informational [Page 6]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ results in small variations in the resulting measurements. Slightly
+ more precisely, for every positive epsilon, there exists a positive
+ delta, such that if two sets of conditions are within delta of each
+ other, then the resulting measurements will be within epsilon of each
+ other. At this point, this should be taken as a heuristic driving
+ our intuition about one kind of robustness property rather than as a
+ precise notion.
+
+ A metric that has at least one methodology that exhibits continuity
+ is said itself to exhibit continuity.
+
+ Note that some metrics, such as hop-count along a path, are integer-
+ valued and therefore cannot exhibit continuity in quite the sense
+ given above.
+
+ Note further that, in practice, it may not be practical to know (or
+ be able to quantify) the conditions relevant to a measurement at a
+ given time. For example, since the instantaneous load (in packets to
+ be served) at a given router in a high-speed wide-area network can
+ vary widely over relatively brief periods and will be very hard for
+ an external observer to quantify, various statistics of a given
+ metric may be more repeatable, or may better exhibit continuity. In
+ that case those particular statistics should be specified when the
+ metric is specified.
+
+ Finally, some measurement methodologies may be 'conservative' in the
+ sense that the act of measurement does not modify, or only slightly
+ modifies, the value of the performance metric the methodology
+ attempts to measure. {Comment: for example, in a wide-are high-speed
+ network under modest load, a test using several small 'ping' packets
+ to measure delay would likely not interfere (much) with the delay
+ properties of that network as observed by others. The corresponding
+ statement about tests using a large flow to measure flow capacity
+ would likely fail.}
+
+
+6.3. Measurements, Uncertainties, and Errors
+
+ Even the very best measurement methodologies for the very most well
+ behaved metrics will exhibit errors. Those who develop such
+ measurement methodologies, however, should strive to:
+
+
+
+
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 7]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ + minimize their uncertainties/errors,
+ + understand and document the sources of uncertainty/error, and
+ + quantify the amounts of uncertainty/error.
+
+ For example, when developing a method for measuring delay, understand
+ how any errors in your clocks introduce errors into your delay
+ measurement, and quantify this effect as well as you can. In some
+ cases, this will result in a requirement that a clock be at least up
+ to a certain quality if it is to be used to make a certain
+ measurement.
+
+ As a second example, consider the timing error due to measurement
+ overheads within the computer making the measurement, as opposed to
+ delays due to the Internet component being measured. The former is a
+ measurement error, while the latter reflects the metric of interest.
+ Note that one technique that can help avoid this overhead is the use
+ of a packet filter/sniffer, running on a separate computer that
+ records network packets and timestamps them accurately (see the
+ discussion of 'wire time' below). The resulting trace can then be
+ analyzed to assess the test traffic, minimizing the effect of
+ measurement host delays, or at least allowing those delays to be
+ accounted for. We note that this technique may prove beneficial even
+ if the packet filter/sniffer runs on the same machine, because such
+ measurements generally provide 'kernel-level' timestamping as opposed
+ to less-accurate 'application-level' timestamping.
+
+ Finally, we note that derived metrics (defined above) or metrics that
+ exhibit spatial or temporal composition (defined below) offer
+ particular occasion for the analysis of measurement uncertainties,
+ namely how the uncertainties propagate (conceptually) due to the
+ derivation or composition.
+
+
+7. Metrics and the Analytical Framework
+
+ As the Internet has evolved from the early packet-switching studies
+ of the 1960s, the Internet engineering community has evolved a common
+ analytical framework of concepts. This analytical framework, or A-
+ frame, used by designers and implementers of protocols, by those
+ involved in measurement, and by those who study computer network
+ performance using the tools of simulation and analysis, has great
+ advantage to our work. A major objective here is to generate network
+ characterizations that are consistent in both analytical and
+ practical settings, since this will maximize the chances that non-
+ empirical network study can be better correlated with, and used to
+ further our understanding of, real network behavior.
+
+
+
+
+
+Paxson, et. al. Informational [Page 8]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ Whenever possible, therefore, we would like to develop and leverage
+ off of the A-frame. Thus, whenever a metric to be specified is
+ understood to be closely related to concepts within the A-frame, we
+ will attempt to specify the metric in the A-frame's terms. In such a
+ specification we will develop the A-frame by precisely defining the
+ concepts needed for the metric, then leverage off of the A-frame by
+ defining the metric in terms of those concepts.
+
+ Such a metric will be called an 'analytically specified metric' or,
+ more simply, an analytical metric.
+
+ {Comment: Examples of such analytical metrics might include:
+
+propagation time of a link
+ The time, in seconds, required by a single bit to travel from the
+ output port on one Internet host across a single link to another
+ Internet host.
+
+bandwidth of a link for packets of size k
+ The capacity, in bits/second, where only those bits of the IP
+ packet are counted, for packets of size k bytes.
+
+routeThe path, as defined in Section 5, from A to B at a given time.
+
+hop count of a route
+ The value 'n' of the route path.
+ }
+
+ Note that we make no a priori list of just what A-frame concepts
+ will emerge in these specifications, but we do encourage their use
+ and urge that they be carefully specified so that, as our set of
+ metrics develops, so will a specified set of A-frame concepts
+ technically consistent with each other and consonant with the
+ common understanding of those concepts within the general Internet
+ community.
+
+ These A-frame concepts will be intended to abstract from actual
+ Internet components in such a way that:
+
+ + the essential function of the component is retained,
+ + properties of the component relevant to the metrics we aim to
+ create are retained,
+ + a subset of these component properties are potentially defined as
+ analytical metrics, and
+
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 9]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ + those properties of actual Internet components not relevant to
+ defining the metrics we aim to create are dropped.
+
+ For example, when considering a router in the context of packet
+ forwarding, we might model the router as a component that receives
+ packets on an input link, queues them on a FIFO packet queue of
+ finite size, employs tail-drop when the packet queue is full, and
+ forwards them on an output link. The transmission speed (in
+ bits/second) of the input and output links, the latency in the router
+ (in seconds), and the maximum size of the packet queue (in bits) are
+ relevant analytical metrics.
+
+ In some cases, such analytical metrics used in relation to a router
+ will be very closely related to specific metrics of the performance
+ of Internet paths. For example, an obvious formula (L + P/B)
+ involving the latency in the router (L), the packet size (in bits)
+ (P), and the transmission speed of the output link (B) might closely
+ approximate the increase in packet delay due to the insertion of a
+ given router along a path.
+
+ We stress, however, that well-chosen and well-specified A-frame
+ concepts and their analytical metrics will support more general
+ metric creation efforts in less obvious ways.
+
+ {Comment: for example, when considering the flow capacity of a path,
+ it may be of real value to be able to model each of the routers along
+ the path as packet forwarders as above. Techniques for estimating
+ the flow capacity of a path might use the maximum packet queue size
+ as a parameter in decidedly non-obvious ways. For example, as the
+ maximum queue size increases, so will the ability of the router to
+ continuously move traffic along an output link despite fluctuations
+ in traffic from an input link. Estimating this increase, however,
+ remains a research topic.}
+
+ Note that, when we specify A-frame concepts and analytical metrics,
+ we will inevitably make simplifying assumptions. The key role of
+ these concepts is to abstract the properties of the Internet
+ components relevant to given metrics. Judgement is required to avoid
+ making assumptions that bias the modeling and metric effort toward
+ one kind of design.
+
+ {Comment: for example, routers might not use tail-drop, even though
+ tail-drop might be easier to model analytically.}
+
+ Finally, note that different elements of the A-frame might well make
+ different simplifying assumptions. For example, the abstraction of a
+ router used to further the definition of path delay might treat the
+ router's packet queue as a single FIFO queue, but the abstraction of
+
+
+
+Paxson, et. al. Informational [Page 10]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ a router used to further the definition of the handling of an RSVP-
+ enabled packet might treat the router's packet queue as supporting
+ bounded delay -- a contradictory assumption. This is not to say that
+ we make contradictory assumptions at the same time, but that two
+ different parts of our work might refine the simpler base concept in
+ two divergent ways for different purposes.
+
+ {Comment: in more mathematical terms, we would say that the A-frame
+ taken as a whole need not be consistent; but the set of particular
+ A-frame elements used to define a particular metric must be.}
+
+
+8. Empirically Specified Metrics
+
+ There are useful performance and reliability metrics that do not fit
+ so neatly into the A-frame, usually because the A-frame lacks the
+ detail or power for dealing with them. For example, "the best flow
+ capacity achievable along a path using an RFC-2001-compliant TCP"
+ would be good to be able to measure, but we have no analytical
+ framework of sufficient richness to allow us to cast that flow
+ capacity as an analytical metric.
+
+ These notions can still be well specified by instead describing a
+ reference methodology for measuring them.
+
+ Such a metric will be called an 'empirically specified metric', or
+ more simply, an empirical metric.
+
+ Such empirical metrics should have three properties:
+
+ + we should have a clear definition for each in terms of Internet
+ components,
+ + we should have at least one effective means to measure them, and
+ + to the extent possible, we should have an (necessarily incomplete)
+ understanding of the metric in terms of the A-frame so that we can
+ use our measurements to reason about the performance and
+ reliability of A-frame components and of aggregations of A-frame
+ components.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 11]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+9. Two Forms of Composition
+
+
+9.1. Spatial Composition of Metrics
+
+ In some cases, it may be realistic and useful to define metrics in
+ such a fashion that they exhibit spatial composition.
+
+ By spatial composition, we mean a characteristic of some path
+ metrics, in which the metric as applied to a (complete) path can also
+ be defined for various subpaths, and in which the appropriate A-frame
+ concepts for the metric suggest useful relationships between the
+ metric applied to these various subpaths (including the complete
+ path, the various cloud subpaths of a given path digest, and even
+ single routers along the path). The effectiveness of spatial
+ composition depends:
+
+ + on the usefulness in analysis of these relationships as applied to
+ the relevant A-frame components, and
+ + on the practical use of the corresponding relationships as applied
+ to metrics and to measurement methodologies.
+
+ {Comment: for example, consider some metric for delay of a 100-byte
+ packet across a path P, and consider further a path digest <h0, e1,
+ C1, ..., en, hn> of P. The definition of such a metric might include
+ a conjecture that the delay across P is very nearly the sum of the
+ corresponding metric across the exchanges (ei) and clouds (Ci) of the
+ given path digest. The definition would further include a note on
+ how a corresponding relation applies to relevant A-frame components,
+ both for the path P and for the exchanges and clouds of the path
+ digest.}
+
+ When the definition of a metric includes a conjecture that the metric
+ across the path is related to the metric across the subpaths of the
+ path, that conjecture constitutes a claim that the metric exhibits
+ spatial composition. The definition should then include:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 12]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ + the specific conjecture applied to the metric,
+ + a justification of the practical utility of the composition in
+ terms of making accurate measurements of the metric on the path,
+ + a justification of the usefulness of the composition in terms of
+ making analysis of the path using A-frame concepts more effective,
+ and
+ + an analysis of how the conjecture could be incorrect.
+
+
+9.2. Temporal Composition of Formal Models and Empirical Metrics
+
+ In some cases, it may be realistic and useful to define metrics in
+ such a fashion that they exhibit temporal composition.
+
+ By temporal composition, we mean a characteristic of some path
+ metric, in which the metric as applied to a path at a given time T is
+ also defined for various times t0 < t1 < ... < tn < T, and in which
+ the appropriate A-frame concepts for the metric suggests useful
+ relationships between the metric applied at times t0, ..., tn and the
+ metric applied at time T. The effectiveness of temporal composition
+ depends:
+
+ + on the usefulness in analysis of these relationships as applied to
+ the relevant A-frame components, and
+ + on the practical use of the corresponding relationships as applied
+ to metrics and to measurement methodologies.
+
+ {Comment: for example, consider a metric for the expected flow
+ capacity across a path P during the five-minute period surrounding
+ the time T, and suppose further that we have the corresponding values
+ for each of the four previous five-minute periods t0, t1, t2, and t3.
+ The definition of such a metric might include a conjecture that the
+ flow capacity at time T can be estimated from a certain kind of
+ extrapolation from the values of t0, ..., t3. The definition would
+ further include a note on how a corresponding relation applies to
+ relevant A-frame components.
+
+ Note: any (spatial or temporal) compositions involving flow capacity
+ are likely to be subtle, and temporal compositions are generally more
+ subtle than spatial compositions, so the reader should understand
+ that the foregoing example is intentionally naive.}
+
+ When the definition of a metric includes a conjecture that the metric
+ across the path at a given time T is related to the metric across the
+ path for a set of other times, that conjecture constitutes a claim
+ that the metric exhibits temporal composition. The definition should
+ then include:
+
+
+
+
+Paxson, et. al. Informational [Page 13]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ + the specific conjecture applied to the metric,
+ + a justification of the practical utility of the composition in
+ terms of making accurate measurements of the metric on the path,
+ and
+ + a justification of the usefulness of the composition in terms of
+ making analysis of the path using A-frame concepts more effective.
+
+
+10. Issues related to Time
+
+
+10.1. Clock Issues
+
+ Measurements of time lie at the heart of many Internet metrics.
+ Because of this, it will often be crucial when designing a
+ methodology for measuring a metric to understand the different types
+ of errors and uncertainties introduced by imperfect clocks. In this
+ section we define terminology for discussing the characteristics of
+ clocks and touch upon related measurement issues which need to be
+ addressed by any sound methodology.
+
+ The Network Time Protocol (NTP; RFC 1305) defines a nomenclature for
+ discussing clock characteristics, which we will also use when
+ appropriate [Mi92]. The main goal of NTP is to provide accurate
+ timekeeping over fairly long time scales, such as minutes to days,
+ while for measurement purposes often what is more important is
+ short-term accuracy, between the beginning of the measurement and the
+ end, or over the course of gathering a body of measurements (a
+ sample). This difference in goals sometimes leads to different
+ definitions of terminology as well, as discussed below.
+
+ To begin, we define a clock's "offset" at a particular moment as the
+ difference between the time reported by the clock and the "true" time
+ as defined by UTC. If the clock reports a time Tc and the true time
+ is Tt, then the clock's offset is Tc - Tt.
+
+ We will refer to a clock as "accurate" at a particular moment if the
+ clock's offset is zero, and more generally a clock's "accuracy" is
+ how close the absolute value of the offset is to zero. For NTP,
+ accuracy also includes a notion of the frequency of the clock; for
+ our purposes, we instead incorporate this notion into that of "skew",
+ because we define accuracy in terms of a single moment in time rather
+ than over an interval of time.
+
+ A clock's "skew" at a particular moment is the frequency difference
+ (first derivative of its offset with respect to true time) between
+ the clock and true time.
+
+
+
+
+Paxson, et. al. Informational [Page 14]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ As noted in RFC 1305, real clocks exhibit some variation in skew.
+ That is, the second derivative of the clock's offset with respect to
+ true time is generally non-zero. In keeping with RFC 1305, we define
+ this quantity as the clock's "drift".
+
+ A clock's "resolution" is the smallest unit by which the clock's time
+ is updated. It gives a lower bound on the clock's uncertainty.
+ (Note that clocks can have very fine resolutions and yet be wildly
+ inaccurate.) Resolution is defined in terms of seconds. However,
+ resolution is relative to the clock's reported time and not to true
+ time, so for example a resolution of 10 ms only means that the clock
+ updates its notion of time in 0.01 second increments, not that this
+ is the true amount of time between updates.
+
+ {Comment: Systems differ on how an application interface to the clock
+ reports the time on subsequent calls during which the clock has not
+ advanced. Some systems simply return the same unchanged time as
+ given for previous calls. Others may add a small increment to the
+ reported time to maintain monotone-increasing timestamps. For
+ systems that do the latter, we do *not* consider these small
+ increments when defining the clock's resolution. They are instead an
+ impediment to assessing the clock's resolution, since a natural
+ method for doing so is to repeatedly query the clock to determine the
+ smallest non-zero difference in reported times.}
+
+ It is expected that a clock's resolution changes only rarely (for
+ example, due to a hardware upgrade).
+
+ There are a number of interesting metrics for which some natural
+ measurement methodologies involve comparing times reported by two
+ different clocks. An example is one-way packet delay [AK97]. Here,
+ the time required for a packet to travel through the network is
+ measured by comparing the time reported by a clock at one end of the
+ packet's path, corresponding to when the packet first entered the
+ network, with the time reported by a clock at the other end of the
+ path, corresponding to when the packet finished traversing the
+ network.
+
+ We are thus also interested in terminology for describing how two
+ clocks C1 and C2 compare. To do so, we introduce terms related to
+ those above in which the notion of "true time" is replaced by the
+ time as reported by clock C1. For example, clock C2's offset
+ relative to C1 at a particular moment is Tc2 - Tc1, the instantaneous
+ difference in time reported by C2 and C1. To disambiguate between
+ the use of the terms to compare two clocks versus the use of the
+ terms to compare to true time, we will in the former case use the
+ phrase "relative". So the offset defined earlier in this paragraph
+ is the "relative offset" between C2 and C1.
+
+
+
+Paxson, et. al. Informational [Page 15]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ When comparing clocks, the analog of "resolution" is not "relative
+ resolution", but instead "joint resolution", which is the sum of the
+ resolutions of C1 and C2. The joint resolution then indicates a
+ conservative lower bound on the accuracy of any time intervals
+ computed by subtracting timestamps generated by one clock from those
+ generated by the other.
+
+ If two clocks are "accurate" with respect to one another (their
+ relative offset is zero), we will refer to the pair of clocks as
+ "synchronized". Note that clocks can be highly synchronized yet
+ arbitrarily inaccurate in terms of how well they tell true time.
+ This point is important because for many Internet measurements,
+ synchronization between two clocks is more important than the
+ accuracy of the clocks. The is somewhat true of skew, too: as long
+ as the absolute skew is not too great, then minimal relative skew is
+ more important, as it can induce systematic trends in packet transit
+ times measured by comparing timestamps produced by the two clocks.
+
+ These distinctions arise because for Internet measurement what is
+ often most important are differences in time as computed by comparing
+ the output of two clocks. The process of computing the difference
+ removes any error due to clock inaccuracies with respect to true
+ time; but it is crucial that the differences themselves accurately
+ reflect differences in true time.
+
+ Measurement methodologies will often begin with the step of assuring
+ that two clocks are synchronized and have minimal skew and drift.
+ {Comment: An effective way to assure these conditions (and also clock
+ accuracy) is by using clocks that derive their notion of time from an
+ external source, rather than only the host computer's clock. (These
+ latter are often subject to large errors.) It is further preferable
+ that the clocks directly derive their time, for example by having
+ immediate access to a GPS (Global Positioning System) unit.}
+
+ Two important concerns arise if the clocks indirectly derive their
+ time using a network time synchronization protocol such as NTP:
+
+ + First, NTP's accuracy depends in part on the properties
+ (particularly delay) of the Internet paths used by the NTP peers,
+ and these might be exactly the properties that we wish to measure,
+ so it would be unsound to use NTP to calibrate such measurements.
+ + Second, NTP focuses on clock accuracy, which can come at the
+ expense of short-term clock skew and drift. For example, when a
+ host's clock is indirectly synchronized to a time source, if the
+ synchronization intervals occur infrequently, then the host will
+ sometimes be faced with the problem of how to adjust its current,
+ incorrect time, Ti, with a considerably different, more accurate
+ time it has just learned, Ta. Two general ways in which this is
+
+
+
+Paxson, et. al. Informational [Page 16]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ done are to either immediately set the current time to Ta, or to
+ adjust the local clock's update frequency (hence, its skew) so
+ that at some point in the future the local time Ti' will agree
+ with the more accurate time Ta'. The first mechanism introduces
+ discontinuities and can also violate common assumptions that
+ timestamps are monotone increasing. If the host's clock is set
+ backward in time, sometimes this can be easily detected. If the
+ clock is set forward in time, this can be harder to detect. The
+ skew induced by the second mechanism can lead to considerable
+ inaccuracies when computing differences in time, as discussed
+ above.
+
+ To illustrate why skew is a crucial concern, consider samples of
+ one-way delays between two Internet hosts made at one minute
+ intervals. The true transmission delay between the hosts might
+ plausibly be on the order of 50 ms for a transcontinental path. If
+ the skew between the two clocks is 0.01%, that is, 1 part in 10,000,
+ then after 10 minutes of observation the error introduced into the
+ measurement is 60 ms. Unless corrected, this error is enough to
+ completely wipe out any accuracy in the transmission delay
+ measurement. Finally, we note that assessing skew errors between
+ unsynchronized network clocks is an open research area. (See [Pa97]
+ for a discussion of detecting and compensating for these sorts of
+ errors.) This shortcoming makes use of a solid, independent clock
+ source such as GPS especially desirable.
+
+
+10.2. The Notion of "Wire Time"
+
+ Internet measurement is often complicated by the use of Internet
+ hosts themselves to perform the measurement. These hosts can
+ introduce delays, bottlenecks, and the like that are due to hardware
+ or operating system effects and have nothing to do with the network
+ behavior we would like to measure. This problem is particularly
+ acute when timestamping of network events occurs at the application
+ level.
+
+ In order to provide a general way of talking about these effects, we
+ introduce two notions of "wire time". These notions are only defined
+ in terms of an Internet host H observing an Internet link L at a
+ particular location:
+
+ + For a given packet P, the 'wire arrival time' of P at H on L is
+ the first time T at which any bit of P has appeared at H's
+ observational position on L.
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 17]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ + For a given packet P, the 'wire exit time' of P at H on L is the
+ first time T at which all the bits of P have appeared at H's
+ observational position on L.
+
+ Note that intrinsic to the definition is the notion of where on the
+ link we are observing. This distinction is important because for
+ large-latency links, we may obtain very different times depending on
+ exactly where we are observing the link. We could allow the
+ observational position to be an arbitrary location along the link;
+ however, we define it to be in terms of an Internet host because we
+ anticipate in practice that, for IPPM metrics, all such timing will
+ be constrained to be performed by Internet hosts, rather than
+ specialized hardware devices that might be able to monitor a link at
+ locations where a host cannot. This definition also takes care of
+ the problem of links that are comprised of multiple physical
+ channels. Because these multiple channels are not visible at the IP
+ layer, they cannot be individually observed in terms of the above
+ definitions.
+
+ It is possible, though one hopes uncommon, that a packet P might make
+ multiple trips over a particular link L, due to a forwarding loop.
+ These trips might even overlap, depending on the link technology.
+ Whenever this occurs, we define a separate wire time associated with
+ each instance of P seen at H's position on the link. This definition
+ is worth making because it serves as a reminder that notions like
+ *the* unique time a packet passes a point in the Internet are
+ inherently slippery.
+
+ The term wire time has historically been used to loosely denote the
+ time at which a packet appeared on a link, without exactly specifying
+ whether this refers to the first bit, the last bit, or some other
+ consideration. This informal definition is generally already very
+ useful, as it is usually used to make a distinction between when the
+ packet's propagation delays begin and cease to be due to the network
+ rather than the endpoint hosts.
+
+ When appropriate, metrics should be defined in terms of wire times
+ rather than host endpoint times, so that the metric's definition
+ highlights the issue of separating delays due to the host from those
+ due to the network.
+
+ We note that one potential difficulty when dealing with wire times
+ concerns IP fragments. It may be the case that, due to
+ fragmentation, only a portion of a particular packet passes by H's
+ location. Such fragments are themselves legitimate packets and have
+ well-defined wire times associated with them; but the larger IP
+ packet corresponding to their aggregate may not.
+
+
+
+
+Paxson, et. al. Informational [Page 18]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ We also note that these notions have not, to our knowledge, been
+ previously defined in exact terms for Internet traffic.
+ Consequently, we may find with experience that these definitions
+ require some adjustment in the future.
+
+ {Comment: It can sometimes be difficult to measure wire times. One
+ technique is to use a packet filter to monitor traffic on a link.
+ The architecture of these filters often attempts to associate with
+ each packet a timestamp as close to the wire time as possible. We
+ note however that one common source of error is to run the packet
+ filter on one of the endpoint hosts. In this case, it has been
+ observed that some packet filters receive for some packets timestamps
+ corresponding to when the packet was *scheduled* to be injected into
+ the network, rather than when it actually was *sent* out onto the
+ network (wire time). There can be a substantial difference between
+ these two times. A technique for dealing with this problem is to run
+ the packet filter on a separate host that passively monitors the
+ given link. This can be problematic however for some link
+ technologies. See [Pa97] for a discussion of the sorts of errors
+ packet filters can exhibit. Finally, we note that packet filters
+ will often only capture the first fragment of a fragmented IP packet,
+ due to the use of filtering on fields in the IP and transport
+ protocol headers. As we generally desire our measurement
+ methodologies to avoid the complexity of creating fragmented traffic,
+ one strategy for dealing with their presence as detected by a packet
+ filter is to flag that the measured traffic has an unusual form and
+ abandon further analysis of the packet timing.}
+
+
+11. Singletons, Samples, and Statistics
+
+ With experience we have found it useful to introduce a separation
+ between three distinct -- yet related -- notions:
+
+ + By a 'singleton' metric, we refer to metrics that are, in a sense,
+ atomic. For example, a single instance of "bulk throughput
+ capacity" from one host to another might be defined as a singleton
+ metric, even though the instance involves measuring the timing of
+ a number of Internet packets.
+ + By a 'sample' metric, we refer to metrics derived from a given
+ singleton metric by taking a number of distinct instances
+ together. For example, we might define a sample metric of one-way
+ delays from one host to another as an hour's worth of
+ measurements, each made at Poisson intervals with a mean spacing
+ of one second.
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 19]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ + By a 'statistical' metric, we refer to metrics derived from a
+ given sample metric by computing some statistic of the values
+ defined by the singleton metric on the sample. For example, the
+ mean of all the one-way delay values on the sample given above
+ might be defined as a statistical metric.
+
+ By applying these notions of singleton, sample, and statistic in a
+ consistent way, we will be able to reuse lessons learned about how to
+ define samples and statistics on various metrics. The orthogonality
+ among these three notions will thus make all our work more effective
+ and more intelligible by the community.
+
+ In the remainder of this section, we will cover some topics in
+ sampling and statistics that we believe will be important to a
+ variety of metric definitions and measurement efforts.
+
+
+11.1. Methods of Collecting Samples
+
+ The main reason for collecting samples is to see what sort of
+ variations and consistencies are present in the metric being
+ measured. These variations might be with respect to different points
+ in the Internet, or different measurement times. When assessing
+ variations based on a sample, one generally makes an assumption that
+ the sample is "unbiased", meaning that the process of collecting the
+ measurements in the sample did not skew the sample so that it no
+ longer accurately reflects the metric's variations and consistencies.
+
+ One common way of collecting samples is to make measurements
+ separated by fixed amounts of time: periodic sampling. Periodic
+ sampling is particularly attractive because of its simplicity, but it
+ suffers from two potential problems:
+
+ + If the metric being measured itself exhibits periodic behavior,
+ then there is a possibility that the sampling will observe only
+ part of the periodic behavior if the periods happen to agree
+ (either directly, or if one is a multiple of the other). Related
+ to this problem is the notion that periodic sampling can be easily
+ anticipated. Predictable sampling is susceptible to manipulation
+ if there are mechanisms by which a network component's behavior
+ can be temporarily changed such that the sampling only sees the
+ modified behavior.
+ + The act of measurement can perturb what is being measured (for
+ example, injecting measurement traffic into a network alters the
+ congestion level of the network), and repeated periodic
+ perturbations can drive a network into a state of synchronization
+ (cf. [FJ94]), greatly magnifying what might individually be minor
+ effects.
+
+
+
+Paxson, et. al. Informational [Page 20]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ A more sound approach is based on "random additive sampling": samples
+ are separated by independent, randomly generated intervals that have
+ a common statistical distribution G(t) [BM92]. The quality of this
+ sampling depends on the distribution G(t). For example, if G(t)
+ generates a constant value g with probability one, then the sampling
+ reduces to periodic sampling with a period of g.
+
+ Random additive sampling gains significant advantages. In general,
+ it avoids synchronization effects and yields an unbiased estimate of
+ the property being sampled. The only significant drawbacks with it
+ are:
+
+ + it complicates frequency-domain analysis, because the samples do
+ not occur at fixed intervals such as assumed by Fourier-transform
+ techniques; and
+ + unless G(t) is the exponential distribution (see below), sampling
+ still remains somewhat predictable, as discussed for periodic
+ sampling above.
+
+
+11.1.1. Poisson Sampling
+
+ It can be proved that if G(t) is an exponential distribution with
+ rate lambda, that is
+
+ G(t) = 1 - exp(-lambda * t)
+
+ then the arrival of new samples *cannot* be predicted (and, again,
+ the sampling is unbiased). Furthermore, the sampling is
+ asymptotically unbiased even if the act of sampling affects the
+ network's state. Such sampling is referred to as "Poisson sampling".
+ It is not prone to inducing synchronization, it can be used to
+ accurately collect measurements of periodic behavior, and it is not
+ prone to manipulation by anticipating when new samples will occur.
+
+ Because of these valuable properties, we in general prefer that
+ samples of Internet measurements are gathered using Poisson sampling.
+ {Comment: We note, however, that there may be circumstances that
+ favor use of a different G(t). For example, the exponential
+ distribution is unbounded, so its use will on occasion generate
+ lengthy spaces between sampling times. We might instead desire to
+ bound the longest such interval to a maximum value dT, to speed the
+ convergence of the estimation derived from the sampling. This could
+ be done by using
+
+ G(t) = Unif(0, dT)
+
+
+
+
+
+Paxson, et. al. Informational [Page 21]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ that is, the uniform distribution between 0 and dT. This sampling,
+ of course, becomes highly predictable if an interval of nearly length
+ dT has elapsed without a sample occurring.}
+
+ In its purest form, Poisson sampling is done by generating
+ independent, exponentially distributed intervals and gathering a
+ single measurement after each interval has elapsed. It can be shown
+ that if starting at time T one performs Poisson sampling over an
+ interval dT, during which a total of N measurements happen to be
+ made, then those measurements will be uniformly distributed over the
+ interval [T, T+dT]. So another way of conducting Poisson sampling is
+ to pick dT and N and generate N random sampling times uniformly over
+ the interval [T, T+dT]. The two approaches are equivalent, except if
+ N and dT are externally known. In that case, the property of not
+ being able to predict measurement times is weakened (the other
+ properties still hold). The N/dT approach has an advantage that
+ dealing with fixed values of N and dT can be simpler than dealing
+ with a fixed lambda but variable numbers of measurements over
+ variably-sized intervals.
+
+
+11.1.2. Geometric Sampling
+
+ Closely related to Poisson sampling is "geometric sampling", in which
+ external events are measured with a fixed probability p. For
+ example, one might capture all the packets over a link but only
+ record the packet to a trace file if a randomly generated number
+ uniformly distributed between 0 and 1 is less than a given p.
+ Geometric sampling has the same properties of being unbiased and not
+ predictable in advance as Poisson sampling, so if it fits a
+ particular Internet measurement task, it too is sound. See [CPB93]
+ for more discussion.
+
+
+11.1.3. Generating Poisson Sampling Intervals
+
+ To generate Poisson sampling intervals, one first determines the rate
+ lambda at which the singleton measurements will on average be made
+ (e.g., for an average sampling interval of 30 seconds, we have lambda
+ = 1/30, if the units of time are seconds). One then generates a
+ series of exponentially-distributed (pseudo) random numbers E1, E2,
+ ..., En. The first measurement is made at time E1, the next at time
+ E1+E2, and so on.
+
+ One technique for generating exponentially-distributed (pseudo)
+ random numbers is based on the ability to generate U1, U2, ..., Un,
+ (pseudo) random numbers that are uniformly distributed between 0 and
+ 1. Many computers provide libraries that can do this. Given such
+
+
+
+Paxson, et. al. Informational [Page 22]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ Ui, to generate Ei one uses:
+
+ Ei = -log(Ui) / lambda
+
+ where log(Ui) is the natural logarithm of Ui. {Comment: This
+ technique is an instance of the more general "inverse transform"
+ method for generating random numbers with a given distribution.}
+
+ Implementation details:
+
+ There are at least three different methods for approximating Poisson
+ sampling, which we describe here as Methods 1 through 3. Method 1 is
+ the easiest to implement and has the most error, and method 3 is the
+ most difficult to implement and has the least error (potentially
+ none).
+
+ Method 1 is to proceed as follows:
+
+ 1. Generate E1 and wait that long.
+ 2. Perform a measurement.
+ 3. Generate E2 and wait that long.
+ 4. Perform a measurement.
+ 5. Generate E3 and wait that long.
+ 6. Perform a measurement ...
+
+ The problem with this approach is that the "Perform a measurement"
+ steps themselves take time, so the sampling is not done at times E1,
+ E1+E2, etc., but rather at E1, E1+M1+E2, etc., where Mi is the amount
+ of time required for the i'th measurement. If Mi is very small
+ compared to 1/lambda then the potential error introduced by this
+ technique is likewise small. As Mi becomes a non-negligible fraction
+ of 1/lambda, the potential error increases.
+
+ Method 2 attempts to correct this error by taking into account the
+ amount of time required by the measurements (i.e., the Mi's) and
+ adjusting the waiting intervals accordingly:
+
+ 1. Generate E1 and wait that long.
+ 2. Perform a measurement and measure M1, the time it took to do so.
+ 3. Generate E2 and wait for a time E2-M1.
+ 4. Perform a measurement and measure M2 ..
+
+ This approach works fine as long as E{i+1} >= Mi. But if E{i+1} < Mi
+ then it is impossible to wait the proper amount of time. (Note that
+ this case corresponds to needing to perform two measurements
+ simultaneously.)
+
+
+
+
+
+Paxson, et. al. Informational [Page 23]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ Method 3 is generating a schedule of measurement times E1, E1+E2,
+ etc., and then sticking to it:
+
+ 1. Generate E1, E2, ..., En.
+ 2. Compute measurement times T1, T2, ..., Tn, as Ti = E1 + ... + Ei.
+ 3. Arrange that at times T1, T2, ..., Tn, a measurement is made.
+
+ By allowing simultaneous measurements, Method 3 avoids the
+ shortcomings of Methods 1 and 2. If, however, simultaneous
+ measurements interfere with one another, then Method 3 does not gain
+ any benefit and may actually prove worse than Methods 1 or 2.
+
+ For Internet phenomena, it is not known to what degree the
+ inaccuracies of these methods are significant. If the Mi's are much
+ less than 1/lambda, then any of the three should suffice. If the
+ Mi's are less than 1/lambda but perhaps not greatly less, then Method
+ 2 is preferred to Method 1. If simultaneous measurements do not
+ interfere with one another, then Method 3 is preferred, though it can
+ be considerably harder to implement.
+
+
+11.2. Self-Consistency
+
+ A fundamental requirement for a sound measurement methodology is that
+ measurement be made using as few unconfirmed assumptions as possible.
+ Experience has painfully shown how easy it is to make an (often
+ implicit) assumption that turns out to be incorrect. An example is
+ incorporating into a measurement the reading of a clock synchronized
+ to a highly accurate source. It is easy to assume that the clock is
+ therefore accurate; but due to software bugs, a loss of power in the
+ source, or a loss of communication between the source and the clock,
+ the clock could actually be quite inaccurate.
+
+ This is not to argue that one must not make *any* assumptions when
+ measuring, but rather that, to the extent which is practical,
+ assumptions should be tested. One powerful way for doing so involves
+ checking for self-consistency. Such checking applies both to the
+ observed value(s) of the measurement *and the values used by the
+ measurement process itself*. A simple example of the former is that
+ when computing a round trip time, one should check to see if it is
+ negative. Since negative time intervals are non-physical, if it ever
+ is negative that finding immediately flags an error. *These sorts of
+ errors should then be investigated!* It is crucial to determine where
+ the error lies, because only by doing so diligently can we build up
+ faith in a methodology's fundamental soundness. For example, it
+ could be that the round trip time is negative because during the
+ measurement the clock was set backward in the process of
+ synchronizing it with another source. But it could also be that the
+
+
+
+Paxson, et. al. Informational [Page 24]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ measurement program accesses uninitialized memory in one of its
+ computations and, only very rarely, that leads to a bogus
+ computation. This second error is more serious, if the same program
+ is used by others to perform the same measurement, since then they
+ too will suffer from incorrect results. Furthermore, once uncovered
+ it can be completely fixed.
+
+ A more subtle example of testing for self-consistency comes from
+ gathering samples of one-way Internet delays. If one has a large
+ sample of such delays, it may well be highly telling to, for example,
+ fit a line to the pairs of (time of measurement, measured delay), to
+ see if the resulting line has a clearly non-zero slope. If so, a
+ possible interpretation is that one of the clocks used in the
+ measurements is skewed relative to the other. Another interpretation
+ is that the slope is actually due to genuine network effects.
+ Determining which is indeed the case will often be highly
+ illuminating. (See [Pa97] for a discussion of distinguishing between
+ relative clock skew and genuine network effects.) Furthermore, if
+ making this check is part of the methodology, then a finding that the
+ long-term slope is very near zero is positive evidence that the
+ measurements are probably not biased by a difference in skew.
+
+ A final example illustrates checking the measurement process itself
+ for self-consistency. Above we outline Poisson sampling techniques,
+ based on generating exponentially-distributed intervals. A sound
+ measurement methodology would include testing the generated intervals
+ to see whether they are indeed exponentially distributed (and also to
+ see if they suffer from correlation). In the appendix we discuss and
+ give C code for one such technique, a general-purpose, well-regarded
+ goodness-of-fit test called the Anderson-Darling test.
+
+ Finally, we note that what is truly relevant for Poisson sampling of
+ Internet metrics is often not when the measurements began but the
+ wire times corresponding to the measurement process. These could
+ well be different, due to complications on the hosts used to perform
+ the measurement. Thus, even those with complete faith in their
+ pseudo-random number generators and subsequent algorithms are
+ encouraged to consider how they might test the assumptions of each
+ measurement procedure as much as possible.
+
+
+11.3. Defining Statistical Distributions
+
+ One way of describing a collection of measurements (a sample) is as a
+ statistical distribution -- informally, as percentiles. There are
+ several slightly different ways of doing so. In this section we
+ define a standard definition to give uniformity to these
+ descriptions.
+
+
+
+Paxson, et. al. Informational [Page 25]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ The "empirical distribution function" (EDF) of a set of scalar
+ measurements is a function F(x) which for any x gives the fractional
+ proportion of the total measurements that were <= x. If x is less
+ than the minimum value observed, then F(x) is 0. If it is greater or
+ equal to the maximum value observed, then F(x) is 1.
+
+ For example, given the 6 measurements:
+
+ -2, 7, 7, 4, 18, -5
+
+ Then F(-8) = 0, F(-5) = 1/6, F(-5.0001) = 0, F(-4.999) = 1/6, F(7) =
+ 5/6, F(18) = 1, F(239) = 1.
+
+ Note that we can recover the different measured values and how many
+ times each occurred from F(x) -- no information regarding the range
+ in values is lost. Summarizing measurements using histograms, on the
+ other hand, in general loses information about the different values
+ observed, so the EDF is preferred.
+
+ Using either the EDF or a histogram, however, we do lose information
+ regarding the order in which the values were observed. Whether this
+ loss is potentially significant will depend on the metric being
+ measured.
+
+ We will use the term "percentile" to refer to the smallest value of x
+ for which F(x) >= a given percentage. So the 50th percentile of the
+ example above is 4, since F(4) = 3/6 = 50%; the 25th percentile is
+ -2, since F(-5) = 1/6 < 25%, and F(-2) = 2/6 >= 25%; the 100th
+ percentile is 18; and the 0th percentile is -infinity, as is the 15th
+ percentile.
+
+ Care must be taken when using percentiles to summarize a sample,
+ because they can lend an unwarranted appearance of more precision
+ than is really available. Any such summary must include the sample
+ size N, because any percentile difference finer than 1/N is below the
+ resolution of the sample.
+
+ See [DS86] for more details regarding EDF's.
+
+ We close with a note on the common (and important!) notion of median.
+ In statistics, the median of a distribution is defined to be the
+ point X for which the probability of observing a value <= X is equal
+ to the probability of observing a value > X. When estimating the
+ median of a set of observations, the estimate depends on whether the
+ number of observations, N, is odd or even:
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 26]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ + If N is odd, then the 50th percentile as defined above is used as
+ the estimated median.
+ + If N is even, then the estimated median is the average of the
+ central two observations; that is, if the observations are sorted
+ in ascending order and numbered from 1 to N, where N = 2*K, then
+ the estimated median is the average of the (K)'th and (K+1)'th
+ observations.
+
+ Usually the term "estimated" is dropped from the phrase "estimated
+ median" and this value is simply referred to as the "median".
+
+
+11.4. Testing For Goodness-of-Fit
+
+ For some forms of measurement calibration we need to test whether a
+ set of numbers is consistent with those numbers having been drawn
+ from a particular distribution. An example is that to apply a self-
+ consistency check to measurements made using a Poisson process, one
+ test is to see whether the spacing between the sampling times does
+ indeed reflect an exponential distribution; or if the dT/N approach
+ discussed above was used, whether the times are uniformly distributed
+ across [T, dT].
+
+ {Comment: There are at least three possible sets of values we could
+ test: the scheduled packet transmission times, as determined by use
+ of a pseudo-random number generator; user-level timestamps made just
+ before or after the system call for transmitting the packet; and wire
+ times for the packets as recorded using a packet filter. All three
+ of these are potentially informative: failures for the scheduled
+ times to match an exponential distribution indicate inaccuracies in
+ the random number generation; failures for the user-level times
+ indicate inaccuracies in the timers used to schedule transmission;
+ and failures for the wire times indicate inaccuracies in actually
+ transmitting the packets, perhaps due to contention for a shared
+ resource.}
+
+ There are a large number of statistical goodness-of-fit techniques
+ for performing such tests. See [DS86] for a thorough discussion.
+ That reference recommends the Anderson-Darling EDF test as being a
+ good all-purpose test, as well as one that is especially good at
+ detecting deviations from a given distribution in the lower and upper
+ tails of the EDF.
+
+ It is important to understand that the nature of goodness-of-fit
+ tests is that one first selects a "significance level", which is the
+ probability that the test will erroneously declare that the EDF of a
+ given set of measurements fails to match a particular distribution
+ when in fact the measurements do indeed reflect that distribution.
+
+
+
+Paxson, et. al. Informational [Page 27]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ Unless otherwise stated, IPPM goodness-of-fit tests are done using 5%
+ significance. This means that if the test is applied to 100 samples
+ and 5 of those samples are deemed to have failed the test, then the
+ samples are all consistent with the distribution being tested. If
+ significantly more of the samples fail the test, then the assumption
+ that the samples are consistent with the distribution being tested
+ must be rejected. If significantly fewer of the samples fail the
+ test, then the samples have potentially been doctored too well to fit
+ the distribution. Similarly, some goodness-of-fit tests (including
+ Anderson-Darling) can detect whether it is likely that a given sample
+ was doctored. We also use a significance of 5% for this case; that
+ is, the test will report that a given honest sample is "too good to
+ be true" 5% of the time, so if the test reports this finding
+ significantly more often than one time out of twenty, it is an
+ indication that something unusual is occurring.
+
+ The appendix gives sample C code for implementing the Anderson-
+ Darling test, as well as further discussing its use.
+
+ See [Pa94] for a discussion of goodness-of-fit and closeness-of-fit
+ tests in the context of network measurement.
+
+
+12. Avoiding Stochastic Metrics
+
+ When defining metrics applying to a path, subpath, cloud, or other
+ network element, we in general do not define them in stochastic terms
+ (probabilities). We instead prefer a deterministic definition. So,
+ for example, rather than defining a metric about a "packet loss
+ probability between A and B", we would define a metric about a
+ "packet loss rate between A and B". (A measurement given by the
+ first definition might be "0.73", and by the second "73 packets out
+ of 100".)
+
+ We emphasize that the above distinction concerns the *definitions* of
+ *metrics*. It is not intended to apply to what sort of techniques we
+ might use to analyze the results of measurements.
+
+ The reason for this distinction is as follows. When definitions are
+ made in terms of probabilities, there are often hidden assumptions in
+ the definition about a stochastic model of the behavior being
+ measured. The fundamental goal with avoiding probabilities in our
+ metric definitions is to avoid biasing our definitions by these
+ hidden assumptions.
+
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 28]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ For example, an easy hidden assumption to make is that packet loss in
+ a network component due to queueing overflows can be described as
+ something that happens to any given packet with a particular
+ probability. In today's Internet, however, queueing drops are
+ actually usually *deterministic*, and assuming that they should be
+ described probabilistically can obscure crucial correlations between
+ queueing drops among a set of packets. So it's better to explicitly
+ note stochastic assumptions, rather than have them sneak into our
+ definitions implicitly.
+
+ This does *not* mean that we abandon stochastic models for
+ *understanding* network performance! It only means that when defining
+ IP metrics we avoid terms such as "probability" for terms like
+ "proportion" or "rate". We will still use, for example, random
+ sampling in order to estimate probabilities used by stochastic models
+ related to the IP metrics. We also do not rule out the possibility
+ of stochastic metrics when they are truly appropriate (for example,
+ perhaps to model transmission errors caused by certain types of line
+ noise).
+
+
+13. Packets of Type P
+
+ A fundamental property of many Internet metrics is that the value of
+ the metric depends on the type of IP packet(s) used to make the
+ measurement. Consider an IP-connectivity metric: one obtains
+ different results depending on whether one is interested in
+ connectivity for packets destined for well-known TCP ports or
+ unreserved UDP ports, or those with invalid IP checksums, or those
+ with TTL's of 16, for example. In some circumstances these
+ distinctions will be highly interesting (for example, in the presence
+ of firewalls, or RSVP reservations).
+
+ Because of this distinction, we introduce the generic notion of a
+ "packet of type P", where in some contexts P will be explicitly
+ defined (i.e., exactly what type of packet we mean), partially
+ defined (e.g., "with a payload of B octets"), or left generic. Thus
+ we may talk about generic IP-type-P-connectivity or more specific
+ IP-port-HTTP-connectivity. Some metrics and methodologies may be
+ fruitfully defined using generic type P definitions which are then
+ made specific when performing actual measurements.
+
+ Whenever a metric's value depends on the type of the packets involved
+ in the metric, the metric's name will include either a specific type
+ or a phrase such as "type-P". Thus we will not define an "IP-
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 29]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ connectivity" metric but instead an "IP-type-P-connectivity" metric
+ and/or perhaps an "IP-port-HTTP-connectivity" metric. This naming
+ convention serves as an important reminder that one must be conscious
+ of the exact type of traffic being measured.
+
+ A closely related note: it would be very useful to know if a given
+ Internet component treats equally a class C of different types of
+ packets. If so, then any one of those types of packets can be used
+ for subsequent measurement of the component. This suggests we devise
+ a metric or suite of metrics that attempt to determine C.
+
+
+14. Internet Addresses vs. Hosts
+
+ When considering a metric for some path through the Internet, it is
+ often natural to think about it as being for the path from Internet
+ host H1 to host H2. A definition in these terms, though, can be
+ ambiguous, because Internet hosts can be attached to more than one
+ network. In this case, the result of the metric will depend on which
+ of these networks is actually used.
+
+ Because of this ambiguity, usually such definitions should instead be
+ defined in terms of Internet IP addresses. For the common case of a
+ unidirectional path through the Internet, we will use the term "Src"
+ to denote the IP address of the beginning of the path, and "Dst" to
+ denote the IP address of the end.
+
+
+15. Standard-Formed Packets
+
+ Unless otherwise stated, all metric definitions that concern IP
+ packets include an implicit assumption that the packet is *standard
+ formed*. A packet is standard formed if it meets all of the
+ following criteria:
+
+ + Its length as given in the IP header corresponds to the size of
+ the IP header plus the size of the payload.
+ + It includes a valid IP header: the version field is 4 (later, we
+ will expand this to include 6); the header length is >= 5; the
+ checksum is correct.
+ + It is not an IP fragment.
+ + The source and destination addresses correspond to the hosts in
+ question.
+
+
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 30]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ + Either the packet possesses sufficient TTL to travel from the
+ source to the destination if the TTL is decremented by one at each
+ hop, or it possesses the maximum TTL of 255.
+ + It does not contain IP options unless explicitly noted.
+ + If a transport header is present, it too contains a valid checksum
+ and other valid fields.
+
+ We further require that if a packet is described as having a "length
+ of B octets", then 0 <= B <= 65535; and if B is the payload length in
+ octets, then B <= (65535-IP header size in octets).
+
+ So, for example, one might imagine defining an IP connectivity metric
+ as "IP-type-P-connectivity for standard-formed packets with the IP
+ TOS field set to 0", or, more succinctly, "IP-type-P-connectivity
+ with the IP TOS field set to 0", since standard-formed is already
+ implied by convention.
+
+ A particular type of standard-formed packet often useful to consider
+ is the "minimal IP packet from A to B" - this is an IP packet with
+ the following properties:
+
+ + It is standard-formed.
+ + Its data payload is 0 octets.
+ + It contains no options.
+
+ (Note that we do not define its protocol field, as different values
+ may lead to different treatment by the network.)
+
+ When defining IP metrics we keep in mind that no packet smaller or
+ simpler than this can be transmitted over a correctly operating IP
+ network.
+
+
+16. Acknowledgements
+
+ The comments of Brian Carpenter, Bill Cerveny, Padma Krishnaswamy
+ Jeff Sedayao and Howard Stanislevic are appreciated.
+
+
+17. Security Considerations
+
+ This document concerns definitions and concepts related to Internet
+ measurement. We discuss measurement procedures only in high-level
+ terms, regarding principles that lend themselves to sound
+ measurement. As such, the topics discussed do not affect the
+ security of the Internet or of applications which run on it.
+
+
+
+
+
+Paxson, et. al. Informational [Page 31]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ That said, it should be recognized that conducting Internet
+ measurements can raise both security and privacy concerns. Active
+ techniques, in which traffic is injected into the network, can be
+ abused for denial-of-service attacks disguised as legitimate
+ measurement activity. Passive techniques, in which existing traffic
+ is recorded and analyzed, can expose the contents of Internet traffic
+ to unintended recipients. Consequently, the definition of each
+ metric and methodology must include a corresponding discussion of
+ security considerations.
+
+
+18. Appendix
+
+ Below we give routines written in C for computing the Anderson-
+ Darling test statistic (A2) for determining whether a set of values
+ is consistent with a given statistical distribution. Externally, the
+ two main routines of interest are:
+
+ double exp_A2_known_mean(double x[], int n, double mean)
+ double unif_A2_known_range(double x[], int n,
+ double min_val, double max_val)
+
+ Both take as their first argument, x, the array of n values to be
+ tested. (Upon return, the elements of x are sorted.) The remaining
+ parameters characterize the distribution to be used: either the mean
+ (1/lambda), for an exponential distribution, or the lower and upper
+ bounds, for a uniform distribution. The names of the routines stress
+ that these values must be known in advance, and *not* estimated from
+ the data (for example, by computing its sample mean). Estimating the
+ parameters from the data *changes* the significance level of the test
+ statistic. While [DS86] gives alternate significance tables for some
+ instances in which the parameters are estimated from the data, for
+ our purposes we expect that we should indeed know the parameters in
+ advance, since what we will be testing are generally values such as
+ packet sending times that we wish to verify follow a known
+ distribution.
+
+ Both routines return a significance level, as described earlier. This
+ is a value between 0 and 1. The correct use of the routines is to
+ pick in advance the threshold for the significance level to test;
+ generally, this will be 0.05, corresponding to 5%, as also described
+ above. Subsequently, if the routines return a value strictly less
+ than this threshold, then the data are deemed to be inconsistent with
+ the presumed distribution, *subject to an error corresponding to the
+ significance level*. That is, for a significance level of 5%, 5% of
+ the time data that is indeed drawn from the presumed distribution
+ will be erroneously deemed inconsistent.
+
+
+
+
+Paxson, et. al. Informational [Page 32]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ Thus, it is important to bear in mind that if these routines are used
+ frequently, then one will indeed encounter occasional failures, even
+ if the data is unblemished.
+
+ Another important point concerning significance levels is that it is
+ unsound to compare them in order to determine which of two sets of
+ values is a "better" fit to a presumed distribution. Such testing
+ should instead be done using "closeness-of-fit metrics" such as the
+ lambda^2 metric described in [Pa94].
+
+ While the routines provided are for exponential and uniform
+ distributions with known parameters, it is generally straight-forward
+ to write comparable routines for any distribution with known
+ parameters. The heart of the A2 tests lies in a statistic computed
+ for testing whether a set of values is consistent with a uniform
+ distribution between 0 and 1, which we term Unif(0, 1). If we wish
+ to test whether a set of values, X, is consistent with a given
+ distribution G(x), we first compute
+ Y = G_inverse(X)
+ If X is indeed distributed according to G(x), then Y will be
+ distributed according to Unif(0, 1); so by testing Y for consistency
+ with Unif(0, 1), we also test X for consistency with G(x).
+
+ We note, however, that the process of computing Y above might yield
+ values of Y outside the range (0..1). Such values should not occur
+ if X is indeed distributed according to G(x), but easily can occur if
+ it is not. In the latter case, we need to avoid computing the
+ central A2 statistic, since floating-point exceptions may occur if
+ any of the values lie outside (0..1). Accordingly, the routines
+ check for this possibility, and if encountered, return a raw A2
+ statistic of -1. The routine that converts the raw A2 statistic to a
+ significance level likewise propagates this value, returning a
+ significance level of -1. So, any use of these routines must be
+ prepared for a possible negative significance level.
+
+ The last important point regarding use of A2 statistic concerns n,
+ the number of values being tested. If n < 5 then the test is not
+ meaningful, and in this case a significance level of -1 is returned.
+
+ On the other hand, for "real" data the test *gains* power as n
+ becomes larger. It is well known in the statistics community that
+ real data almost never exactly matches a theoretical distribution,
+ even in cases such as rolling dice a great many times (see [Pa94] for
+ a brief discussion and references). The A2 test is sensitive enough
+ that, for sufficiently large sets of real data, the test will almost
+ always fail, because it will manage to detect slight imperfections in
+ the fit of the data to the distribution.
+
+
+
+
+Paxson, et. al. Informational [Page 33]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ For example, we have found that when testing 8,192 measured wire
+ times for packets sent at Poisson intervals, the measurements almost
+ always fail the A2 test. On the other hand, testing 128 measurements
+ failed at 5% significance only about 5% of the time, as expected.
+ Thus, in general, when the test fails, care must be taken to
+ understand why it failed.
+
+ The remainder of this appendix gives C code for the routines
+ mentioned above.
+
+ /* Routines for computing the Anderson-Darling A2 test statistic.
+ *
+ * Implemented based on the description in "Goodness-of-Fit
+ * Techniques," R. D'Agostino and M. Stephens, editors,
+ * Marcel Dekker, Inc., 1986.
+ */
+
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <math.h>
+
+ /* Returns the raw A^2 test statistic for n sorted samples
+ * z[0] .. z[n-1], for z ~ Unif(0,1).
+ */
+ extern double compute_A2(double z[], int n);
+
+ /* Returns the significance level associated with a A^2 test
+ * statistic value of A2, assuming no parameters of the tested
+ * distribution were estimated from the data.
+ */
+ extern double A2_significance(double A2);
+
+ /* Returns the A^2 significance level for testing n observations
+ * x[0] .. x[n-1] against an exponential distribution with the
+ * given mean.
+ *
+ * SIDE EFFECT: the x[0..n-1] are sorted upon return.
+ */
+ extern double exp_A2_known_mean(double x[], int n, double mean);
+
+ /* Returns the A^2 significance level for testing n observations
+ * x[0] .. x[n-1] against the uniform distribution [min_val, max_val].
+ *
+ * SIDE EFFECT: the x[0..n-1] are sorted upon return.
+ */
+ extern double unif_A2_known_range(double x[], int n,
+ double min_val, double max_val);
+
+
+
+
+Paxson, et. al. Informational [Page 34]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ /* Returns a pseudo-random number distributed according to an
+ * exponential distribution with the given mean.
+ */
+ extern double random_exponential(double mean);
+
+
+ /* Helper function used by qsort() to sort double-precision
+ * floating-point values.
+ */
+ static int
+ compare_double(const void *v1, const void *v2)
+ {
+ double d1 = *(double *) v1;
+ double d2 = *(double *) v2;
+
+ if (d1 < d2)
+ return -1;
+ else if (d1 > d2)
+ return 1;
+ else
+ return 0;
+ }
+
+ double
+ compute_A2(double z[], int n)
+ {
+ int i;
+ double sum = 0.0;
+
+ if ( n < 5 )
+ /* Too few values. */
+ return -1.0;
+
+ /* If any of the values are outside the range (0, 1) then
+ * fail immediately (and avoid a possible floating point
+ * exception in the code below).
+ */
+ for (i = 0; i < n; ++i)
+ if ( z[i] <= 0.0 || z[i] >= 1.0 )
+ return -1.0;
+
+ /* Page 101 of D'Agostino and Stephens. */
+ for (i = 1; i <= n; ++i) {
+ sum += (2 * i - 1) * log(z[i-1]);
+ sum += (2 * n + 1 - 2 * i) * log(1.0 - z[i-1]);
+ }
+ return -n - (1.0 / n) * sum;
+ }
+
+
+
+Paxson, et. al. Informational [Page 35]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ double
+ A2_significance(double A2)
+ {
+ /* Page 105 of D'Agostino and Stephens. */
+ if (A2 < 0.0)
+ return A2; /* Bogus A2 value - propagate it. */
+
+ /* Check for possibly doctored values. */
+ if (A2 <= 0.201)
+ return 0.99;
+ else if (A2 <= 0.240)
+ return 0.975;
+ else if (A2 <= 0.283)
+ return 0.95;
+ else if (A2 <= 0.346)
+ return 0.90;
+ else if (A2 <= 0.399)
+ return 0.85;
+
+ /* Now check for possible inconsistency. */
+ if (A2 <= 1.248)
+ return 0.25;
+ else if (A2 <= 1.610)
+ return 0.15;
+ else if (A2 <= 1.933)
+ return 0.10;
+ else if (A2 <= 2.492)
+ return 0.05;
+ else if (A2 <= 3.070)
+ return 0.025;
+ else if (A2 <= 3.880)
+ return 0.01;
+ else if (A2 <= 4.500)
+ return 0.005;
+ else if (A2 <= 6.000)
+ return 0.001;
+ else
+ return 0.0;
+ }
+
+ double
+ exp_A2_known_mean(double x[], int n, double mean)
+ {
+ int i;
+ double A2;
+
+ /* Sort the first n values. */
+ qsort(x, n, sizeof(x[0]), compare_double);
+
+
+
+Paxson, et. al. Informational [Page 36]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+ /* Assuming they match an exponential distribution, transform
+ * them to Unif(0,1).
+ */
+ for (i = 0; i < n; ++i) {
+ x[i] = 1.0 - exp(-x[i] / mean);
+ }
+
+ /* Now make the A^2 test to see if they're truly uniform. */
+ A2 = compute_A2(x, n);
+ return A2_significance(A2);
+ }
+
+ double
+ unif_A2_known_range(double x[], int n, double min_val, double max_val)
+ {
+ int i;
+ double A2;
+ double range = max_val - min_val;
+
+ /* Sort the first n values. */
+ qsort(x, n, sizeof(x[0]), compare_double);
+
+ /* Transform Unif(min_val, max_val) to Unif(0,1). */
+ for (i = 0; i < n; ++i)
+ x[i] = (x[i] - min_val) / range;
+
+ /* Now make the A^2 test to see if they're truly uniform. */
+ A2 = compute_A2(x, n);
+ return A2_significance(A2);
+ }
+
+ double
+ random_exponential(double mean)
+ {
+ return -mean * log1p(-drand48());
+ }
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 37]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+19. References
+
+ [AK97] G. Almes and S. Kalidindi, "A One-way Delay Metric for IPPM",
+ Work in Progress, November 1997.
+
+ [BM92] I. Bilinskis and A. Mikelsons, Randomized Signal Processing,
+ Prentice Hall International, 1992.
+
+ [DS86] R. D'Agostino and M. Stephens, editors, Goodness-of-Fit
+ Techniques, Marcel Dekker, Inc., 1986.
+
+ [CPB93] K. Claffy, G. Polyzos, and H-W. Braun, "Application of
+ Sampling Methodologies to Network Traffic Characterization," Proc.
+ SIGCOMM '93, pp. 194-203, San Francisco, September 1993.
+
+ [FJ94] S. Floyd and V. Jacobson, "The Synchronization of Periodic
+ Routing Messages," IEEE/ACM Transactions on Networking, 2(2), pp.
+ 122-136, April 1994.
+
+ [Mi92] Mills, D., "Network Time Protocol (Version 3) Specification,
+ Implementation and Analysis", RFC 1305, March 1992.
+
+ [Pa94] V. Paxson, "Empirically-Derived Analytic Models of Wide-Area
+ TCP Connections," IEEE/ACM Transactions on Networking, 2(4), pp.
+ 316-336, August 1994.
+
+ [Pa96] V. Paxson, "Towards a Framework for Defining Internet
+ Performance Metrics," Proceedings of INET '96,
+ ftp://ftp.ee.lbl.gov/papers/metrics-framework-INET96.ps.Z
+
+ [Pa97] V. Paxson, "Measurements and Analysis of End-to-End Internet
+ Dynamics," Ph.D. dissertation, U.C. Berkeley, 1997,
+ ftp://ftp.ee.lbl.gov/papers/vp-thesis/dis.ps.gz.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 38]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+20. Authors' Addresses
+
+ Vern Paxson
+ MS 50B/2239
+ Lawrence Berkeley National Laboratory
+ University of California
+ Berkeley, CA 94720
+ USA
+
+ Phone: +1 510/486-7504
+ EMail: vern@ee.lbl.gov
+
+
+ Guy Almes
+ Advanced Network & Services, Inc.
+ 200 Business Park Drive
+ Armonk, NY 10504
+ USA
+
+ Phone: +1 914/765-1120
+ EMail: almes@advanced.org
+
+
+ Jamshid Mahdavi
+ Pittsburgh Supercomputing Center
+ 4400 5th Avenue
+ Pittsburgh, PA 15213
+ USA
+
+ Phone: +1 412/268-6282
+ EMail: mahdavi@psc.edu
+
+
+ Matt Mathis
+ Pittsburgh Supercomputing Center
+ 4400 5th Avenue
+ Pittsburgh, PA 15213
+ USA
+
+ Phone: +1 412/268-3319
+ EMail: mathis@psc.edu
+
+
+
+
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 39]
+
+RFC 2330 Framework for IP Performance Metrics May 1998
+
+
+21. Full Copyright Statement
+
+ Copyright (C) The Internet Society (1998). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assigns.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Paxson, et. al. Informational [Page 40]
+