diff options
Diffstat (limited to 'doc/rfc/rfc2330.txt')
-rw-r--r-- | doc/rfc/rfc2330.txt | 2243 |
1 files changed, 2243 insertions, 0 deletions
diff --git a/doc/rfc/rfc2330.txt b/doc/rfc/rfc2330.txt new file mode 100644 index 0000000..38a7a5f --- /dev/null +++ b/doc/rfc/rfc2330.txt @@ -0,0 +1,2243 @@ + + + + + + +Network Working Group V. Paxson +Request for Comments: 2330 Lawrence Berkeley National Lab +Category: Informational G. Almes + Advanced Network & Services + J. Mahdavi + M. Mathis + Pittsburgh Supercomputer Center + May 1998 + + + Framework for IP Performance Metrics + + +1. Status of this Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + + +2. Copyright Notice + + Copyright (C) The Internet Society (1998). All Rights Reserved. + +Table of Contents + + 1. STATUS OF THIS MEMO.............................................1 + 2. COPYRIGHT NOTICE................................................1 + 3. INTRODUCTION....................................................2 + 4. CRITERIA FOR IP PERFORMANCE METRICS.............................3 + 5. TERMINOLOGY FOR PATHS AND CLOUDS................................4 + 6. FUNDAMENTAL CONCEPTS............................................5 + 6.1 Metrics......................................................5 + 6.2 Measurement Methodology......................................6 + 6.3 Measurements, Uncertainties, and Errors......................7 + 7. METRICS AND THE ANALYTICAL FRAMEWORK............................8 + 8. EMPIRICALLY SPECIFIED METRICS..................................11 + 9. TWO FORMS OF COMPOSITION.......................................12 + 9.1 Spatial Composition of Metrics..............................12 + 9.2 Temporal Composition of Formal Models and Empirical Metrics.13 + 10. ISSUES RELATED TO TIME........................................14 + 10.1 Clock Issues...............................................14 + 10.2 The Notion of "Wire Time"..................................17 + 11. SINGLETONS, SAMPLES, AND STATISTICS............................19 + 11.1 Methods of Collecting Samples..............................20 + 11.1.1 Poisson Sampling........................................21 + 11.1.2 Geometric Sampling......................................22 + 11.1.3 Generating Poisson Sampling Intervals...................22 + + + +Paxson, et. al. Informational [Page 1] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + 11.2 Self-Consistency...........................................24 + 11.3 Defining Statistical Distributions.........................25 + 11.4 Testing For Goodness-of-Fit................................27 + 12. AVOIDING STOCHASTIC METRICS....................................28 + 13. PACKETS OF TYPE P..............................................29 + 14. INTERNET ADDRESSES VS. HOSTS...................................30 + 15. STANDARD-FORMED PACKETS........................................30 + 16. ACKNOWLEDGEMENTS...............................................31 + 17. SECURITY CONSIDERATIONS........................................31 + 18. APPENDIX.......................................................32 + 19. REFERENCES.....................................................38 + 20. AUTHORS' ADDRESSES.............................................39 + 21. FULL COPYRIGHT STATEMENT.......................................40 + + +3. Introduction + + The purpose of this memo is to define a general framework for + particular metrics to be developed by the IETF's IP Performance + Metrics effort, begun by the Benchmarking Methodology Working Group + (BMWG) of the Operational Requirements Area, and being continued by + the IP Performance Metrics Working Group (IPPM) of the Transport + Area. + + We begin by laying out several criteria for the metrics that we + adopt. These criteria are designed to promote an IPPM effort that + will maximize an accurate common understanding by Internet users and + Internet providers of the performance and reliability both of end- + to-end paths through the Internet and of specific 'IP clouds' that + comprise portions of those paths. + + We next define some Internet vocabulary that will allow us to speak + clearly about Internet components such as routers, paths, and clouds. + + We then define the fundamental concepts of 'metric' and 'measurement + methodology', which allow us to speak clearly about measurement + issues. Given these concepts, we proceed to discuss the important + issue of measurement uncertainties and errors, and develop a key, + somewhat subtle notion of how they relate to the analytical framework + shared by many aspects of the Internet engineering discipline. We + then introduce the notion of empirically defined metrics, and finish + this part of the document with a general discussion of how metrics + can be 'composed'. + + The remainder of the document deals with a variety of issues related + to defining sound metrics and methodologies: how to deal with + imperfect clocks; the notion of 'wire time' as distinct from 'host + time'; how to aggregate sets of singleton metrics into samples and + + + +Paxson, et. al. Informational [Page 2] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + derive sound statistics from those samples; why it is recommended to + avoid thinking about Internet properties in probabilistic terms (such + as the probability that a packet is dropped), since these terms often + include implicit assumptions about how the network behaves; the + utility of defining metrics in terms of packets of a generic type; + the benefits of preferring IP addresses to DNS host names; and the + notion of 'standard-formed' packets. An appendix discusses the + Anderson-Darling test for gauging whether a set of values matches a + given statistical distribution, and gives C code for an + implementation of the test. + + In some sections of the memo, we will surround some commentary text + with the brackets {Comment: ... }. We stress that this commentary is + only commentary, and is not itself part of the framework document or + a proposal of particular metrics. In some cases this commentary will + discuss some of the properties of metrics that might be envisioned, + but the reader should assume that any such discussion is intended + only to shed light on points made in the framework document, and not + to suggest any specific metrics. + + +4. Criteria for IP Performance Metrics + + The overarching goal of the IP Performance Metrics effort is to + achieve a situation in which users and providers of Internet + transport service have an accurate common understanding of the + performance and reliability of the Internet component 'clouds' that + they use/provide. + + To achieve this, performance and reliability metrics for paths + through the Internet must be developed. In several IETF meetings + criteria for these metrics have been specified: + + + The metrics must be concrete and well-defined, + + A methodology for a metric should have the property that it is + repeatable: if the methodology is used multiple times under + identical conditions, the same measurements should result in the + same measurements. + + The metrics must exhibit no bias for IP clouds implemented with + identical technology, + + The metrics must exhibit understood and fair bias for IP clouds + implemented with non-identical technology, + + The metrics must be useful to users and providers in understanding + the performance they experience or provide, + + + + + + + +Paxson, et. al. Informational [Page 3] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + + The metrics must avoid inducing artificial performance goals. + + +5. Terminology for Paths and Clouds + + The following list defines terms that need to be precise in the + development of path metrics. We begin with low-level notions of + 'host', 'router', and 'link', then proceed to define the notions of + 'path', 'IP cloud', and 'exchange' that allow us to segment a path + into relevant pieces. + + host A computer capable of communicating using the Internet + protocols; includes "routers". + + link A single link-level connection between two (or more) hosts; + includes leased lines, ethernets, frame relay clouds, etc. + + routerA host which facilitates network-level communication between + hosts by forwarding IP packets. + + path A sequence of the form < h0, l1, h1, ..., ln, hn >, where n >= + 0, each hi is a host, each li is a link between hi-1 and hi, + each h1...hn-1 is a router. A pair <li, hi> is termed a 'hop'. + In an appropriate operational configuration, the links and + routers in the path facilitate network-layer communication of + packets from h0 to hn. Note that path is a unidirectional + concept. + + subpath + Given a path, a subpath is any subsequence of the given path + which is itself a path. (Thus, the first and last element of a + subpath is a host.) + + cloudAn undirected (possibly cyclic) graph whose vertices are routers + and whose edges are links that connect pairs of routers. + Formally, ethernets, frame relay clouds, and other links that + connect more than two routers are modelled as fully-connected + meshes of graph edges. Note that to connect to a cloud means to + connect to a router of the cloud over a link; this link is not + itself part of the cloud. + + exchange + A special case of a link, an exchange directly connects either a + host to a cloud and/or one cloud to another cloud. + + cloud subpath + A subpath of a given path, all of whose hosts are routers of a + given cloud. + + + +Paxson, et. al. Informational [Page 4] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + path digest + A sequence of the form < h0, e1, C1, ..., en, hn >, where n >= + 0, h0 and hn are hosts, each e1 ... en is an exchange, and each + C1 ... Cn-1 is a cloud subpath. + +6. Fundamental Concepts + + +6.1. Metrics + + In the operational Internet, there are several quantities related to + the performance and reliability of the Internet that we'd like to + know the value of. When such a quantity is carefully specified, we + term the quantity a metric. We anticipate that there will be + separate RFCs for each metric (or for each closely related group of + metrics). + + In some cases, there might be no obvious means to effectively measure + the metric; this is allowed, and even understood to be very useful in + some cases. It is required, however, that the specification of the + metric be as clear as possible about what quantity is being + specified. Thus, difficulty in practical measurement is sometimes + allowed, but ambiguity in meaning is not. + + Each metric will be defined in terms of standard units of + measurement. The international metric system will be used, with the + following points specifically noted: + + + When a unit is expressed in simple meters (for distance/length) or + seconds (for duration), appropriate related units based on + thousands or thousandths of acceptable units are acceptable. + Thus, distances expressed in kilometers (km), durations expressed + in milliseconds (ms), or microseconds (us) are allowed, but not + centimeters (because the prefix is not in terms of thousands or + thousandths). + + When a unit is expressed in a combination of units, appropriate + related units based on thousands or thousandths of acceptable + units are acceptable, but all such thousands/thousandths must be + grouped at the beginning. Thus, kilo-meters per second (km/s) is + allowed, but meters per millisecond is not. + + The unit of information is the bit. + + When metric prefixes are used with bits or with combinations + including bits, those prefixes will have their metric meaning + (related to decimal 1000), and not the meaning conventional with + computer storage (related to decimal 1024). In any RFC that + defines a metric whose units include bits, this convention will be + followed and will be repeated to ensure clarity for the reader. + + + + +Paxson, et. al. Informational [Page 5] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + + When a time is given, it will be expressed in UTC. + + Note that these points apply to the specifications for metrics and + not, for example, to packet formats where octets will likely be used + in preference/addition to bits. + + Finally, we note that some metrics may be defined purely in terms of + other metrics; such metrics are call 'derived metrics'. + + +6.2. Measurement Methodology + + For a given set of well-defined metrics, a number of distinct + measurement methodologies may exist. A partial list includes: + + + Direct measurement of a performance metric using injected test + traffic. Example: measurement of the round-trip delay of an IP + packet of a given size over a given route at a given time. + + Projection of a metric from lower-level measurements. Example: + given accurate measurements of propagation delay and bandwidth for + each step along a path, projection of the complete delay for the + path for an IP packet of a given size. + + Estimation of a constituent metric from a set of more aggregated + measurements. Example: given accurate measurements of delay for a + given one-hop path for IP packets of different sizes, estimation + of propagation delay for the link of that one-hop path. + + Estimation of a given metric at one time from a set of related + metrics at other times. Example: given an accurate measurement of + flow capacity at a past time, together with a set of accurate + delay measurements for that past time and the current time, and + given a model of flow dynamics, estimate the flow capacity that + would be observed at the current time. + + This list is by no means exhaustive. The purpose is to point out the + variety of measurement techniques. + + When a given metric is specified, a given measurement approach might + be noted and discussed. That approach, however, is not formally part + of the specification. + + A methodology for a metric should have the property that it is + repeatable: if the methodology is used multiple times under identical + conditions, it should result in consistent measurements. + + Backing off a little from the word 'identical' in the previous + paragraph, we could more accurately use the word 'continuity' to + describe a property of a given methodology: a methodology for a given + metric exhibits continuity if, for small variations in conditions, it + + + +Paxson, et. al. Informational [Page 6] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + results in small variations in the resulting measurements. Slightly + more precisely, for every positive epsilon, there exists a positive + delta, such that if two sets of conditions are within delta of each + other, then the resulting measurements will be within epsilon of each + other. At this point, this should be taken as a heuristic driving + our intuition about one kind of robustness property rather than as a + precise notion. + + A metric that has at least one methodology that exhibits continuity + is said itself to exhibit continuity. + + Note that some metrics, such as hop-count along a path, are integer- + valued and therefore cannot exhibit continuity in quite the sense + given above. + + Note further that, in practice, it may not be practical to know (or + be able to quantify) the conditions relevant to a measurement at a + given time. For example, since the instantaneous load (in packets to + be served) at a given router in a high-speed wide-area network can + vary widely over relatively brief periods and will be very hard for + an external observer to quantify, various statistics of a given + metric may be more repeatable, or may better exhibit continuity. In + that case those particular statistics should be specified when the + metric is specified. + + Finally, some measurement methodologies may be 'conservative' in the + sense that the act of measurement does not modify, or only slightly + modifies, the value of the performance metric the methodology + attempts to measure. {Comment: for example, in a wide-are high-speed + network under modest load, a test using several small 'ping' packets + to measure delay would likely not interfere (much) with the delay + properties of that network as observed by others. The corresponding + statement about tests using a large flow to measure flow capacity + would likely fail.} + + +6.3. Measurements, Uncertainties, and Errors + + Even the very best measurement methodologies for the very most well + behaved metrics will exhibit errors. Those who develop such + measurement methodologies, however, should strive to: + + + + + + + + + + +Paxson, et. al. Informational [Page 7] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + + minimize their uncertainties/errors, + + understand and document the sources of uncertainty/error, and + + quantify the amounts of uncertainty/error. + + For example, when developing a method for measuring delay, understand + how any errors in your clocks introduce errors into your delay + measurement, and quantify this effect as well as you can. In some + cases, this will result in a requirement that a clock be at least up + to a certain quality if it is to be used to make a certain + measurement. + + As a second example, consider the timing error due to measurement + overheads within the computer making the measurement, as opposed to + delays due to the Internet component being measured. The former is a + measurement error, while the latter reflects the metric of interest. + Note that one technique that can help avoid this overhead is the use + of a packet filter/sniffer, running on a separate computer that + records network packets and timestamps them accurately (see the + discussion of 'wire time' below). The resulting trace can then be + analyzed to assess the test traffic, minimizing the effect of + measurement host delays, or at least allowing those delays to be + accounted for. We note that this technique may prove beneficial even + if the packet filter/sniffer runs on the same machine, because such + measurements generally provide 'kernel-level' timestamping as opposed + to less-accurate 'application-level' timestamping. + + Finally, we note that derived metrics (defined above) or metrics that + exhibit spatial or temporal composition (defined below) offer + particular occasion for the analysis of measurement uncertainties, + namely how the uncertainties propagate (conceptually) due to the + derivation or composition. + + +7. Metrics and the Analytical Framework + + As the Internet has evolved from the early packet-switching studies + of the 1960s, the Internet engineering community has evolved a common + analytical framework of concepts. This analytical framework, or A- + frame, used by designers and implementers of protocols, by those + involved in measurement, and by those who study computer network + performance using the tools of simulation and analysis, has great + advantage to our work. A major objective here is to generate network + characterizations that are consistent in both analytical and + practical settings, since this will maximize the chances that non- + empirical network study can be better correlated with, and used to + further our understanding of, real network behavior. + + + + + +Paxson, et. al. Informational [Page 8] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + Whenever possible, therefore, we would like to develop and leverage + off of the A-frame. Thus, whenever a metric to be specified is + understood to be closely related to concepts within the A-frame, we + will attempt to specify the metric in the A-frame's terms. In such a + specification we will develop the A-frame by precisely defining the + concepts needed for the metric, then leverage off of the A-frame by + defining the metric in terms of those concepts. + + Such a metric will be called an 'analytically specified metric' or, + more simply, an analytical metric. + + {Comment: Examples of such analytical metrics might include: + +propagation time of a link + The time, in seconds, required by a single bit to travel from the + output port on one Internet host across a single link to another + Internet host. + +bandwidth of a link for packets of size k + The capacity, in bits/second, where only those bits of the IP + packet are counted, for packets of size k bytes. + +routeThe path, as defined in Section 5, from A to B at a given time. + +hop count of a route + The value 'n' of the route path. + } + + Note that we make no a priori list of just what A-frame concepts + will emerge in these specifications, but we do encourage their use + and urge that they be carefully specified so that, as our set of + metrics develops, so will a specified set of A-frame concepts + technically consistent with each other and consonant with the + common understanding of those concepts within the general Internet + community. + + These A-frame concepts will be intended to abstract from actual + Internet components in such a way that: + + + the essential function of the component is retained, + + properties of the component relevant to the metrics we aim to + create are retained, + + a subset of these component properties are potentially defined as + analytical metrics, and + + + + + + + +Paxson, et. al. Informational [Page 9] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + + those properties of actual Internet components not relevant to + defining the metrics we aim to create are dropped. + + For example, when considering a router in the context of packet + forwarding, we might model the router as a component that receives + packets on an input link, queues them on a FIFO packet queue of + finite size, employs tail-drop when the packet queue is full, and + forwards them on an output link. The transmission speed (in + bits/second) of the input and output links, the latency in the router + (in seconds), and the maximum size of the packet queue (in bits) are + relevant analytical metrics. + + In some cases, such analytical metrics used in relation to a router + will be very closely related to specific metrics of the performance + of Internet paths. For example, an obvious formula (L + P/B) + involving the latency in the router (L), the packet size (in bits) + (P), and the transmission speed of the output link (B) might closely + approximate the increase in packet delay due to the insertion of a + given router along a path. + + We stress, however, that well-chosen and well-specified A-frame + concepts and their analytical metrics will support more general + metric creation efforts in less obvious ways. + + {Comment: for example, when considering the flow capacity of a path, + it may be of real value to be able to model each of the routers along + the path as packet forwarders as above. Techniques for estimating + the flow capacity of a path might use the maximum packet queue size + as a parameter in decidedly non-obvious ways. For example, as the + maximum queue size increases, so will the ability of the router to + continuously move traffic along an output link despite fluctuations + in traffic from an input link. Estimating this increase, however, + remains a research topic.} + + Note that, when we specify A-frame concepts and analytical metrics, + we will inevitably make simplifying assumptions. The key role of + these concepts is to abstract the properties of the Internet + components relevant to given metrics. Judgement is required to avoid + making assumptions that bias the modeling and metric effort toward + one kind of design. + + {Comment: for example, routers might not use tail-drop, even though + tail-drop might be easier to model analytically.} + + Finally, note that different elements of the A-frame might well make + different simplifying assumptions. For example, the abstraction of a + router used to further the definition of path delay might treat the + router's packet queue as a single FIFO queue, but the abstraction of + + + +Paxson, et. al. Informational [Page 10] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + a router used to further the definition of the handling of an RSVP- + enabled packet might treat the router's packet queue as supporting + bounded delay -- a contradictory assumption. This is not to say that + we make contradictory assumptions at the same time, but that two + different parts of our work might refine the simpler base concept in + two divergent ways for different purposes. + + {Comment: in more mathematical terms, we would say that the A-frame + taken as a whole need not be consistent; but the set of particular + A-frame elements used to define a particular metric must be.} + + +8. Empirically Specified Metrics + + There are useful performance and reliability metrics that do not fit + so neatly into the A-frame, usually because the A-frame lacks the + detail or power for dealing with them. For example, "the best flow + capacity achievable along a path using an RFC-2001-compliant TCP" + would be good to be able to measure, but we have no analytical + framework of sufficient richness to allow us to cast that flow + capacity as an analytical metric. + + These notions can still be well specified by instead describing a + reference methodology for measuring them. + + Such a metric will be called an 'empirically specified metric', or + more simply, an empirical metric. + + Such empirical metrics should have three properties: + + + we should have a clear definition for each in terms of Internet + components, + + we should have at least one effective means to measure them, and + + to the extent possible, we should have an (necessarily incomplete) + understanding of the metric in terms of the A-frame so that we can + use our measurements to reason about the performance and + reliability of A-frame components and of aggregations of A-frame + components. + + + + + + + + + + + + + +Paxson, et. al. Informational [Page 11] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + +9. Two Forms of Composition + + +9.1. Spatial Composition of Metrics + + In some cases, it may be realistic and useful to define metrics in + such a fashion that they exhibit spatial composition. + + By spatial composition, we mean a characteristic of some path + metrics, in which the metric as applied to a (complete) path can also + be defined for various subpaths, and in which the appropriate A-frame + concepts for the metric suggest useful relationships between the + metric applied to these various subpaths (including the complete + path, the various cloud subpaths of a given path digest, and even + single routers along the path). The effectiveness of spatial + composition depends: + + + on the usefulness in analysis of these relationships as applied to + the relevant A-frame components, and + + on the practical use of the corresponding relationships as applied + to metrics and to measurement methodologies. + + {Comment: for example, consider some metric for delay of a 100-byte + packet across a path P, and consider further a path digest <h0, e1, + C1, ..., en, hn> of P. The definition of such a metric might include + a conjecture that the delay across P is very nearly the sum of the + corresponding metric across the exchanges (ei) and clouds (Ci) of the + given path digest. The definition would further include a note on + how a corresponding relation applies to relevant A-frame components, + both for the path P and for the exchanges and clouds of the path + digest.} + + When the definition of a metric includes a conjecture that the metric + across the path is related to the metric across the subpaths of the + path, that conjecture constitutes a claim that the metric exhibits + spatial composition. The definition should then include: + + + + + + + + + + + + + + + +Paxson, et. al. Informational [Page 12] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + + the specific conjecture applied to the metric, + + a justification of the practical utility of the composition in + terms of making accurate measurements of the metric on the path, + + a justification of the usefulness of the composition in terms of + making analysis of the path using A-frame concepts more effective, + and + + an analysis of how the conjecture could be incorrect. + + +9.2. Temporal Composition of Formal Models and Empirical Metrics + + In some cases, it may be realistic and useful to define metrics in + such a fashion that they exhibit temporal composition. + + By temporal composition, we mean a characteristic of some path + metric, in which the metric as applied to a path at a given time T is + also defined for various times t0 < t1 < ... < tn < T, and in which + the appropriate A-frame concepts for the metric suggests useful + relationships between the metric applied at times t0, ..., tn and the + metric applied at time T. The effectiveness of temporal composition + depends: + + + on the usefulness in analysis of these relationships as applied to + the relevant A-frame components, and + + on the practical use of the corresponding relationships as applied + to metrics and to measurement methodologies. + + {Comment: for example, consider a metric for the expected flow + capacity across a path P during the five-minute period surrounding + the time T, and suppose further that we have the corresponding values + for each of the four previous five-minute periods t0, t1, t2, and t3. + The definition of such a metric might include a conjecture that the + flow capacity at time T can be estimated from a certain kind of + extrapolation from the values of t0, ..., t3. The definition would + further include a note on how a corresponding relation applies to + relevant A-frame components. + + Note: any (spatial or temporal) compositions involving flow capacity + are likely to be subtle, and temporal compositions are generally more + subtle than spatial compositions, so the reader should understand + that the foregoing example is intentionally naive.} + + When the definition of a metric includes a conjecture that the metric + across the path at a given time T is related to the metric across the + path for a set of other times, that conjecture constitutes a claim + that the metric exhibits temporal composition. The definition should + then include: + + + + +Paxson, et. al. Informational [Page 13] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + + the specific conjecture applied to the metric, + + a justification of the practical utility of the composition in + terms of making accurate measurements of the metric on the path, + and + + a justification of the usefulness of the composition in terms of + making analysis of the path using A-frame concepts more effective. + + +10. Issues related to Time + + +10.1. Clock Issues + + Measurements of time lie at the heart of many Internet metrics. + Because of this, it will often be crucial when designing a + methodology for measuring a metric to understand the different types + of errors and uncertainties introduced by imperfect clocks. In this + section we define terminology for discussing the characteristics of + clocks and touch upon related measurement issues which need to be + addressed by any sound methodology. + + The Network Time Protocol (NTP; RFC 1305) defines a nomenclature for + discussing clock characteristics, which we will also use when + appropriate [Mi92]. The main goal of NTP is to provide accurate + timekeeping over fairly long time scales, such as minutes to days, + while for measurement purposes often what is more important is + short-term accuracy, between the beginning of the measurement and the + end, or over the course of gathering a body of measurements (a + sample). This difference in goals sometimes leads to different + definitions of terminology as well, as discussed below. + + To begin, we define a clock's "offset" at a particular moment as the + difference between the time reported by the clock and the "true" time + as defined by UTC. If the clock reports a time Tc and the true time + is Tt, then the clock's offset is Tc - Tt. + + We will refer to a clock as "accurate" at a particular moment if the + clock's offset is zero, and more generally a clock's "accuracy" is + how close the absolute value of the offset is to zero. For NTP, + accuracy also includes a notion of the frequency of the clock; for + our purposes, we instead incorporate this notion into that of "skew", + because we define accuracy in terms of a single moment in time rather + than over an interval of time. + + A clock's "skew" at a particular moment is the frequency difference + (first derivative of its offset with respect to true time) between + the clock and true time. + + + + +Paxson, et. al. Informational [Page 14] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + As noted in RFC 1305, real clocks exhibit some variation in skew. + That is, the second derivative of the clock's offset with respect to + true time is generally non-zero. In keeping with RFC 1305, we define + this quantity as the clock's "drift". + + A clock's "resolution" is the smallest unit by which the clock's time + is updated. It gives a lower bound on the clock's uncertainty. + (Note that clocks can have very fine resolutions and yet be wildly + inaccurate.) Resolution is defined in terms of seconds. However, + resolution is relative to the clock's reported time and not to true + time, so for example a resolution of 10 ms only means that the clock + updates its notion of time in 0.01 second increments, not that this + is the true amount of time between updates. + + {Comment: Systems differ on how an application interface to the clock + reports the time on subsequent calls during which the clock has not + advanced. Some systems simply return the same unchanged time as + given for previous calls. Others may add a small increment to the + reported time to maintain monotone-increasing timestamps. For + systems that do the latter, we do *not* consider these small + increments when defining the clock's resolution. They are instead an + impediment to assessing the clock's resolution, since a natural + method for doing so is to repeatedly query the clock to determine the + smallest non-zero difference in reported times.} + + It is expected that a clock's resolution changes only rarely (for + example, due to a hardware upgrade). + + There are a number of interesting metrics for which some natural + measurement methodologies involve comparing times reported by two + different clocks. An example is one-way packet delay [AK97]. Here, + the time required for a packet to travel through the network is + measured by comparing the time reported by a clock at one end of the + packet's path, corresponding to when the packet first entered the + network, with the time reported by a clock at the other end of the + path, corresponding to when the packet finished traversing the + network. + + We are thus also interested in terminology for describing how two + clocks C1 and C2 compare. To do so, we introduce terms related to + those above in which the notion of "true time" is replaced by the + time as reported by clock C1. For example, clock C2's offset + relative to C1 at a particular moment is Tc2 - Tc1, the instantaneous + difference in time reported by C2 and C1. To disambiguate between + the use of the terms to compare two clocks versus the use of the + terms to compare to true time, we will in the former case use the + phrase "relative". So the offset defined earlier in this paragraph + is the "relative offset" between C2 and C1. + + + +Paxson, et. al. Informational [Page 15] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + When comparing clocks, the analog of "resolution" is not "relative + resolution", but instead "joint resolution", which is the sum of the + resolutions of C1 and C2. The joint resolution then indicates a + conservative lower bound on the accuracy of any time intervals + computed by subtracting timestamps generated by one clock from those + generated by the other. + + If two clocks are "accurate" with respect to one another (their + relative offset is zero), we will refer to the pair of clocks as + "synchronized". Note that clocks can be highly synchronized yet + arbitrarily inaccurate in terms of how well they tell true time. + This point is important because for many Internet measurements, + synchronization between two clocks is more important than the + accuracy of the clocks. The is somewhat true of skew, too: as long + as the absolute skew is not too great, then minimal relative skew is + more important, as it can induce systematic trends in packet transit + times measured by comparing timestamps produced by the two clocks. + + These distinctions arise because for Internet measurement what is + often most important are differences in time as computed by comparing + the output of two clocks. The process of computing the difference + removes any error due to clock inaccuracies with respect to true + time; but it is crucial that the differences themselves accurately + reflect differences in true time. + + Measurement methodologies will often begin with the step of assuring + that two clocks are synchronized and have minimal skew and drift. + {Comment: An effective way to assure these conditions (and also clock + accuracy) is by using clocks that derive their notion of time from an + external source, rather than only the host computer's clock. (These + latter are often subject to large errors.) It is further preferable + that the clocks directly derive their time, for example by having + immediate access to a GPS (Global Positioning System) unit.} + + Two important concerns arise if the clocks indirectly derive their + time using a network time synchronization protocol such as NTP: + + + First, NTP's accuracy depends in part on the properties + (particularly delay) of the Internet paths used by the NTP peers, + and these might be exactly the properties that we wish to measure, + so it would be unsound to use NTP to calibrate such measurements. + + Second, NTP focuses on clock accuracy, which can come at the + expense of short-term clock skew and drift. For example, when a + host's clock is indirectly synchronized to a time source, if the + synchronization intervals occur infrequently, then the host will + sometimes be faced with the problem of how to adjust its current, + incorrect time, Ti, with a considerably different, more accurate + time it has just learned, Ta. Two general ways in which this is + + + +Paxson, et. al. Informational [Page 16] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + done are to either immediately set the current time to Ta, or to + adjust the local clock's update frequency (hence, its skew) so + that at some point in the future the local time Ti' will agree + with the more accurate time Ta'. The first mechanism introduces + discontinuities and can also violate common assumptions that + timestamps are monotone increasing. If the host's clock is set + backward in time, sometimes this can be easily detected. If the + clock is set forward in time, this can be harder to detect. The + skew induced by the second mechanism can lead to considerable + inaccuracies when computing differences in time, as discussed + above. + + To illustrate why skew is a crucial concern, consider samples of + one-way delays between two Internet hosts made at one minute + intervals. The true transmission delay between the hosts might + plausibly be on the order of 50 ms for a transcontinental path. If + the skew between the two clocks is 0.01%, that is, 1 part in 10,000, + then after 10 minutes of observation the error introduced into the + measurement is 60 ms. Unless corrected, this error is enough to + completely wipe out any accuracy in the transmission delay + measurement. Finally, we note that assessing skew errors between + unsynchronized network clocks is an open research area. (See [Pa97] + for a discussion of detecting and compensating for these sorts of + errors.) This shortcoming makes use of a solid, independent clock + source such as GPS especially desirable. + + +10.2. The Notion of "Wire Time" + + Internet measurement is often complicated by the use of Internet + hosts themselves to perform the measurement. These hosts can + introduce delays, bottlenecks, and the like that are due to hardware + or operating system effects and have nothing to do with the network + behavior we would like to measure. This problem is particularly + acute when timestamping of network events occurs at the application + level. + + In order to provide a general way of talking about these effects, we + introduce two notions of "wire time". These notions are only defined + in terms of an Internet host H observing an Internet link L at a + particular location: + + + For a given packet P, the 'wire arrival time' of P at H on L is + the first time T at which any bit of P has appeared at H's + observational position on L. + + + + + + +Paxson, et. al. Informational [Page 17] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + + For a given packet P, the 'wire exit time' of P at H on L is the + first time T at which all the bits of P have appeared at H's + observational position on L. + + Note that intrinsic to the definition is the notion of where on the + link we are observing. This distinction is important because for + large-latency links, we may obtain very different times depending on + exactly where we are observing the link. We could allow the + observational position to be an arbitrary location along the link; + however, we define it to be in terms of an Internet host because we + anticipate in practice that, for IPPM metrics, all such timing will + be constrained to be performed by Internet hosts, rather than + specialized hardware devices that might be able to monitor a link at + locations where a host cannot. This definition also takes care of + the problem of links that are comprised of multiple physical + channels. Because these multiple channels are not visible at the IP + layer, they cannot be individually observed in terms of the above + definitions. + + It is possible, though one hopes uncommon, that a packet P might make + multiple trips over a particular link L, due to a forwarding loop. + These trips might even overlap, depending on the link technology. + Whenever this occurs, we define a separate wire time associated with + each instance of P seen at H's position on the link. This definition + is worth making because it serves as a reminder that notions like + *the* unique time a packet passes a point in the Internet are + inherently slippery. + + The term wire time has historically been used to loosely denote the + time at which a packet appeared on a link, without exactly specifying + whether this refers to the first bit, the last bit, or some other + consideration. This informal definition is generally already very + useful, as it is usually used to make a distinction between when the + packet's propagation delays begin and cease to be due to the network + rather than the endpoint hosts. + + When appropriate, metrics should be defined in terms of wire times + rather than host endpoint times, so that the metric's definition + highlights the issue of separating delays due to the host from those + due to the network. + + We note that one potential difficulty when dealing with wire times + concerns IP fragments. It may be the case that, due to + fragmentation, only a portion of a particular packet passes by H's + location. Such fragments are themselves legitimate packets and have + well-defined wire times associated with them; but the larger IP + packet corresponding to their aggregate may not. + + + + +Paxson, et. al. Informational [Page 18] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + We also note that these notions have not, to our knowledge, been + previously defined in exact terms for Internet traffic. + Consequently, we may find with experience that these definitions + require some adjustment in the future. + + {Comment: It can sometimes be difficult to measure wire times. One + technique is to use a packet filter to monitor traffic on a link. + The architecture of these filters often attempts to associate with + each packet a timestamp as close to the wire time as possible. We + note however that one common source of error is to run the packet + filter on one of the endpoint hosts. In this case, it has been + observed that some packet filters receive for some packets timestamps + corresponding to when the packet was *scheduled* to be injected into + the network, rather than when it actually was *sent* out onto the + network (wire time). There can be a substantial difference between + these two times. A technique for dealing with this problem is to run + the packet filter on a separate host that passively monitors the + given link. This can be problematic however for some link + technologies. See [Pa97] for a discussion of the sorts of errors + packet filters can exhibit. Finally, we note that packet filters + will often only capture the first fragment of a fragmented IP packet, + due to the use of filtering on fields in the IP and transport + protocol headers. As we generally desire our measurement + methodologies to avoid the complexity of creating fragmented traffic, + one strategy for dealing with their presence as detected by a packet + filter is to flag that the measured traffic has an unusual form and + abandon further analysis of the packet timing.} + + +11. Singletons, Samples, and Statistics + + With experience we have found it useful to introduce a separation + between three distinct -- yet related -- notions: + + + By a 'singleton' metric, we refer to metrics that are, in a sense, + atomic. For example, a single instance of "bulk throughput + capacity" from one host to another might be defined as a singleton + metric, even though the instance involves measuring the timing of + a number of Internet packets. + + By a 'sample' metric, we refer to metrics derived from a given + singleton metric by taking a number of distinct instances + together. For example, we might define a sample metric of one-way + delays from one host to another as an hour's worth of + measurements, each made at Poisson intervals with a mean spacing + of one second. + + + + + + +Paxson, et. al. Informational [Page 19] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + + By a 'statistical' metric, we refer to metrics derived from a + given sample metric by computing some statistic of the values + defined by the singleton metric on the sample. For example, the + mean of all the one-way delay values on the sample given above + might be defined as a statistical metric. + + By applying these notions of singleton, sample, and statistic in a + consistent way, we will be able to reuse lessons learned about how to + define samples and statistics on various metrics. The orthogonality + among these three notions will thus make all our work more effective + and more intelligible by the community. + + In the remainder of this section, we will cover some topics in + sampling and statistics that we believe will be important to a + variety of metric definitions and measurement efforts. + + +11.1. Methods of Collecting Samples + + The main reason for collecting samples is to see what sort of + variations and consistencies are present in the metric being + measured. These variations might be with respect to different points + in the Internet, or different measurement times. When assessing + variations based on a sample, one generally makes an assumption that + the sample is "unbiased", meaning that the process of collecting the + measurements in the sample did not skew the sample so that it no + longer accurately reflects the metric's variations and consistencies. + + One common way of collecting samples is to make measurements + separated by fixed amounts of time: periodic sampling. Periodic + sampling is particularly attractive because of its simplicity, but it + suffers from two potential problems: + + + If the metric being measured itself exhibits periodic behavior, + then there is a possibility that the sampling will observe only + part of the periodic behavior if the periods happen to agree + (either directly, or if one is a multiple of the other). Related + to this problem is the notion that periodic sampling can be easily + anticipated. Predictable sampling is susceptible to manipulation + if there are mechanisms by which a network component's behavior + can be temporarily changed such that the sampling only sees the + modified behavior. + + The act of measurement can perturb what is being measured (for + example, injecting measurement traffic into a network alters the + congestion level of the network), and repeated periodic + perturbations can drive a network into a state of synchronization + (cf. [FJ94]), greatly magnifying what might individually be minor + effects. + + + +Paxson, et. al. Informational [Page 20] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + A more sound approach is based on "random additive sampling": samples + are separated by independent, randomly generated intervals that have + a common statistical distribution G(t) [BM92]. The quality of this + sampling depends on the distribution G(t). For example, if G(t) + generates a constant value g with probability one, then the sampling + reduces to periodic sampling with a period of g. + + Random additive sampling gains significant advantages. In general, + it avoids synchronization effects and yields an unbiased estimate of + the property being sampled. The only significant drawbacks with it + are: + + + it complicates frequency-domain analysis, because the samples do + not occur at fixed intervals such as assumed by Fourier-transform + techniques; and + + unless G(t) is the exponential distribution (see below), sampling + still remains somewhat predictable, as discussed for periodic + sampling above. + + +11.1.1. Poisson Sampling + + It can be proved that if G(t) is an exponential distribution with + rate lambda, that is + + G(t) = 1 - exp(-lambda * t) + + then the arrival of new samples *cannot* be predicted (and, again, + the sampling is unbiased). Furthermore, the sampling is + asymptotically unbiased even if the act of sampling affects the + network's state. Such sampling is referred to as "Poisson sampling". + It is not prone to inducing synchronization, it can be used to + accurately collect measurements of periodic behavior, and it is not + prone to manipulation by anticipating when new samples will occur. + + Because of these valuable properties, we in general prefer that + samples of Internet measurements are gathered using Poisson sampling. + {Comment: We note, however, that there may be circumstances that + favor use of a different G(t). For example, the exponential + distribution is unbounded, so its use will on occasion generate + lengthy spaces between sampling times. We might instead desire to + bound the longest such interval to a maximum value dT, to speed the + convergence of the estimation derived from the sampling. This could + be done by using + + G(t) = Unif(0, dT) + + + + + +Paxson, et. al. Informational [Page 21] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + that is, the uniform distribution between 0 and dT. This sampling, + of course, becomes highly predictable if an interval of nearly length + dT has elapsed without a sample occurring.} + + In its purest form, Poisson sampling is done by generating + independent, exponentially distributed intervals and gathering a + single measurement after each interval has elapsed. It can be shown + that if starting at time T one performs Poisson sampling over an + interval dT, during which a total of N measurements happen to be + made, then those measurements will be uniformly distributed over the + interval [T, T+dT]. So another way of conducting Poisson sampling is + to pick dT and N and generate N random sampling times uniformly over + the interval [T, T+dT]. The two approaches are equivalent, except if + N and dT are externally known. In that case, the property of not + being able to predict measurement times is weakened (the other + properties still hold). The N/dT approach has an advantage that + dealing with fixed values of N and dT can be simpler than dealing + with a fixed lambda but variable numbers of measurements over + variably-sized intervals. + + +11.1.2. Geometric Sampling + + Closely related to Poisson sampling is "geometric sampling", in which + external events are measured with a fixed probability p. For + example, one might capture all the packets over a link but only + record the packet to a trace file if a randomly generated number + uniformly distributed between 0 and 1 is less than a given p. + Geometric sampling has the same properties of being unbiased and not + predictable in advance as Poisson sampling, so if it fits a + particular Internet measurement task, it too is sound. See [CPB93] + for more discussion. + + +11.1.3. Generating Poisson Sampling Intervals + + To generate Poisson sampling intervals, one first determines the rate + lambda at which the singleton measurements will on average be made + (e.g., for an average sampling interval of 30 seconds, we have lambda + = 1/30, if the units of time are seconds). One then generates a + series of exponentially-distributed (pseudo) random numbers E1, E2, + ..., En. The first measurement is made at time E1, the next at time + E1+E2, and so on. + + One technique for generating exponentially-distributed (pseudo) + random numbers is based on the ability to generate U1, U2, ..., Un, + (pseudo) random numbers that are uniformly distributed between 0 and + 1. Many computers provide libraries that can do this. Given such + + + +Paxson, et. al. Informational [Page 22] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + Ui, to generate Ei one uses: + + Ei = -log(Ui) / lambda + + where log(Ui) is the natural logarithm of Ui. {Comment: This + technique is an instance of the more general "inverse transform" + method for generating random numbers with a given distribution.} + + Implementation details: + + There are at least three different methods for approximating Poisson + sampling, which we describe here as Methods 1 through 3. Method 1 is + the easiest to implement and has the most error, and method 3 is the + most difficult to implement and has the least error (potentially + none). + + Method 1 is to proceed as follows: + + 1. Generate E1 and wait that long. + 2. Perform a measurement. + 3. Generate E2 and wait that long. + 4. Perform a measurement. + 5. Generate E3 and wait that long. + 6. Perform a measurement ... + + The problem with this approach is that the "Perform a measurement" + steps themselves take time, so the sampling is not done at times E1, + E1+E2, etc., but rather at E1, E1+M1+E2, etc., where Mi is the amount + of time required for the i'th measurement. If Mi is very small + compared to 1/lambda then the potential error introduced by this + technique is likewise small. As Mi becomes a non-negligible fraction + of 1/lambda, the potential error increases. + + Method 2 attempts to correct this error by taking into account the + amount of time required by the measurements (i.e., the Mi's) and + adjusting the waiting intervals accordingly: + + 1. Generate E1 and wait that long. + 2. Perform a measurement and measure M1, the time it took to do so. + 3. Generate E2 and wait for a time E2-M1. + 4. Perform a measurement and measure M2 .. + + This approach works fine as long as E{i+1} >= Mi. But if E{i+1} < Mi + then it is impossible to wait the proper amount of time. (Note that + this case corresponds to needing to perform two measurements + simultaneously.) + + + + + +Paxson, et. al. Informational [Page 23] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + Method 3 is generating a schedule of measurement times E1, E1+E2, + etc., and then sticking to it: + + 1. Generate E1, E2, ..., En. + 2. Compute measurement times T1, T2, ..., Tn, as Ti = E1 + ... + Ei. + 3. Arrange that at times T1, T2, ..., Tn, a measurement is made. + + By allowing simultaneous measurements, Method 3 avoids the + shortcomings of Methods 1 and 2. If, however, simultaneous + measurements interfere with one another, then Method 3 does not gain + any benefit and may actually prove worse than Methods 1 or 2. + + For Internet phenomena, it is not known to what degree the + inaccuracies of these methods are significant. If the Mi's are much + less than 1/lambda, then any of the three should suffice. If the + Mi's are less than 1/lambda but perhaps not greatly less, then Method + 2 is preferred to Method 1. If simultaneous measurements do not + interfere with one another, then Method 3 is preferred, though it can + be considerably harder to implement. + + +11.2. Self-Consistency + + A fundamental requirement for a sound measurement methodology is that + measurement be made using as few unconfirmed assumptions as possible. + Experience has painfully shown how easy it is to make an (often + implicit) assumption that turns out to be incorrect. An example is + incorporating into a measurement the reading of a clock synchronized + to a highly accurate source. It is easy to assume that the clock is + therefore accurate; but due to software bugs, a loss of power in the + source, or a loss of communication between the source and the clock, + the clock could actually be quite inaccurate. + + This is not to argue that one must not make *any* assumptions when + measuring, but rather that, to the extent which is practical, + assumptions should be tested. One powerful way for doing so involves + checking for self-consistency. Such checking applies both to the + observed value(s) of the measurement *and the values used by the + measurement process itself*. A simple example of the former is that + when computing a round trip time, one should check to see if it is + negative. Since negative time intervals are non-physical, if it ever + is negative that finding immediately flags an error. *These sorts of + errors should then be investigated!* It is crucial to determine where + the error lies, because only by doing so diligently can we build up + faith in a methodology's fundamental soundness. For example, it + could be that the round trip time is negative because during the + measurement the clock was set backward in the process of + synchronizing it with another source. But it could also be that the + + + +Paxson, et. al. Informational [Page 24] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + measurement program accesses uninitialized memory in one of its + computations and, only very rarely, that leads to a bogus + computation. This second error is more serious, if the same program + is used by others to perform the same measurement, since then they + too will suffer from incorrect results. Furthermore, once uncovered + it can be completely fixed. + + A more subtle example of testing for self-consistency comes from + gathering samples of one-way Internet delays. If one has a large + sample of such delays, it may well be highly telling to, for example, + fit a line to the pairs of (time of measurement, measured delay), to + see if the resulting line has a clearly non-zero slope. If so, a + possible interpretation is that one of the clocks used in the + measurements is skewed relative to the other. Another interpretation + is that the slope is actually due to genuine network effects. + Determining which is indeed the case will often be highly + illuminating. (See [Pa97] for a discussion of distinguishing between + relative clock skew and genuine network effects.) Furthermore, if + making this check is part of the methodology, then a finding that the + long-term slope is very near zero is positive evidence that the + measurements are probably not biased by a difference in skew. + + A final example illustrates checking the measurement process itself + for self-consistency. Above we outline Poisson sampling techniques, + based on generating exponentially-distributed intervals. A sound + measurement methodology would include testing the generated intervals + to see whether they are indeed exponentially distributed (and also to + see if they suffer from correlation). In the appendix we discuss and + give C code for one such technique, a general-purpose, well-regarded + goodness-of-fit test called the Anderson-Darling test. + + Finally, we note that what is truly relevant for Poisson sampling of + Internet metrics is often not when the measurements began but the + wire times corresponding to the measurement process. These could + well be different, due to complications on the hosts used to perform + the measurement. Thus, even those with complete faith in their + pseudo-random number generators and subsequent algorithms are + encouraged to consider how they might test the assumptions of each + measurement procedure as much as possible. + + +11.3. Defining Statistical Distributions + + One way of describing a collection of measurements (a sample) is as a + statistical distribution -- informally, as percentiles. There are + several slightly different ways of doing so. In this section we + define a standard definition to give uniformity to these + descriptions. + + + +Paxson, et. al. Informational [Page 25] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + The "empirical distribution function" (EDF) of a set of scalar + measurements is a function F(x) which for any x gives the fractional + proportion of the total measurements that were <= x. If x is less + than the minimum value observed, then F(x) is 0. If it is greater or + equal to the maximum value observed, then F(x) is 1. + + For example, given the 6 measurements: + + -2, 7, 7, 4, 18, -5 + + Then F(-8) = 0, F(-5) = 1/6, F(-5.0001) = 0, F(-4.999) = 1/6, F(7) = + 5/6, F(18) = 1, F(239) = 1. + + Note that we can recover the different measured values and how many + times each occurred from F(x) -- no information regarding the range + in values is lost. Summarizing measurements using histograms, on the + other hand, in general loses information about the different values + observed, so the EDF is preferred. + + Using either the EDF or a histogram, however, we do lose information + regarding the order in which the values were observed. Whether this + loss is potentially significant will depend on the metric being + measured. + + We will use the term "percentile" to refer to the smallest value of x + for which F(x) >= a given percentage. So the 50th percentile of the + example above is 4, since F(4) = 3/6 = 50%; the 25th percentile is + -2, since F(-5) = 1/6 < 25%, and F(-2) = 2/6 >= 25%; the 100th + percentile is 18; and the 0th percentile is -infinity, as is the 15th + percentile. + + Care must be taken when using percentiles to summarize a sample, + because they can lend an unwarranted appearance of more precision + than is really available. Any such summary must include the sample + size N, because any percentile difference finer than 1/N is below the + resolution of the sample. + + See [DS86] for more details regarding EDF's. + + We close with a note on the common (and important!) notion of median. + In statistics, the median of a distribution is defined to be the + point X for which the probability of observing a value <= X is equal + to the probability of observing a value > X. When estimating the + median of a set of observations, the estimate depends on whether the + number of observations, N, is odd or even: + + + + + + +Paxson, et. al. Informational [Page 26] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + + If N is odd, then the 50th percentile as defined above is used as + the estimated median. + + If N is even, then the estimated median is the average of the + central two observations; that is, if the observations are sorted + in ascending order and numbered from 1 to N, where N = 2*K, then + the estimated median is the average of the (K)'th and (K+1)'th + observations. + + Usually the term "estimated" is dropped from the phrase "estimated + median" and this value is simply referred to as the "median". + + +11.4. Testing For Goodness-of-Fit + + For some forms of measurement calibration we need to test whether a + set of numbers is consistent with those numbers having been drawn + from a particular distribution. An example is that to apply a self- + consistency check to measurements made using a Poisson process, one + test is to see whether the spacing between the sampling times does + indeed reflect an exponential distribution; or if the dT/N approach + discussed above was used, whether the times are uniformly distributed + across [T, dT]. + + {Comment: There are at least three possible sets of values we could + test: the scheduled packet transmission times, as determined by use + of a pseudo-random number generator; user-level timestamps made just + before or after the system call for transmitting the packet; and wire + times for the packets as recorded using a packet filter. All three + of these are potentially informative: failures for the scheduled + times to match an exponential distribution indicate inaccuracies in + the random number generation; failures for the user-level times + indicate inaccuracies in the timers used to schedule transmission; + and failures for the wire times indicate inaccuracies in actually + transmitting the packets, perhaps due to contention for a shared + resource.} + + There are a large number of statistical goodness-of-fit techniques + for performing such tests. See [DS86] for a thorough discussion. + That reference recommends the Anderson-Darling EDF test as being a + good all-purpose test, as well as one that is especially good at + detecting deviations from a given distribution in the lower and upper + tails of the EDF. + + It is important to understand that the nature of goodness-of-fit + tests is that one first selects a "significance level", which is the + probability that the test will erroneously declare that the EDF of a + given set of measurements fails to match a particular distribution + when in fact the measurements do indeed reflect that distribution. + + + +Paxson, et. al. Informational [Page 27] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + Unless otherwise stated, IPPM goodness-of-fit tests are done using 5% + significance. This means that if the test is applied to 100 samples + and 5 of those samples are deemed to have failed the test, then the + samples are all consistent with the distribution being tested. If + significantly more of the samples fail the test, then the assumption + that the samples are consistent with the distribution being tested + must be rejected. If significantly fewer of the samples fail the + test, then the samples have potentially been doctored too well to fit + the distribution. Similarly, some goodness-of-fit tests (including + Anderson-Darling) can detect whether it is likely that a given sample + was doctored. We also use a significance of 5% for this case; that + is, the test will report that a given honest sample is "too good to + be true" 5% of the time, so if the test reports this finding + significantly more often than one time out of twenty, it is an + indication that something unusual is occurring. + + The appendix gives sample C code for implementing the Anderson- + Darling test, as well as further discussing its use. + + See [Pa94] for a discussion of goodness-of-fit and closeness-of-fit + tests in the context of network measurement. + + +12. Avoiding Stochastic Metrics + + When defining metrics applying to a path, subpath, cloud, or other + network element, we in general do not define them in stochastic terms + (probabilities). We instead prefer a deterministic definition. So, + for example, rather than defining a metric about a "packet loss + probability between A and B", we would define a metric about a + "packet loss rate between A and B". (A measurement given by the + first definition might be "0.73", and by the second "73 packets out + of 100".) + + We emphasize that the above distinction concerns the *definitions* of + *metrics*. It is not intended to apply to what sort of techniques we + might use to analyze the results of measurements. + + The reason for this distinction is as follows. When definitions are + made in terms of probabilities, there are often hidden assumptions in + the definition about a stochastic model of the behavior being + measured. The fundamental goal with avoiding probabilities in our + metric definitions is to avoid biasing our definitions by these + hidden assumptions. + + + + + + + +Paxson, et. al. Informational [Page 28] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + For example, an easy hidden assumption to make is that packet loss in + a network component due to queueing overflows can be described as + something that happens to any given packet with a particular + probability. In today's Internet, however, queueing drops are + actually usually *deterministic*, and assuming that they should be + described probabilistically can obscure crucial correlations between + queueing drops among a set of packets. So it's better to explicitly + note stochastic assumptions, rather than have them sneak into our + definitions implicitly. + + This does *not* mean that we abandon stochastic models for + *understanding* network performance! It only means that when defining + IP metrics we avoid terms such as "probability" for terms like + "proportion" or "rate". We will still use, for example, random + sampling in order to estimate probabilities used by stochastic models + related to the IP metrics. We also do not rule out the possibility + of stochastic metrics when they are truly appropriate (for example, + perhaps to model transmission errors caused by certain types of line + noise). + + +13. Packets of Type P + + A fundamental property of many Internet metrics is that the value of + the metric depends on the type of IP packet(s) used to make the + measurement. Consider an IP-connectivity metric: one obtains + different results depending on whether one is interested in + connectivity for packets destined for well-known TCP ports or + unreserved UDP ports, or those with invalid IP checksums, or those + with TTL's of 16, for example. In some circumstances these + distinctions will be highly interesting (for example, in the presence + of firewalls, or RSVP reservations). + + Because of this distinction, we introduce the generic notion of a + "packet of type P", where in some contexts P will be explicitly + defined (i.e., exactly what type of packet we mean), partially + defined (e.g., "with a payload of B octets"), or left generic. Thus + we may talk about generic IP-type-P-connectivity or more specific + IP-port-HTTP-connectivity. Some metrics and methodologies may be + fruitfully defined using generic type P definitions which are then + made specific when performing actual measurements. + + Whenever a metric's value depends on the type of the packets involved + in the metric, the metric's name will include either a specific type + or a phrase such as "type-P". Thus we will not define an "IP- + + + + + + +Paxson, et. al. Informational [Page 29] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + connectivity" metric but instead an "IP-type-P-connectivity" metric + and/or perhaps an "IP-port-HTTP-connectivity" metric. This naming + convention serves as an important reminder that one must be conscious + of the exact type of traffic being measured. + + A closely related note: it would be very useful to know if a given + Internet component treats equally a class C of different types of + packets. If so, then any one of those types of packets can be used + for subsequent measurement of the component. This suggests we devise + a metric or suite of metrics that attempt to determine C. + + +14. Internet Addresses vs. Hosts + + When considering a metric for some path through the Internet, it is + often natural to think about it as being for the path from Internet + host H1 to host H2. A definition in these terms, though, can be + ambiguous, because Internet hosts can be attached to more than one + network. In this case, the result of the metric will depend on which + of these networks is actually used. + + Because of this ambiguity, usually such definitions should instead be + defined in terms of Internet IP addresses. For the common case of a + unidirectional path through the Internet, we will use the term "Src" + to denote the IP address of the beginning of the path, and "Dst" to + denote the IP address of the end. + + +15. Standard-Formed Packets + + Unless otherwise stated, all metric definitions that concern IP + packets include an implicit assumption that the packet is *standard + formed*. A packet is standard formed if it meets all of the + following criteria: + + + Its length as given in the IP header corresponds to the size of + the IP header plus the size of the payload. + + It includes a valid IP header: the version field is 4 (later, we + will expand this to include 6); the header length is >= 5; the + checksum is correct. + + It is not an IP fragment. + + The source and destination addresses correspond to the hosts in + question. + + + + + + + + +Paxson, et. al. Informational [Page 30] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + + Either the packet possesses sufficient TTL to travel from the + source to the destination if the TTL is decremented by one at each + hop, or it possesses the maximum TTL of 255. + + It does not contain IP options unless explicitly noted. + + If a transport header is present, it too contains a valid checksum + and other valid fields. + + We further require that if a packet is described as having a "length + of B octets", then 0 <= B <= 65535; and if B is the payload length in + octets, then B <= (65535-IP header size in octets). + + So, for example, one might imagine defining an IP connectivity metric + as "IP-type-P-connectivity for standard-formed packets with the IP + TOS field set to 0", or, more succinctly, "IP-type-P-connectivity + with the IP TOS field set to 0", since standard-formed is already + implied by convention. + + A particular type of standard-formed packet often useful to consider + is the "minimal IP packet from A to B" - this is an IP packet with + the following properties: + + + It is standard-formed. + + Its data payload is 0 octets. + + It contains no options. + + (Note that we do not define its protocol field, as different values + may lead to different treatment by the network.) + + When defining IP metrics we keep in mind that no packet smaller or + simpler than this can be transmitted over a correctly operating IP + network. + + +16. Acknowledgements + + The comments of Brian Carpenter, Bill Cerveny, Padma Krishnaswamy + Jeff Sedayao and Howard Stanislevic are appreciated. + + +17. Security Considerations + + This document concerns definitions and concepts related to Internet + measurement. We discuss measurement procedures only in high-level + terms, regarding principles that lend themselves to sound + measurement. As such, the topics discussed do not affect the + security of the Internet or of applications which run on it. + + + + + +Paxson, et. al. Informational [Page 31] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + That said, it should be recognized that conducting Internet + measurements can raise both security and privacy concerns. Active + techniques, in which traffic is injected into the network, can be + abused for denial-of-service attacks disguised as legitimate + measurement activity. Passive techniques, in which existing traffic + is recorded and analyzed, can expose the contents of Internet traffic + to unintended recipients. Consequently, the definition of each + metric and methodology must include a corresponding discussion of + security considerations. + + +18. Appendix + + Below we give routines written in C for computing the Anderson- + Darling test statistic (A2) for determining whether a set of values + is consistent with a given statistical distribution. Externally, the + two main routines of interest are: + + double exp_A2_known_mean(double x[], int n, double mean) + double unif_A2_known_range(double x[], int n, + double min_val, double max_val) + + Both take as their first argument, x, the array of n values to be + tested. (Upon return, the elements of x are sorted.) The remaining + parameters characterize the distribution to be used: either the mean + (1/lambda), for an exponential distribution, or the lower and upper + bounds, for a uniform distribution. The names of the routines stress + that these values must be known in advance, and *not* estimated from + the data (for example, by computing its sample mean). Estimating the + parameters from the data *changes* the significance level of the test + statistic. While [DS86] gives alternate significance tables for some + instances in which the parameters are estimated from the data, for + our purposes we expect that we should indeed know the parameters in + advance, since what we will be testing are generally values such as + packet sending times that we wish to verify follow a known + distribution. + + Both routines return a significance level, as described earlier. This + is a value between 0 and 1. The correct use of the routines is to + pick in advance the threshold for the significance level to test; + generally, this will be 0.05, corresponding to 5%, as also described + above. Subsequently, if the routines return a value strictly less + than this threshold, then the data are deemed to be inconsistent with + the presumed distribution, *subject to an error corresponding to the + significance level*. That is, for a significance level of 5%, 5% of + the time data that is indeed drawn from the presumed distribution + will be erroneously deemed inconsistent. + + + + +Paxson, et. al. Informational [Page 32] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + Thus, it is important to bear in mind that if these routines are used + frequently, then one will indeed encounter occasional failures, even + if the data is unblemished. + + Another important point concerning significance levels is that it is + unsound to compare them in order to determine which of two sets of + values is a "better" fit to a presumed distribution. Such testing + should instead be done using "closeness-of-fit metrics" such as the + lambda^2 metric described in [Pa94]. + + While the routines provided are for exponential and uniform + distributions with known parameters, it is generally straight-forward + to write comparable routines for any distribution with known + parameters. The heart of the A2 tests lies in a statistic computed + for testing whether a set of values is consistent with a uniform + distribution between 0 and 1, which we term Unif(0, 1). If we wish + to test whether a set of values, X, is consistent with a given + distribution G(x), we first compute + Y = G_inverse(X) + If X is indeed distributed according to G(x), then Y will be + distributed according to Unif(0, 1); so by testing Y for consistency + with Unif(0, 1), we also test X for consistency with G(x). + + We note, however, that the process of computing Y above might yield + values of Y outside the range (0..1). Such values should not occur + if X is indeed distributed according to G(x), but easily can occur if + it is not. In the latter case, we need to avoid computing the + central A2 statistic, since floating-point exceptions may occur if + any of the values lie outside (0..1). Accordingly, the routines + check for this possibility, and if encountered, return a raw A2 + statistic of -1. The routine that converts the raw A2 statistic to a + significance level likewise propagates this value, returning a + significance level of -1. So, any use of these routines must be + prepared for a possible negative significance level. + + The last important point regarding use of A2 statistic concerns n, + the number of values being tested. If n < 5 then the test is not + meaningful, and in this case a significance level of -1 is returned. + + On the other hand, for "real" data the test *gains* power as n + becomes larger. It is well known in the statistics community that + real data almost never exactly matches a theoretical distribution, + even in cases such as rolling dice a great many times (see [Pa94] for + a brief discussion and references). The A2 test is sensitive enough + that, for sufficiently large sets of real data, the test will almost + always fail, because it will manage to detect slight imperfections in + the fit of the data to the distribution. + + + + +Paxson, et. al. Informational [Page 33] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + For example, we have found that when testing 8,192 measured wire + times for packets sent at Poisson intervals, the measurements almost + always fail the A2 test. On the other hand, testing 128 measurements + failed at 5% significance only about 5% of the time, as expected. + Thus, in general, when the test fails, care must be taken to + understand why it failed. + + The remainder of this appendix gives C code for the routines + mentioned above. + + /* Routines for computing the Anderson-Darling A2 test statistic. + * + * Implemented based on the description in "Goodness-of-Fit + * Techniques," R. D'Agostino and M. Stephens, editors, + * Marcel Dekker, Inc., 1986. + */ + + #include <stdio.h> + #include <stdlib.h> + #include <math.h> + + /* Returns the raw A^2 test statistic for n sorted samples + * z[0] .. z[n-1], for z ~ Unif(0,1). + */ + extern double compute_A2(double z[], int n); + + /* Returns the significance level associated with a A^2 test + * statistic value of A2, assuming no parameters of the tested + * distribution were estimated from the data. + */ + extern double A2_significance(double A2); + + /* Returns the A^2 significance level for testing n observations + * x[0] .. x[n-1] against an exponential distribution with the + * given mean. + * + * SIDE EFFECT: the x[0..n-1] are sorted upon return. + */ + extern double exp_A2_known_mean(double x[], int n, double mean); + + /* Returns the A^2 significance level for testing n observations + * x[0] .. x[n-1] against the uniform distribution [min_val, max_val]. + * + * SIDE EFFECT: the x[0..n-1] are sorted upon return. + */ + extern double unif_A2_known_range(double x[], int n, + double min_val, double max_val); + + + + +Paxson, et. al. Informational [Page 34] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + /* Returns a pseudo-random number distributed according to an + * exponential distribution with the given mean. + */ + extern double random_exponential(double mean); + + + /* Helper function used by qsort() to sort double-precision + * floating-point values. + */ + static int + compare_double(const void *v1, const void *v2) + { + double d1 = *(double *) v1; + double d2 = *(double *) v2; + + if (d1 < d2) + return -1; + else if (d1 > d2) + return 1; + else + return 0; + } + + double + compute_A2(double z[], int n) + { + int i; + double sum = 0.0; + + if ( n < 5 ) + /* Too few values. */ + return -1.0; + + /* If any of the values are outside the range (0, 1) then + * fail immediately (and avoid a possible floating point + * exception in the code below). + */ + for (i = 0; i < n; ++i) + if ( z[i] <= 0.0 || z[i] >= 1.0 ) + return -1.0; + + /* Page 101 of D'Agostino and Stephens. */ + for (i = 1; i <= n; ++i) { + sum += (2 * i - 1) * log(z[i-1]); + sum += (2 * n + 1 - 2 * i) * log(1.0 - z[i-1]); + } + return -n - (1.0 / n) * sum; + } + + + +Paxson, et. al. Informational [Page 35] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + double + A2_significance(double A2) + { + /* Page 105 of D'Agostino and Stephens. */ + if (A2 < 0.0) + return A2; /* Bogus A2 value - propagate it. */ + + /* Check for possibly doctored values. */ + if (A2 <= 0.201) + return 0.99; + else if (A2 <= 0.240) + return 0.975; + else if (A2 <= 0.283) + return 0.95; + else if (A2 <= 0.346) + return 0.90; + else if (A2 <= 0.399) + return 0.85; + + /* Now check for possible inconsistency. */ + if (A2 <= 1.248) + return 0.25; + else if (A2 <= 1.610) + return 0.15; + else if (A2 <= 1.933) + return 0.10; + else if (A2 <= 2.492) + return 0.05; + else if (A2 <= 3.070) + return 0.025; + else if (A2 <= 3.880) + return 0.01; + else if (A2 <= 4.500) + return 0.005; + else if (A2 <= 6.000) + return 0.001; + else + return 0.0; + } + + double + exp_A2_known_mean(double x[], int n, double mean) + { + int i; + double A2; + + /* Sort the first n values. */ + qsort(x, n, sizeof(x[0]), compare_double); + + + +Paxson, et. al. Informational [Page 36] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + + /* Assuming they match an exponential distribution, transform + * them to Unif(0,1). + */ + for (i = 0; i < n; ++i) { + x[i] = 1.0 - exp(-x[i] / mean); + } + + /* Now make the A^2 test to see if they're truly uniform. */ + A2 = compute_A2(x, n); + return A2_significance(A2); + } + + double + unif_A2_known_range(double x[], int n, double min_val, double max_val) + { + int i; + double A2; + double range = max_val - min_val; + + /* Sort the first n values. */ + qsort(x, n, sizeof(x[0]), compare_double); + + /* Transform Unif(min_val, max_val) to Unif(0,1). */ + for (i = 0; i < n; ++i) + x[i] = (x[i] - min_val) / range; + + /* Now make the A^2 test to see if they're truly uniform. */ + A2 = compute_A2(x, n); + return A2_significance(A2); + } + + double + random_exponential(double mean) + { + return -mean * log1p(-drand48()); + } + + + + + + + + + + + + + + + +Paxson, et. al. Informational [Page 37] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + +19. References + + [AK97] G. Almes and S. Kalidindi, "A One-way Delay Metric for IPPM", + Work in Progress, November 1997. + + [BM92] I. Bilinskis and A. Mikelsons, Randomized Signal Processing, + Prentice Hall International, 1992. + + [DS86] R. D'Agostino and M. Stephens, editors, Goodness-of-Fit + Techniques, Marcel Dekker, Inc., 1986. + + [CPB93] K. Claffy, G. Polyzos, and H-W. Braun, "Application of + Sampling Methodologies to Network Traffic Characterization," Proc. + SIGCOMM '93, pp. 194-203, San Francisco, September 1993. + + [FJ94] S. Floyd and V. Jacobson, "The Synchronization of Periodic + Routing Messages," IEEE/ACM Transactions on Networking, 2(2), pp. + 122-136, April 1994. + + [Mi92] Mills, D., "Network Time Protocol (Version 3) Specification, + Implementation and Analysis", RFC 1305, March 1992. + + [Pa94] V. Paxson, "Empirically-Derived Analytic Models of Wide-Area + TCP Connections," IEEE/ACM Transactions on Networking, 2(4), pp. + 316-336, August 1994. + + [Pa96] V. Paxson, "Towards a Framework for Defining Internet + Performance Metrics," Proceedings of INET '96, + ftp://ftp.ee.lbl.gov/papers/metrics-framework-INET96.ps.Z + + [Pa97] V. Paxson, "Measurements and Analysis of End-to-End Internet + Dynamics," Ph.D. dissertation, U.C. Berkeley, 1997, + ftp://ftp.ee.lbl.gov/papers/vp-thesis/dis.ps.gz. + + + + + + + + + + + + + + + + + + +Paxson, et. al. Informational [Page 38] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + +20. Authors' Addresses + + Vern Paxson + MS 50B/2239 + Lawrence Berkeley National Laboratory + University of California + Berkeley, CA 94720 + USA + + Phone: +1 510/486-7504 + EMail: vern@ee.lbl.gov + + + Guy Almes + Advanced Network & Services, Inc. + 200 Business Park Drive + Armonk, NY 10504 + USA + + Phone: +1 914/765-1120 + EMail: almes@advanced.org + + + Jamshid Mahdavi + Pittsburgh Supercomputing Center + 4400 5th Avenue + Pittsburgh, PA 15213 + USA + + Phone: +1 412/268-6282 + EMail: mahdavi@psc.edu + + + Matt Mathis + Pittsburgh Supercomputing Center + 4400 5th Avenue + Pittsburgh, PA 15213 + USA + + Phone: +1 412/268-3319 + EMail: mathis@psc.edu + + + + + + + + + + +Paxson, et. al. Informational [Page 39] + +RFC 2330 Framework for IP Performance Metrics May 1998 + + +21. Full Copyright Statement + + Copyright (C) The Internet Society (1998). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + + + + + + + + + + + + + + + + + + + + + + + + +Paxson, et. al. Informational [Page 40] + |