summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc5475.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc5475.txt')
-rw-r--r--doc/rfc/rfc5475.txt2579
1 files changed, 2579 insertions, 0 deletions
diff --git a/doc/rfc/rfc5475.txt b/doc/rfc/rfc5475.txt
new file mode 100644
index 0000000..1168e89
--- /dev/null
+++ b/doc/rfc/rfc5475.txt
@@ -0,0 +1,2579 @@
+
+
+
+
+
+
+Network Working Group T. Zseby
+Request for Comments: 5475 Fraunhofer FOKUS
+Category: Standards Track M. Molina
+ DANTE
+ N. Duffield
+ AT&T Labs - Research
+ S. Niccolini
+ NEC Europe Ltd.
+ F. Raspall
+ EPSC-UPC
+ March 2009
+
+
+ Sampling and Filtering Techniques for IP Packet Selection
+
+Status of This Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (c) 2009 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents in effect on the date of
+ publication of this document (http://trustee.ietf.org/license-info).
+ Please review these documents carefully, as they describe your rights
+ and restrictions with respect to this document.
+
+ This document may contain material from IETF Documents or IETF
+ Contributions published or made publicly available before November
+ 10, 2008. The person(s) controlling the copyright in some of this
+ material may not have granted the IETF Trust the right to allow
+ modifications of such material outside the IETF Standards Process.
+ Without obtaining an adequate license from the person(s) controlling
+ the copyright in such materials, this document may not be modified
+ outside the IETF Standards Process, and derivative works of it may
+ not be created outside the IETF Standards Process, except to format
+ it for publication as an RFC or to translate it into languages other
+ than English.
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 1]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+Abstract
+
+ This document describes Sampling and Filtering techniques for IP
+ packet selection. It provides a categorization of schemes and
+ defines what parameters are needed to describe the most common
+ selection schemes. Furthermore, it shows how techniques can be
+ combined to build more elaborate packet Selectors. The document
+ provides the basis for the definition of information models for
+ configuring selection techniques in Metering Processes and for
+ reporting the technique in use to a Collector.
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 1.1. Conventions Used in This Document ..........................4
+ 2. PSAMP Documents Overview ........................................4
+ 3. Terminology .....................................................4
+ 3.1. Observation Points, Packet Streams, and Packet Content .....4
+ 3.2. Selection Process ..........................................5
+ 3.3. Reporting ..................................................7
+ 3.4. Metering Process ...........................................7
+ 3.5. Exporting Process ..........................................8
+ 3.6. PSAMP Device ...............................................8
+ 3.7. Collector ..................................................8
+ 3.8. Selection Methods ..........................................8
+ 4. Categorization of Packet Selection Techniques ..................11
+ 5. Sampling .......................................................12
+ 5.1. Systematic Sampling .......................................13
+ 5.2. Random Sampling ...........................................14
+ 5.2.1. n-out-of-N Sampling ................................14
+ 5.2.2. Probabilistic Sampling .............................14
+ 6. Filtering ......................................................16
+ 6.1. Property Match Filtering ..................................16
+ 6.2. Hash-Based Filtering ......................................19
+ 6.2.1. Application Examples for Coordinated Packet
+ Selection ..........................................19
+ 6.2.2. Desired Properties of Hash Functions ...............21
+ 6.2.3. Security Considerations for Hash Functions .........22
+ 6.2.4. Choice of Hash Function ............................26
+ 7. Parameters for the Description of Selection Techniques .........29
+ 7.1. Description of Sampling Techniques ........................30
+ 7.2. Description of Filtering Techniques .......................31
+ 8. Composite Techniques ...........................................34
+ 8.1. Cascaded Filtering->Sampling or Sampling->Filtering .......34
+ 8.2. Stratified Sampling .......................................34
+ 9. Security Considerations ........................................35
+ 10. Contributors ..................................................36
+ 11. Acknowledgments ...............................................36
+
+
+
+Zseby, et al. Standards Track [Page 2]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ 12. References ....................................................36
+ 12.1. Normative References .....................................36
+ 12.2. Informative References ...................................36
+ Appendix A. Hash Functions ........................................40
+ A.1 IP Shift-XOR (IPSX) Hash Function..............................40
+ A.2 BOB Hash Function..............................................41
+
+1. Introduction
+
+ There are two main drivers for the evolution in measurement
+ infrastructures and their underlying technology. First, network data
+ rates are increasing, with a concomitant growth in measurement data.
+ Second, the growth is compounded by the demand of measurement-based
+ applications for increasingly fine-grained traffic measurements.
+ Devices that perform the measurements, require increasingly
+ sophisticated and resource-intensive measurement capabilities,
+ including the capture of packet headers or even parts of the payload,
+ and classification for flow analysis. All these factors can lead to
+ an overwhelming amount of measurement data, resulting in high demands
+ on resources for measurement, storage, transfer, and post processing.
+
+ The sustained capture of network traffic at line rate can be
+ performed by specialized measurement hardware. However, the cost of
+ the hardware and the measurement infrastructure required to
+ accommodate the measurements preclude this as a ubiquitous approach.
+ Instead, some form of data reduction at the point of measurement is
+ necessary.
+
+ This can be achieved by an intelligent packet selection through
+ Sampling or Filtering. Another way to reduce the amount of data is
+ to use aggregation techniques (not addressed in this document). The
+ motivation for Sampling is to select a representative subset of
+ packets that allow accurate estimates of properties of the unsampled
+ traffic to be formed. The motivation for Filtering is to remove all
+ packets that are not of interest. Aggregation combines data and
+ allows compact pre-defined views of the traffic. Examples of
+ applications that benefit from packet selection are given in
+ [RFC5474]. Aggregation techniques are out of scope of this document.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 3]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+1.1. Conventions Used in This Document
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [RFC2119].
+
+2. PSAMP Documents Overview
+
+ This document is one out of a series of documents from the PSAMP
+ group.
+
+ [RFC5474]: "A Framework for Packet Selection and Reporting" describes
+ the PSAMP framework for network elements to select subsets of packets
+ by statistical and other methods, and to export a stream of reports
+ on the selected packets to a Collector.
+
+ RFC 5475 (this document): "Sampling and Filtering Techniques for IP
+ Packet Selection" describes the set of packet selection techniques
+ supported by PSAMP.
+
+ [RFC5476]: "Packet Sampling (PSAMP) Protocol Specifications"
+ specifies the export of packet information from a PSAMP Exporting
+ Process to a PSAMP Collecting Process.
+
+ [RFC5477]: "Information Model for Packet Sampling Exports" defines an
+ information and data model for PSAMP.
+
+3. Terminology
+
+ The PSAMP terminology defined here is fully consistent with all terms
+ listed in [RFC5474] but includes additional terms required for the
+ description of packet selection methods. An architecture overview
+ and possible configurations of PSAMP elements can be found in
+ [RFC5474]. PSAMP terminology also aims at consistency with terms
+ used in [RFC3917]. The relationship between PSAMP and IPFIX terms is
+ described in [RFC5474].
+
+ In the PSAMP documents, all defined PSAMP terms are written
+ capitalized. This document uses the same convention.
+
+3.1. Observation Points, Packet Streams, and Packet Content
+
+ * Observation Point
+
+ An Observation Point [RFC5101] is a location in the network where
+ packets can be observed. Examples include:
+
+ (i) A line to which a probe is attached;
+
+
+
+Zseby, et al. Standards Track [Page 4]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ (ii) a shared medium, such as an Ethernet-based LAN;
+
+ (iii) a single port of a router, or set of interfaces (physical
+ or logical) of a router;
+
+ (iv) an embedded measurement subsystem within an interface.
+
+ Note that one Observation Point may be a superset of several other
+ Observation Points. For example, one Observation Point can be an
+ entire line card. This would be the superset of the individual
+ Observation Points at the line card's interfaces.
+
+ * Observed Packet Stream
+
+ The Observed Packet Stream is the set of all packets observed at
+ the Observation Point.
+
+ * Packet Stream
+
+ A Packet Stream denotes a set of packets from the Observed Packet
+ Stream that flows past some specified point within the Metering
+ Process. An example of a Packet Stream is the output of the
+ selection process. Note that packets selected from a stream,
+ e.g., by Sampling, do not necessarily possess a property by which
+ they can be distinguished from packets that have not been
+ selected. For this reason, the term "stream" is favored over
+ "flow", which is defined as a set of packets with common
+ properties [RFC3917].
+
+ * Packet Content
+
+ The Packet Content denotes the union of the packet header (which
+ includes link layer, network layer, and other encapsulation
+ headers) and the packet payload. At some Observation Points, the
+ link header information may not be available.
+
+3.2. Selection Process
+
+ * Selection Process
+
+ A Selection Process takes the Observed Packet Stream as its input
+ and selects a subset of that stream as its output.
+
+
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 5]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ * Selection State
+
+ A Selection Process may maintain state information for use by the
+ Selection Process. At a given time, the Selection State may
+ depend on packets observed at and before that time, and other
+ variables. Examples include:
+
+ (i) sequence numbers of packets at the input of Selectors;
+
+ (ii) a timestamp of observation of the packet at the Observation
+ Point;
+
+ (iii) iterators for pseudorandom number generators;
+
+ (iv) hash values calculated during selection;
+
+ (v) indicators of whether the packet was selected by a given
+ Selector.
+
+ Selection Processes may change portions of the Selection State as
+ a result of processing a packet. Selection State for a packet is
+ to reflect the state after processing the packet.
+
+ * Selector
+
+ A Selector defines what kind of action a Selection Process
+ performs on a single packet of its input. If selected, the packet
+ becomes an element of the output Packet Stream.
+
+ The Selector can make use of the following information in
+ determining whether a packet is selected:
+
+ (i) the Packet Content;
+
+ (ii) information derived from the packet's treatment at the
+ Observation Point;
+
+ (iii) any Selection State that may be maintained by the Selection
+ Process.
+
+ * Composite Selector
+
+ A Composite Selector is an ordered composition of Selectors, in
+ which the output Packet Stream issuing from one Selector forms the
+ input Packet Stream to the succeeding Selector.
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 6]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ * Primitive Selector
+
+ A Selector is primitive if it is not a Composite Selector.
+
+ * Selection Sequence
+
+ From all the packets observed at an Observation Point, only a few
+ packets are selected by one or more Selectors. The Selection
+ Sequence is a unique value per Observation Domain describing the
+ Observation Point and the Selector IDs through which the packets
+ are selected.
+
+3.3. Reporting
+
+ * Packet Reports
+
+ Packet Reports comprise a configurable subset of a packet's input
+ to the Selection Process, including the Packet's Content,
+ information relating to its treatment (for example, the output
+ interface), and its associated Selection State (for example, a
+ hash of the Packet's Content).
+
+ * Report Interpretation
+
+ Report Interpretation comprises subsidiary information, relating
+ to one or more packets, that is used for interpretation of their
+ Packet Reports. Examples include configuration parameters of the
+ Selection Process.
+
+ * Report Stream
+
+ The Report Stream is the output of a Metering Process, comprising
+ two distinguished types of information: Packet Reports and Report
+ Interpretation.
+
+3.4. Metering Process
+
+ A Metering Process selects packets from the Observed Packet Stream
+ using a Selection Process, and produces as output a Report Stream
+ concerning the selected packets.
+
+ The PSAMP Metering Process can be viewed as analogous to the IPFIX
+ Metering Process [RFC5101], which produces Flow Records as its
+ output, with the difference that the PSAMP Metering Process always
+ contains a Selection Process. The relationship between PSAMP and
+ IPFIX is further described in [RFC5477] and [RFC5474].
+
+
+
+
+
+Zseby, et al. Standards Track [Page 7]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+3.5. Exporting Process
+
+ * Exporting Process
+
+ An Exporting Process sends, in the form of Export Packets, the
+ output of one or more Metering Processes to one or more
+ Collectors.
+
+ * Export Packet
+
+ An Export Packet is a combination of Report Interpretations and/or
+ one or more Packet Reports that are bundled by the Exporting
+ Process into an Export Packet for exporting to a Collector.
+
+3.6. PSAMP Device
+
+ * PSAMP Device
+
+ A PSAMP Device is a device hosting at least an Observation Point,
+ a Metering Process (which includes a Selection Process), and an
+ Exporting Process. Typically, corresponding Observation Point(s),
+ Metering Process(es), and Exporting Process(es) are colocated at
+ this device, for example, at a router.
+
+3.7. Collector
+
+ * Collector
+
+ A Collector receives a Report Stream exported by one or more
+ Exporting Processes. In some cases, the host of the Metering
+ and/or Exporting Processes may also serve as the Collector.
+
+3.8. Selection Methods
+
+ * Filtering
+
+ A filter is a Selector that selects a packet deterministically
+ based on the Packet Content, or its treatment, or functions of
+ these occurring in the Selection State. Two examples are:
+
+ (i) Property Match Filtering: A packet is selected if a
+ specific field in the packet equals a predefined value.
+
+ (ii) Hash-based Selection: A Hash Function is applied to the
+ Packet Content, and the packet is selected if the result
+ falls in a specified range.
+
+
+
+
+
+Zseby, et al. Standards Track [Page 8]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ * Sampling
+
+ A Selector that is not a filter is called a Sampling operation.
+ This reflects the intuitive notion that if the selection of a
+ packet cannot be determined from its content alone, there must be
+ some type of Sampling taking place. Sampling operations can be
+ divided into two subtypes:
+
+ (i) Content-independent Sampling, which does not use Packet
+ Content in reaching Sampling decisions. Examples include
+ systematic Sampling, and uniform pseudorandom Sampling
+ driven by a pseudorandom number whose generation is
+ independent of Packet Content. Note that in content-
+ independent Sampling, it is not necessary to access the
+ Packet Content in order to make the selection decision.
+
+ (ii) Content-dependent Sampling, in which the Packet Content is
+ used in reaching selection decisions. An application is
+ pseudorandom selection according to a probability that
+ depends on the contents of a packet field, e.g., Sampling
+ packets with a probability dependent on their TCP/UDP port
+ numbers. Note that this is not a Filter.
+
+ * Hash Domain
+
+ A Hash Domain is a subset of the Packet Content and the packet
+ treatment, viewed as an N-bit string for some positive integer N.
+
+ * Hash Range
+
+ A Hash Range is a set of M-bit strings for some positive integer M
+ that defines the range of values that the result of the hash
+ operation can take.
+
+ * Hash Function
+
+ A Hash Function defines a deterministic mapping from the Hash
+ Domain into the Hash Range.
+
+ * Hash Selection Range
+
+ A Hash Selection Range is a subset of the Hash Range. The packet
+ is selected if the action of the Hash Function on the Hash Domain
+ for the packet yields a result in the Hash Selection Range.
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 9]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ * Hash-based Selection
+
+ A Hash-based Selection is Filtering specified by a Hash Domain, a
+ Hash Function, a Hash Range, and a Hash Selection Range.
+
+ * Approximative Selection
+
+ Selectors in any of the above categories may be approximated by
+ operations in the same or another category for the purposes of
+ implementation. For example, uniform pseudorandom Sampling may be
+ approximated by Hash-based Selection, using a suitable Hash
+ Function and Hash Domain. In this case, the closeness of the
+ approximation depends on the choice of Hash Function and Hash
+ Domain.
+
+ * Population
+
+ A Population is a Packet Stream or a subset of a Packet Stream. A
+ Population can be considered as a base set from which packets are
+ selected. An example is all packets in the Observed Packet Stream
+ that are observed within some specified time interval.
+
+ * Population Size
+
+ The Population Size is the number of all packets in the
+ Population.
+
+ * Sample Size
+
+ The Sample Size is a number of packets selected from the
+ Population by a Selector.
+
+ * Configured Selection Fraction
+
+ The Configured Selection Fraction is the expected ratio of the
+ Sample Size to the Population Size, as based on the configured
+ selection parameters.
+
+ * Attained Selection Fraction
+
+ The Attained Selection Fraction is the ratio of the actual Sample
+ Size to the Population Size. For some Sampling methods, the
+ Attained Selection Fraction can differ from the Configured
+ Selection Fraction due to, for example, the inherent statistical
+ variability in Sampling decisions of probabilistic Sampling and
+ Hash-based Selection. Nevertheless, for large Population Sizes
+ and properly configured Selectors, the Attained Selection Fraction
+ usually approaches the Configured Selection Fraction.
+
+
+
+Zseby, et al. Standards Track [Page 10]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+4. Categorization of Packet Selection Techniques
+
+ Packet selection techniques generate a subset of packets from an
+ Observed Packet Stream at an Observation Point. We distinguish
+ between Sampling and Filtering.
+
+ Sampling is targeted at the selection of a representative subset of
+ packets. The subset is used to infer knowledge about the whole set
+ of observed packets without processing them all. The selection can
+ depend on packet position, and/or on Packet Content, and/or on
+ (pseudo) random decisions.
+
+ Filtering selects a subset with common properties. This is used if
+ only a subset of packets is of interest. The properties can be
+ directly derived from the Packet Content, or depend on the treatment
+ given by the router to the packet. Filtering is a deterministic
+ operation. It depends on Packet Content or router treatment. It
+ never depends on packet position or on (pseudo) random decisions.
+
+ Note that a common technique to select packets is to compute a Hash
+ Function on some bits of the packet header and/or content and to
+ select it if the hash value falls in the Hash Selection Range. Since
+ hashing is a deterministic operation on the Packet Content, it is a
+ Filtering technique according to our categorization. Nevertheless,
+ Hash Functions are sometimes used to emulate random Sampling.
+ Depending on the chosen input bits, the Hash Function, and the Hash
+ Selection Range, this technique can be used to emulate the random
+ selection of packets with a given probability p. It is also a
+ powerful technique to consistently select the same packet subset at
+ multiple Observation Points [DuGr00].
+
+ The following table gives an overview of the schemes described in
+ this document and their categorization. X means that the
+ characteristic applies to the selection scheme. (X) denotes schemes
+ for which content-dependent and content-independent variants exist.
+ For instance, Property Match Filtering is typically based on Packet
+ Content and therefore is content dependent. But as explained in
+ Section 6.1, it may also depend on router state and then would be
+ independent of the content. It easily can be seen that only schemes
+ with both properties, content dependence and deterministic selection,
+ are considered as Filters.
+
+
+
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 11]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ Selection Scheme | Deterministic | Content -| Category
+ | Selection | Dependent|
+ ------------------------+---------------+----------+----------
+ Systematic | X | _ | Sampling
+ Count-based | | |
+ ------------------------+---------------+----------+----------
+ Systematic | X | - | Sampling
+ Time-based | | |
+ ------------------------+---------------+----------+----------
+ Random | - | - | Sampling
+ n-out-of-N | | |
+ ------------------------+---------------+----------+----------
+ Random | - | - | Sampling
+ uniform probabilistic | | |
+ ------------------------+---------------+----------+----------
+ Random | - | (X) | Sampling
+ non-uniform probabil. | | |
+ ------------------------+---------------+----------+----------
+ Random | - | (X) | Sampling
+ non-uniform Flow-State | | |
+ ------------------------+---------------+----------+----------
+ Property Match | X | (X) | Filtering
+ Filtering | | |
+ ------------------------+---------------+----------+----------
+ Hash function | X | X | Filtering
+ ------------------------+---------------+----------+----------
+
+ The categorization just introduced is mainly useful for the
+ definition of an information model describing Primitive Selectors.
+ More complex selection techniques can be described through the
+ composition of cascaded Sampling and Filtering operations. For
+ example, a packet selection that weights the selection probability on
+ the basis of the packet length can be described as a cascade of a
+ Filtering and a Sampling scheme. However, this descriptive approach
+ is not intended to be rigid: if a common and consolidated selection
+ practice turns out to be too complex to be described as a composition
+ of the mentioned building blocks, an ad hoc description can be
+ specified instead and added as a new scheme to the information model.
+
+5. Sampling
+
+ The deployment of Sampling techniques aims at the provisioning of
+ information about a specific characteristic of the parent Population
+ at a lower cost than a full census would demand. In order to plan a
+ suitable Sampling strategy, it is therefore crucial to determine the
+ needed type of information and the desired degree of accuracy in
+ advance.
+
+
+
+
+Zseby, et al. Standards Track [Page 12]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ First of all, it is important to know the type of metric that should
+ be estimated. The metric of interest can range from simple packet
+ counts [JePP92] up to the estimation of whole distributions of flow
+ characteristics (e.g., packet sizes) [ClPB93].
+
+ Second, the required accuracy of the information and with this, the
+ confidence that is aimed at, should be known in advance. For
+ instance, for usage-based accounting the required confidence for the
+ estimation of packet counters can depend on the monetary value that
+ corresponds to the transfer of one packet. That means that a higher
+ confidence could be required for expensive packet flows (e.g.,
+ premium IP service) than for cheaper flows (e.g., best effort). The
+ accuracy requirements for validating a previously agreed quality can
+ also vary extremely with the customer demands. These requirements
+ are usually determined by the service level agreement (SLA).
+
+ The Sampling method and the parameters in use must be clearly
+ communicated to all applications that use the measurement data. Only
+ with this knowledge a correct interpretation of the measurement
+ results can be ensured.
+
+ Sampling methods can be characterized by the Sampling algorithm, the
+ trigger type used for starting a Sampling interval, and the length of
+ the Sampling interval. These parameters are described here in
+ detail. The Sampling algorithm describes the basic process for
+ selection of samples. In accordance to [AmCa89] and [ClPB93], we
+ define the following basic Sampling processes.
+
+5.1. Systematic Sampling
+
+ Systematic Sampling describes the process of selecting the start
+ points and the duration of the selection intervals according to a
+ deterministic function. This can be for instance the periodic
+ selection of every k-th element of a trace but also the selection of
+ all packets that arrive at predefined points in time. Even if the
+ selection process does not follow a periodic function (e.g., if the
+ time between the Sampling intervals varies over time), we consider
+ this as systematic Sampling as long as the selection is
+ deterministic.
+
+ The use of systematic Sampling always involves the risk of biasing
+ the results. If the systematics in the Sampling process resemble
+ systematics in the observed stochastic process (occurrence of the
+ characteristic of interest in the network), there is a high
+ probability that the estimation will be biased. Systematics in the
+ observed process might not be known in advance.
+
+
+
+
+
+Zseby, et al. Standards Track [Page 13]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ Here only equally spaced schemes are considered, where triggers for
+ Sampling are periodic, either in time or in packet count. All
+ packets occurring in a selection interval (either in time or packet
+ count) beyond the trigger are selected.
+
+ Systematic count-based
+ In systematic count-based Sampling, the start and stop triggers for
+ the Sampling interval are defined in accordance to the spatial packet
+ position (packet count).
+
+ Systematic time-based
+ In systematic time-based Sampling, time-based start and stop triggers
+ are used to define the Sampling intervals. All packets are selected
+ that arrive at the Observation Point within the time intervals
+ defined by the start and stop triggers (i.e., arrival time of the
+ packet is larger than the start time and smaller than the stop time).
+
+ Both schemes are content-independent selection schemes. Content-
+ dependent deterministic Selectors are categorized as filters.
+
+5.2. Random Sampling
+
+ Random Sampling selects the starting points of the Sampling intervals
+ in accordance to a random process. The selection of elements is an
+ independent experiment. With this, unbiased estimations can be
+ achieved. In contrast to systematic Sampling, random Sampling
+ requires the generation of random numbers. One can differentiate two
+ methods of random Sampling: n-out-of-N Sampling and probabilistic
+ Sampling.
+
+5.2.1. n-out-of-N Sampling
+
+ In n-out-of-N Sampling, n elements are selected out of the parent
+ Population that consists of N elements. One example would be to
+ generate n different random numbers in the range [1,N] and select all
+ packets that have a packet position equal to one of the random
+ numbers. For this kind of Sampling, the Sample Size n is fixed.
+
+5.2.2. Probabilistic Sampling
+
+ In probabilistic Sampling, the decision whether or not an element is
+ selected is made in accordance to a predefined selection probability.
+ An example would be to flip a coin for each packet and select all
+ packets for which the coin showed the head. For this kind of
+ Sampling, the Sample Size can vary for different trials. The
+ selection probability does not necessarily have to be the same for
+ each packet. Therefore, we distinguish between uniform probabilistic
+ Sampling (with the same selection probability for all packets) and
+
+
+
+Zseby, et al. Standards Track [Page 14]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ non-uniform probabilistic Sampling (where the selection probability
+ can vary for different packets).
+
+5.2.2.1. Uniform Probabilistic Sampling
+
+ For Uniform Probabilistic Sampling, packets are selected
+ independently with a uniform probability p. This Sampling can be
+ count-driven, and is sometimes referred to as geometric random
+ Sampling, since the difference in count between successive selected
+ packets is an independent random variable with a geometric
+ distribution of mean 1/p. A time-driven analog, exponential random
+ Sampling, has the time between triggers exponentially distributed.
+
+ Both geometric and exponential random Sampling are examples of what
+ is known as additive random Sampling, defined as Sampling where the
+ intervals or counts between successive samples are independent
+ identically distributed random variables.
+
+5.2.2.2. Non-Uniform Probabilistic Sampling
+
+ This is a variant of Probabilistic Sampling in which the Sampling
+ probabilities can depend on the selection process input. This can be
+ used to weight Sampling probabilities in order, e.g., to boost the
+ chance of Sampling packets that are rare but are deemed important.
+ Unbiased estimators for quantitative statistics are recovered by
+ re-normalization of sample values; see [HT52].
+
+5.2.2.3. Non-Uniform Flow State Dependent Sampling
+
+ Another type of Sampling that can be classified as probabilistic
+ Non-Uniform is closely related to the flow concept as defined in
+ [RFC3917], and it is only used jointly with a flow monitoring
+ function (IPFIX Metering Process). Packets are selected, dependent
+ on a Selection State. The point, here, is that the Selection State
+ is determined also by the state of the flow the packet belongs to
+ and/or by the state of the other flows currently being monitored by
+ the associated flow monitoring function. An example for such an
+ algorithm is the "sample and hold" method described in [EsVa01]:
+
+ - If a packet accounts for a Flow Record that already exists in the
+ IPFIX flow recording process, it is selected (i.e., the Flow Record
+ is updated).
+
+ - If a packet doesn't account for any existing Flow Record, it is
+ selected with probability p. If it has been selected, a new Flow
+ Record has to be created.
+
+
+
+
+
+Zseby, et al. Standards Track [Page 15]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ A further algorithm that fits into the category of non-uniform flow
+ state dependent Sampling is described in [Moli03].
+
+ This type of Sampling is content dependent because the identification
+ of the flow the packet belongs to requires analyzing part of the
+ Packet Content. If the packet is selected, then it is passed as an
+ input to the IPFIX monitoring function (this is called "Local Export"
+ in [RFC5474]). Selecting the packet depending on the state of a flow
+ cache is useful when memory resources of the flow monitoring function
+ are scarce (i.e., there is no room to keep all the flows that have
+ been scheduled for monitoring).
+
+5.2.2.4. Configuration of Non-Uniform Probabilistic and Flow State
+ Sampling
+
+ Many different specific methods can be grouped under the terms
+ non-uniform probabilistic and flow state Sampling. Dependent on the
+ Sampling goal and the implemented scheme, a different number and type
+ of input parameters are required to configure such a scheme.
+
+ Some concrete proposals for such methods exist from the research
+ community (e.g., [EsVa01], [DuLT01], [Moli03]). Some of these
+ proposals are still in an early stage and need further investigations
+ to prove their usefulness and applicability. It is not our aim to
+ indicate preference among these methods. Instead, we only describe
+ here the basic methods and leave the specification of explicit
+ schemes and their parameters up to vendors (e.g., as an extension of
+ the information model).
+
+6. Filtering
+
+ Filtering is the deterministic selection of packets based on the
+ Packet Content, the treatment of the packet at the Observation Point,
+ or deterministic functions of these occurring in the Selection State.
+ The packet is selected if these quantities fall into a specified
+ range. The role of Filtering, as the word itself suggest, is to
+ separate all the packets having a certain property from those not
+ having it. A distinguishing characteristic from Sampling is that the
+ selection decision does not depend on the packet position in time or
+ in space, or on a random process.
+
+ We identify and describe in the following two Filtering techniques.
+
+6.1. Property Match Filtering
+
+ With this Filtering method, a packet is selected if specific fields
+ within the packet and/or properties of the router state equal a
+ predefined value. Possible filter fields are all IPFIX flow
+
+
+
+Zseby, et al. Standards Track [Page 16]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ attributes specified in [RFC5102]. Further fields can be defined by
+ proposing new information elements or defining vendor-specific
+ extensions.
+
+ A packet is selected if Field=Value. Masks and ranges are only
+ supported to the extent to which [RFC5102] allows them, e.g., by
+ providing explicit fields like the netmasks for source and
+ destination addresses.
+
+ AND operations are possible by concatenating filters, thus producing
+ a composite selection operation. In this case, the ordering in which
+ the Filtering happens is implicitly defined (outer filters come after
+ inner filters). However, as long as the concatenation is on filters
+ only, the result of the cascaded filter is independent from the
+ order, but the order may be important for implementation purposes, as
+ the first filter will have to work at a higher rate. In any case, an
+ implementation is not constrained to respect the filter ordering, as
+ long as the result is the same, and it may even implement the
+ composite Filtering in one single step.
+
+ OR operations are not supported with this basic model. More
+ sophisticated filters (e.g., supporting bitmasks, ranges, or OR
+ operations) can be realized as vendor-specific schemes.
+
+ All IPFIX flow attributes defined in [RFC5102] can be used for
+ Property Match Filtering. Further information elements can be easily
+ defined. Property match operations should be available for different
+ protocol portions of the packet header:
+
+ (i) IP header (excluding options in IPv4, stacked headers in
+ IPv6)
+
+ (ii) transport protocol header (e.g., TCP, UDP)
+
+ (iii) encapsulation headers (e.g., the MPLS label stack, if
+ present)
+
+ When the PSAMP Device offers Property Match Filtering, and, in its
+ usual capacity other than in performing PSAMP functions, identifies
+ or processes information from IP, transport protocol or encapsulation
+ protocols, then the information should be made available for
+ Filtering. For example, when a PSAMP Device routes based on
+ destination IP address, that field should be made available for
+ Filtering. Conversely, a PSAMP Device that does not route is not
+ expected to be able to locate an IP address within a packet, or make
+ it available for Filtering, although it may do so.
+
+
+
+
+
+Zseby, et al. Standards Track [Page 17]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ Since packet encryption conceals the real values of encrypted fields,
+ Property Match Filtering must be configurable to ignore encrypted
+ packets, when detected.
+
+ The Selection Process may support Filtering based on the properties
+ of the router state:
+
+ (i) Ingress interface at which a packet arrives equals a
+ specified value
+
+ (ii) Egress interface to which a packet is routed to equals a
+ specified value
+
+ (iii) Packet violated Access Control List (ACL) on the router
+
+ (iv) Failed Reverse Path Forwarding (RPF)
+
+ (v) Failed Resource Reservation Protocol (RSVP)
+
+ (vi) No route found for the packet
+
+ (vii) Origin Border Gateway Protocol (BGP) Autonomous System (AS)
+ [RFC4271] equals a specified value or lies within a given
+ range
+
+ (viii) Destination BGP AS equals a specified value or lies within
+ a given range
+
+ Packets that match the failed Reverse Path Forwarding (RPF) condition
+ are packets for which ingress Filtering failed as defined in
+ [RFC3704].
+
+ Packets that match the failed Resource Reservation Protocol (RSVP)
+ condition are packets that do not fulfill the RSVP specification as
+ defined in [RFC2205].
+
+ Router architectural considerations may preclude some information
+ concerning the packet treatment being available at line rate for
+ selection of packets. For example, the Selection Process may not be
+ implemented in the fast path that is able to access router state at
+ line rate. However, when Filtering follows Sampling (or some other
+ selection operation) in a Composite Selector, the rate of the Packet
+ Stream output from the sampler and input to the filter may be
+ sufficiently slow that the filter could select based on router state.
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 18]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+6.2. Hash-Based Filtering
+
+ A Hash Function h maps the Packet Content c, or some portion of it,
+ onto a Hash Range R. The packet is selected if h(c) is an element of
+ S, which is a subset of R called the Hash Selection Range. Thus,
+ Hash-Based selection is a particular case of Filtering. The object
+ is selected if c is in inv(h(S)). But for desirable Hash Functions,
+ the inverse image inv(h(S)) will be extremely complex, and hence h
+ would not be expressible as, say, a Property Match Filter or a simple
+ combination of these.
+
+ Hash-based Selection is mainly used to realize a coordinated packet
+ selection. That means that the same packets are selected at
+ different Observation Points. This is useful for instance to observe
+ the path (trajectory) that a packet took through the network or to
+ apply packet selection to passive one-way measurements.
+
+ A prerequisite for the method to work and to ensure interoperability
+ is that the same Hash Function with the same parameters (e.g., input
+ vector) is used at the Observation Points.
+
+ A consistent packet selection is also possible with Property Match
+ Filtering. Nevertheless, Hash-based Selection can be used to
+ approximate a random selection. The desired statistical properties
+ are discussed in Section 6.2.2.
+
+ In the following subsections, we give some application examples for
+ coordinated packet selection.
+
+6.2.1. Application Examples for Coordinated Packet Selection
+
+6.2.1.1. Trajectory Sampling
+
+ Trajectory Sampling is the consistent selection of a subset of
+ packets at either all of a set of Observation Points or none of them.
+ Trajectory Sampling is realized by Hash-based Selection if all
+ Observation Points in the set use a common Hash Function, Hash
+ Domain, and Selection Range. The Hash Domain comprises all or part
+ of the Packet Content that is invariant along the packet path.
+ Fields such as Time-to-Live, which is decremented per hop, and header
+ CRC [RFC1624], which is recalculated per hop, are thus excluded from
+ the Hash Domain. The Hash Domain needs to be wider than just a flow
+ key, if packets are to be selected quasi-randomly within flows.
+
+ The trajectory (or path) followed by a packet is reconstructed from
+ PSAMP reports on it that reach a Collector. Reports on a given
+ packet originating from different Observation Points are associated
+ by matching a label from the reports. The label may comprise that
+
+
+
+Zseby, et al. Standards Track [Page 19]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ portion of the invariant Packet Content that is reported, or possibly
+ some digest of the invariant Packet Content that is inserted into the
+ packet report at the Observation Point. Such a digest may be
+ constructed by applying a second Hash Function to the invariant
+ Packet Content. The reconstruction of trajectories and methods for
+ dealing with possible ambiguities due to label collisions (identical
+ labels reported for different packets) and potential loss of reports
+ in transmission are dealt with in [DuGr00], [DuGG02], and [DuGr04].
+
+ Applications of trajectory Sampling include (i) estimation of the
+ network path matrix, i.e., the traffic intensities according to
+ network path, broken down by flow key; (ii) detection of routing
+ loops, as indicated by self-intersecting trajectories; (iii) passive
+ performance measurement: prematurely terminating trajectories
+ indicate packet loss, packet one-way delay can be determined if
+ reports include (synchronized) timestamps of packet arrival at the
+ Observation Point; and (iv) network attack tracing, of the actual
+ paths taken by attack packets with spoofed source addresses.
+
+6.2.1.2. Passive One-Way Measurements
+
+ Coordinated packet selection can be applied for instance to one-way
+ delay measurements in order to reduce the required resources. In
+ one-way delay measurements, packets are collected at different
+ Observation Points in the network. A packet digest is generated for
+ each packet that helps to identify the packet. The packet digest and
+ the arrival time of the packet at the Observation Point are reported
+ to a process that calculates the delay. The delay is calculated by
+ subtracting the arrival time of the same packet at the Observation
+ Points (e.g., [ZsZC01]). With high data rates, capturing all packets
+ can require a lot of resources for storage, transfer, and processing.
+ To reduce resource consumption, packet selection methods can be
+ applied. But for such selection techniques, it has to be ensured
+ that the same packets are collected at different Observation Points.
+ Hash-based Selection provides this feature.
+
+6.2.1.3. Generation of Pseudorandom Numbers
+
+ Although pseudorandom number generators with well-understood
+ properties have been developed, they may not be the method of choice
+ in settings where computational resources are scarce. A convenient
+ alternative is to use Hash Functions of Packet Content as a source of
+ randomness. The hash (suitably re-normalized) is a pseudorandom
+ variate in the interval [0,1]. Other schemes may use packet fields
+ in iterators for pseudorandom numbers. However, the statistical
+ properties of an ideal packet selection law (such as independent
+
+
+
+
+
+Zseby, et al. Standards Track [Page 20]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ Sampling for different packets, or independence on Packet Content)
+ may not be exactly rendered by an implementation, but only
+ approximately so.
+
+ Use of Packet Content to generate pseudorandom variates shares with
+ non-uniform probabilistic Sampling (see Section 5.2.2.2 above) the
+ property that selection decisions depend on Packet Content. However,
+ there is a fundamental difference between the two. In the former
+ case, the content determines pseudorandom variates. In the latter
+ case, the content only determines the selection probabilities:
+ selection could then proceed, e.g., by use of random variates
+ obtained by an independent pseudorandom number generator.
+
+6.2.2. Desired Properties of Hash Functions
+
+ Here we formulate desired properties for Hash Functions. For this,
+ we have to distinguish whether a Hash Function is used for packet
+ selection or just as a packet digest. The main focus of this
+ document is on packet selection. Nevertheless, we also provide some
+ requirements for the use of Hash Functions as packet digest.
+
+ First of all, we need to define suitable input fields from the
+ packet. In accordance to [DuGr00], input field should be:
+
+ - invariant on the path
+ - variable among packets
+
+ Only if the input fields are the same at different Observation Points
+ is it possible to recognize the packet. The input fields should be
+ variable among packets in order to distribute the hash results over
+ the selection range.
+
+6.2.2.1. Requirements for Packet Selection
+
+ In accordance to considerations in [MoND05] and [Henk08], we define
+ the following desired properties of Hash Functions used for packet
+ selection:
+
+ (i) Speed: The Hash Function has to be applied to each packet
+ that traverses the Observation Point. Therefore, it has to
+ be fast in order to cope with the high packet rates. In
+ the ideal case, the hash operation should not influence the
+ performance on the PSAMP Device.
+
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 21]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ (ii) Uniformity: The Hash Function h should have good mixing
+ properties, in the sense that small changes in the input
+ (e.g., the flipping of a single bit) cause large changes in
+ the output (many bits change). Then any local clump of
+ values of c is spread widely over R by h, and so the
+ distribution of h(c) is fairly uniform even if the
+ distribution of c is not. Then the Attained Selection
+ Fraction gets close to the Configured Selection Fraction
+ (#S/#R), which can be tuned by choice of S.
+
+ (iii) Unbiasedness: The selection decision should be as
+ independent of packet attributes as possible. The set of
+ selected packets should not be biased towards a specific
+ type of packets.
+
+ (iv) Representativeness of sample: The sample should be as
+ representative as possible for the observed traffic.
+
+ (v) Non-linearity: The function should not be linear. This
+ increases the mixing properties (uniformity criterion). In
+ addition to this, it decreases the predictability of the
+ output and therefore the vulnerabilities against attacks.
+
+ (vi) Robustness against vulnerabilities: The Hash Function
+ should be robust against attacks. Potential
+ vulnerabilities are described in Section 6.2.3.
+
+6.2.2.2. Requirements for Packet Digesting
+
+ For digesting Packet Content for inclusion in a reported label, the
+ most important property is a low collision frequency. A secondary
+ requirement is the ability to accept variable-length input, in order
+ to allow inclusion of maximal amount of packet as input. Execution
+ speed is of secondary importance, since the digest need only be
+ formed from selected packets.
+
+6.2.3. Security Considerations for Hash Functions
+
+ A concern for Hash-based Selection is whether some large set of
+ related packets could be disproportionately sampled, i.e., that the
+ Attained Selection Fraction is significantly different from the
+ Configured Selection Fraction. This can happen either
+
+ (i) through unanticipated behavior in the Hash Function, or
+
+ (ii) because the packets had been deliberately crafted to have
+ this property.
+
+
+
+
+Zseby, et al. Standards Track [Page 22]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ The first point underlines the importance of using a Hash Function
+ with good mixing properties. For this, the statistical properties of
+ candidate Hash Functions need to be evaluated. Since the hash output
+ depends on the traffic mix, the evaluation should be done preferably
+ on up-to-date packet traces from the network in which the Hash-based
+ Selection will be deployed.
+
+ However, Hash Functions that perform well on typical traffic may not
+ be sufficiently strong to withstand attacks specifically targeted
+ against them. Such potential attacks have been described in
+ [GoRe07].
+
+ In the following subsections, we point out different potential attack
+ scenarios. We encourage the use of standardized Hash Functions.
+ Therefore, we assume that the Hash Function itself is public and
+ hence known to an attacker.
+
+ Nevertheless, we also assume the possibility of using a private input
+ parameter for the Hash Function that is kept secret. Such an input
+ parameter can for instance be attached to the hash input before the
+ hash operation is applied. With this, at least parts of the hash
+ operation remain secret.
+
+ For the attack scenarios, we assume that an attacker uses its
+ knowledge of the Hash Function to craft packets that are then
+ dispatched, either as the attack itself or to elicit further
+ information that can be used to refine the attack.
+
+ Two scenarios are considered. In the first scenario, the attacker
+ has no knowledge about whether or not the crafted packets are
+ selected. In the second scenario, the attacker uses some knowledge
+ of Sampling outcomes. The means by which this might be acquired is
+ discussed below. Some additional attacks that involve tampering with
+ Export Packets in transit, as opposed to attacking the PSAMP Device,
+ are discussed in [GoRe07].
+
+6.2.3.1. Vulnerabilities of Hash-Based Selection without Knowledge of
+ Selection Outcomes
+
+ (i) The Hash Function does not use a private parameter.
+
+ If no private input parameter is used, potential attackers can easily
+ calculate which packets result in which hash values. If the
+ selection range is public, an attacker can craft packets whose
+ selection properties are known in advance. If the selection range is
+ private, an attacker cannot determine whether a crafted packet is
+ selected. However, by computing the hash on different trial crafted
+ packets, and selecting those yielding a given hash value, the
+
+
+
+Zseby, et al. Standards Track [Page 23]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ attacker can construct an arbitrarily large set of distinct packets
+ with a common selection properties, i.e., packets that will be either
+ all selected or all not selected. This can be done whatever the
+ strength of the Hash Function.
+
+ (ii) The Hash Function is not cryptographically strong.
+
+ If the Hash Function is not cryptographically strong, it may be
+ possible to construct sequences of distinct packets with the common
+ selection property even if a private parameter is used.
+
+ An example is the standard CRC-32 Hash Function used with a private
+ modulus (but without a private string post-pended to the input). It
+ has weak mixing properties for low-order bits. Consequently, simply
+ by incrementing the hash input, one obtains distinct packets whose
+ hashes mostly fall in a narrow range, and hence are likely commonly
+ selected; see [GoRe07].
+
+ Suitable parameterization of the Hash Function can make such attacks
+ more difficult. For example, post-pending a private string to the
+ input before hashing with CRC-32 will give stronger mixing properties
+ over all bits of the input. However, with a Hash Function, such as
+ CRC-32, that is not cryptographically strong, the possibility of
+ discovering a method to construct packet sets with the common
+ selected property cannot be ruled out, even when a private modulus or
+ post-pended string is used.
+
+6.2.3.2. Vulnerabilities of Hash-Based Selection Using Knowledge of
+ Selection Outcomes
+
+ Knowledge of the selection outcomes of crafted packets can be used by
+ an attacker to more easily construct sets of packets that are
+ disproportionately sampled and/or are commonly selected. For this,
+ the attacker does not need any a priori knowledge about the Hash
+ Function or selection range.
+
+ There are several ways an attacker might acquire this knowledge about
+ the selection outcome:
+
+ (i) Billing Reports: If samples are used for billing purposes,
+ then the selection outcomes of packets may be able to be
+ inferred by correlating a crafted Packet Stream with the
+ billing reports that it generates. However, the rate at
+ which knowledge of selection outcomes can be acquired
+ depends on the temporal and spatial granularity of the
+ billing reports; being slower the more aggregated the
+ reports are.
+
+
+
+
+Zseby, et al. Standards Track [Page 24]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ (ii) Feedback from an Intrusion Detection System: e.g., a
+ botmaster adversary learns if his packets were detected by
+ the intrusion detection system by seeing if one of his bots
+ is blocked by the network.
+
+ (iii) Observation of the Report Stream: Export Packets sent
+ across a public network may be eavesdropped on by an
+ adversary. Encryption of the Export Packets provides only
+ a partial defense, since it may be possible to infer the
+ selection outcomes of packets by correlating a crafted
+ Packet Stream with the occurrence (not the content) of
+ packets in the export stream that it generates. The rate
+ at which such knowledge could be acquired is limited by the
+ temporal resolution at which reports can be associated with
+ packets, e.g., due to processing and propagation
+ variability, and difficulty in distinguishing report on
+ attack packets from those of background traffic, if
+ present. The association between packets and their reports
+ on which this depends could be removed by padding Export
+ Packets to a constant length and sending them at a constant
+ rate.
+
+ We now turn to attacks that can exploit knowledge of selection
+ outcomes. First, with a non-cryptographic Hash Function, knowledge
+ of selection outcomes for a trial stream may be used to further craft
+ a packet set with the common selection property. This has been
+ demonstrated for the modular hash f(x) = a x + b mod k, for private
+ parameters a, b, and k. With Sampling rate p, knowledge of the
+ Sampling outcomes of roughly 2/p is sufficient for the attack to
+ succeed, independent of the values of a, b, and k. With knowledge of
+ the selection outcomes of a larger number of packets, the parameters
+ a, b, and k can be determined; see [GoRe07].
+
+ A cryptographic Hash Function employing a private parameter and
+ operating in one of the pseudorandom function modes specified above
+ is not vulnerable to these attacks, even if the selection range is
+ known.
+
+6.2.3.3. Vulnerabilities to Replay Attacks
+
+ Since Hash-based Selection is deterministic, any packet or set of
+ packets with known selection properties can be replayed into a
+ network and experience the same selection outcomes provide the Hash
+ Function and its parameters are not changed. Repetition of a single
+ packet may be noticeable to other measurement methods if employed
+ (e.g., collection of flow statistics), whereas a set of distinct
+ packets that appears statistically similar to regular traffic may be
+ less noticeable.
+
+
+
+Zseby, et al. Standards Track [Page 25]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ Replay attacks may be mitigated by repeated changing of Hash Function
+ parameters. This also prevents attacks that exploit knowledge of
+ Sampling outcomes, at least if the parameters are changed at least as
+ fast as the knowledge can be acquired by an attacker. In order to
+ preserve the ability to perform trajectory Sampling, parameter change
+ would have to be simultaneous (or approximately so) across all
+ Observation Points.
+
+6.2.4. Choice of Hash Function
+
+ The specific choice of Hash Function represents a trade-off between
+ complexity and ease of implementation. Ideally, a cryptographically
+ strong Hash Function employing a private parameter and operating in
+ pseudorandom function mode as specified above would be used, yielding
+ a good emulation of a random packet selection at a target Sampling
+ rate, and giving maximal robustness against the attacks described in
+ the previous section. Unfortunately, there is currently no single
+ Hash Function that fulfills all the requirements.
+
+ As detailed in Section 6.2.3, only cryptographic Hash Functions
+ employing a private parameter operating in pseudorandom function mode
+ are sufficiently strong to withstand the range of conceivable
+ attacks. For example, fixed- or variable-length inputs could be
+ hashed using a block cipher (like Advanced Encryption Standard (AES))
+ in cipher-block-chaining mode. Fixed-length inputs could also be
+ hashed using an iterated cryptographic Hash Function (like MD5 or
+ SHA1), with a private initial vector. For variable-length inputs, an
+ iterated cryptographic Hash Function (like MD5 or SHA1) should employ
+ private string post-pended to the data in addition to a private
+ initial vector. For more details, see the "append-cascade"
+ construction of [BeCK96]. We encourage the use of such
+ cryptographically strong Hash Functions wherever possible.
+
+ However, a problem with using such functions is the low performance.
+ As shown for instance in [Henk08], the computation times for MD5 and
+ SHA are about 7-10 times higher compared to non-cryptographic
+ functions. The difference increases for small hash input lengths.
+
+ Therefore, it is not assumed that all PSAMP Devices will be capable
+ of applying a cryptographically strong Hash Function to every packet
+ at line rate. For this reason, the Hash Functions listed in this
+ section will be of a weaker variety. Future protocol extensions that
+ employ stronger Hash Functions are highly welcome.
+
+ Comparisons of Hash Functions for packet selection and packet
+ digesting with regard to various criteria can be found in [MoND05]
+ and [Henk08].
+
+
+
+
+Zseby, et al. Standards Track [Page 26]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+6.2.4.1. Hash Functions for Packet Selection
+
+ If Hash-based packet Selection is applied, the BOB function MUST be
+ used for packet selection operations in order to be compliant with
+ PSAMP. The specification of BOB is given in the appendix. Both the
+ parameter (the init value) and the selection range should be kept
+ private. The initial vector of the Hash Function MUST be
+ configurable out of band to prevent security breaches like exposure
+ of the initial vector content.
+
+ Other functions, such as CRC-32 and IPSX, MAY be used. The IPSX
+ function is described in the appendix, and the CRC-32 function is
+ described in [RFC1141]. If CRC-32 is used, the input should first be
+ post-pended with a private string that acts as a parameter, and the
+ modulus of the CRC should also be kept private.
+
+ IPSX is simple to implement and was correspondingly about an order of
+ magnitude faster to execute per packet than BOB or CRC-32 [MoND05].
+
+ All three Hash Functions evaluated showed relatively poor uniformity
+ with 16-byte input that was drawn from only invariant fields in the
+ IP and TCP/UDP headers (i.e., header fields that do not change from
+ hop to hop). IPSX is inherently limited to 16 bytes.
+
+ BOB and CRC-32 exhibit noticeably better uniformity when 4 or more
+ bytes from the payload are also included in the input [MoND05]. Also
+ with other criteria BOB performed quite well [Henk08].
+
+ Although the characteristics have been checked for different traffic
+ traces, results cannot be generalized to arbitrary traffic. Since
+ Hash-based Selection is a deterministic function on the Packet
+ Content, it can always be biased towards packets with specific
+ attributes. Furthermore, it should be noted that all Hash Functions
+ were evaluated only for IPv4.
+
+ None of these Hash Functions is recommended for cryptographic
+ purposes. Please also note that the use of a private parameter only
+ slightly reduces the vulnerabilities against attacks. As shown in
+ Section 6.2.3, functions that are not cryptographically strong (e.g.,
+ BOB and CRC) cannot prevent attackers from crafting packets that are
+ disproportionally selected even if a private parameter is used and
+ the selection range is kept secret.
+
+ As described in Section 6.2.2, the input bytes for the Hash Function
+ need to be invariant along the path the packet is traveling. Only
+ with this it is ensured that the same packets are selected at
+
+
+
+
+
+Zseby, et al. Standards Track [Page 27]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ different Observation Points. Furthermore, they should have a high
+ variability between different packets to generate a high variation in
+ the Hash Range. An evaluation of the variability of different packet
+ header fields can be found in [DuGr00], [HeSZ08], and [Henk08].
+
+ If a Hash-based Selection with the BOB function is used with IPv4
+ traffic, the following input bytes MUST be used.
+
+ - IP identification field
+
+ - Flags field
+
+ - Fragment offset
+
+ - Source IP address
+
+ - Destination IP address
+
+ - A configurable number of bytes from the IP payload, starting at
+ a configurable offset
+
+ Due to the lack of suitable IPv6 packet traces, all candidate Hash
+ Functions in [DuGr00], [MoND05], and [Henk08] were evaluated only for
+ IPv4. Due to the IPv6 header fields and address structure, it is
+ expected that there is less randomness in IPv6 packet headers than in
+ IPv4 headers. Nevertheless, the randomness of IPv6 traffic has not
+ yet been evaluated sufficiently to get any evidence. In addition to
+ this, IPv6 traffic profiles may change significantly in the future
+ when IPv6 is used by a broader community.
+
+ If a Hash-based Selection with the BOB function is used with IPv6
+ traffic, the following input bytes MUST be used.
+
+ - Payload length (2 bytes)
+
+ - Byte number 10,11,14,15,16 of the IPv6 source address
+
+ - Byte number 10,11,14,15,16 of the IPv6 destination address
+
+ - A configurable number of bytes from the IP payload, starting at
+ a configurable offset. It is recommended to use at least 4
+ bytes from the IP payload.
+
+ The payload itself is not changing during the path. Even if some
+ routers process some extension headers, they are not going to strip
+ them from the packet. Therefore, the payload length is invariant
+ along the path. Furthermore, it usually differs for different
+ packets. The IPv6 address has 16 bytes. The first part is the
+
+
+
+Zseby, et al. Standards Track [Page 28]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ network part and contains low variation. The second part is the host
+ part and contains higher variation. Therefore, the second part of
+ the address is used. Nevertheless, the uniformity has not been
+ checked for IPv6 traffic.
+
+6.2.4.2. Hash Functions Suitable for Packet Digesting
+
+ For this purpose also the BOB function SHOULD be used. Other
+ functions (such as CRC-32) MAY be used. Among the functions capable
+ of operating with variable-length input, BOB and CRC-32 have the
+ fastest execution, BOB being slightly faster. IPSX is not
+ recommended for digesting because it has a significantly higher
+ collision rate and takes only a fixed-length input.
+
+7. Parameters for the Description of Selection Techniques
+
+ This section gives an overview of different alternative selection
+ schemes and their required parameters. In order to be compliant with
+ PSAMP, at least one of proposed schemes MUST be implemented.
+
+ The decision whether or not to select a packet is based on a function
+ that is performed when the packet arrives at the selection process.
+ Packet selection schemes differ in the input parameters for the
+ selection process and the functions they require to do the packet
+ selection. The following table gives an overview.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 29]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ Scheme | Input parameters | Functions
+ ---------------+------------------------+-------------------
+ systematic | packet position | packet counter
+ count-based | Sampling pattern |
+ ---------------+------------------------+-------------------
+ systematic | arrival time | clock or timer
+ time-based | Sampling pattern |
+ ---------------+------------------------+-------------------
+ random | packet position | packet counter,
+ n-out-of-N | Sampling pattern | random numbers
+ | (random number list) |
+ ---------------+------------------------+-------------------
+ uniform | Sampling | random function
+ probabilistic | probability |
+ ---------------+------------------------+-------------------
+ non-uniform |e.g., packet position, | selection function,
+ probabilistic | Packet Content(parts) | probability calc.
+ ---------------+------------------------+-------------------
+ non-uniform |e.g., flow state, | selection function,
+ flow-state | Packet Content(parts) | probability calc.
+ ---------------+------------------------+-------------------
+ property | Packet Content(parts) | filter function or
+ match | or router state | state discovery
+ ---------------+------------------------+-------------------
+ hash-based | Packet Content(parts) | Hash Function
+ ---------------+------------------------+-------------------
+
+7.1. Description of Sampling Techniques
+
+ In this section, we define what elements are needed to describe the
+ most common Sampling techniques. Here the selection function is
+ predefined and given by the Selector ID.
+
+ Sampler Description:
+ SELECTOR_ID
+ SELECTOR_TYPE
+ SELECTOR_PARAMETERS
+
+ Where:
+
+ SELECTOR_ID:
+ Unique ID for the packet sampler.
+
+
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 30]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ SELECTOR_TYPE:
+ For Sampling processes, the SELECTOR TYPE defines what Sampling
+ algorithm is used.
+ Values: Systematic count-based | Systematic time-based | Random
+ |n-out-of-N | uniform probabilistic | Non-uniform probabilistic |
+ Non-uniform flow state
+
+ SELECTOR_PARAMETERS:
+ For Sampling processes, the SELECTOR PARAMETERS define the input
+ parameters for the process. Interval length in systematic Sampling
+ means that all packets that arrive in this interval are selected.
+ The spacing parameter defines the spacing in time or number of
+ packets between the end of one Sampling interval and the start of the
+ next succeeding interval.
+
+ Case n-out-of-N:
+ - Population Size N, Sample size n
+
+ Case systematic time-based:
+ - Interval length (in usec), Spacing (in usec)
+
+ Case systematic count-based:
+ - Interval length (in packets), Spacing (in packets)
+
+ Case uniform probabilistic (with equal probability per packet):
+ - Sampling probability p
+
+ Case non-uniform probabilistic:
+ - Calculation function for Sampling probability p (see also
+ Section 5.2.2.4)
+
+ Case flow state:
+ - Information reported for flow state Sampling is not defined in
+ this document (see also Section 5.2.2.4)
+
+7.2. Description of Filtering Techniques
+
+ In this section, we define what elements are needed to describe the
+ most common Filtering techniques. The structure closely parallels
+ the one presented for the Sampling techniques.
+
+ Filter Description:
+ SELECTOR_ID
+ SELECTOR_TYPE
+ SELECTOR_PARAMETERS
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 31]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ Where:
+
+ SELECTOR_ID:
+ Unique ID for the packet filter. The ID can be calculated under
+ consideration of the SELECTION SEQUENCE and a local ID.
+
+ SELECTOR_TYPE:
+ For Filtering processes, the SELECTOR TYPE defines what Filtering
+ type is used.
+ Values: Matching | Hashing | Router_state
+
+ SELECTOR_PARAMETERS:
+ For Filtering processes, the SELECTOR PARAMETERS define formally the
+ common property of the packet being filtered. For the filters of
+ type matching and hashing, the definitions have a lot of points in
+ common.
+
+ Values:
+
+ Case matching:
+ - Information Element (from [RFC5102])
+ - Value (type in accordance to [RFC5102])
+
+ In case of multiple match criteria, multiple "case matching" has to
+ be bound by a logical AND.
+
+ Case hashing:
+ - Hash Domain (input bits from packet)
+ - <Header type = IPv4>
+ - <Input bit specification, header part>
+ - <Header type = IPv6>
+ - <Input bit specification, header part>
+ - <payload byte number N>
+ - <Input bit specification, payload part>
+ - Hash Function
+ - Hash Function name
+ - Length of input key (eliminate 0x bytes)
+ - Output value (length M and bitmask)
+ - Hash Selection Range, as a list of non-overlapping
+ intervals [start value, end value] where value is in
+ [0,2^M-1]
+ - Additional parameters are dependent on specific Hash
+ Function (e.g., hash input bits (seed))
+
+ Notes to input bits for case hashing:
+
+ - Input bits can be from header part only, from the payload part
+ only, or from both.
+
+
+
+Zseby, et al. Standards Track [Page 32]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ - The bit specification, for the header part, can be specified for
+ IPv4 or IPv6 only, or both.
+
+ - In case of IPv4, the bit specification is a sequence of 20
+ hexadecimal numbers [00,FF] specifying a 20-byte bitmask to be
+ applied to the header.
+
+ - In case of IPv6, it is a sequence of 40 hexadecimal numbers [00,FF]
+ specifying a 40-byte bitmask to be applied to the header.
+
+ - The bit specification, for the payload part, is a sequence of
+ hexadecimal numbers [00,FF] specifying the bitmask to be applied to
+ the first N bytes of the payload, as specified by the previous
+ field. In case the hexadecimal number sequence is longer than N,
+ only the first N numbers are considered.
+
+ - In case the payload is shorter than N, the Hash Function cannot be
+ applied. Other options, like padding with zeros, may be considered
+ in the future.
+
+ - A Hash Function cannot be defined on the options field of the IPv4
+ header, neither on stacked headers of IPv6.
+
+ - The Hash Selection Range defines a range of hash values (out of all
+ possible results of the hash operation). If the hash result for a
+ specific packet falls in this range, the packet is selected. If
+ the value is outside the range, the packet is not selected. For
+ example, if the selection interval specification is [1:3], [6:9]
+ all packets are selected for which the hash result is 1,2,3,6,7,8,
+ or 9. In all other cases, the packet is not selected.
+
+ Case router state:
+
+ - Ingress interface at which the packet arrives equals a specified
+ value
+
+ - Egress interface to which the packet is routed equals a specified
+ value
+
+ - Packet violated Access Control List (ACL) on the router
+
+ - Reverse Path Forwarding (RPF) failed for the packet
+
+ - Resource Reservation is insufficient for the packet
+
+ - No route is found for the packet
+
+ - Origin AS equals a specified value or lies within a given range
+
+
+
+Zseby, et al. Standards Track [Page 33]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ - Destination AS equals a specified value or lies within a given
+ range
+
+ Note to case router state:
+
+ - All router state entries can be linked by AND operators
+
+8. Composite Techniques
+
+ Composite schemes are realized by combining the Selector IDs into a
+ Selection Sequence. The Selection Sequence contains all Selector IDs
+ that are applied to the Packet Stream subsequently. Some examples of
+ composite schemes are reported below.
+
+8.1. Cascaded Filtering->Sampling or Sampling->Filtering
+
+ If a filter precedes a Sampling process, the role of Filtering is to
+ create a set of "parent populations" from a single stream that can
+ then be fed independently to different Sampling functions, with
+ different parameters tuned for the Population itself (e.g., if
+ streams of different intensity result from Filtering, it may be good
+ to have different Sampling rates). If Filtering follows a Sampling
+ process, the same Selection Fraction and type are applied to the
+ whole stream, independently of the relative size of the streams
+ resulting from the Filtering function. Moreover, also packets not
+ destined to be selected in the Filtering operation will "load" the
+ Sampling function. So, in principle, Filtering before Sampling
+ allows a more accurate tuning of the Sampling procedure, but if
+ filters are too complex to work at full line rate (e.g., because they
+ have to access router state information), Sampling before Filtering
+ may be a need.
+
+8.2. Stratified Sampling
+
+ Stratified Sampling is one example for using a composite technique.
+ The basic idea behind stratified Sampling is to increase the
+ estimation accuracy by using a priori information about correlations
+ of the investigated characteristic with some other characteristic
+ that is easier to obtain. The a priori information is used to
+ perform an intelligent grouping of the elements of the parent
+ Population. In this manner, a higher estimation accuracy can be
+ achieved with the same sample size or the sample size can be reduced
+ without reducing the estimation accuracy.
+
+ Stratified Sampling divides the Sampling process into multiple steps.
+ First, the elements of the parent Population are grouped into subsets
+ in accordance to a given characteristic. This grouping can be done
+ in multiple steps. Then samples are taken from each subset.
+
+
+
+Zseby, et al. Standards Track [Page 34]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ The stronger the correlation between the characteristic used to
+ divide the parent Population (stratification variable) and the
+ characteristic of interest (for which an estimate is sought after),
+ the easier is the consecutive Sampling process and the higher is the
+ stratification gain. For instance, if the dividing characteristic
+ were equal to the investigated characteristic, each element of the
+ subgroup would be a perfect representative of that characteristic.
+ In this case, it would be sufficient to take one arbitrary element
+ out of each subgroup to get the actual distribution of the
+ characteristic in the parent Population. Therefore, stratified
+ Sampling can reduce the costs for the Sampling process (i.e., the
+ number of samples needed to achieve a given level of confidence).
+
+ For stratified Sampling, one has to specify classification rules for
+ grouping the elements into subgroups and the Sampling scheme that is
+ used within the subgroups. The classification rules can be expressed
+ by multiple filters. For the Sampling scheme within the subgroups,
+ the parameters have to be specified as described above. The use of
+ stratified Sampling methods for measurement purposes is described for
+ instance in [ClPB93] and [Zseb03].
+
+9. Security Considerations
+
+ Security considerations concerning the choice of a Hash Function for
+ Hash-based Selection have been discussed in Section 6.2.3. That
+ section discussed a number of potential attacks to craft Packet
+ Streams that are disproportionately detected and/or discover the Hash
+ Function parameters, the vulnerabilities of different Hash Functions
+ to these attacks, and practices to minimize these vulnerabilities.
+
+ In addition to this, a user can gain knowledge about the start and
+ stop triggers in time-based systematic Sampling, e.g., by sending
+ test packets. This knowledge might allow users to modify their send
+ schedule in a way that their packets are disproportionately selected
+ or not selected [GoRe07].
+
+ For random Sampling, a cryptographically strong random number
+ generator should be used in order to prevent that an advisory can
+ predict the selection decision [GoRe07].
+
+ Further security threats can occur when Sampling parameters are
+ configured or communicated to other entities. The configuration and
+ reporting of Sampling parameters are out of scope of this document.
+ Therefore, the security threats that originate from this kind of
+ communication cannot be assessed with the information given in this
+ document.
+
+
+
+
+
+Zseby, et al. Standards Track [Page 35]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ Some of these threats can probably be addressed by keeping
+ configuration information confidential and by authenticating entities
+ that configure Sampling. Nevertheless, a full analysis and
+ assessment of threats for configuration and reporting has to be done
+ if configuration or reporting methods are proposed.
+
+10. Contributors
+
+ Sharon Goldberg contributed to the security considerations for Hash-
+ based Selection.
+
+ Sharon Goldberg
+ Department of Electrical Engineering
+ Princeton University
+ F210-K EQuad
+ Princeton, NJ 08544,
+ USA
+ EMail: goldbe@princeton.edu
+
+11. Acknowledgments
+
+ We would like to thank the PSAMP group, especially Benoit Claise and
+ Stewart Bryant, for fruitful discussions and for proofreading the
+ document. We thank Sharon Goldberg for her input on security issues
+ concerning Hash-based Selection.
+
+12. References
+
+12.1. Normative References
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+12.2. Informative References
+
+ [AmCa89] Paul D. Amer, Lillian N. Cassel, "Management of Sampled
+ Real-Time Network Measurements", 14th Conference on Local
+ Computer Networks, October 1989, Minneapolis, pages 62-68,
+ IEEE, 1989.
+
+ [BeCK96] M. Bellare, R. Canetti and H. Krawczyk, "Pseudorandom
+ Functions Revisited: The Cascade Construction and its
+ Concrete Security", Symposium on Foundations of Computer
+ Science, 1996.
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 36]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ [ClPB93] K.C. Claffy, George C. Polyzos, Hans-Werner Braun,
+ "Application of Sampling Methodologies to Network Traffic
+ Characterization", Proceedings of ACM SIGCOMM'93, San
+ Francisco, CA, USA, September 13 - 17, 1993.
+
+ [DuGG02] N.G. Duffield, A. Gerber, M. Grossglauser, "Trajectory
+ Engine: A Backend for Trajectory Sampling", IEEE Network
+ Operations and Management Symposium 2002, Florence, Italy,
+ April 15-19, 2002.
+
+ [DuGr00] N.G. Duffield, M. Grossglauser, "Trajectory Sampling for
+ Direct Traffic Observation", Proceedings of ACM SIGCOMM
+ 2000, Stockholm, Sweden, August 28 - September 1, 2000.
+
+ [DuGr04] N.G. Duffield and M. Grossglauser "Trajectory Sampling
+ with Unreliable Reporting", Proc IEEE Infocom 2004, Hong
+ Kong, March 2004.
+
+ [DuLT01] N.G. Duffield, C. Lund, and M. Thorup, "Charging from
+ Sampled Network Usage", ACM Internet Measurement Workshop
+ IMW 2001, San Francisco, USA, November 1-2, 2001.
+
+ [EsVa01] C. Estan and G. Varghese, "New Directions in Traffic
+ Measurement and Accounting", ACM SIGCOMM Internet
+ Measurement Workshop 2001, San Francisco (CA) Nov. 2001.
+
+ [GoRe07] S. Goldberg, J. Rexford, "Security Vulnerabilities and
+ Solutions for Packet Sampling", IEEE Sarnoff Symposium,
+ Princeton, NJ, May 2007.
+
+ [HT52] D.G. Horvitz and D.J. Thompson, "A Generalization of
+ Sampling without replacement from a Finite Universe" J.
+ Amer. Statist. Assoc. Vol. 47, pp. 663-685, 1952.
+
+ [Henk08] Christian Henke, Evaluation of Hash Functions for
+ Multipoint Sampling in IP Networks, Diploma Thesis, TU
+ Berlin, April 2008.
+
+ [HeSZ08] Christian Henke, Carsten Schmoll, Tanja Zseby, Evaluation
+ of Header Field Entropy for Hash-Based Packet Selection,
+ Proceedings of Passive and Active Measurement Conference
+ PAM 2008, Cleveland, Ohio, USA, April 2008.
+
+ [Jenk97] B. Jenkins, "Algorithm Alley", Dr. Dobb's Journal,
+ September 1997.
+ http://burtleburtle.net/bob/hash/doobs.html.
+
+
+
+
+
+Zseby, et al. Standards Track [Page 37]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ [JePP92] Jonathan Jedwab, Peter Phaal, Bob Pinna, "Traffic
+ Estimation for the Largest Sources on a Network, Using
+ Packet Sampling with Limited Storage", HP technical
+ report, Managemenr, Mathematics and Security Department,
+ HP Laboratories, Bristol, March 1992,
+ http://www.hpl.hp.com/techreports/92/HPL-92-35.html.
+
+ [Moli03] M. Molina, "A scalable and efficient methodology for flow
+ monitoring in the Internet", International Teletraffic
+ Congress (ITC-18), Berlin, Sep. 2003.
+
+ [MoND05] M. Molina, S. Niccolini, N.G. Duffield, "A Comparative
+ Experimental Study of Hash Functions Applied to Packet
+ Sampling", International Teletraffic Congress (ITC-19),
+ Beijing, August 2005.
+
+ [RFC1141] Mallory, T. and A. Kullberg, "Incremental updating of the
+ Internet checksum", RFC 1141, January 1990.
+
+ [RFC1624] Rijsinghani, A., Ed., "Computation of the Internet
+ Checksum via Incremental Update", RFC 1624, May 1994.
+
+ [RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S.
+ Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
+ Functional Specification", RFC 2205, September 1997.
+
+ [RFC3704] Baker, F. and P. Savola, "Ingress Filtering for Multihomed
+ Networks", BCP 84, RFC 3704, March 2004.
+
+ [RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander,
+ "Requirements for IP Flow Information Export (IPFIX)", RFC
+ 3917, October 2004.
+
+ [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
+ Border Gateway Protocol 4 (BGP-4)", RFC 4271, January
+ 2006.
+
+ [RFC5101] Claise, B., Ed., "Specification of the IP Flow Information
+ Export (IPFIX) Protocol for the Exchange of IP Traffic
+ Flow Information", RFC 5101, January 2008.
+
+ [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J.
+ Meyer, "Information Model for IP Flow Information Export",
+ RFC 5102, January 2008.
+
+ [RFC5474] Duffield, N., Ed., "A Framework for Packet Selection and
+ Reporting", RFC 5474, March 2009.
+
+
+
+
+Zseby, et al. Standards Track [Page 38]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ [RFC5476] Claise, B., Ed., "Packet Sampling (PSAMP) Protocol
+ Specifications", RFC 5476, March 2009.
+
+ [RFC5477] Dietz, T., Claise, B., Aitken, P., Dressler, F., and G.
+ Carle, "Information Model for Packet Sampling Exports",
+ RFC 5477, March 2009.
+
+ [Zseb03] T. Zseby, "Stratification Strategies for Sampling-based
+ Non-intrusive Measurement of One-way Delay", Proceedings
+ of Passive and Active Measurement Workshop (PAM 2003), La
+ Jolla, CA, USA, pp. 171-179, April 2003.
+
+ [ZsZC01] Tanja Zseby, Sebastian Zander, Georg Carle. Evaluation of
+ Building Blocks for Passive One-way-delay Measurements.
+ Proceedings of Passive and Active Measurement Workshop
+ (PAM 2001), Amsterdam, The Netherlands, April 23-24, 2001.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 39]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+Appendix A. Hash Functions
+
+A.1. IP Shift-XOR (IPSX) Hash Function
+
+ The IPSX Hash Function is tailored for acting on IP version 4
+ packets. It exploits the structure of IP packets and in particular
+ the variability expected to be exhibited within different fields of
+ the IP packet in order to furnish a hash value with little apparent
+ correlation with individual packet fields. Fields from the IPv4 and
+ TCP/UDP headers are used as input. The IPSX Hash Function uses a
+ small number of simple instructions.
+
+ Input parameters: None
+
+ Built-in parameters: None
+
+ Output: The output of the IPSX is a 16-bit number
+
+ Functioning:
+
+ The functioning can be divided into two parts: input selection, whose
+ forms are composite input from various portions of the IP packet,
+ followed by computation of the hash on the composite.
+
+ Input Selection:
+
+ The raw input is drawn from the first 20 bytes of the IP packet
+ header and the first 8 bytes of the IP payload. If IP options are
+ not used, the IP header has 20 bytes, and hence the two portions
+ adjoin and comprise the first 28 bytes of the IP packet. We now use
+ the raw input as four 32-bit subportions of these 28 bytes. We
+ specify the input by bit offsets from the start of IP header or
+ payload.
+
+ f1 = bits 32 to 63 of the IP header, comprising the IP identification
+ field, flags, and fragment offset.
+
+ f2 = bits 96 to 127 of the IP header, the source IP address.
+
+ f3 = bits 128 to 159 of the IP header, the destination IP address.
+
+ f4 = bits 32 to 63 of the IP payload. For a TCP packet, f4 comprises
+ the TCP sequence number followed by the message length. For a
+ UDP packet, f4 comprises the UDP checksum.
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 40]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ Hash Computation:
+
+ The hash is computed from f1, f2, f3, and f4 by a combination of XOR
+ (^), right shift (>>), and left shift (<<) operations. The
+ intermediate quantities h1, v1, and v2 are 32-bit numbers.
+
+ 1. v1 = f1 ^ f2;
+ 2. v2 = f3 ^ f4;
+ 3. h1 = v1 << 8;
+ 4. h1 ^= v1 >> 4;
+ 5. h1 ^= v1 >> 12;
+ 6. h1 ^= v1 >> 16;
+ 7. h1 ^= v2 << 6;
+ 8. h1 ^= v2 << 10;
+ 9. h1 ^= v2 << 14;
+ 10. h1 ^= v2 >> 7;
+
+ The output of the hash is the least significant 16 bits of h1.
+
+A.2. BOB Hash Function
+
+ The BOB Hash Function is a Hash Function designed for having each bit
+ of the input affecting every bit of the return value and using both
+ 1-bit and 2-bit deltas to achieve the so-called avalanche effect
+ [Jenk97]. This function was originally built for hash table lookup
+ with fast software implementation.
+
+ Input parameters:
+
+ The input parameters of such a function are:
+
+ - the length of the input string (key) to be hashed, in bytes.
+ The elementary input blocks of BOB hash are the single bytes;
+ therefore, no padding is needed.
+
+ - an init value (an arbitrary 32-bit number).
+
+ Built-in parameters:
+
+ The BOB hash uses the following built-in parameter:
+
+ - the golden ratio (an arbitrary 32-bit number used in the Hash
+ Function computation: its purpose is to avoid mapping all zeros
+ to all zeros).
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 41]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ Note: The mix sub-function (see mix (a,b,c) macro in the reference
+ code below) has a number of parameters governing the shifts in the
+ registers. The one presented is not the only possible choice.
+
+ It is an open point whether these may be considered additional
+ built-in parameters to specify at function configuration.
+
+ Output:
+
+ The output of the BOB function is a 32-bit number. It should be
+ specified:
+
+ - A 32-bit mask to apply to the output
+
+ - The Selection Range as a list of non-overlapping intervals
+ [start value, end value] where value is in [0,2^32]
+
+ Functioning:
+
+ The hash value is obtained computing first an initialization of an
+ internal state (composed of three 32-bit numbers, called a, b, c in
+ the reference code below), then, for each input byte of the key the
+ internal state is combined by addition and mixed using the mix sub-
+ function. Finally, the internal state mixed one last time and the
+ third number of the state (c) is chosen as the return value.
+
+ typedef unsigned long int ub4; /* unsigned 4-byte quantities
+ */
+ typedef unsigned char ub1; /* unsigned 1-byte quantities
+ */
+
+ #define hashsize(n) ((ub4)1<<(n))
+ #define hashmask(n) (hashsize(n)-1)
+
+ /* ------------------------------------------------------
+ mix -- mix three 32-bit values reversibly.
+
+ For every delta with one or two bits set, and the deltas of
+ all three high bits or all three low bits, whether the original
+ value of a,b,c is almost all zero or is uniformly distributed,
+ * If mix() is run forward or backward, at least 32 bits in
+ a,b,c have at least 1/4 probability of changing.
+ * If mix() is run forward, every bit of c will change between
+ 1/3 and 2/3 of the time (well, 22/100 and 78/100 for some 2-
+ bit deltas) mix() was built out of 36 single-cycle latency
+ instructions in a structure that could support 2x parallelism,
+ like so:
+
+
+
+
+Zseby, et al. Standards Track [Page 42]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ a -= b;
+ a -= c; x = (c>>13);
+ b -= c; a ^= x;
+ b -= a; x = (a<<8);
+ c -= a; b ^= x;
+ c -= b; x = (b>>13);
+ ...
+ Unfortunately, superscalar Pentiums and Sparcs can't take
+ advantage of that parallelism. They've also turned some of
+ those single-cycle latency instructions into multi-cycle latency
+ instructions
+
+ ------------------------------------------------------------*/
+
+ #define mix(a,b,c) \
+ { \
+ a -= b; a -= c; a ^= (c>>13); \
+ b -= c; b -= a; b ^= (a<<8); \
+ c -= a; c -= b; c ^= (b>>13); \
+ a -= b; a -= c; a ^= (c>>12); \
+ b -= c; b -= a; b ^= (a<<16); \
+ c -= a; c -= b; c ^= (b>>5); \
+ a -= b; a -= c; a ^= (c>>3); \
+ b -= c; b -= a; b ^= (a<<10); \
+ c -= a; c -= b; c ^= (b>>15); \
+ }
+
+ /* -----------------------------------------------------------
+ hash() -- hash a variable-length key into a 32-bit value
+ k : the key (the unaligned variable-length array of bytes)
+ len : the length of the key, counting by bytes
+ initval : can be any 4-byte value
+ Returns a 32-bit value. Every bit of the key affects every bit
+ of the return value. Every 1-bit and 2-bit delta achieves
+ avalanche. About 6*len+35 instructions.
+
+ The best hash table sizes are powers of 2. There is no need to do
+ mod a prime (mod is so slow!). If you need less than 32 bits, use a
+ bitmask. For example, if you need only 10 bits, do h = (h &
+ hashmask(10)), in which case, the hash table should have hashsize(10)
+ elements.
+
+ If you are hashing n strings (ub1 **)k, do it like this: for (i=0,
+ h=0; i<n; ++i) h = hash( k[i], len[i], h);
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 43]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ By Bob Jenkins, 1996. bob_jenkins@burtleburtle.net. You may use
+ this code any way you wish, private, educational, or commercial.
+ It's free. See http://burtleburtle.net/bob/hash/evahash.html.
+ Use for hash table lookup, or anything where one collision in 2^^32
+ is acceptable. Do NOT use for cryptographic purposes.
+ ----------------------------------------------------------- */
+
+ ub4 bob_hash(k, length, initval)
+ register ub1 *k; /* the key */
+ register ub4 length; /* the length of the key */
+ register ub4 initval; /* an arbitrary value */
+ {
+ register ub4 a,b,c,len;
+
+ /* Set up the internal state */
+ len = length;
+ a = b = 0x9e3779b9; /*the golden ratio; an arbitrary value
+ */
+ c = initval; /* another arbitrary value */
+
+ /*------------------------------------ handle most of the key */
+
+ while (len >= 12)
+ {
+ a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16)
+ +((ub4)k[3]<<24));
+ b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16)
+ +((ub4)k[7]<<24));
+ c += (k[8] +((ub4)k[9]<<8)
+ +((ub4)k[10]<<16)+((ub4)k[11]<<24));
+ mix(a,b,c);
+ k += 12; len -= 12;
+ }
+
+ /*---------------------------- handle the last 11 bytes */
+ c += length;
+ switch(len) /* all the case statements fall through*/
+ {
+ case 11: c+=((ub4)k[10]<<24);
+ case 10: c+=((ub4)k[9]<<16);
+ case 9 : c+=((ub4)k[8]<<8);
+ /* the first byte of c is reserved for the length */
+ case 8 : b+=((ub4)k[7]<<24);
+ case 7 : b+=((ub4)k[6]<<16);
+ case 6 : b+=((ub4)k[5]<<8);
+ case 5 : b+=k[4];
+ case 4 : a+=((ub4)k[3]<<24);
+ case 3 : a+=((ub4)k[2]<<16);
+
+
+
+Zseby, et al. Standards Track [Page 44]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+ case 2 : a+=((ub4)k[1]<<8);
+ case 1 : a+=k[0];
+ /* case 0: nothing left to add */
+ }
+ mix(a,b,c);
+ /*-------------------------------- report the result */
+ return c;
+ }
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 45]
+
+RFC 5475 Techniques for IP Packet Selection March 2009
+
+
+Authors' Addresses
+
+ Tanja Zseby
+ Fraunhofer Institute for Open Communication Systems
+ Kaiserin-Augusta-Allee 31
+ 10589 Berlin
+ Germany
+ Phone: +49-30-34 63 7153
+ EMail: tanja.zseby@fokus.fraunhofer.de
+
+ Maurizio Molina
+ DANTE
+ City House
+ 126-130 Hills Road
+ Cambridge CB21PQ
+ United Kingdom
+ Phone: +44 1223 371 300
+ EMail: maurizio.molina@dante.org.uk
+
+ Nick Duffield
+ AT&T Labs - Research
+ Room B-139
+ 180 Park Ave
+ Florham Park, NJ 07932
+ USA
+ Phone: +1 973-360-8726
+ EMail: duffield@research.att.com
+
+ Saverio Niccolini
+ Network Laboratories, NEC Europe Ltd.
+ Kurfuerstenanlage 36
+ 69115 Heidelberg
+ Germany
+ Phone: +49-6221-9051118
+ EMail: saverio.niccolini@netlab.nec.de
+
+ Frederic Raspall
+ EPSC-UPC
+ Dept. of Telematics
+ Av. del Canal Olimpic, s/n
+ Edifici C4
+ E-08860 Castelldefels, Barcelona
+ Spain
+ EMail: fredi@entel.upc.es
+
+
+
+
+
+
+
+Zseby, et al. Standards Track [Page 46]
+