diff options
Diffstat (limited to 'doc/rfc/rfc5475.txt')
-rw-r--r-- | doc/rfc/rfc5475.txt | 2579 |
1 files changed, 2579 insertions, 0 deletions
diff --git a/doc/rfc/rfc5475.txt b/doc/rfc/rfc5475.txt new file mode 100644 index 0000000..1168e89 --- /dev/null +++ b/doc/rfc/rfc5475.txt @@ -0,0 +1,2579 @@ + + + + + + +Network Working Group T. Zseby +Request for Comments: 5475 Fraunhofer FOKUS +Category: Standards Track M. Molina + DANTE + N. Duffield + AT&T Labs - Research + S. Niccolini + NEC Europe Ltd. + F. Raspall + EPSC-UPC + March 2009 + + + Sampling and Filtering Techniques for IP Packet Selection + +Status of This Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (c) 2009 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents in effect on the date of + publication of this document (http://trustee.ietf.org/license-info). + Please review these documents carefully, as they describe your rights + and restrictions with respect to this document. + + This document may contain material from IETF Documents or IETF + Contributions published or made publicly available before November + 10, 2008. The person(s) controlling the copyright in some of this + material may not have granted the IETF Trust the right to allow + modifications of such material outside the IETF Standards Process. + Without obtaining an adequate license from the person(s) controlling + the copyright in such materials, this document may not be modified + outside the IETF Standards Process, and derivative works of it may + not be created outside the IETF Standards Process, except to format + it for publication as an RFC or to translate it into languages other + than English. + + + + + + +Zseby, et al. Standards Track [Page 1] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + +Abstract + + This document describes Sampling and Filtering techniques for IP + packet selection. It provides a categorization of schemes and + defines what parameters are needed to describe the most common + selection schemes. Furthermore, it shows how techniques can be + combined to build more elaborate packet Selectors. The document + provides the basis for the definition of information models for + configuring selection techniques in Metering Processes and for + reporting the technique in use to a Collector. + +Table of Contents + + 1. Introduction ....................................................3 + 1.1. Conventions Used in This Document ..........................4 + 2. PSAMP Documents Overview ........................................4 + 3. Terminology .....................................................4 + 3.1. Observation Points, Packet Streams, and Packet Content .....4 + 3.2. Selection Process ..........................................5 + 3.3. Reporting ..................................................7 + 3.4. Metering Process ...........................................7 + 3.5. Exporting Process ..........................................8 + 3.6. PSAMP Device ...............................................8 + 3.7. Collector ..................................................8 + 3.8. Selection Methods ..........................................8 + 4. Categorization of Packet Selection Techniques ..................11 + 5. Sampling .......................................................12 + 5.1. Systematic Sampling .......................................13 + 5.2. Random Sampling ...........................................14 + 5.2.1. n-out-of-N Sampling ................................14 + 5.2.2. Probabilistic Sampling .............................14 + 6. Filtering ......................................................16 + 6.1. Property Match Filtering ..................................16 + 6.2. Hash-Based Filtering ......................................19 + 6.2.1. Application Examples for Coordinated Packet + Selection ..........................................19 + 6.2.2. Desired Properties of Hash Functions ...............21 + 6.2.3. Security Considerations for Hash Functions .........22 + 6.2.4. Choice of Hash Function ............................26 + 7. Parameters for the Description of Selection Techniques .........29 + 7.1. Description of Sampling Techniques ........................30 + 7.2. Description of Filtering Techniques .......................31 + 8. Composite Techniques ...........................................34 + 8.1. Cascaded Filtering->Sampling or Sampling->Filtering .......34 + 8.2. Stratified Sampling .......................................34 + 9. Security Considerations ........................................35 + 10. Contributors ..................................................36 + 11. Acknowledgments ...............................................36 + + + +Zseby, et al. Standards Track [Page 2] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + 12. References ....................................................36 + 12.1. Normative References .....................................36 + 12.2. Informative References ...................................36 + Appendix A. Hash Functions ........................................40 + A.1 IP Shift-XOR (IPSX) Hash Function..............................40 + A.2 BOB Hash Function..............................................41 + +1. Introduction + + There are two main drivers for the evolution in measurement + infrastructures and their underlying technology. First, network data + rates are increasing, with a concomitant growth in measurement data. + Second, the growth is compounded by the demand of measurement-based + applications for increasingly fine-grained traffic measurements. + Devices that perform the measurements, require increasingly + sophisticated and resource-intensive measurement capabilities, + including the capture of packet headers or even parts of the payload, + and classification for flow analysis. All these factors can lead to + an overwhelming amount of measurement data, resulting in high demands + on resources for measurement, storage, transfer, and post processing. + + The sustained capture of network traffic at line rate can be + performed by specialized measurement hardware. However, the cost of + the hardware and the measurement infrastructure required to + accommodate the measurements preclude this as a ubiquitous approach. + Instead, some form of data reduction at the point of measurement is + necessary. + + This can be achieved by an intelligent packet selection through + Sampling or Filtering. Another way to reduce the amount of data is + to use aggregation techniques (not addressed in this document). The + motivation for Sampling is to select a representative subset of + packets that allow accurate estimates of properties of the unsampled + traffic to be formed. The motivation for Filtering is to remove all + packets that are not of interest. Aggregation combines data and + allows compact pre-defined views of the traffic. Examples of + applications that benefit from packet selection are given in + [RFC5474]. Aggregation techniques are out of scope of this document. + + + + + + + + + + + + + +Zseby, et al. Standards Track [Page 3] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + +1.1. Conventions Used in This Document + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in RFC 2119 [RFC2119]. + +2. PSAMP Documents Overview + + This document is one out of a series of documents from the PSAMP + group. + + [RFC5474]: "A Framework for Packet Selection and Reporting" describes + the PSAMP framework for network elements to select subsets of packets + by statistical and other methods, and to export a stream of reports + on the selected packets to a Collector. + + RFC 5475 (this document): "Sampling and Filtering Techniques for IP + Packet Selection" describes the set of packet selection techniques + supported by PSAMP. + + [RFC5476]: "Packet Sampling (PSAMP) Protocol Specifications" + specifies the export of packet information from a PSAMP Exporting + Process to a PSAMP Collecting Process. + + [RFC5477]: "Information Model for Packet Sampling Exports" defines an + information and data model for PSAMP. + +3. Terminology + + The PSAMP terminology defined here is fully consistent with all terms + listed in [RFC5474] but includes additional terms required for the + description of packet selection methods. An architecture overview + and possible configurations of PSAMP elements can be found in + [RFC5474]. PSAMP terminology also aims at consistency with terms + used in [RFC3917]. The relationship between PSAMP and IPFIX terms is + described in [RFC5474]. + + In the PSAMP documents, all defined PSAMP terms are written + capitalized. This document uses the same convention. + +3.1. Observation Points, Packet Streams, and Packet Content + + * Observation Point + + An Observation Point [RFC5101] is a location in the network where + packets can be observed. Examples include: + + (i) A line to which a probe is attached; + + + +Zseby, et al. Standards Track [Page 4] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + (ii) a shared medium, such as an Ethernet-based LAN; + + (iii) a single port of a router, or set of interfaces (physical + or logical) of a router; + + (iv) an embedded measurement subsystem within an interface. + + Note that one Observation Point may be a superset of several other + Observation Points. For example, one Observation Point can be an + entire line card. This would be the superset of the individual + Observation Points at the line card's interfaces. + + * Observed Packet Stream + + The Observed Packet Stream is the set of all packets observed at + the Observation Point. + + * Packet Stream + + A Packet Stream denotes a set of packets from the Observed Packet + Stream that flows past some specified point within the Metering + Process. An example of a Packet Stream is the output of the + selection process. Note that packets selected from a stream, + e.g., by Sampling, do not necessarily possess a property by which + they can be distinguished from packets that have not been + selected. For this reason, the term "stream" is favored over + "flow", which is defined as a set of packets with common + properties [RFC3917]. + + * Packet Content + + The Packet Content denotes the union of the packet header (which + includes link layer, network layer, and other encapsulation + headers) and the packet payload. At some Observation Points, the + link header information may not be available. + +3.2. Selection Process + + * Selection Process + + A Selection Process takes the Observed Packet Stream as its input + and selects a subset of that stream as its output. + + + + + + + + + +Zseby, et al. Standards Track [Page 5] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + * Selection State + + A Selection Process may maintain state information for use by the + Selection Process. At a given time, the Selection State may + depend on packets observed at and before that time, and other + variables. Examples include: + + (i) sequence numbers of packets at the input of Selectors; + + (ii) a timestamp of observation of the packet at the Observation + Point; + + (iii) iterators for pseudorandom number generators; + + (iv) hash values calculated during selection; + + (v) indicators of whether the packet was selected by a given + Selector. + + Selection Processes may change portions of the Selection State as + a result of processing a packet. Selection State for a packet is + to reflect the state after processing the packet. + + * Selector + + A Selector defines what kind of action a Selection Process + performs on a single packet of its input. If selected, the packet + becomes an element of the output Packet Stream. + + The Selector can make use of the following information in + determining whether a packet is selected: + + (i) the Packet Content; + + (ii) information derived from the packet's treatment at the + Observation Point; + + (iii) any Selection State that may be maintained by the Selection + Process. + + * Composite Selector + + A Composite Selector is an ordered composition of Selectors, in + which the output Packet Stream issuing from one Selector forms the + input Packet Stream to the succeeding Selector. + + + + + + +Zseby, et al. Standards Track [Page 6] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + * Primitive Selector + + A Selector is primitive if it is not a Composite Selector. + + * Selection Sequence + + From all the packets observed at an Observation Point, only a few + packets are selected by one or more Selectors. The Selection + Sequence is a unique value per Observation Domain describing the + Observation Point and the Selector IDs through which the packets + are selected. + +3.3. Reporting + + * Packet Reports + + Packet Reports comprise a configurable subset of a packet's input + to the Selection Process, including the Packet's Content, + information relating to its treatment (for example, the output + interface), and its associated Selection State (for example, a + hash of the Packet's Content). + + * Report Interpretation + + Report Interpretation comprises subsidiary information, relating + to one or more packets, that is used for interpretation of their + Packet Reports. Examples include configuration parameters of the + Selection Process. + + * Report Stream + + The Report Stream is the output of a Metering Process, comprising + two distinguished types of information: Packet Reports and Report + Interpretation. + +3.4. Metering Process + + A Metering Process selects packets from the Observed Packet Stream + using a Selection Process, and produces as output a Report Stream + concerning the selected packets. + + The PSAMP Metering Process can be viewed as analogous to the IPFIX + Metering Process [RFC5101], which produces Flow Records as its + output, with the difference that the PSAMP Metering Process always + contains a Selection Process. The relationship between PSAMP and + IPFIX is further described in [RFC5477] and [RFC5474]. + + + + + +Zseby, et al. Standards Track [Page 7] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + +3.5. Exporting Process + + * Exporting Process + + An Exporting Process sends, in the form of Export Packets, the + output of one or more Metering Processes to one or more + Collectors. + + * Export Packet + + An Export Packet is a combination of Report Interpretations and/or + one or more Packet Reports that are bundled by the Exporting + Process into an Export Packet for exporting to a Collector. + +3.6. PSAMP Device + + * PSAMP Device + + A PSAMP Device is a device hosting at least an Observation Point, + a Metering Process (which includes a Selection Process), and an + Exporting Process. Typically, corresponding Observation Point(s), + Metering Process(es), and Exporting Process(es) are colocated at + this device, for example, at a router. + +3.7. Collector + + * Collector + + A Collector receives a Report Stream exported by one or more + Exporting Processes. In some cases, the host of the Metering + and/or Exporting Processes may also serve as the Collector. + +3.8. Selection Methods + + * Filtering + + A filter is a Selector that selects a packet deterministically + based on the Packet Content, or its treatment, or functions of + these occurring in the Selection State. Two examples are: + + (i) Property Match Filtering: A packet is selected if a + specific field in the packet equals a predefined value. + + (ii) Hash-based Selection: A Hash Function is applied to the + Packet Content, and the packet is selected if the result + falls in a specified range. + + + + + +Zseby, et al. Standards Track [Page 8] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + * Sampling + + A Selector that is not a filter is called a Sampling operation. + This reflects the intuitive notion that if the selection of a + packet cannot be determined from its content alone, there must be + some type of Sampling taking place. Sampling operations can be + divided into two subtypes: + + (i) Content-independent Sampling, which does not use Packet + Content in reaching Sampling decisions. Examples include + systematic Sampling, and uniform pseudorandom Sampling + driven by a pseudorandom number whose generation is + independent of Packet Content. Note that in content- + independent Sampling, it is not necessary to access the + Packet Content in order to make the selection decision. + + (ii) Content-dependent Sampling, in which the Packet Content is + used in reaching selection decisions. An application is + pseudorandom selection according to a probability that + depends on the contents of a packet field, e.g., Sampling + packets with a probability dependent on their TCP/UDP port + numbers. Note that this is not a Filter. + + * Hash Domain + + A Hash Domain is a subset of the Packet Content and the packet + treatment, viewed as an N-bit string for some positive integer N. + + * Hash Range + + A Hash Range is a set of M-bit strings for some positive integer M + that defines the range of values that the result of the hash + operation can take. + + * Hash Function + + A Hash Function defines a deterministic mapping from the Hash + Domain into the Hash Range. + + * Hash Selection Range + + A Hash Selection Range is a subset of the Hash Range. The packet + is selected if the action of the Hash Function on the Hash Domain + for the packet yields a result in the Hash Selection Range. + + + + + + + +Zseby, et al. Standards Track [Page 9] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + * Hash-based Selection + + A Hash-based Selection is Filtering specified by a Hash Domain, a + Hash Function, a Hash Range, and a Hash Selection Range. + + * Approximative Selection + + Selectors in any of the above categories may be approximated by + operations in the same or another category for the purposes of + implementation. For example, uniform pseudorandom Sampling may be + approximated by Hash-based Selection, using a suitable Hash + Function and Hash Domain. In this case, the closeness of the + approximation depends on the choice of Hash Function and Hash + Domain. + + * Population + + A Population is a Packet Stream or a subset of a Packet Stream. A + Population can be considered as a base set from which packets are + selected. An example is all packets in the Observed Packet Stream + that are observed within some specified time interval. + + * Population Size + + The Population Size is the number of all packets in the + Population. + + * Sample Size + + The Sample Size is a number of packets selected from the + Population by a Selector. + + * Configured Selection Fraction + + The Configured Selection Fraction is the expected ratio of the + Sample Size to the Population Size, as based on the configured + selection parameters. + + * Attained Selection Fraction + + The Attained Selection Fraction is the ratio of the actual Sample + Size to the Population Size. For some Sampling methods, the + Attained Selection Fraction can differ from the Configured + Selection Fraction due to, for example, the inherent statistical + variability in Sampling decisions of probabilistic Sampling and + Hash-based Selection. Nevertheless, for large Population Sizes + and properly configured Selectors, the Attained Selection Fraction + usually approaches the Configured Selection Fraction. + + + +Zseby, et al. Standards Track [Page 10] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + +4. Categorization of Packet Selection Techniques + + Packet selection techniques generate a subset of packets from an + Observed Packet Stream at an Observation Point. We distinguish + between Sampling and Filtering. + + Sampling is targeted at the selection of a representative subset of + packets. The subset is used to infer knowledge about the whole set + of observed packets without processing them all. The selection can + depend on packet position, and/or on Packet Content, and/or on + (pseudo) random decisions. + + Filtering selects a subset with common properties. This is used if + only a subset of packets is of interest. The properties can be + directly derived from the Packet Content, or depend on the treatment + given by the router to the packet. Filtering is a deterministic + operation. It depends on Packet Content or router treatment. It + never depends on packet position or on (pseudo) random decisions. + + Note that a common technique to select packets is to compute a Hash + Function on some bits of the packet header and/or content and to + select it if the hash value falls in the Hash Selection Range. Since + hashing is a deterministic operation on the Packet Content, it is a + Filtering technique according to our categorization. Nevertheless, + Hash Functions are sometimes used to emulate random Sampling. + Depending on the chosen input bits, the Hash Function, and the Hash + Selection Range, this technique can be used to emulate the random + selection of packets with a given probability p. It is also a + powerful technique to consistently select the same packet subset at + multiple Observation Points [DuGr00]. + + The following table gives an overview of the schemes described in + this document and their categorization. X means that the + characteristic applies to the selection scheme. (X) denotes schemes + for which content-dependent and content-independent variants exist. + For instance, Property Match Filtering is typically based on Packet + Content and therefore is content dependent. But as explained in + Section 6.1, it may also depend on router state and then would be + independent of the content. It easily can be seen that only schemes + with both properties, content dependence and deterministic selection, + are considered as Filters. + + + + + + + + + + +Zseby, et al. Standards Track [Page 11] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + Selection Scheme | Deterministic | Content -| Category + | Selection | Dependent| + ------------------------+---------------+----------+---------- + Systematic | X | _ | Sampling + Count-based | | | + ------------------------+---------------+----------+---------- + Systematic | X | - | Sampling + Time-based | | | + ------------------------+---------------+----------+---------- + Random | - | - | Sampling + n-out-of-N | | | + ------------------------+---------------+----------+---------- + Random | - | - | Sampling + uniform probabilistic | | | + ------------------------+---------------+----------+---------- + Random | - | (X) | Sampling + non-uniform probabil. | | | + ------------------------+---------------+----------+---------- + Random | - | (X) | Sampling + non-uniform Flow-State | | | + ------------------------+---------------+----------+---------- + Property Match | X | (X) | Filtering + Filtering | | | + ------------------------+---------------+----------+---------- + Hash function | X | X | Filtering + ------------------------+---------------+----------+---------- + + The categorization just introduced is mainly useful for the + definition of an information model describing Primitive Selectors. + More complex selection techniques can be described through the + composition of cascaded Sampling and Filtering operations. For + example, a packet selection that weights the selection probability on + the basis of the packet length can be described as a cascade of a + Filtering and a Sampling scheme. However, this descriptive approach + is not intended to be rigid: if a common and consolidated selection + practice turns out to be too complex to be described as a composition + of the mentioned building blocks, an ad hoc description can be + specified instead and added as a new scheme to the information model. + +5. Sampling + + The deployment of Sampling techniques aims at the provisioning of + information about a specific characteristic of the parent Population + at a lower cost than a full census would demand. In order to plan a + suitable Sampling strategy, it is therefore crucial to determine the + needed type of information and the desired degree of accuracy in + advance. + + + + +Zseby, et al. Standards Track [Page 12] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + First of all, it is important to know the type of metric that should + be estimated. The metric of interest can range from simple packet + counts [JePP92] up to the estimation of whole distributions of flow + characteristics (e.g., packet sizes) [ClPB93]. + + Second, the required accuracy of the information and with this, the + confidence that is aimed at, should be known in advance. For + instance, for usage-based accounting the required confidence for the + estimation of packet counters can depend on the monetary value that + corresponds to the transfer of one packet. That means that a higher + confidence could be required for expensive packet flows (e.g., + premium IP service) than for cheaper flows (e.g., best effort). The + accuracy requirements for validating a previously agreed quality can + also vary extremely with the customer demands. These requirements + are usually determined by the service level agreement (SLA). + + The Sampling method and the parameters in use must be clearly + communicated to all applications that use the measurement data. Only + with this knowledge a correct interpretation of the measurement + results can be ensured. + + Sampling methods can be characterized by the Sampling algorithm, the + trigger type used for starting a Sampling interval, and the length of + the Sampling interval. These parameters are described here in + detail. The Sampling algorithm describes the basic process for + selection of samples. In accordance to [AmCa89] and [ClPB93], we + define the following basic Sampling processes. + +5.1. Systematic Sampling + + Systematic Sampling describes the process of selecting the start + points and the duration of the selection intervals according to a + deterministic function. This can be for instance the periodic + selection of every k-th element of a trace but also the selection of + all packets that arrive at predefined points in time. Even if the + selection process does not follow a periodic function (e.g., if the + time between the Sampling intervals varies over time), we consider + this as systematic Sampling as long as the selection is + deterministic. + + The use of systematic Sampling always involves the risk of biasing + the results. If the systematics in the Sampling process resemble + systematics in the observed stochastic process (occurrence of the + characteristic of interest in the network), there is a high + probability that the estimation will be biased. Systematics in the + observed process might not be known in advance. + + + + + +Zseby, et al. Standards Track [Page 13] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + Here only equally spaced schemes are considered, where triggers for + Sampling are periodic, either in time or in packet count. All + packets occurring in a selection interval (either in time or packet + count) beyond the trigger are selected. + + Systematic count-based + In systematic count-based Sampling, the start and stop triggers for + the Sampling interval are defined in accordance to the spatial packet + position (packet count). + + Systematic time-based + In systematic time-based Sampling, time-based start and stop triggers + are used to define the Sampling intervals. All packets are selected + that arrive at the Observation Point within the time intervals + defined by the start and stop triggers (i.e., arrival time of the + packet is larger than the start time and smaller than the stop time). + + Both schemes are content-independent selection schemes. Content- + dependent deterministic Selectors are categorized as filters. + +5.2. Random Sampling + + Random Sampling selects the starting points of the Sampling intervals + in accordance to a random process. The selection of elements is an + independent experiment. With this, unbiased estimations can be + achieved. In contrast to systematic Sampling, random Sampling + requires the generation of random numbers. One can differentiate two + methods of random Sampling: n-out-of-N Sampling and probabilistic + Sampling. + +5.2.1. n-out-of-N Sampling + + In n-out-of-N Sampling, n elements are selected out of the parent + Population that consists of N elements. One example would be to + generate n different random numbers in the range [1,N] and select all + packets that have a packet position equal to one of the random + numbers. For this kind of Sampling, the Sample Size n is fixed. + +5.2.2. Probabilistic Sampling + + In probabilistic Sampling, the decision whether or not an element is + selected is made in accordance to a predefined selection probability. + An example would be to flip a coin for each packet and select all + packets for which the coin showed the head. For this kind of + Sampling, the Sample Size can vary for different trials. The + selection probability does not necessarily have to be the same for + each packet. Therefore, we distinguish between uniform probabilistic + Sampling (with the same selection probability for all packets) and + + + +Zseby, et al. Standards Track [Page 14] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + non-uniform probabilistic Sampling (where the selection probability + can vary for different packets). + +5.2.2.1. Uniform Probabilistic Sampling + + For Uniform Probabilistic Sampling, packets are selected + independently with a uniform probability p. This Sampling can be + count-driven, and is sometimes referred to as geometric random + Sampling, since the difference in count between successive selected + packets is an independent random variable with a geometric + distribution of mean 1/p. A time-driven analog, exponential random + Sampling, has the time between triggers exponentially distributed. + + Both geometric and exponential random Sampling are examples of what + is known as additive random Sampling, defined as Sampling where the + intervals or counts between successive samples are independent + identically distributed random variables. + +5.2.2.2. Non-Uniform Probabilistic Sampling + + This is a variant of Probabilistic Sampling in which the Sampling + probabilities can depend on the selection process input. This can be + used to weight Sampling probabilities in order, e.g., to boost the + chance of Sampling packets that are rare but are deemed important. + Unbiased estimators for quantitative statistics are recovered by + re-normalization of sample values; see [HT52]. + +5.2.2.3. Non-Uniform Flow State Dependent Sampling + + Another type of Sampling that can be classified as probabilistic + Non-Uniform is closely related to the flow concept as defined in + [RFC3917], and it is only used jointly with a flow monitoring + function (IPFIX Metering Process). Packets are selected, dependent + on a Selection State. The point, here, is that the Selection State + is determined also by the state of the flow the packet belongs to + and/or by the state of the other flows currently being monitored by + the associated flow monitoring function. An example for such an + algorithm is the "sample and hold" method described in [EsVa01]: + + - If a packet accounts for a Flow Record that already exists in the + IPFIX flow recording process, it is selected (i.e., the Flow Record + is updated). + + - If a packet doesn't account for any existing Flow Record, it is + selected with probability p. If it has been selected, a new Flow + Record has to be created. + + + + + +Zseby, et al. Standards Track [Page 15] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + A further algorithm that fits into the category of non-uniform flow + state dependent Sampling is described in [Moli03]. + + This type of Sampling is content dependent because the identification + of the flow the packet belongs to requires analyzing part of the + Packet Content. If the packet is selected, then it is passed as an + input to the IPFIX monitoring function (this is called "Local Export" + in [RFC5474]). Selecting the packet depending on the state of a flow + cache is useful when memory resources of the flow monitoring function + are scarce (i.e., there is no room to keep all the flows that have + been scheduled for monitoring). + +5.2.2.4. Configuration of Non-Uniform Probabilistic and Flow State + Sampling + + Many different specific methods can be grouped under the terms + non-uniform probabilistic and flow state Sampling. Dependent on the + Sampling goal and the implemented scheme, a different number and type + of input parameters are required to configure such a scheme. + + Some concrete proposals for such methods exist from the research + community (e.g., [EsVa01], [DuLT01], [Moli03]). Some of these + proposals are still in an early stage and need further investigations + to prove their usefulness and applicability. It is not our aim to + indicate preference among these methods. Instead, we only describe + here the basic methods and leave the specification of explicit + schemes and their parameters up to vendors (e.g., as an extension of + the information model). + +6. Filtering + + Filtering is the deterministic selection of packets based on the + Packet Content, the treatment of the packet at the Observation Point, + or deterministic functions of these occurring in the Selection State. + The packet is selected if these quantities fall into a specified + range. The role of Filtering, as the word itself suggest, is to + separate all the packets having a certain property from those not + having it. A distinguishing characteristic from Sampling is that the + selection decision does not depend on the packet position in time or + in space, or on a random process. + + We identify and describe in the following two Filtering techniques. + +6.1. Property Match Filtering + + With this Filtering method, a packet is selected if specific fields + within the packet and/or properties of the router state equal a + predefined value. Possible filter fields are all IPFIX flow + + + +Zseby, et al. Standards Track [Page 16] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + attributes specified in [RFC5102]. Further fields can be defined by + proposing new information elements or defining vendor-specific + extensions. + + A packet is selected if Field=Value. Masks and ranges are only + supported to the extent to which [RFC5102] allows them, e.g., by + providing explicit fields like the netmasks for source and + destination addresses. + + AND operations are possible by concatenating filters, thus producing + a composite selection operation. In this case, the ordering in which + the Filtering happens is implicitly defined (outer filters come after + inner filters). However, as long as the concatenation is on filters + only, the result of the cascaded filter is independent from the + order, but the order may be important for implementation purposes, as + the first filter will have to work at a higher rate. In any case, an + implementation is not constrained to respect the filter ordering, as + long as the result is the same, and it may even implement the + composite Filtering in one single step. + + OR operations are not supported with this basic model. More + sophisticated filters (e.g., supporting bitmasks, ranges, or OR + operations) can be realized as vendor-specific schemes. + + All IPFIX flow attributes defined in [RFC5102] can be used for + Property Match Filtering. Further information elements can be easily + defined. Property match operations should be available for different + protocol portions of the packet header: + + (i) IP header (excluding options in IPv4, stacked headers in + IPv6) + + (ii) transport protocol header (e.g., TCP, UDP) + + (iii) encapsulation headers (e.g., the MPLS label stack, if + present) + + When the PSAMP Device offers Property Match Filtering, and, in its + usual capacity other than in performing PSAMP functions, identifies + or processes information from IP, transport protocol or encapsulation + protocols, then the information should be made available for + Filtering. For example, when a PSAMP Device routes based on + destination IP address, that field should be made available for + Filtering. Conversely, a PSAMP Device that does not route is not + expected to be able to locate an IP address within a packet, or make + it available for Filtering, although it may do so. + + + + + +Zseby, et al. Standards Track [Page 17] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + Since packet encryption conceals the real values of encrypted fields, + Property Match Filtering must be configurable to ignore encrypted + packets, when detected. + + The Selection Process may support Filtering based on the properties + of the router state: + + (i) Ingress interface at which a packet arrives equals a + specified value + + (ii) Egress interface to which a packet is routed to equals a + specified value + + (iii) Packet violated Access Control List (ACL) on the router + + (iv) Failed Reverse Path Forwarding (RPF) + + (v) Failed Resource Reservation Protocol (RSVP) + + (vi) No route found for the packet + + (vii) Origin Border Gateway Protocol (BGP) Autonomous System (AS) + [RFC4271] equals a specified value or lies within a given + range + + (viii) Destination BGP AS equals a specified value or lies within + a given range + + Packets that match the failed Reverse Path Forwarding (RPF) condition + are packets for which ingress Filtering failed as defined in + [RFC3704]. + + Packets that match the failed Resource Reservation Protocol (RSVP) + condition are packets that do not fulfill the RSVP specification as + defined in [RFC2205]. + + Router architectural considerations may preclude some information + concerning the packet treatment being available at line rate for + selection of packets. For example, the Selection Process may not be + implemented in the fast path that is able to access router state at + line rate. However, when Filtering follows Sampling (or some other + selection operation) in a Composite Selector, the rate of the Packet + Stream output from the sampler and input to the filter may be + sufficiently slow that the filter could select based on router state. + + + + + + + +Zseby, et al. Standards Track [Page 18] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + +6.2. Hash-Based Filtering + + A Hash Function h maps the Packet Content c, or some portion of it, + onto a Hash Range R. The packet is selected if h(c) is an element of + S, which is a subset of R called the Hash Selection Range. Thus, + Hash-Based selection is a particular case of Filtering. The object + is selected if c is in inv(h(S)). But for desirable Hash Functions, + the inverse image inv(h(S)) will be extremely complex, and hence h + would not be expressible as, say, a Property Match Filter or a simple + combination of these. + + Hash-based Selection is mainly used to realize a coordinated packet + selection. That means that the same packets are selected at + different Observation Points. This is useful for instance to observe + the path (trajectory) that a packet took through the network or to + apply packet selection to passive one-way measurements. + + A prerequisite for the method to work and to ensure interoperability + is that the same Hash Function with the same parameters (e.g., input + vector) is used at the Observation Points. + + A consistent packet selection is also possible with Property Match + Filtering. Nevertheless, Hash-based Selection can be used to + approximate a random selection. The desired statistical properties + are discussed in Section 6.2.2. + + In the following subsections, we give some application examples for + coordinated packet selection. + +6.2.1. Application Examples for Coordinated Packet Selection + +6.2.1.1. Trajectory Sampling + + Trajectory Sampling is the consistent selection of a subset of + packets at either all of a set of Observation Points or none of them. + Trajectory Sampling is realized by Hash-based Selection if all + Observation Points in the set use a common Hash Function, Hash + Domain, and Selection Range. The Hash Domain comprises all or part + of the Packet Content that is invariant along the packet path. + Fields such as Time-to-Live, which is decremented per hop, and header + CRC [RFC1624], which is recalculated per hop, are thus excluded from + the Hash Domain. The Hash Domain needs to be wider than just a flow + key, if packets are to be selected quasi-randomly within flows. + + The trajectory (or path) followed by a packet is reconstructed from + PSAMP reports on it that reach a Collector. Reports on a given + packet originating from different Observation Points are associated + by matching a label from the reports. The label may comprise that + + + +Zseby, et al. Standards Track [Page 19] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + portion of the invariant Packet Content that is reported, or possibly + some digest of the invariant Packet Content that is inserted into the + packet report at the Observation Point. Such a digest may be + constructed by applying a second Hash Function to the invariant + Packet Content. The reconstruction of trajectories and methods for + dealing with possible ambiguities due to label collisions (identical + labels reported for different packets) and potential loss of reports + in transmission are dealt with in [DuGr00], [DuGG02], and [DuGr04]. + + Applications of trajectory Sampling include (i) estimation of the + network path matrix, i.e., the traffic intensities according to + network path, broken down by flow key; (ii) detection of routing + loops, as indicated by self-intersecting trajectories; (iii) passive + performance measurement: prematurely terminating trajectories + indicate packet loss, packet one-way delay can be determined if + reports include (synchronized) timestamps of packet arrival at the + Observation Point; and (iv) network attack tracing, of the actual + paths taken by attack packets with spoofed source addresses. + +6.2.1.2. Passive One-Way Measurements + + Coordinated packet selection can be applied for instance to one-way + delay measurements in order to reduce the required resources. In + one-way delay measurements, packets are collected at different + Observation Points in the network. A packet digest is generated for + each packet that helps to identify the packet. The packet digest and + the arrival time of the packet at the Observation Point are reported + to a process that calculates the delay. The delay is calculated by + subtracting the arrival time of the same packet at the Observation + Points (e.g., [ZsZC01]). With high data rates, capturing all packets + can require a lot of resources for storage, transfer, and processing. + To reduce resource consumption, packet selection methods can be + applied. But for such selection techniques, it has to be ensured + that the same packets are collected at different Observation Points. + Hash-based Selection provides this feature. + +6.2.1.3. Generation of Pseudorandom Numbers + + Although pseudorandom number generators with well-understood + properties have been developed, they may not be the method of choice + in settings where computational resources are scarce. A convenient + alternative is to use Hash Functions of Packet Content as a source of + randomness. The hash (suitably re-normalized) is a pseudorandom + variate in the interval [0,1]. Other schemes may use packet fields + in iterators for pseudorandom numbers. However, the statistical + properties of an ideal packet selection law (such as independent + + + + + +Zseby, et al. Standards Track [Page 20] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + Sampling for different packets, or independence on Packet Content) + may not be exactly rendered by an implementation, but only + approximately so. + + Use of Packet Content to generate pseudorandom variates shares with + non-uniform probabilistic Sampling (see Section 5.2.2.2 above) the + property that selection decisions depend on Packet Content. However, + there is a fundamental difference between the two. In the former + case, the content determines pseudorandom variates. In the latter + case, the content only determines the selection probabilities: + selection could then proceed, e.g., by use of random variates + obtained by an independent pseudorandom number generator. + +6.2.2. Desired Properties of Hash Functions + + Here we formulate desired properties for Hash Functions. For this, + we have to distinguish whether a Hash Function is used for packet + selection or just as a packet digest. The main focus of this + document is on packet selection. Nevertheless, we also provide some + requirements for the use of Hash Functions as packet digest. + + First of all, we need to define suitable input fields from the + packet. In accordance to [DuGr00], input field should be: + + - invariant on the path + - variable among packets + + Only if the input fields are the same at different Observation Points + is it possible to recognize the packet. The input fields should be + variable among packets in order to distribute the hash results over + the selection range. + +6.2.2.1. Requirements for Packet Selection + + In accordance to considerations in [MoND05] and [Henk08], we define + the following desired properties of Hash Functions used for packet + selection: + + (i) Speed: The Hash Function has to be applied to each packet + that traverses the Observation Point. Therefore, it has to + be fast in order to cope with the high packet rates. In + the ideal case, the hash operation should not influence the + performance on the PSAMP Device. + + + + + + + + +Zseby, et al. Standards Track [Page 21] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + (ii) Uniformity: The Hash Function h should have good mixing + properties, in the sense that small changes in the input + (e.g., the flipping of a single bit) cause large changes in + the output (many bits change). Then any local clump of + values of c is spread widely over R by h, and so the + distribution of h(c) is fairly uniform even if the + distribution of c is not. Then the Attained Selection + Fraction gets close to the Configured Selection Fraction + (#S/#R), which can be tuned by choice of S. + + (iii) Unbiasedness: The selection decision should be as + independent of packet attributes as possible. The set of + selected packets should not be biased towards a specific + type of packets. + + (iv) Representativeness of sample: The sample should be as + representative as possible for the observed traffic. + + (v) Non-linearity: The function should not be linear. This + increases the mixing properties (uniformity criterion). In + addition to this, it decreases the predictability of the + output and therefore the vulnerabilities against attacks. + + (vi) Robustness against vulnerabilities: The Hash Function + should be robust against attacks. Potential + vulnerabilities are described in Section 6.2.3. + +6.2.2.2. Requirements for Packet Digesting + + For digesting Packet Content for inclusion in a reported label, the + most important property is a low collision frequency. A secondary + requirement is the ability to accept variable-length input, in order + to allow inclusion of maximal amount of packet as input. Execution + speed is of secondary importance, since the digest need only be + formed from selected packets. + +6.2.3. Security Considerations for Hash Functions + + A concern for Hash-based Selection is whether some large set of + related packets could be disproportionately sampled, i.e., that the + Attained Selection Fraction is significantly different from the + Configured Selection Fraction. This can happen either + + (i) through unanticipated behavior in the Hash Function, or + + (ii) because the packets had been deliberately crafted to have + this property. + + + + +Zseby, et al. Standards Track [Page 22] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + The first point underlines the importance of using a Hash Function + with good mixing properties. For this, the statistical properties of + candidate Hash Functions need to be evaluated. Since the hash output + depends on the traffic mix, the evaluation should be done preferably + on up-to-date packet traces from the network in which the Hash-based + Selection will be deployed. + + However, Hash Functions that perform well on typical traffic may not + be sufficiently strong to withstand attacks specifically targeted + against them. Such potential attacks have been described in + [GoRe07]. + + In the following subsections, we point out different potential attack + scenarios. We encourage the use of standardized Hash Functions. + Therefore, we assume that the Hash Function itself is public and + hence known to an attacker. + + Nevertheless, we also assume the possibility of using a private input + parameter for the Hash Function that is kept secret. Such an input + parameter can for instance be attached to the hash input before the + hash operation is applied. With this, at least parts of the hash + operation remain secret. + + For the attack scenarios, we assume that an attacker uses its + knowledge of the Hash Function to craft packets that are then + dispatched, either as the attack itself or to elicit further + information that can be used to refine the attack. + + Two scenarios are considered. In the first scenario, the attacker + has no knowledge about whether or not the crafted packets are + selected. In the second scenario, the attacker uses some knowledge + of Sampling outcomes. The means by which this might be acquired is + discussed below. Some additional attacks that involve tampering with + Export Packets in transit, as opposed to attacking the PSAMP Device, + are discussed in [GoRe07]. + +6.2.3.1. Vulnerabilities of Hash-Based Selection without Knowledge of + Selection Outcomes + + (i) The Hash Function does not use a private parameter. + + If no private input parameter is used, potential attackers can easily + calculate which packets result in which hash values. If the + selection range is public, an attacker can craft packets whose + selection properties are known in advance. If the selection range is + private, an attacker cannot determine whether a crafted packet is + selected. However, by computing the hash on different trial crafted + packets, and selecting those yielding a given hash value, the + + + +Zseby, et al. Standards Track [Page 23] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + attacker can construct an arbitrarily large set of distinct packets + with a common selection properties, i.e., packets that will be either + all selected or all not selected. This can be done whatever the + strength of the Hash Function. + + (ii) The Hash Function is not cryptographically strong. + + If the Hash Function is not cryptographically strong, it may be + possible to construct sequences of distinct packets with the common + selection property even if a private parameter is used. + + An example is the standard CRC-32 Hash Function used with a private + modulus (but without a private string post-pended to the input). It + has weak mixing properties for low-order bits. Consequently, simply + by incrementing the hash input, one obtains distinct packets whose + hashes mostly fall in a narrow range, and hence are likely commonly + selected; see [GoRe07]. + + Suitable parameterization of the Hash Function can make such attacks + more difficult. For example, post-pending a private string to the + input before hashing with CRC-32 will give stronger mixing properties + over all bits of the input. However, with a Hash Function, such as + CRC-32, that is not cryptographically strong, the possibility of + discovering a method to construct packet sets with the common + selected property cannot be ruled out, even when a private modulus or + post-pended string is used. + +6.2.3.2. Vulnerabilities of Hash-Based Selection Using Knowledge of + Selection Outcomes + + Knowledge of the selection outcomes of crafted packets can be used by + an attacker to more easily construct sets of packets that are + disproportionately sampled and/or are commonly selected. For this, + the attacker does not need any a priori knowledge about the Hash + Function or selection range. + + There are several ways an attacker might acquire this knowledge about + the selection outcome: + + (i) Billing Reports: If samples are used for billing purposes, + then the selection outcomes of packets may be able to be + inferred by correlating a crafted Packet Stream with the + billing reports that it generates. However, the rate at + which knowledge of selection outcomes can be acquired + depends on the temporal and spatial granularity of the + billing reports; being slower the more aggregated the + reports are. + + + + +Zseby, et al. Standards Track [Page 24] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + (ii) Feedback from an Intrusion Detection System: e.g., a + botmaster adversary learns if his packets were detected by + the intrusion detection system by seeing if one of his bots + is blocked by the network. + + (iii) Observation of the Report Stream: Export Packets sent + across a public network may be eavesdropped on by an + adversary. Encryption of the Export Packets provides only + a partial defense, since it may be possible to infer the + selection outcomes of packets by correlating a crafted + Packet Stream with the occurrence (not the content) of + packets in the export stream that it generates. The rate + at which such knowledge could be acquired is limited by the + temporal resolution at which reports can be associated with + packets, e.g., due to processing and propagation + variability, and difficulty in distinguishing report on + attack packets from those of background traffic, if + present. The association between packets and their reports + on which this depends could be removed by padding Export + Packets to a constant length and sending them at a constant + rate. + + We now turn to attacks that can exploit knowledge of selection + outcomes. First, with a non-cryptographic Hash Function, knowledge + of selection outcomes for a trial stream may be used to further craft + a packet set with the common selection property. This has been + demonstrated for the modular hash f(x) = a x + b mod k, for private + parameters a, b, and k. With Sampling rate p, knowledge of the + Sampling outcomes of roughly 2/p is sufficient for the attack to + succeed, independent of the values of a, b, and k. With knowledge of + the selection outcomes of a larger number of packets, the parameters + a, b, and k can be determined; see [GoRe07]. + + A cryptographic Hash Function employing a private parameter and + operating in one of the pseudorandom function modes specified above + is not vulnerable to these attacks, even if the selection range is + known. + +6.2.3.3. Vulnerabilities to Replay Attacks + + Since Hash-based Selection is deterministic, any packet or set of + packets with known selection properties can be replayed into a + network and experience the same selection outcomes provide the Hash + Function and its parameters are not changed. Repetition of a single + packet may be noticeable to other measurement methods if employed + (e.g., collection of flow statistics), whereas a set of distinct + packets that appears statistically similar to regular traffic may be + less noticeable. + + + +Zseby, et al. Standards Track [Page 25] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + Replay attacks may be mitigated by repeated changing of Hash Function + parameters. This also prevents attacks that exploit knowledge of + Sampling outcomes, at least if the parameters are changed at least as + fast as the knowledge can be acquired by an attacker. In order to + preserve the ability to perform trajectory Sampling, parameter change + would have to be simultaneous (or approximately so) across all + Observation Points. + +6.2.4. Choice of Hash Function + + The specific choice of Hash Function represents a trade-off between + complexity and ease of implementation. Ideally, a cryptographically + strong Hash Function employing a private parameter and operating in + pseudorandom function mode as specified above would be used, yielding + a good emulation of a random packet selection at a target Sampling + rate, and giving maximal robustness against the attacks described in + the previous section. Unfortunately, there is currently no single + Hash Function that fulfills all the requirements. + + As detailed in Section 6.2.3, only cryptographic Hash Functions + employing a private parameter operating in pseudorandom function mode + are sufficiently strong to withstand the range of conceivable + attacks. For example, fixed- or variable-length inputs could be + hashed using a block cipher (like Advanced Encryption Standard (AES)) + in cipher-block-chaining mode. Fixed-length inputs could also be + hashed using an iterated cryptographic Hash Function (like MD5 or + SHA1), with a private initial vector. For variable-length inputs, an + iterated cryptographic Hash Function (like MD5 or SHA1) should employ + private string post-pended to the data in addition to a private + initial vector. For more details, see the "append-cascade" + construction of [BeCK96]. We encourage the use of such + cryptographically strong Hash Functions wherever possible. + + However, a problem with using such functions is the low performance. + As shown for instance in [Henk08], the computation times for MD5 and + SHA are about 7-10 times higher compared to non-cryptographic + functions. The difference increases for small hash input lengths. + + Therefore, it is not assumed that all PSAMP Devices will be capable + of applying a cryptographically strong Hash Function to every packet + at line rate. For this reason, the Hash Functions listed in this + section will be of a weaker variety. Future protocol extensions that + employ stronger Hash Functions are highly welcome. + + Comparisons of Hash Functions for packet selection and packet + digesting with regard to various criteria can be found in [MoND05] + and [Henk08]. + + + + +Zseby, et al. Standards Track [Page 26] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + +6.2.4.1. Hash Functions for Packet Selection + + If Hash-based packet Selection is applied, the BOB function MUST be + used for packet selection operations in order to be compliant with + PSAMP. The specification of BOB is given in the appendix. Both the + parameter (the init value) and the selection range should be kept + private. The initial vector of the Hash Function MUST be + configurable out of band to prevent security breaches like exposure + of the initial vector content. + + Other functions, such as CRC-32 and IPSX, MAY be used. The IPSX + function is described in the appendix, and the CRC-32 function is + described in [RFC1141]. If CRC-32 is used, the input should first be + post-pended with a private string that acts as a parameter, and the + modulus of the CRC should also be kept private. + + IPSX is simple to implement and was correspondingly about an order of + magnitude faster to execute per packet than BOB or CRC-32 [MoND05]. + + All three Hash Functions evaluated showed relatively poor uniformity + with 16-byte input that was drawn from only invariant fields in the + IP and TCP/UDP headers (i.e., header fields that do not change from + hop to hop). IPSX is inherently limited to 16 bytes. + + BOB and CRC-32 exhibit noticeably better uniformity when 4 or more + bytes from the payload are also included in the input [MoND05]. Also + with other criteria BOB performed quite well [Henk08]. + + Although the characteristics have been checked for different traffic + traces, results cannot be generalized to arbitrary traffic. Since + Hash-based Selection is a deterministic function on the Packet + Content, it can always be biased towards packets with specific + attributes. Furthermore, it should be noted that all Hash Functions + were evaluated only for IPv4. + + None of these Hash Functions is recommended for cryptographic + purposes. Please also note that the use of a private parameter only + slightly reduces the vulnerabilities against attacks. As shown in + Section 6.2.3, functions that are not cryptographically strong (e.g., + BOB and CRC) cannot prevent attackers from crafting packets that are + disproportionally selected even if a private parameter is used and + the selection range is kept secret. + + As described in Section 6.2.2, the input bytes for the Hash Function + need to be invariant along the path the packet is traveling. Only + with this it is ensured that the same packets are selected at + + + + + +Zseby, et al. Standards Track [Page 27] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + different Observation Points. Furthermore, they should have a high + variability between different packets to generate a high variation in + the Hash Range. An evaluation of the variability of different packet + header fields can be found in [DuGr00], [HeSZ08], and [Henk08]. + + If a Hash-based Selection with the BOB function is used with IPv4 + traffic, the following input bytes MUST be used. + + - IP identification field + + - Flags field + + - Fragment offset + + - Source IP address + + - Destination IP address + + - A configurable number of bytes from the IP payload, starting at + a configurable offset + + Due to the lack of suitable IPv6 packet traces, all candidate Hash + Functions in [DuGr00], [MoND05], and [Henk08] were evaluated only for + IPv4. Due to the IPv6 header fields and address structure, it is + expected that there is less randomness in IPv6 packet headers than in + IPv4 headers. Nevertheless, the randomness of IPv6 traffic has not + yet been evaluated sufficiently to get any evidence. In addition to + this, IPv6 traffic profiles may change significantly in the future + when IPv6 is used by a broader community. + + If a Hash-based Selection with the BOB function is used with IPv6 + traffic, the following input bytes MUST be used. + + - Payload length (2 bytes) + + - Byte number 10,11,14,15,16 of the IPv6 source address + + - Byte number 10,11,14,15,16 of the IPv6 destination address + + - A configurable number of bytes from the IP payload, starting at + a configurable offset. It is recommended to use at least 4 + bytes from the IP payload. + + The payload itself is not changing during the path. Even if some + routers process some extension headers, they are not going to strip + them from the packet. Therefore, the payload length is invariant + along the path. Furthermore, it usually differs for different + packets. The IPv6 address has 16 bytes. The first part is the + + + +Zseby, et al. Standards Track [Page 28] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + network part and contains low variation. The second part is the host + part and contains higher variation. Therefore, the second part of + the address is used. Nevertheless, the uniformity has not been + checked for IPv6 traffic. + +6.2.4.2. Hash Functions Suitable for Packet Digesting + + For this purpose also the BOB function SHOULD be used. Other + functions (such as CRC-32) MAY be used. Among the functions capable + of operating with variable-length input, BOB and CRC-32 have the + fastest execution, BOB being slightly faster. IPSX is not + recommended for digesting because it has a significantly higher + collision rate and takes only a fixed-length input. + +7. Parameters for the Description of Selection Techniques + + This section gives an overview of different alternative selection + schemes and their required parameters. In order to be compliant with + PSAMP, at least one of proposed schemes MUST be implemented. + + The decision whether or not to select a packet is based on a function + that is performed when the packet arrives at the selection process. + Packet selection schemes differ in the input parameters for the + selection process and the functions they require to do the packet + selection. The following table gives an overview. + + + + + + + + + + + + + + + + + + + + + + + + + + +Zseby, et al. Standards Track [Page 29] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + Scheme | Input parameters | Functions + ---------------+------------------------+------------------- + systematic | packet position | packet counter + count-based | Sampling pattern | + ---------------+------------------------+------------------- + systematic | arrival time | clock or timer + time-based | Sampling pattern | + ---------------+------------------------+------------------- + random | packet position | packet counter, + n-out-of-N | Sampling pattern | random numbers + | (random number list) | + ---------------+------------------------+------------------- + uniform | Sampling | random function + probabilistic | probability | + ---------------+------------------------+------------------- + non-uniform |e.g., packet position, | selection function, + probabilistic | Packet Content(parts) | probability calc. + ---------------+------------------------+------------------- + non-uniform |e.g., flow state, | selection function, + flow-state | Packet Content(parts) | probability calc. + ---------------+------------------------+------------------- + property | Packet Content(parts) | filter function or + match | or router state | state discovery + ---------------+------------------------+------------------- + hash-based | Packet Content(parts) | Hash Function + ---------------+------------------------+------------------- + +7.1. Description of Sampling Techniques + + In this section, we define what elements are needed to describe the + most common Sampling techniques. Here the selection function is + predefined and given by the Selector ID. + + Sampler Description: + SELECTOR_ID + SELECTOR_TYPE + SELECTOR_PARAMETERS + + Where: + + SELECTOR_ID: + Unique ID for the packet sampler. + + + + + + + + + +Zseby, et al. Standards Track [Page 30] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + SELECTOR_TYPE: + For Sampling processes, the SELECTOR TYPE defines what Sampling + algorithm is used. + Values: Systematic count-based | Systematic time-based | Random + |n-out-of-N | uniform probabilistic | Non-uniform probabilistic | + Non-uniform flow state + + SELECTOR_PARAMETERS: + For Sampling processes, the SELECTOR PARAMETERS define the input + parameters for the process. Interval length in systematic Sampling + means that all packets that arrive in this interval are selected. + The spacing parameter defines the spacing in time or number of + packets between the end of one Sampling interval and the start of the + next succeeding interval. + + Case n-out-of-N: + - Population Size N, Sample size n + + Case systematic time-based: + - Interval length (in usec), Spacing (in usec) + + Case systematic count-based: + - Interval length (in packets), Spacing (in packets) + + Case uniform probabilistic (with equal probability per packet): + - Sampling probability p + + Case non-uniform probabilistic: + - Calculation function for Sampling probability p (see also + Section 5.2.2.4) + + Case flow state: + - Information reported for flow state Sampling is not defined in + this document (see also Section 5.2.2.4) + +7.2. Description of Filtering Techniques + + In this section, we define what elements are needed to describe the + most common Filtering techniques. The structure closely parallels + the one presented for the Sampling techniques. + + Filter Description: + SELECTOR_ID + SELECTOR_TYPE + SELECTOR_PARAMETERS + + + + + + +Zseby, et al. Standards Track [Page 31] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + Where: + + SELECTOR_ID: + Unique ID for the packet filter. The ID can be calculated under + consideration of the SELECTION SEQUENCE and a local ID. + + SELECTOR_TYPE: + For Filtering processes, the SELECTOR TYPE defines what Filtering + type is used. + Values: Matching | Hashing | Router_state + + SELECTOR_PARAMETERS: + For Filtering processes, the SELECTOR PARAMETERS define formally the + common property of the packet being filtered. For the filters of + type matching and hashing, the definitions have a lot of points in + common. + + Values: + + Case matching: + - Information Element (from [RFC5102]) + - Value (type in accordance to [RFC5102]) + + In case of multiple match criteria, multiple "case matching" has to + be bound by a logical AND. + + Case hashing: + - Hash Domain (input bits from packet) + - <Header type = IPv4> + - <Input bit specification, header part> + - <Header type = IPv6> + - <Input bit specification, header part> + - <payload byte number N> + - <Input bit specification, payload part> + - Hash Function + - Hash Function name + - Length of input key (eliminate 0x bytes) + - Output value (length M and bitmask) + - Hash Selection Range, as a list of non-overlapping + intervals [start value, end value] where value is in + [0,2^M-1] + - Additional parameters are dependent on specific Hash + Function (e.g., hash input bits (seed)) + + Notes to input bits for case hashing: + + - Input bits can be from header part only, from the payload part + only, or from both. + + + +Zseby, et al. Standards Track [Page 32] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + - The bit specification, for the header part, can be specified for + IPv4 or IPv6 only, or both. + + - In case of IPv4, the bit specification is a sequence of 20 + hexadecimal numbers [00,FF] specifying a 20-byte bitmask to be + applied to the header. + + - In case of IPv6, it is a sequence of 40 hexadecimal numbers [00,FF] + specifying a 40-byte bitmask to be applied to the header. + + - The bit specification, for the payload part, is a sequence of + hexadecimal numbers [00,FF] specifying the bitmask to be applied to + the first N bytes of the payload, as specified by the previous + field. In case the hexadecimal number sequence is longer than N, + only the first N numbers are considered. + + - In case the payload is shorter than N, the Hash Function cannot be + applied. Other options, like padding with zeros, may be considered + in the future. + + - A Hash Function cannot be defined on the options field of the IPv4 + header, neither on stacked headers of IPv6. + + - The Hash Selection Range defines a range of hash values (out of all + possible results of the hash operation). If the hash result for a + specific packet falls in this range, the packet is selected. If + the value is outside the range, the packet is not selected. For + example, if the selection interval specification is [1:3], [6:9] + all packets are selected for which the hash result is 1,2,3,6,7,8, + or 9. In all other cases, the packet is not selected. + + Case router state: + + - Ingress interface at which the packet arrives equals a specified + value + + - Egress interface to which the packet is routed equals a specified + value + + - Packet violated Access Control List (ACL) on the router + + - Reverse Path Forwarding (RPF) failed for the packet + + - Resource Reservation is insufficient for the packet + + - No route is found for the packet + + - Origin AS equals a specified value or lies within a given range + + + +Zseby, et al. Standards Track [Page 33] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + - Destination AS equals a specified value or lies within a given + range + + Note to case router state: + + - All router state entries can be linked by AND operators + +8. Composite Techniques + + Composite schemes are realized by combining the Selector IDs into a + Selection Sequence. The Selection Sequence contains all Selector IDs + that are applied to the Packet Stream subsequently. Some examples of + composite schemes are reported below. + +8.1. Cascaded Filtering->Sampling or Sampling->Filtering + + If a filter precedes a Sampling process, the role of Filtering is to + create a set of "parent populations" from a single stream that can + then be fed independently to different Sampling functions, with + different parameters tuned for the Population itself (e.g., if + streams of different intensity result from Filtering, it may be good + to have different Sampling rates). If Filtering follows a Sampling + process, the same Selection Fraction and type are applied to the + whole stream, independently of the relative size of the streams + resulting from the Filtering function. Moreover, also packets not + destined to be selected in the Filtering operation will "load" the + Sampling function. So, in principle, Filtering before Sampling + allows a more accurate tuning of the Sampling procedure, but if + filters are too complex to work at full line rate (e.g., because they + have to access router state information), Sampling before Filtering + may be a need. + +8.2. Stratified Sampling + + Stratified Sampling is one example for using a composite technique. + The basic idea behind stratified Sampling is to increase the + estimation accuracy by using a priori information about correlations + of the investigated characteristic with some other characteristic + that is easier to obtain. The a priori information is used to + perform an intelligent grouping of the elements of the parent + Population. In this manner, a higher estimation accuracy can be + achieved with the same sample size or the sample size can be reduced + without reducing the estimation accuracy. + + Stratified Sampling divides the Sampling process into multiple steps. + First, the elements of the parent Population are grouped into subsets + in accordance to a given characteristic. This grouping can be done + in multiple steps. Then samples are taken from each subset. + + + +Zseby, et al. Standards Track [Page 34] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + The stronger the correlation between the characteristic used to + divide the parent Population (stratification variable) and the + characteristic of interest (for which an estimate is sought after), + the easier is the consecutive Sampling process and the higher is the + stratification gain. For instance, if the dividing characteristic + were equal to the investigated characteristic, each element of the + subgroup would be a perfect representative of that characteristic. + In this case, it would be sufficient to take one arbitrary element + out of each subgroup to get the actual distribution of the + characteristic in the parent Population. Therefore, stratified + Sampling can reduce the costs for the Sampling process (i.e., the + number of samples needed to achieve a given level of confidence). + + For stratified Sampling, one has to specify classification rules for + grouping the elements into subgroups and the Sampling scheme that is + used within the subgroups. The classification rules can be expressed + by multiple filters. For the Sampling scheme within the subgroups, + the parameters have to be specified as described above. The use of + stratified Sampling methods for measurement purposes is described for + instance in [ClPB93] and [Zseb03]. + +9. Security Considerations + + Security considerations concerning the choice of a Hash Function for + Hash-based Selection have been discussed in Section 6.2.3. That + section discussed a number of potential attacks to craft Packet + Streams that are disproportionately detected and/or discover the Hash + Function parameters, the vulnerabilities of different Hash Functions + to these attacks, and practices to minimize these vulnerabilities. + + In addition to this, a user can gain knowledge about the start and + stop triggers in time-based systematic Sampling, e.g., by sending + test packets. This knowledge might allow users to modify their send + schedule in a way that their packets are disproportionately selected + or not selected [GoRe07]. + + For random Sampling, a cryptographically strong random number + generator should be used in order to prevent that an advisory can + predict the selection decision [GoRe07]. + + Further security threats can occur when Sampling parameters are + configured or communicated to other entities. The configuration and + reporting of Sampling parameters are out of scope of this document. + Therefore, the security threats that originate from this kind of + communication cannot be assessed with the information given in this + document. + + + + + +Zseby, et al. Standards Track [Page 35] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + Some of these threats can probably be addressed by keeping + configuration information confidential and by authenticating entities + that configure Sampling. Nevertheless, a full analysis and + assessment of threats for configuration and reporting has to be done + if configuration or reporting methods are proposed. + +10. Contributors + + Sharon Goldberg contributed to the security considerations for Hash- + based Selection. + + Sharon Goldberg + Department of Electrical Engineering + Princeton University + F210-K EQuad + Princeton, NJ 08544, + USA + EMail: goldbe@princeton.edu + +11. Acknowledgments + + We would like to thank the PSAMP group, especially Benoit Claise and + Stewart Bryant, for fruitful discussions and for proofreading the + document. We thank Sharon Goldberg for her input on security issues + concerning Hash-based Selection. + +12. References + +12.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + +12.2. Informative References + + [AmCa89] Paul D. Amer, Lillian N. Cassel, "Management of Sampled + Real-Time Network Measurements", 14th Conference on Local + Computer Networks, October 1989, Minneapolis, pages 62-68, + IEEE, 1989. + + [BeCK96] M. Bellare, R. Canetti and H. Krawczyk, "Pseudorandom + Functions Revisited: The Cascade Construction and its + Concrete Security", Symposium on Foundations of Computer + Science, 1996. + + + + + + + +Zseby, et al. Standards Track [Page 36] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + [ClPB93] K.C. Claffy, George C. Polyzos, Hans-Werner Braun, + "Application of Sampling Methodologies to Network Traffic + Characterization", Proceedings of ACM SIGCOMM'93, San + Francisco, CA, USA, September 13 - 17, 1993. + + [DuGG02] N.G. Duffield, A. Gerber, M. Grossglauser, "Trajectory + Engine: A Backend for Trajectory Sampling", IEEE Network + Operations and Management Symposium 2002, Florence, Italy, + April 15-19, 2002. + + [DuGr00] N.G. Duffield, M. Grossglauser, "Trajectory Sampling for + Direct Traffic Observation", Proceedings of ACM SIGCOMM + 2000, Stockholm, Sweden, August 28 - September 1, 2000. + + [DuGr04] N.G. Duffield and M. Grossglauser "Trajectory Sampling + with Unreliable Reporting", Proc IEEE Infocom 2004, Hong + Kong, March 2004. + + [DuLT01] N.G. Duffield, C. Lund, and M. Thorup, "Charging from + Sampled Network Usage", ACM Internet Measurement Workshop + IMW 2001, San Francisco, USA, November 1-2, 2001. + + [EsVa01] C. Estan and G. Varghese, "New Directions in Traffic + Measurement and Accounting", ACM SIGCOMM Internet + Measurement Workshop 2001, San Francisco (CA) Nov. 2001. + + [GoRe07] S. Goldberg, J. Rexford, "Security Vulnerabilities and + Solutions for Packet Sampling", IEEE Sarnoff Symposium, + Princeton, NJ, May 2007. + + [HT52] D.G. Horvitz and D.J. Thompson, "A Generalization of + Sampling without replacement from a Finite Universe" J. + Amer. Statist. Assoc. Vol. 47, pp. 663-685, 1952. + + [Henk08] Christian Henke, Evaluation of Hash Functions for + Multipoint Sampling in IP Networks, Diploma Thesis, TU + Berlin, April 2008. + + [HeSZ08] Christian Henke, Carsten Schmoll, Tanja Zseby, Evaluation + of Header Field Entropy for Hash-Based Packet Selection, + Proceedings of Passive and Active Measurement Conference + PAM 2008, Cleveland, Ohio, USA, April 2008. + + [Jenk97] B. Jenkins, "Algorithm Alley", Dr. Dobb's Journal, + September 1997. + http://burtleburtle.net/bob/hash/doobs.html. + + + + + +Zseby, et al. Standards Track [Page 37] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + [JePP92] Jonathan Jedwab, Peter Phaal, Bob Pinna, "Traffic + Estimation for the Largest Sources on a Network, Using + Packet Sampling with Limited Storage", HP technical + report, Managemenr, Mathematics and Security Department, + HP Laboratories, Bristol, March 1992, + http://www.hpl.hp.com/techreports/92/HPL-92-35.html. + + [Moli03] M. Molina, "A scalable and efficient methodology for flow + monitoring in the Internet", International Teletraffic + Congress (ITC-18), Berlin, Sep. 2003. + + [MoND05] M. Molina, S. Niccolini, N.G. Duffield, "A Comparative + Experimental Study of Hash Functions Applied to Packet + Sampling", International Teletraffic Congress (ITC-19), + Beijing, August 2005. + + [RFC1141] Mallory, T. and A. Kullberg, "Incremental updating of the + Internet checksum", RFC 1141, January 1990. + + [RFC1624] Rijsinghani, A., Ed., "Computation of the Internet + Checksum via Incremental Update", RFC 1624, May 1994. + + [RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S. + Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 + Functional Specification", RFC 2205, September 1997. + + [RFC3704] Baker, F. and P. Savola, "Ingress Filtering for Multihomed + Networks", BCP 84, RFC 3704, March 2004. + + [RFC3917] Quittek, J., Zseby, T., Claise, B., and S. Zander, + "Requirements for IP Flow Information Export (IPFIX)", RFC + 3917, October 2004. + + [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A + Border Gateway Protocol 4 (BGP-4)", RFC 4271, January + 2006. + + [RFC5101] Claise, B., Ed., "Specification of the IP Flow Information + Export (IPFIX) Protocol for the Exchange of IP Traffic + Flow Information", RFC 5101, January 2008. + + [RFC5102] Quittek, J., Bryant, S., Claise, B., Aitken, P., and J. + Meyer, "Information Model for IP Flow Information Export", + RFC 5102, January 2008. + + [RFC5474] Duffield, N., Ed., "A Framework for Packet Selection and + Reporting", RFC 5474, March 2009. + + + + +Zseby, et al. Standards Track [Page 38] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + [RFC5476] Claise, B., Ed., "Packet Sampling (PSAMP) Protocol + Specifications", RFC 5476, March 2009. + + [RFC5477] Dietz, T., Claise, B., Aitken, P., Dressler, F., and G. + Carle, "Information Model for Packet Sampling Exports", + RFC 5477, March 2009. + + [Zseb03] T. Zseby, "Stratification Strategies for Sampling-based + Non-intrusive Measurement of One-way Delay", Proceedings + of Passive and Active Measurement Workshop (PAM 2003), La + Jolla, CA, USA, pp. 171-179, April 2003. + + [ZsZC01] Tanja Zseby, Sebastian Zander, Georg Carle. Evaluation of + Building Blocks for Passive One-way-delay Measurements. + Proceedings of Passive and Active Measurement Workshop + (PAM 2001), Amsterdam, The Netherlands, April 23-24, 2001. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Zseby, et al. Standards Track [Page 39] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + +Appendix A. Hash Functions + +A.1. IP Shift-XOR (IPSX) Hash Function + + The IPSX Hash Function is tailored for acting on IP version 4 + packets. It exploits the structure of IP packets and in particular + the variability expected to be exhibited within different fields of + the IP packet in order to furnish a hash value with little apparent + correlation with individual packet fields. Fields from the IPv4 and + TCP/UDP headers are used as input. The IPSX Hash Function uses a + small number of simple instructions. + + Input parameters: None + + Built-in parameters: None + + Output: The output of the IPSX is a 16-bit number + + Functioning: + + The functioning can be divided into two parts: input selection, whose + forms are composite input from various portions of the IP packet, + followed by computation of the hash on the composite. + + Input Selection: + + The raw input is drawn from the first 20 bytes of the IP packet + header and the first 8 bytes of the IP payload. If IP options are + not used, the IP header has 20 bytes, and hence the two portions + adjoin and comprise the first 28 bytes of the IP packet. We now use + the raw input as four 32-bit subportions of these 28 bytes. We + specify the input by bit offsets from the start of IP header or + payload. + + f1 = bits 32 to 63 of the IP header, comprising the IP identification + field, flags, and fragment offset. + + f2 = bits 96 to 127 of the IP header, the source IP address. + + f3 = bits 128 to 159 of the IP header, the destination IP address. + + f4 = bits 32 to 63 of the IP payload. For a TCP packet, f4 comprises + the TCP sequence number followed by the message length. For a + UDP packet, f4 comprises the UDP checksum. + + + + + + + +Zseby, et al. Standards Track [Page 40] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + Hash Computation: + + The hash is computed from f1, f2, f3, and f4 by a combination of XOR + (^), right shift (>>), and left shift (<<) operations. The + intermediate quantities h1, v1, and v2 are 32-bit numbers. + + 1. v1 = f1 ^ f2; + 2. v2 = f3 ^ f4; + 3. h1 = v1 << 8; + 4. h1 ^= v1 >> 4; + 5. h1 ^= v1 >> 12; + 6. h1 ^= v1 >> 16; + 7. h1 ^= v2 << 6; + 8. h1 ^= v2 << 10; + 9. h1 ^= v2 << 14; + 10. h1 ^= v2 >> 7; + + The output of the hash is the least significant 16 bits of h1. + +A.2. BOB Hash Function + + The BOB Hash Function is a Hash Function designed for having each bit + of the input affecting every bit of the return value and using both + 1-bit and 2-bit deltas to achieve the so-called avalanche effect + [Jenk97]. This function was originally built for hash table lookup + with fast software implementation. + + Input parameters: + + The input parameters of such a function are: + + - the length of the input string (key) to be hashed, in bytes. + The elementary input blocks of BOB hash are the single bytes; + therefore, no padding is needed. + + - an init value (an arbitrary 32-bit number). + + Built-in parameters: + + The BOB hash uses the following built-in parameter: + + - the golden ratio (an arbitrary 32-bit number used in the Hash + Function computation: its purpose is to avoid mapping all zeros + to all zeros). + + + + + + + +Zseby, et al. Standards Track [Page 41] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + Note: The mix sub-function (see mix (a,b,c) macro in the reference + code below) has a number of parameters governing the shifts in the + registers. The one presented is not the only possible choice. + + It is an open point whether these may be considered additional + built-in parameters to specify at function configuration. + + Output: + + The output of the BOB function is a 32-bit number. It should be + specified: + + - A 32-bit mask to apply to the output + + - The Selection Range as a list of non-overlapping intervals + [start value, end value] where value is in [0,2^32] + + Functioning: + + The hash value is obtained computing first an initialization of an + internal state (composed of three 32-bit numbers, called a, b, c in + the reference code below), then, for each input byte of the key the + internal state is combined by addition and mixed using the mix sub- + function. Finally, the internal state mixed one last time and the + third number of the state (c) is chosen as the return value. + + typedef unsigned long int ub4; /* unsigned 4-byte quantities + */ + typedef unsigned char ub1; /* unsigned 1-byte quantities + */ + + #define hashsize(n) ((ub4)1<<(n)) + #define hashmask(n) (hashsize(n)-1) + + /* ------------------------------------------------------ + mix -- mix three 32-bit values reversibly. + + For every delta with one or two bits set, and the deltas of + all three high bits or all three low bits, whether the original + value of a,b,c is almost all zero or is uniformly distributed, + * If mix() is run forward or backward, at least 32 bits in + a,b,c have at least 1/4 probability of changing. + * If mix() is run forward, every bit of c will change between + 1/3 and 2/3 of the time (well, 22/100 and 78/100 for some 2- + bit deltas) mix() was built out of 36 single-cycle latency + instructions in a structure that could support 2x parallelism, + like so: + + + + +Zseby, et al. Standards Track [Page 42] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + a -= b; + a -= c; x = (c>>13); + b -= c; a ^= x; + b -= a; x = (a<<8); + c -= a; b ^= x; + c -= b; x = (b>>13); + ... + Unfortunately, superscalar Pentiums and Sparcs can't take + advantage of that parallelism. They've also turned some of + those single-cycle latency instructions into multi-cycle latency + instructions + + ------------------------------------------------------------*/ + + #define mix(a,b,c) \ + { \ + a -= b; a -= c; a ^= (c>>13); \ + b -= c; b -= a; b ^= (a<<8); \ + c -= a; c -= b; c ^= (b>>13); \ + a -= b; a -= c; a ^= (c>>12); \ + b -= c; b -= a; b ^= (a<<16); \ + c -= a; c -= b; c ^= (b>>5); \ + a -= b; a -= c; a ^= (c>>3); \ + b -= c; b -= a; b ^= (a<<10); \ + c -= a; c -= b; c ^= (b>>15); \ + } + + /* ----------------------------------------------------------- + hash() -- hash a variable-length key into a 32-bit value + k : the key (the unaligned variable-length array of bytes) + len : the length of the key, counting by bytes + initval : can be any 4-byte value + Returns a 32-bit value. Every bit of the key affects every bit + of the return value. Every 1-bit and 2-bit delta achieves + avalanche. About 6*len+35 instructions. + + The best hash table sizes are powers of 2. There is no need to do + mod a prime (mod is so slow!). If you need less than 32 bits, use a + bitmask. For example, if you need only 10 bits, do h = (h & + hashmask(10)), in which case, the hash table should have hashsize(10) + elements. + + If you are hashing n strings (ub1 **)k, do it like this: for (i=0, + h=0; i<n; ++i) h = hash( k[i], len[i], h); + + + + + + + +Zseby, et al. Standards Track [Page 43] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + By Bob Jenkins, 1996. bob_jenkins@burtleburtle.net. You may use + this code any way you wish, private, educational, or commercial. + It's free. See http://burtleburtle.net/bob/hash/evahash.html. + Use for hash table lookup, or anything where one collision in 2^^32 + is acceptable. Do NOT use for cryptographic purposes. + ----------------------------------------------------------- */ + + ub4 bob_hash(k, length, initval) + register ub1 *k; /* the key */ + register ub4 length; /* the length of the key */ + register ub4 initval; /* an arbitrary value */ + { + register ub4 a,b,c,len; + + /* Set up the internal state */ + len = length; + a = b = 0x9e3779b9; /*the golden ratio; an arbitrary value + */ + c = initval; /* another arbitrary value */ + + /*------------------------------------ handle most of the key */ + + while (len >= 12) + { + a += (k[0] +((ub4)k[1]<<8) +((ub4)k[2]<<16) + +((ub4)k[3]<<24)); + b += (k[4] +((ub4)k[5]<<8) +((ub4)k[6]<<16) + +((ub4)k[7]<<24)); + c += (k[8] +((ub4)k[9]<<8) + +((ub4)k[10]<<16)+((ub4)k[11]<<24)); + mix(a,b,c); + k += 12; len -= 12; + } + + /*---------------------------- handle the last 11 bytes */ + c += length; + switch(len) /* all the case statements fall through*/ + { + case 11: c+=((ub4)k[10]<<24); + case 10: c+=((ub4)k[9]<<16); + case 9 : c+=((ub4)k[8]<<8); + /* the first byte of c is reserved for the length */ + case 8 : b+=((ub4)k[7]<<24); + case 7 : b+=((ub4)k[6]<<16); + case 6 : b+=((ub4)k[5]<<8); + case 5 : b+=k[4]; + case 4 : a+=((ub4)k[3]<<24); + case 3 : a+=((ub4)k[2]<<16); + + + +Zseby, et al. Standards Track [Page 44] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + + case 2 : a+=((ub4)k[1]<<8); + case 1 : a+=k[0]; + /* case 0: nothing left to add */ + } + mix(a,b,c); + /*-------------------------------- report the result */ + return c; + } + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Zseby, et al. Standards Track [Page 45] + +RFC 5475 Techniques for IP Packet Selection March 2009 + + +Authors' Addresses + + Tanja Zseby + Fraunhofer Institute for Open Communication Systems + Kaiserin-Augusta-Allee 31 + 10589 Berlin + Germany + Phone: +49-30-34 63 7153 + EMail: tanja.zseby@fokus.fraunhofer.de + + Maurizio Molina + DANTE + City House + 126-130 Hills Road + Cambridge CB21PQ + United Kingdom + Phone: +44 1223 371 300 + EMail: maurizio.molina@dante.org.uk + + Nick Duffield + AT&T Labs - Research + Room B-139 + 180 Park Ave + Florham Park, NJ 07932 + USA + Phone: +1 973-360-8726 + EMail: duffield@research.att.com + + Saverio Niccolini + Network Laboratories, NEC Europe Ltd. + Kurfuerstenanlage 36 + 69115 Heidelberg + Germany + Phone: +49-6221-9051118 + EMail: saverio.niccolini@netlab.nec.de + + Frederic Raspall + EPSC-UPC + Dept. of Telematics + Av. del Canal Olimpic, s/n + Edifici C4 + E-08860 Castelldefels, Barcelona + Spain + EMail: fredi@entel.upc.es + + + + + + + +Zseby, et al. Standards Track [Page 46] + |