diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc7363.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc7363.txt')
-rw-r--r-- | doc/rfc/rfc7363.txt | 1235 |
1 files changed, 1235 insertions, 0 deletions
diff --git a/doc/rfc/rfc7363.txt b/doc/rfc/rfc7363.txt new file mode 100644 index 0000000..41c7b54 --- /dev/null +++ b/doc/rfc/rfc7363.txt @@ -0,0 +1,1235 @@ + + + + + + +Internet Engineering Task Force (IETF) J. Maenpaa +Request for Comments: 7363 G. Camarillo +Category: Standards Track Ericsson +ISSN: 2070-1721 September 2014 + + + Self-Tuning Distributed Hash Table (DHT) + for REsource LOcation And Discovery (RELOAD) + +Abstract + + REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P) + signaling protocol that provides an overlay network service. Peers + in a RELOAD overlay network collectively run an overlay algorithm to + organize the overlay and to store and retrieve data. This document + describes how the default topology plugin of RELOAD can be extended + to support self-tuning, that is, to adapt to changing operating + conditions such as churn and network size. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc7363. + +Copyright Notice + + Copyright (c) 2014 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + + + + +Maenpaa & Camarillo Standards Track [Page 1] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + +Table of Contents + + 1. Introduction ....................................................2 + 2. Terminology .....................................................3 + 3. Introduction to Stabilization in DHTs ...........................5 + 3.1. Reactive versus Periodic Stabilization .....................5 + 3.2. Configuring Periodic Stabilization .........................6 + 3.3. Adaptive Stabilization .....................................7 + 4. Introduction to Chord ...........................................7 + 5. Extending Chord-Reload to Support Self-Tuning ...................9 + 5.1. Update Requests ............................................9 + 5.2. Neighbor Stabilization ....................................10 + 5.3. Finger Stabilization ......................................11 + 5.4. Adjusting Finger Table Size ...............................11 + 5.5. Detecting Partitioning ....................................11 + 5.6. Leaving the Overlay .......................................11 + 6. Self-Tuning Chord Parameters ...................................12 + 6.1. Estimating Overlay Size ...................................12 + 6.2. Determining Routing Table Size ............................13 + 6.3. Estimating Failure Rate ...................................13 + 6.3.1. Detecting Failures .................................14 + 6.4. Estimating Join Rate ......................................14 + 6.5. Estimate Sharing ..........................................15 + 6.6. Calculating the Stabilization Interval ....................17 + 7. Overlay Configuration Document Extension .......................17 + 8. Security Considerations ........................................18 + 9. IANA Considerations ............................................18 + 9.1. Message Extensions ........................................18 + 9.2. New Overlay Algorithm Type ................................19 + 9.3. A New IETF XML Registry ...................................19 + 10. Acknowledgments ...............................................19 + 11. References ....................................................19 + 11.1. Normative References .....................................19 + 11.2. Informative References ...................................20 + +1. Introduction + + REsource LOcation And Discovery (RELOAD) [RFC6940] is a peer-to-peer + signaling protocol that can be used to maintain an overlay network + and to store data in and retrieve data from the overlay. For + interoperability reasons, RELOAD specifies one overlay algorithm, + called "chord-reload", that is mandatory to implement. This document + extends the chord-reload algorithm by introducing self-tuning + behavior. + + DHT-based overlay networks are self-organizing, scalable, and + reliable. However, these features come at a cost: peers in the + overlay network need to consume network bandwidth to maintain routing + + + +Maenpaa & Camarillo Standards Track [Page 2] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + state. Most DHTs use a periodic stabilization routine to counter the + undesirable effects of churn on routing. To configure the parameters + of a DHT, some characteristics such as churn rate and network size + need to be known in advance. These characteristics are then used to + configure the DHT in a static fashion by using fixed values for + parameters such as the size of the successor set, size of the routing + table, and rate of maintenance messages. The problem with this + approach is that it is not possible to achieve a low failure rate and + a low communication overhead by using fixed parameters. Instead, a + better approach is to allow the system to take into account the + evolution of network conditions and adapt to them. + + This document extends the mandatory-to-implement chord-reload + algorithm by making it self-tuning. The use of the self-tuning + feature is optional. However, when used, it needs to be supported by + all peers in the RELOAD overlay network. The fact that a RELOAD + overlay uses the self-tuning feature is indicated in the RELOAD + overlay configuration document using the CHORD-SELF-TUNING algorithm + name specified in Section 9.2 in the topology-plugin element. Two + main advantages of self-tuning are that users no longer need to tune + every DHT parameter correctly for a given operating environment and + that the system adapts to changing operating conditions. + + The remainder of this document is structured as follows: Section 2 + provides definitions of terms used in this document. Section 3 + discusses alternative approaches to stabilization operations in DHTs, + including reactive stabilization, periodic stabilization, and + adaptive stabilization. Section 4 gives an introduction to the Chord + DHT algorithm. Section 5 describes how this document extends the + stabilization routine of the chord-reload algorithm. Section 6 + describes how the stabilization rate and routing table size are + calculated in an adaptive fashion. + +2. Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + [RFC2119]. + + This document uses terminology and definitions from the RELOAD base + specification [RFC6940]. + + numBitsInNodeId: Specifies the number of bits in a RELOAD Node-ID. + + DHT: Distributed Hash Tables are a class of decentralized + distributed systems that provide a lookup service similar to a + regular hash table. Given a key, any peer participating in the + + + +Maenpaa & Camarillo Standards Track [Page 3] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + system can retrieve the value associated with that key. The + responsibility for maintaining the mapping from keys to values is + distributed among the peers. + + Chord Ring: The Chord DHT uses ring topology and orders identifiers + on an identifier circle of size 2^numBitsInNodeId. This + identifier circle is called the Chord ring. On the Chord ring, + the responsibility for a key k is assigned to the node whose + identifier equals to or immediately follows k. + + Finger Table: A data structure with up to (but typically less than) + numBitsInNodeId entries maintained by each peer in a Chord-based + overlay. The ith entry in the finger table of peer n contains the + identity of the first peer that succeeds n by at least + 2^(numBitsInNodeId-i) on the Chord ring. This peer is called the + ith finger of peer n. As an example, the first entry in the + finger table of peer n contains a peer halfway around the Chord + ring from peer n. The purpose of the finger table is to + accelerate lookups. + + n.id: In this document, this abbreviation is used to refer to the + Node-ID of peer n. + + O(g(n)): Informally, saying that some equation f(n) = O(g(n)) means + that f(n) is less than some constant multiple of g(n). For the + formal definition, please refer to [Weiss1998]. + + Omega(g(n)): Informally, saying that some equation f(n) = + Omega(g(n)) means that f(n) is more than some constant multiple of + g(n). For the formal definition, please refer to [Weiss1998]. + + Percentile: The Pth (0<=P<=100) percentile of N values arranged in + ascending order is obtained by first calculating the (ordinal) + rank n=(P/100)*N, rounding the result to the nearest integer and + then taking the value corresponding to that rank. + + Predecessor List: A data structure containing the first r + predecessors of a peer on the Chord ring. + + Successor List: A data structure containing the first r successors + of a peer on the Chord ring. + + Neighborhood Set: A term used to refer to the set of peers included + in the successor and predecessor lists of a given peer. + + + + + + + +Maenpaa & Camarillo Standards Track [Page 4] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + Routing Table: Contents of a given peer's routing table include the + set of peers that the peer can use to route overlay messages. The + routing table is made up of the finger table, successor list, and + predecessor list. + +3. Introduction to Stabilization in DHTs + + DHTs use stabilization routines to counter the undesirable effects of + churn on routing. The purpose of stabilization is to keep the + routing information of each peer in the overlay consistent with the + constantly changing overlay topology. There are two alternative + approaches to stabilization: periodic and reactive [Rhea2004]. + Periodic stabilization can either use a fixed stabilization rate or + calculate the stabilization rate in an adaptive fashion. + +3.1. Reactive versus Periodic Stabilization + + In reactive stabilization, a peer reacts to the loss of a peer in its + neighborhood set or to the appearance of a new peer that should be + added to its neighborhood set by sending a copy of its neighbor table + to all peers in the neighborhood set. Periodic recovery, in + contrast, takes place independently of changes in the neighborhood + set. In periodic recovery, a peer periodically shares its + neighborhood set with each or a subset of the members of that set. + + The chord-reload algorithm [RFC6940] supports both reactive and + periodic stabilization. It has been shown in [Rhea2004] that + reactive stabilization works well for small neighborhood sets (i.e., + small overlays) and moderate churn. However, in large-scale (e.g., + 1000 peers or more [Rhea2004]) or high-churn overlays, reactive + stabilization runs the risk of creating a positive feedback cycle, + which can eventually result in congestion collapse. In [Rhea2004], + it is shown that a 1000-peer overlay under churn uses significantly + less bandwidth and has lower latencies when periodic stabilization is + used than when reactive stabilization is used. Although in the + experiments carried out in [Rhea2004], reactive stabilization + performed well when there was no churn, its bandwidth use was + observed to jump dramatically under churn. At higher churn rates and + larger scale overlays, periodic stabilization uses less bandwidth and + the resulting lower contention for the network leads to lower + latencies. For this reason, most DHTs, such as CAN [CAN], Chord + [Chord], Pastry [Pastry], and Bamboo [Rhea2004], use periodic + stabilization [Ghinita2006]. As an example, the first version of + Bamboo used reactive stabilization, which caused Bamboo to suffer + from degradation in performance under churn. To fix this problem, + Bamboo was modified to use periodic stabilization. + + + + + +Maenpaa & Camarillo Standards Track [Page 5] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + In Chord, periodic stabilization is typically done both for + successors and fingers. An alternative strategy is analyzed in + [Krishnamurthy2008]. In this strategy, called the "correction-on- + change maintenance strategy", a peer periodically stabilizes its + successors but does not do so for its fingers. Instead, finger + pointers are stabilized in a reactive fashion. The results obtained + in [Krishnamurthy2008] imply that although the correction-on-change + strategy works well when churn is low, periodic stabilization + outperforms the correction-on-change strategy when churn is high. + +3.2. Configuring Periodic Stabilization + + When periodic stabilization is used, one faces the problem of + selecting an appropriate execution rate for the stabilization + procedure. If the execution rate of periodic stabilization is high, + changes in the system can be quickly detected, but at the + disadvantage of increased communication overhead. Alternatively, if + the stabilization rate is low and the churn rate is high, routing + tables become inaccurate and DHT performance deteriorates. Thus, the + problem is setting the parameters so that the overlay achieves the + desired reliability and performance even in challenging conditions, + such as under heavy churn. This naturally results in high cost + during periods when the churn level is lower than expected, or + alternatively, poor performance or even network partitioning in worse + than expected conditions. + + In addition to selecting an appropriate stabilization interval, + regardless of whether or not periodic stabilization is used, an + appropriate size needs to be selected for the neighborhood set and + for the finger table. + + The current approach is to configure overlays statically. This works + in situations where perfect information about the future is + available. In situations where the operating conditions of the + network are known in advance and remain static throughout the + lifetime of the system, it is possible to choose fixed optimal values + for parameters such as stabilization rate, neighborhood set size and + routing table size. However, if the operating conditions (e.g., the + size of the overlay and its churn rate) do not remain static but + evolve with time, it is not possible to achieve both a low lookup + failure rate and a low communication overhead by using fixed + parameters [Ghinita2006]. + + As an example, to configure the Chord DHT algorithm, one needs to + select values for the following parameters: size of successor list, + stabilization interval, and size of the finger table. To select an + appropriate value for the stabilization interval, one needs to know + the expected churn rate and overlay size. According to + + + +Maenpaa & Camarillo Standards Track [Page 6] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + [Liben-Nowell2002], a Chord network in a ring-like state remains in a + ring-like state as long as peers send Omega(square(log(N))) messages + before N new peers join or N/2 peers fail. Thus, in a 500-peer + overlay churning at a rate such that one peer joins and one peer + leaves the network every 30 seconds, an appropriate stabilization + interval would be on the order of 93 s. According to [Chord], the + size of the successor list and finger table should be on the order of + log(N). Already a successor list of a modest size (e.g., log2(N) or + 2*log2(N), which is the successor list size used in [Chord]) makes it + very unlikely that a peer will lose all of its successors, which + would cause the Chord ring to become disconnected. Thus, in a + 500-peer network each peer should maintain on the order of nine + successors and fingers. However, if the churn rate doubles and the + network size remains unchanged, the stabilization rate should double + as well. That is, the appropriate maintenance interval would now be + on the order of 46 s. On the other hand, if the churn rate becomes, + e.g., six-fold and the size of the network grows to 2000 peers, on + the order of 11 fingers and successors should be maintained and the + stabilization interval should be on the order of 42 s. If one + continued using the old values, this could result in inaccurate + routing tables, network partitioning, and deteriorating performance. + +3.3. Adaptive Stabilization + + A self-tuning DHT takes into consideration the continuous evolution + of network conditions and adapts to them. In a self-tuning DHT, each + peer collects statistical data about the network and dynamically + adjusts its stabilization rate, neighborhood set size, and finger + table size based on the analysis of the data [Ghinita2006]. + Reference [Mahajan2003] shows that by using self-tuning, it is + possible to achieve high reliability and performance even in adverse + conditions with low maintenance cost. Adaptive stabilization has + been shown to outperform periodic stabilization in terms of both + lookup failures and communication overhead [Ghinita2006]. + +4. Introduction to Chord + + Chord [Chord] is a structured P2P algorithm that uses consistent + hashing to build a DHT out of several independent peers. Consistent + hashing assigns each peer and resource a fixed-length identifier. + Peers use SHA-1 as the base hash function to generate the + identifiers. As specified in RELOAD base [RFC6940], the length of + the identifiers is numBitsInNodeId=128 bits. The identifiers are + ordered on an identifier circle of size 2^numBitsInNodeId. On the + identifier circle, key k is assigned to the first peer whose + identifier equals or follows the identifier of k in the identifier + space. The identifier circle is called the Chord ring. + + + + +Maenpaa & Camarillo Standards Track [Page 7] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + Different DHTs differ significantly in performance when bandwidth is + limited. It has been shown that when compared to other DHTs, the + advantages of Chord include that it uses bandwidth efficiently and + can achieve low lookup latencies at little cost [Li2004]. + + A simple lookup mechanism could be implemented on a Chord ring by + requiring each peer to only know how to contact its current successor + on the identifier circle. Queries for a given identifier could then + be passed around the circle via the successor pointers until they + encounter the first peer whose identifier is equal to or larger than + the desired identifier. Such a lookup scheme uses a number of + messages that grows linearly with the number of peers. To reduce the + cost of lookups, Chord maintains also additional routing information; + each peer n maintains a data structure with up to numBitsInNodeId + entries, called the finger table. The first entry in the finger + table of peer n contains the peer halfway around the ring from peer + n. The second entry contains the peer that is 1/4th of the way + around, the third entry the peer that is 1/8th of the way around, + etc. In other words, the ith entry in the finger table at peer n + contains the identity of the first peer s that succeeds n by at least + 2^(numBitsInNodeId-i) on the Chord ring. This peer is called the ith + finger of peer n. The interval between two consecutive fingers is + called a finger interval. The ith finger interval of peer n covers + the range [n.id + 2^(numBitsInNodeId-i), n.id + 2^(numBitsInNodeId- + i+1)) on the Chord ring. In an N-peer network, each peer maintains + information about O(log(N)) other peers in its finger table. As an + example, if N=100000, it is sufficient to maintain 17 fingers. + + Chord needs all peers' successor pointers to be up to date in order + to ensure that lookups produce correct results as the set of + participating peers changes. To achieve this, peers run a + stabilization protocol periodically in the background. The + stabilization protocol of the original Chord algorithm uses two + operations: successor stabilization and finger stabilization. + However, the Chord algorithm of RELOAD base defines two additional + stabilization components, as will be discussed below. + + To increase robustness in the event of peer failures, each Chord peer + maintains a successor list of size r, containing the peer's first r + successors. The benefit of successor lists is that if each peer + fails independently with probability p, the probability that all r + successors fail simultaneously is only p^r. + + The original Chord algorithm maintains only a single predecessor + pointer. However, multiple predecessor pointers (i.e., a predecessor + list) can be maintained to speed up recovery from predecessor + failures. The routing table of a peer consists of the successor + list, finger table, and predecessor list. + + + +Maenpaa & Camarillo Standards Track [Page 8] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + +5. Extending Chord-Reload to Support Self-Tuning + + This section describes how the mandatory-to-implement chord-reload + algorithm defined in RELOAD base [RFC6940] can be extended to support + self-tuning. + + The chord-reload algorithm supports both reactive and periodic + recovery strategies. When the self-tuning mechanisms defined in this + document are used, the periodic recovery strategy is used. Further, + chord-reload specifies that at least three predecessors and three + successors need to be maintained. When the self-tuning mechanisms + are used, the appropriate sizes of the successor list and predecessor + list are determined in an adaptive fashion based on the estimated + network size, as will be described in Section 6. + + As specified in RELOAD base [RFC6940], each peer maintains a + stabilization timer. When the stabilization timer fires, the peer + restarts the timer and carries out the overlay stabilization routine. + Overlay stabilization has four components in chord-reload: + + 1. Update the neighbor table. We refer to this as "neighbor + stabilization". + + 2. Refreshing the finger table. We refer to this as "finger + stabilization". + + 3. Adjusting finger table size. + + 4. Detecting partitioning. We refer to this as "strong + stabilization". + + As specified in RELOAD base [RFC6940], a peer sends periodic messages + as part of the neighbor stabilization, finger stabilization, and + strong stabilization routines. In neighbor stabilization, a peer + periodically sends an Update request to every peer in its connection + table. The default time is every ten minutes. In finger + stabilization, a peer periodically searches for new peers to include + in its finger table. This time defaults to one hour. This document + specifies how the neighbor stabilization and finger stabilization + intervals can be determined in an adaptive fashion based on the + operating conditions of the overlay. The subsections below describe + how this document extends the four components of stabilization. + +5.1. Update Requests + + As described in RELOAD base [RFC6940], the neighbor and finger + stabilization procedures are implemented using Update requests. + RELOAD base defines three types of Update requests: 'peer_ready', + + + +Maenpaa & Camarillo Standards Track [Page 9] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + 'neighbors', and 'full'. Regardless of the type, all Update requests + include an 'uptime' field. The self-tuning extensions require + information on the uptimes of peers in the routing table. The sender + of an Update request includes its current uptime (in seconds) in the + 'uptime' field. Regardless of the type, all Update requests MUST + include an 'uptime' field. + + When self-tuning is used, each peer decides independently the + appropriate size for the successor list, predecessor list, and finger + table. Thus, the 'predecessors', 'successors', and 'fingers' fields + included in RELOAD Update requests are of variable length. As + specified in RELOAD [RFC6940], variable-length fields are on the wire + preceded by length bytes. In the case of the successor list, + predecessor list, and finger table, there are two length bytes + (allowing lengths up to 2^16-1). The number of NodeId structures + included in each field can be calculated based on the length bytes + since the size of a single NodeId structure is 16 bytes. If a peer + receives more entries than fit into its successor list, predecessor + list, or finger table, the peer MUST ignore the extra entries. A + peer may also receive less entries than it currently has in its own + data structure. In that case, it uses the received entries to update + only a subset of the entries in its data structure. As an example, a + peer that has a successor list of size 8 may receive a successor list + of size 4 from its immediate successor. In that case, the received + successor list can only be used to update the first few successors on + the peer's successor list. The rest of the successors will remain + intact. + +5.2. Neighbor Stabilization + + In the neighbor stabilization operation of chord-reload, a peer + periodically sends an Update request to every peer in its connection + table. In a small, low-churn overlay, the amount of traffic this + process generates is typically acceptable. However, in a large-scale + overlay churning at a moderate or high churn rate, the traffic load + may no longer be acceptable since the size of the connection table is + large and the stabilization interval relatively short. The self- + tuning mechanisms described in this document are especially designed + for overlays of the latter type. Therefore, when the self-tuning + mechanisms are used, each peer only sends a periodic Update request + to its first predecessor and first successor on the Chord ring; it + MUST NOT send Update requests to others. + + The neighbor stabilization routine is executed when the stabilization + timer fires. To begin the neighbor stabilization routine, a peer + sends an Update request to its first successor and its first + predecessor. The type of the Update request MUST be 'neighbors'. + The Update request includes the successor and predecessor lists of + + + +Maenpaa & Camarillo Standards Track [Page 10] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + the sender. If a peer receiving such an Update request learns from + the predecessor and successor lists included in the request that new + peers can be included in its neighborhood set, it sends Attach + requests to the new peers. + + After a new peer has been added to the predecessor or successor list, + an Update request of type 'peer_ready' is sent to the new peer. This + allows the new peer to insert the sender into its neighborhood set. + +5.3. Finger Stabilization + + Chord-reload specifies two alternative methods for searching for new + peers to the finger table. Both of the alternatives can be used with + the self-tuning extensions defined in this document. + + Immediately after a new peer has been added to the finger table, a + Probe request is sent to the new peer to fetch its uptime. The + 'requested_info' field of the Probe request MUST be set to contain + the ProbeInformationType 'uptime' defined in RELOAD base [RFC6940]. + +5.4. Adjusting Finger Table Size + + The chord-reload algorithm defines how a peer can make sure that the + finger table is appropriately sized to allow for efficient routing. + Since the self-tuning mechanisms specified in this document produce a + network size estimate, this estimate can be directly used to + calculate the optimal size for the finger table. This mechanism is + used instead of the one specified by chord-reload. A peer uses the + network size estimate to determine whether it needs to adjust the + size of its finger table each time when the stabilization timer + fires. The way this is done is explained in Section 6.2. + +5.5. Detecting Partitioning + + This document does not require any changes to the mechanism chord- + reload uses to detect network partitioning. + +5.6. Leaving the Overlay + + As specified in RELOAD base [RFC6940], a leaving peer SHOULD send a + Leave request to all members of its neighbor table prior to leaving + the overlay. The 'overlay_specific_data' field MUST contain the + ChordLeaveData structure. The Leave requests that are sent to + successors contain the predecessor list of the leaving peer. The + Leave requests that are sent to the predecessors contain the + successor list of the leaving peer. If a given successor can + identify better predecessors (that is, predecessors that are closer + to it on the Chord ring than its existing predecessors) than are + + + +Maenpaa & Camarillo Standards Track [Page 11] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + already included in its predecessor lists by investigating the + predecessor list it receives from the leaving peer, it sends Attach + requests to them. Similarly, if a given predecessor identifies + better successors by investigating the successor list it receives + from the leaving peer, it sends Attach requests to them. + +6. Self-Tuning Chord Parameters + + This section specifies how to determine an appropriate stabilization + rate and routing table size in an adaptive fashion. The proposed + mechanism is based on [Mahajan2003], [Liben-Nowell2002], and + [Ghinita2006]. To calculate an appropriate stabilization rate, the + values of three parameters must be estimated: overlay size N, failure + rate U, and join rate L. To calculate an appropriate routing table + size, the estimated network size N can be used. Peers in the overlay + MUST recalculate the values of the parameters to self-tune the chord- + reload algorithm at the end of each stabilization period before + restarting the stabilization timer. + +6.1. Estimating Overlay Size + + Techniques for estimating the size of an overlay network have been + proposed, for instance, in [Mahajan2003], [Horowitz2003], + [Kostoulas2005], [Binzenhofer2006], and [Ghinita2006]. In Chord, the + density of peer identifiers in the neighborhood set can be used to + produce an estimate of the size of the overlay, N [Mahajan2003]. + Since peer identifiers are picked randomly with uniform probability + from the numBitsInNodeId-bit identifier space, the average distance + between peer identifiers in the successor set is + (2^numBitsInNodeId)/N. + + To estimate the overlay network size, a peer computes the average + inter-peer distance d between the successive peers starting from the + most distant predecessor and ending to the most distant successor in + the successor list. The estimated network size is calculated as: + + 2^numBitsInNodeId + N = ------------------- + d + + This estimate has been found to be accurate within 15% of the real + network size [Ghinita2006]. Of course, the size of the neighborhood + set affects the accuracy of the estimate. + + + + + + + + +Maenpaa & Camarillo Standards Track [Page 12] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + During the join process, a joining peer fills its routing table by + sending a series of Ping and Attach requests, as specified in RELOAD + base [RFC6940]. Thus, a joining peer immediately has enough + information at its disposal to calculate an estimate of the network + size. + +6.2. Determining Routing Table Size + + As specified in RELOAD base [RFC6940], the finger table must contain + at least 16 entries. When the self-tuning mechanisms are used, the + size of the finger table MUST be set to max(ceiling(log2(N)), 16) + using the estimated network size N. + + The size of the successor list MUST be set to a maximum of + ceiling(log2(N)). An implementation can place a lower limit on the + size of the successor list. As an example, the implementation might + require the size of the successor list to be always at least three. + + The size of the predecessor list MUST be set to ceiling(log2(N)). + +6.3. Estimating Failure Rate + + A typical approach is to assume that peers join the overlay according + to a Poisson process with rate L and leave according to a Poisson + process with rate parameter U [Mahajan2003]. The value of U can be + estimated using peer failures in the finger table and neighborhood + set [Mahajan2003]. If peers fail with rate U, a peer with M unique + peer identifiers in its routing table should observe K failures in + time K/(M*U). Every peer in the overlay maintains a history of the + last K failures. The current time is inserted into the history when + the peer joins the overlay. The estimate of U is calculated as: + + k + U = --------, + M * Tk + + where M is the number of unique peer identifiers in the routing + table, Tk is the time between the first and the last failure in the + history, and k is the number of failures in the history. If k is + smaller than K, the estimate is computed as if there was a failure at + the current time. It has been shown that an estimate calculated in a + similar manner is accurate within 17% of the real value of U + [Ghinita2006]. + + The size of the failure history K affects the accuracy of the + estimate of U. One can increase the accuracy by increasing K. + However, this has the side effect of decreasing responsiveness to + changes in the failure rate. On the other hand, a small history size + + + +Maenpaa & Camarillo Standards Track [Page 13] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + may cause a peer to overreact each time a new failure occurs. In + [Ghinita2006], K is set to 25% of the routing table size. Use of + this value is RECOMMENDED. + +6.3.1. Detecting Failures + + A new failure is inserted to the failure history in the following + cases: + + 1. A Leave request is received from a neighbor. + + 2. A peer fails to reply to a Ping request sent in the situation + explained below. If no packets have been received on a + connection during the past 2*Tr seconds (where Tr is the + inactivity timer defined by Interactive Connectivity + Establishment (ICE) [RFC5245]), a RELOAD Ping request MUST be + sent to the remote peer. RELOAD mandates the use of Session + Traversal Utilities for NAT (STUN) [RFC5389] for keepalives. + STUN keepalives take the form of STUN Binding Indication + transactions. As specified in ICE [RFC5245], a peer sends a STUN + Binding Indication if there has been no packet sent on a + connection for Tr seconds. Tr is configurable and has a default + of 15 seconds. Although STUN Binding Indications do not generate + a response, the fact that a peer has failed can be learned from + the lack of packets (Binding Indications or application protocol + packets) received from the peer. If the remote peer fails to + reply to the Ping request, the sender should consider the remote + peer to have failed. + + As an alternative to relying on STUN keepalives to detect peer + failure, a peer could send additional, frequent RELOAD messages to + every peer in its connection table. These messages could be Update + requests, in which case they would serve two purposes: detecting peer + failure and stabilization. However, as the cost of this approach can + be very high in terms of bandwidth consumption and traffic load, + especially in large-scale overlays experiencing churn, its use is NOT + RECOMMENDED. + +6.4. Estimating Join Rate + + Reference [Ghinita2006] proposes that a peer can estimate the join + rate based on the uptime of the peers in its routing table. An + increase in peer join rate will be reflected by a decrease in the + average age of peers in the routing table. Thus, each peer + maintained an array of the ages of the peers in its routing table + sorted in increasing order. Using this information, an estimate of + the global peer join rate L is calculated as: + + + + +Maenpaa & Camarillo Standards Track [Page 14] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + N + L = ----------------------, + Ages[floor(rsize/2)] + + where Ages is an array containing the ages of the peers in the + routing table sorted in increasing order and rsize is the size of the + routing table. It has been shown that the estimate obtained by using + this method is accurate within 22% of the real join rate + [Ghinita2006]. Of course, the size of the routing table affects the + accuracy. + + In order for this mechanism to work, peers need to exchange + information about the time they have been present in the overlay. + Peers receive the uptimes of their successors and predecessors during + the stabilization operations since all Update requests carry uptime + values. A joining peer learns the uptime of the admitting peer since + it receives an Update from the admitting peer during the join + procedure. Peers learn the uptimes of new fingers since they can + fetch the uptime using a Probe request after having attached to the + new finger. + +6.5. Estimate Sharing + + To improve the accuracy of network size, join rate, and leave rate + estimates, peers share their estimates. When the stabilization timer + fires, a peer selects number-of-peers-to-probe random peers from its + finger table and send each of them a Probe request. The targets of + Probe requests are selected from the finger table rather than from + the neighbor table since neighbors are likely to make similar errors + when calculating their estimates. The number-of-peers-to-probe is a + new element in the overlay configuration document. It is defined in + Section 7. Both the Probe request and the answer returned by the + target peer MUST contain a new message extension whose + MessageExtensionType is 'self_tuning_data'. This extension type is + defined in Section 9.1. The 'extension_contents' field of the + MessageExtension structure MUST contain a SelfTuningData structure: + + struct { + uint32 network_size; + uint32 join_rate; + uint32 leave_rate; + } SelfTuningData; + + + + + + + + + +Maenpaa & Camarillo Standards Track [Page 15] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + The contents of the SelfTuningData structure are as follows: + + network_size + + The latest network size estimate calculated by the sender. + + join_rate + + The latest join rate estimate calculated by the sender. + + leave_rate + + The latest leave rate estimate calculated by the sender. + + The join and leave rates are expressed as joins or failures per 24 + hours. As an example, if the global join rate estimate a peer has + calculated is 0.123 peers/s, it would include in the 'join_rate' + field the ceiling of the value 10627.2 (24*60*60*0.123 = 10627.2), + that is, the value 10628. + + The 'type' field of the MessageExtension structure MUST be set to + contain the value 'self_tuning_data'. The 'critical' field of the + structure MUST be set to False. + + A peer stores all estimates it receives in Probe requests and answers + during a stabilization interval. When the stabilization timer fires, + the peer calculates the estimates to be used during the next + stabilization interval by taking the 75th percentile (i.e., third + quartile) of a data set containing its own estimate and the received + estimates. + + The default value for number-of-peers-to-probe is 4. This default + value is recommended to allow a peer to receive a sufficiently large + set of estimates from other peers. With a value of 4, a peer + receives four estimates in Probe answers. On the average, each peer + also receives four Probe requests each carrying an estimate. Thus, + on the average, each peer has nine estimates (including its own) that + it can use at the end of the stabilization interval. A value smaller + than 4 is NOT RECOMMENDED to keep the number of received estimates + high enough. As an example, if the value were 2, there would be + peers in the overlay that would only receive two estimates during a + stabilization interval. Such peers would only have three estimates + available at the end of the interval, which may not be reliable + enough since even a single exceptionally high or low estimate can + have a large impact. + + + + + + +Maenpaa & Camarillo Standards Track [Page 16] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + +6.6. Calculating the Stabilization Interval + + According to [Liben-Nowell2002], a Chord network in a ring-like state + remains in a ring-like state as long as peers send + Omega(square(log(N))) messages before N new peers join or N/2 peers + fail. We can use the estimate of peer failure rate, U, to calculate + the time Tf in which N/2 peers fail: + + 1 + Tf = ------ + 2*U + + Based on this estimate, a stabilization interval Tstab-1 is + calculated as: + + Tf + Tstab-1 = ----------------- + square(log2(N)) + + On the other hand, the estimated join rate L can be used to calculate + the time in which N new peers join the overlay. Based on the + estimate of L, a stabilization interval Tstab-2 is calculated as: + + N + Tstab-2 = --------------------- + L * square(log2(N)) + + Finally, the actual stabilization interval Tstab that is used can be + obtained by taking the minimum of Tstab-1 and Tstab-2. + + The results obtained in [Maenpaa2009] indicate that making the + stabilization interval too small has the effect of making the overlay + less stable (e.g., in terms of detected loops and path failures). + Thus, a lower limit should be used for the stabilization period. + Based on the results in [Maenpaa2009], a lower limit of 15 s is + RECOMMENDED, since using a stabilization period smaller than this + will with a high probability cause too much traffic in the overlay. + +7. Overlay Configuration Document Extension + + This document extends the RELOAD overlay configuration document by + adding one new element, "number-of-peers-to-probe", inside each + "configuration" element. + + self-tuning:number-of-peers-to-probe: The number of fingers to which + Probe requests are sent to obtain their network size, join rate, + and leave rate estimates. The default value is 4. + + + + +Maenpaa & Camarillo Standards Track [Page 17] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + The RELAX NG grammar for this element is: + + namespace self-tuning = "urn:ietf:params:xml:ns:p2p:self-tuning" + + parameter &= element self-tuning:number-of-peers-to-probe { + xsd:unsignedInt }? + + This namespace is added into the <mandatory-extension> element in the + overlay configuration file. + +8. Security Considerations + + In the same way as malicious or compromised peers implementing the + RELOAD base protocol [RFC6940] can advertise false network metrics or + distribute false routing table information for instance in RELOAD + Update messages, malicious peers implementing this specification may + share false join rate, leave rate, and network size estimates. For + such attacks, the same security concerns apply as in the RELOAD base + specification. In addition, as long as the amount of malicious peers + in the overlay remains modest, the statistical mechanisms applied in + Section 6.5 (i.e., the use of 75th percentiles) to process the shared + estimates a peer obtains help ensure that estimates that are clearly + different from (i.e., larger or smaller than) other received + estimates will not significantly influence the process of adapting + the stabilization interval and routing table size. However, it + should be noted that if an attacker is able to impersonate a high + number of other peers in the overlay in strategic locations, it may + be able to send a high enough number of false estimates to a victim + and therefore influence the victim's choice of a stabilization + interval. + +9. IANA Considerations + +9.1. Message Extensions + + This document introduces one additional extension to the "RELOAD + Extensions Registry": + + +------------------+-------+---------------+ + | Extension Name | Code | Specification | + +------------------+-------+---------------+ + | self_tuning_data | 0x3 | RFC 7363 | + +------------------+-------+---------------+ + + + The contents of the extension are defined in Section 6.5. + + + + + +Maenpaa & Camarillo Standards Track [Page 18] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + +9.2. New Overlay Algorithm Type + + This document introduces one additional overlay algorithm type to the + "RELOAD Overlay Algorithm Types" registry: + + +-------------------+-----------+ + | Algorithm Name | Reference | + +-------------------+-----------+ + | CHORD-SELF-TUNING | RFC 7363 | + +-------------------+-----------+ + + +9.3. A New IETF XML Registry + + This document registers one new URI for the self-tuning namespace in + the "ns" subregistry of the IETF XML registry defined in [RFC3688]. + + URI: urn:ietf:params:xml:ns:p2p:self-tuning + + Registrant Contact: The IESG + + XML: N/A, the requested URI is an XML namespace + +10. Acknowledgments + + The authors would like to thank Jani Hautakorpi for his contributions + to the document. The authors would also like to thank Carlos + Bernardos, Martin Durst, Alissa Cooper, Tobias Gondrom, and Barry + Leiba for their comments on the document. + +11. References + +11.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment + (ICE): A Protocol for Network Address Translator (NAT) + Traversal for Offer/Answer Protocols", RFC 5245, April + 2010. + + [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, + "Session Traversal Utilities for NAT (STUN)", RFC 5389, + October 2008. + + + + + + +Maenpaa & Camarillo Standards Track [Page 19] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + [RFC6940] Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and + H. Schulzrinne, "REsource LOcation And Discovery (RELOAD) + Base Protocol", RFC 6940, January 2014. + +11.2. Informative References + + [Binzenhofer2006] + Binzenhofer, A., Kunzmann, G., and R. Henjes, "A Scalable + Algorithm to Monitor Chord-Based P2P Systems at Runtime", + In Proceedings of the 20th IEEE International Parallel and + Distributed Processing Symposium (IPDPS), pp. 1-8, April + 2006. + + [CAN] Ratnasamy, S., Francis, P., Handley, M., Karp, R., and S. + Schenker, "A Scalable Content-Addressable Network", In + Proceedings of the 2001 Conference on Applications, + Technologies, Architectures and Protocols for Computer + Communications, pp. 161-172, August 2001. + + [Chord] Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., + Kaashoek, M., Dabek, F., and H. Balakrishnan, "Chord: A + Scalable Peer-to-peer Lookup Service for Internet + Applications", IEEE/ACM Transactions on Networking, Volume + 11, Issue 1, pp. 17-32, February 2003. + + [Ghinita2006] + Ghinita, G. and Y. Teo, "An Adaptive Stabilization + Framework for Distributed Hash Tables", In Proceedings of + the 20th IEEE International Parallel and Distributed + Processing Symposium (IPDPS), pp. 29-38, April 2006. + + [Horowitz2003] + Horowitz, K. and D. Malkhi, "Estimating Network Size from + Local Information", Information Processing Letters, Volume + 88, Issue 5, pp. 237-243, December 2003. + + [Kostoulas2005] + Kostoulas, D., Psaltoulis, D., Gupta, I., Birman, K., and + A. Demers, "Decentralized Schemes for Size Estimation in + Large and Dynamic Groups", In Proceedings of the 4th IEEE + International Symposium on Network Computing and + Applications, pp. 41-48, July 2005. + + + + + + + + + +Maenpaa & Camarillo Standards Track [Page 20] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + [Krishnamurthy2008] + Krishnamurthy, S., El-Ansary, S., Aurell, E., and S. + Haridi, "Comparing Maintenance Strategies for Overlays", + In Proceedings of the 16th Euromicro Conference on + Parallel, Distributed and Network-Based Processing, pp. + 473-482, February 2008. + + [Li2004] Li, J., Strinbling, J., Gil, T., Morris, R., and M. + Kaashoek, "Comparing the Performance of Distributed Hash + Tables Under Churn", Peer-to-Peer Systems III, Volume 3279 + of Lecture Notes in Computer Science, Springer, pp. 87-99, + February 2005. + + [Liben-Nowell2002] + Liben-Nowell, D., Balakrishnan, H., and D. Karger, + "Observations on the Dynamic Evolution of Peer-to-Peer + Networks", In Proceedings of the 1st International + Workshop on Peer-to-Peer Systems (IPTPS), pp. 22-33, March + 2002. + + [Maenpaa2009] + Maenpaa, J. and G. Camarillo, "A Study on Maintenance + Operations in a Chord-Based Peer-to-Peer Session + Initiation Protocol Overlay Network", In Proceedings of + the 23rd IEEE International Parallel and Distributed + Processing Symposium (IPDPS), pp. 1-9, May 2009. + + [Mahajan2003] + Mahajan, R., Castro, M., and A. Rowstron, "Controlling the + Cost of Reliability in Peer-to-Peer Overlays", In + Proceedings of the 2nd International Workshop on Peer-to- + Peer Systems (IPTPS), pp. 21-32, February 2003. + + [Pastry] Rowstron, A. and P. Druschel, "Pastry: Scalable, + Decentralized Object Location and Routing for Large-Scale + Peer-to-Peer Systems", In Proceedings of the IFIP/ACM + International Conference on Distributed Systems Platforms, + pp. 329-350, November 2001. + + [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, + January 2004. + + [Rhea2004] + Rhea, S., Geels, D., Roscoe, T., and J. Kubiatowicz, + "Handling Churn in a DHT", In Proceedings of the USENIX + Annual Technical Conference, pp. 127-140, June 2004. + + + + + +Maenpaa & Camarillo Standards Track [Page 21] + +RFC 7363 Self-Tuning DHT for RELOAD September 2014 + + + [Weiss1998] + Weiss, M., "Data Structures and Algorithm Analysis in + C++", Addison-Wesley Longman Publishing Co., Inc., 2nd + Edition, ISBN 0201361221, 1998. + +Authors' Addresses + + Jouni Maenpaa + Ericsson + Hirsalantie 11 + Jorvas 02420 + Finland + + EMail: Jouni.Maenpaa@ericsson.com + + + Gonzalo Camarillo + Ericsson + Hirsalantie 11 + Jorvas 02420 + Finland + + EMail: Gonzalo.Camarillo@ericsson.com + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Maenpaa & Camarillo Standards Track [Page 22] + |