diff options
| author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 | 
|---|---|---|
| committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 | 
| commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
| tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4981.txt | |
| parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) | |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc4981.txt')
| -rw-r--r-- | doc/rfc/rfc4981.txt | 5099 | 
1 files changed, 5099 insertions, 0 deletions
| diff --git a/doc/rfc/rfc4981.txt b/doc/rfc/rfc4981.txt new file mode 100644 index 0000000..6e832a9 --- /dev/null +++ b/doc/rfc/rfc4981.txt @@ -0,0 +1,5099 @@ + + + + + + +Network Working Group                                          J. Risson +Request for Comments: 4981                                      T. Moors +Category: Informational                    University of New South Wales +                                                          September 2007 + + +       Survey of Research towards Robust Peer-to-Peer Networks: +                            Search Methods + +Status of This Memo + +   This memo provides information for the Internet community.  It does +   not specify an Internet standard of any kind.  Distribution of this +   memo is unlimited. + +IESG Note + +   This RFC is not a candidate for any level of Internet Standard.  The +   IETF disclaims any knowledge of the fitness of this RFC for any +   purpose and notes that the decision to publish is not based on IETF +   review apart from IESG review for conflict with IETF work.  The RFC +   Editor has chosen to publish this document at its discretion.  See +   RFC 3932 for more information. + +Abstract + +   The pace of research on peer-to-peer (P2P) networking in the last +   five years warrants a critical survey.  P2P has the makings of a +   disruptive technology -- it can aggregate enormous storage and +   processing resources while minimizing entry and scaling costs. + +   Failures are common amongst massive numbers of distributed peers, +   though the impact of individual failures may be less than in +   conventional architectures.  Thus, the key to realizing P2P's +   potential in applications other than casual file sharing is +   robustness. + +   P2P search methods are first couched within an overall P2P taxonomy. +   P2P indexes for simple key lookup are assessed, including those based +   on Plaxton trees, rings, tori, butterflies, de Bruijn graphs, and +   skip graphs.  Similarly, P2P indexes for keyword lookup, information +   retrieval and data management are explored.  Finally, early efforts +   to optimize range, multi-attribute, join, and aggregation queries +   over P2P indexes are reviewed.  Insofar as they are available in the +   primary literature, robustness mechanisms and metrics are highlighted +   throughout.  However, the low-level mechanisms that most affect +   robustness are not well isolated in the literature.  Recommendations +   are given for future research. + + + +Risson & Moors               Informational                      [Page 1] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +Table of Contents + +   1. Introduction ....................................................3 +      1.1. Related Disciplines ........................................6 +      1.2. Structured and Unstructured Routing ........................7 +      1.3. Indexes and Queries ........................................9 +   2. Index Types ....................................................10 +      2.1. Local Index (Gnutella) ....................................10 +      2.2. Central Index (Napster) ...................................12 +      2.3. Distributed Index (Freenet) ...............................13 +   3. Semantic Free Index ............................................15 +      3.1. Origins ...................................................15 +           3.1.1. Plaxton, Rajaraman, and Richa (PRR) ................15 +           3.1.2. Consistent Hashing .................................16 +           3.1.3. Scalable Distributed Data Structures (LH*) .........16 +      3.2. Dependability .............................................17 +           3.2.1. Static Dependability ...............................17 +           3.2.2. Dynamic Dependability ..............................18 +           3.2.3. Ephemeral or Stable Nodes -- O(log n) or +                  O(1) Hops ..........................................19 +           3.2.4. Simulation and Proof ...............................20 +      3.3. Latency ...................................................21 +           3.3.1. Hop Count and the O(1)-Hop DHTs ....................21 +           3.3.2. Proximity and the O(log n)-Hop DHTs ................22 +      3.4. Multicasting ..............................................23 +           3.4.1. Multicasting vs. Broadcasting ......................23 +           3.4.2. Motivation for DHT-based Multicasting ..............23 +           3.4.3. Design Issues ......................................24 +      3.5. Routing Geometries ........................................25 +           3.5.1. Plaxton Trees (Pastry, Tapestry) ...................25 +           3.5.2. Rings (Chord, DKS) .................................27 +           3.5.3. Tori (CAN) .........................................28 +           3.5.4. Butterflies (Viceroy) ..............................29 +           3.5.5. de Bruijn (D2B, Koorde, Distance Halving, ODRI) ....30 +           3.5.6. Skip Graphs ........................................32 +   4. Semantic Index .................................................33 +      4.1. Keyword Lookup ............................................34 +           4.1.1. Gnutella Enhancements ..............................36 +           4.1.2. Partition-by-Document, Partition-by-Keyword ........38 +           4.1.3. Partial Search, Exhaustive Search ..................39 +      4.2. Information Retrieval .....................................39 +           4.2.1. Vector Model (PlanetP, FASD, eSearch) ..............41 +           4.2.2. Latent Semantic Indexing (pSearch) .................43 +           4.2.3. Small Worlds .......................................43 +   5. Queries ........................................................44 +      5.1. Range Queries .............................................45 +      5.2. Multi-Attribute Queries ...................................48 +      5.3. Join Queries ..............................................50 + + + +Risson & Moors               Informational                      [Page 2] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +      5.4. Aggregation Queries .......................................50 +   6. Security Considerations ........................................52 +   7. Conclusions ....................................................52 +   8. Acknowledgments ................................................53 +   9. References .....................................................54 +      9.1. Informative References ....................................54 + +1.  Introduction + +   Peer-to-peer (P2P) networks are those that exhibit three +   characteristics: self-organization, symmetric communication, and +   distributed control [1].  A self-organizing P2P network +   "automatically adapts to the arrival, departure and failure of nodes" +   [2].  Communication is symmetric in that peers act as both clients +   and servers.  It has no centralized directory or control point. +   USENET servers and BGP peers have these traits [3] but the emphasis +   here is on the flurry of research since 2000.  Leading examples +   include Gnutella [4], Freenet [5], Pastry [2], Tapestry [6], Chord +   [7], the Content Addressable Network (CAN) [8], pSearch [9], and +   Edutella [10].  Some have suggested that peers are inherently +   unreliable [11].  Others have assumed well-connected, stable peers +   [12]. + +   This critical survey of P2P academic literature is warranted, given +   the intensity of recent research.  At the time of writing, one +   research database lists over 5,800 P2P publications [13].  One vendor +   surveyed P2P products and deployments [14].  There is also a tutorial +   survey of leading P2P systems [15].  DePaoli and Mariani recently +   reviewed the dependability of some early P2P systems at a high level +   [16].  The need for a critical survey was flagged in the peer-to-peer +   research group of the Internet Research Task Force (IRTF) [17]. + +   P2P is potentially a disruptive technology with numerous +   applications, but this potential will not be realized unless it is +   demonstrated to be robust.  A massively distributed search technique +   may yield numerous practical benefits for applications [18].  A P2P +   system has potential to be more dependable than architectures relying +   on a small number of centralized servers.  It has potential to evolve +   better from small configurations -- the capital outlays for high +   performance servers can be reduced and spread over time if a P2P +   assembly of general purpose nodes is used.  A similar argument +   motivated the deployment of distributed databases -- one thousand, +   off-the-shelf PC processors are more powerful and much less expensive +   than a large mainframe computer [19].  Storage and processing can be +   aggregated to achieve massive scale.  Wasteful partitioning between +   servers or clusters can be avoided.  As Gedik and Liu put it, if P2P +   is to find its way into applications other than casual file sharing, +   then reliability needs to be addressed [20]. + + + +Risson & Moors               Informational                      [Page 3] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   The taxonomy of Figure 1 divides the entire body of P2P research +   literature along four lines: search, storage, security, and +   applications.  This survey concentrates on search aspects.  A P2P +   search network consists of an underlying index (Sections 2 to 4) and +   queries that propagate over that index (Section 5). + +   Search [18, 21-29] +      Semantic-Free Indexes [2, 6, 7, 30-52] +         Plaxton Trees +         Rings +         Tori +         Butterflies +         de Bruijn Graphs +         Skip Graphs +      Semantic Indexes [4, 53-71] +         Keyword Lookup +         Peer Information Retrieval +         Peer Data Management +      Queries [20, 22, 23, 25, 32, 38, 41, 56, 72-100] +         Range Queries +         Multi-Attribute Queries +         Join Queries +         Aggregation Queries +         Continuous Queries +         Recursive Queries +         Adaptive Queries + +   Storage +      Consistency & Replication [101-112] +         Eventual consistency +         Trade-offs +      Distribution [39, 42, 90, 92, 113-131] +         Epidemics, Bloom Filters +      Fault Tolerance [40, 105, 132-139] +         Erasure Coding +         Byzantine Agreement +      Locality [24, 43, 47, 140-160] +      Load Balancing [37, 86, 100, 107, 151, 161-171] + + + + + + + + + + + + + +Risson & Moors               Informational                      [Page 4] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   Security +      Character [172-182] +         Identity +         Reputation and Trust +         Incentives +      Goals [25, 27, 71, 183-197] +         Availability +         Authenticity +         Anonymity +         Access Control +         Fair Trading + +   Applications [1, 198-200] +      Memory [32, 90, 142, 201-222] +         File Systems +         Web +         Content Delivery Networks +         Directories +      Service Discovery +      Publish / Subscribe ... +   Intelligence [223-228] +      GRID +      Security... +   Communication [12, 92, 119, 229-247] +      Multicasting +      Streaming Media +      Mobility +      Sensors... + +            Figure 1: Classification of P2P Research Literature + +   This survey is concerned with two questions.  The first, "How do P2P +   search networks work?"  This foundation is important given the pace +   and breadth of P2P research in the last five years.  In Section 2, we +   classify indexes as local, centralized and distributed.  Since +   distributed indexes are becoming dominant, they are given closer +   attention in Sections 3 and 4.  Section 3 compares distributed P2P +   indexes for simple key lookup; in particular, their origins (Section +   3.1), dependability (Section 3.2), latency (Section 3.3), and their +   support for multicast (Section 3.4).  It classifies those indexes +   according to their routing geometry (Section 3.5) -- Plaxton trees, +   rings, tori, butterflies, de Bruijn graphs and skip graphs.  Section +   4 reviews distributed P2P indexes supporting keyword lookup (Section +   4.1) and information retrieval (Section 4.2).  Section 5 probes the +   embryonic research on P2P queries; in particular, range queries +   (Section 5.1), multi-attribute queries (Section 5.2), join queries +   (Section 5.3), and aggregation queries (Section 5.4). + + + + +Risson & Moors               Informational                      [Page 5] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   The second question, "How robust are P2P search networks?"  Insofar +   as it is available in the research literature, we tease out the +   robustness mechanisms and metrics throughout Sections 2 to 5. +   Unfortunately, robustness is often more sensitive to low-level design +   choices than it is to the broad P2P index structure, yet these +   underlying design choices are seldom isolated in the primary +   literature [248].  Furthermore, there has been little consensus on +   P2P robustness metrics (Section 3.2).  Section 8 gives +   recommendations to address these important gaps. + +1.1.  Related Disciplines + +   Peer-to-peer research draws upon numerous distributed systems +   disciplines.  Networking researchers will recognize familiar issues +   of naming, routing, and congestion control.  P2P designs need to +   address routing and security issues across network region boundaries +   [152].  Networking research has traditionally been host-centric.  The +   Web's Universal Resource Identifiers are naturally tied to specific +   hosts, making object mobility a challenge [216]. + +   P2P work is data-centric [249].  P2P systems for dynamic object +   location and routing have borrowed heavily from the distributed +   systems corpus.  Some have used replication, erasure codes, and +   Byzantine agreement [111].  Others have used epidemics for durable +   peer group communication [39]. + +   Similarly, P2P research is set to benefit from database research +   [250].  Database researchers will recognize the need to reapply +   Codd's principle of physical data independence, that is, to decouple +   data indexes from the applications that use the data [23].  It was +   the invention of appropriate indexing mechanisms and query +   optimizations that enabled data independence.  Database indexes like +   B+ trees have an analog in P2P's distributed hash tables (DHTs). +   Wide-area, P2P query optimization is a ripe, but challenging, area +   for innovation. + +   More flexible distribution of objects comes with increased security +   risks.  There are opportunities for security researchers to deliver +   new methods for availability, file authenticity, anonymity, and +   access control [25].  Proactive and reactive mechanisms are needed to +   deal with large numbers of autonomous, distributed peers.  To build +   robust systems from cooperating but self-interested peers, issues of +   identity, reputation, trust, and incentives need to be tackled. +   Although it is beyond the scope of this paper, robustness against +   malicious attacks also ought to be addressed [195]. + +   Possibly the largest portion of P2P research has majored on basic +   routing structures [18], where research on algorithms comes to the + + + +Risson & Moors               Informational                      [Page 6] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   fore.  Should the overlay be "structured" or "unstructured"?  Are the +   two approaches competing or complementary?  Comparisons of the +   "structured" approaches (hypercubes, rings, toroids, butterflies, de +   Bruijn, and skip graphs) have weighed the amount of routing state per +   peer and the number of links per peer against overlay hop counts. +   While "unstructured" overlays initially used blind flooding and +   random walks, overheads usually trigger some structure, for example, +   super-peers and clusters. + +   P2P applications rely on cooperation between these disciplines. +   Applications have included file sharing, directories, content +   delivery networks, email, distributed computation, publish-subscribe +   middleware, multicasting, and distributed authentication.  Which +   applications will be suited to which structures?  Are there adaptable +   mechanisms that can decouple applications from the underlying data +   structures?  What are the criteria for selection of applications +   amenable to a P2P design [1]? + +   Robustness is emphasized throughout the survey.  We are particularly +   interested in two aspects.  The first, dependability, was a leading +   design goal for the original Internet [251].  It deserves the same +   status in P2P.  The measures of dependability are well established: +   reliability, a measure of the mean-time-to-failure (MTTF); +   availability, a measure of both the MTTF and the mean-time-to-repair +   (MTTR); maintainability; and safety [252].  The second aspect is the +   ability to accommodate variation in outcome, which one could call +   adaptability.  Its measures have yet to be defined.  In the context +   of the Internet, it was only recently acknowledged as a first-class +   requirement [253].  In P2P, it means planning for the tussles over +   resources and identity.  It means handling different kinds of queries +   and accommodating changeable application requirements with minimal +   intervention.  It means "organic scaling" [22], whereby the system +   grows gracefully, without a priori data center costs or architectural +   breakpoints. + +   In the following section, we discuss one notable omission from the +   taxonomy of P2P networking in Figure 1 -- routing. + +1.2.  Structured and Unstructured Routing + +   P2P routing algorithms have been classified as "structured" or +   "unstructured".  Peers in unstructured overlay networks join by +   connecting to any existing peers [254].  In structured overlays, the +   identifier of the joining peer determines the set of peers that it +   connects to [254].  Early instantiations of Gnutella were +   unstructured -- keyword queries were flooded widely [255].  Napster +   [256] had decentralized content and a centralized index, so it only +   partially satisfies the distributed control criteria for P2P systems. + + + +Risson & Moors               Informational                      [Page 7] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   Early structured algorithms included Plaxton, Rajaraman and Richa +   (PRR) [30], Pastry [2], Tapestry [31], Chord [7], and the Content +   Addressable Network [8].  Mishchke and Stiller recently classified +   P2P systems by the presence or absence of structure in routing tables +   and network topology [257]. + +   Some have cast unstructured and structured algorithms as competing +   alternatives.  Unstructured approaches have been called "first +   generation", implicitly inferior to the "second generation" +   structured algorithms [2, 31].  When generic key lookups are +   required, these structured, key-based routing schemes can guarantee +   location of a target within a bounded number of hops [23].  The +   broadcasting unstructured approaches, however, may have large routing +   costs, or fail to find available content [22].  Despite the apparent +   advantages of structured P2P, several research groups are still +   pursuing unstructured P2P. + +   There have been two main criticisms of structured systems [61].  The +   first relates to peer transience, which in turn, affects robustness. +   Chawathe, et al. opined that highly transient peers are not well +   supported by DHTs [61].  P2P systems often exhibit "churn", with +   peers continually arriving and departing.  One objection to concerns +   about highly transient peers is that many applications use peers in +   well-connected parts of the network.  The Tapestry authors analyzed +   the impact of churn in a network of 1000 nodes [31].  Others opined +   that it is possible to maintain a robust DHT at relatively low cost +   [258].  Very few papers have quantitatively compared the resilience +   of structured systems.  Loguinov, Kumar, et al. claimed that there +   were only two such works [24, 36]. + +   The second criticism of structured systems is that they do not +   support keyword searches and complex queries as well as unstructured +   systems.  Given the current file-sharing deployments, keyword +   searches seem more important than exact-match key searches in the +   short term.  Paraphrased, "most queries are for hay, not needles" +   [61]. + +   More recently, some have justifiably seen unstructured and structured +   proposals as complementary, and have devised hybrid models [259]. +   Their starting point was the observation that unstructured flooding +   or random walks are inefficient for data that is not highly +   replicated across the P2P network.  Structured graphs can find keys +   efficiently, irrespective of replication.  Castro, et al. proposed +   Structella, a hybrid of Gnutella built on top of Pastry [259]. +   Another design used structured search for rare items and unstructured +   search for massively replicated items [54]. + + + + + +Risson & Moors               Informational                      [Page 8] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   However, the "structured versus unstructured routing" taxonomy is +   becoming less useful, for two reasons, Firstly, most "unstructured" +   proposals have evolved and incorporated structure.  Consider the +   classic "unstructured" system, Gnutella [4].  For scalability, its +   peers are either ultrapeers or leaf nodes.  This hierarchy is +   augmented with a query routing protocol whereby ultrapeers receive a +   hashed summary of the resource names available at leaf nodes. +   Between ultrapeers, simple query broadcast is still used, though +   methods to reduce the query load here have been considered [260]. +   Secondly, there are emerging schema-based P2P designs [59], with +   super-node hierarchies and structure within documents.  These are +   quite distinct from the structured DHT proposals. + +1.3.  Indexes and Queries + +   Given that most, if not all, P2P designs today assume some structure, +   a more instructive taxonomy would describe the structure.  In this +   survey, we use a database taxonomy in lieu of the networking +   taxonomy, as suggested by Hellerstein, Cooper, and Garcia-Molina [23, +   261].  The structure is determined by the type of index (Sections 2 , +   3, and 4).  Queries feature in lieu of routing (Section 5).  The DHT +   algorithms implement a "semantic-free index" [216].  They are +   oblivious of whether keys represent document titles, meta-data, or +   text.  Gnutella-like and schema-based proposals have a "semantic +   index". + +   Index engineering is at the heart of P2P search methods.  It captures +   a broad range of P2P issues, as demonstrated by the Search/Index +   Links model [261].  As Manber put it, "the most important of the +   tools for information retrieval is the index -- a collection of terms +   with pointers to places where information about documents can be +   found" [262].  Sen and Wang noted that a "P2P network" usually +   consists of connections between hosts for application-layer +   signaling, rather than for the data transfer itself [263]. +   Similarly, we concentrate on the "signaled" indexes and queries. + +   Our focus here is the dependability and adaptability of the search +   network.  Static dependability is a measure of how well queries route +   around failures in a network that is normally fault-free.  Dynamic +   dependability gives an indication of query success when nodes and +   data are continually joining and leaving the P2P system.  An +   adaptable index accommodates change in the data and query +   distribution.  It enables data independence, in that it facilitates +   changes to the data layout without requiring changes to the +   applications that use the data [23].  An adaptable P2P system can +   support rich queries for a wide range of applications.  Some +   applications benefit from simple, semantic-free key lookups [264]. +   Others require more complex, Structured Query Language (SQL)-like + + + +Risson & Moors               Informational                      [Page 9] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   queries to find documents with multiple keywords, or to aggregate or +   join query results from distributed relations [22]. + +2.  Index Types + +   A P2P index can be local, centralized, or distributed.  With a local +   index, a peer only keeps the references to its own data, and does not +   receive references for data at other nodes.  The very early Gnutella +   design epitomized the local index (Section 2.1).  In a centralized +   index, a single server keeps references to data on many peers.  The +   classic example is Napster (Section 2.2).  With distributed indexes, +   pointers towards the target reside at several nodes.  One very early +   example is Freenet (Section 2.3).  Distributed indexes are used in +   most P2P designs nowadays -- they dominate this survey. + +   P2P indexes can also be classified as non-forwarding and forwarding. +   When queries are guided by a non-forwarding index, they jump to the +   node containing the target data in a single hop.  There have been +   semantic and semantic-free one-hop schemes [138, 265, 266].  Where +   scalability to a massive number of peers is required, these schemes +   have been extended to two hops [267, 268].  More common are the +   forwarding P2Ps, where the number of hops varies with the total +   number of peers, often logarithmically.  The related trade-offs +   between routing state, lookup latency, update bandwidth, and peer +   churn are critical to total system dependability. + +2.1.  Local Index (Gnutella) + +   P2Ps with a purely local data index are becoming rare.  In such +   designs, peers flood queries widely and only index their own content. +   They enable rich queries - the search is not limited to a simple key +   lookup.  However, they also generate a large volume of query traffic +   with no guarantee that a match will be found, even if it does exist +   on the network.  For example, to find potential peers on the early +   instantiations of Gnutella, 'ping' messages were broadcast over the +   P2P network and the 'pong' responses were used to build the node +   index.  Then, small 'query' messages, each with a list of keywords, +   are broadcast to peers that respond with matching filenames [4]. + +   There have been numerous attempts to improve the scalability of +   local-index P2P networks.  Gnutella uses fixed time-to-live (TTL) +   rings, where the query's TTL is set less than 7-10 hops [4].  Small +   TTLs reduce the network traffic and the load on peers, but also +   reduce the chances of a successful query hit.  One paper reported, +   perhaps a little too bluntly, that the fixed "TTL-based mechanism +   does not work" [67].  To address this TTL selection problem, they +   proposed an expanding ring, known elsewhere as iterative deepening +   [29].  It uses successively larger TTL counters until there is a + + + +Risson & Moors               Informational                     [Page 10] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   match.  The flooding, ring, and expanding ring methods all increase +   network load with duplicated query messages.  A random walk, whereby +   an unduplicated query wanders about the network, does indeed reduce +   the network load but massively increases the search latency.  One +   solution is to replicate the query k times at each peer.  Called +   random k-walkers, this technique can be coupled with TTL limits, or +   periodic checks with the query originator, to cap the query load +   [67].  Adamic, Lukose, et al. suggested that the random walk searches +   be directed to nodes with a higher degree, that is, with larger +   numbers of inter-peer connections [269].  They assumed that higher- +   degree peers are also capable of higher query throughputs.  However, +   without some balancing design rule, such peers would be swamped with +   the entire P2P signaling traffic.  In addition to the above +   approaches, there is the 'directed breadth-first' algorithm [29].  It +   forwards queries within a subset of peers selected according to +   heuristics on previous performance, like the number of successful +   query results.  Another algorithm, called probabilistic flooding, has +   been modeled using percolation theory [270]. + +   Several measurement studies have investigated locally indexed P2Ps. +   Jovanovic noted Gnutella's power law behaviour [70].  Sen and Wang +   compared the performance of Gnutella, Fasttrack [271], and Direct +   Connect [263, 272, 273].  At the time, only Gnutella used local data +   indexes.  All three schemes now use distributed data indexes, with +   hierarchy in the form of Ultrapeers (Gnutella), Super-Nodes +   FastTrack), and Hubs (Direct Connect).  It was found that a very +   small percentage of peers have a very high degree and that the total +   system dependability is at the mercy of such peers.  While peer up- +   time and bandwidth were heavy-tailed, they did not fit well with the +   Zipf distribution.  Fortunately for Internet Service Providers, +   measures aggregated by IP prefix and Autonomous System (AS) were more +   stable than for individual IP addresses.  A study of University of +   Washington traffic found that Gnutella and Kazaa together contributed +   43% of the university's total TCP traffic [274].  They also reported +   a heavy-tailed distribution, with 600 external peers (out of 281,026) +   delivering 26% of Kazaa bytes to internal peers.  Furthermore, +   objects retrieved from the P2P network were typically three orders of +   magnitude larger than Web objects -- 300 objects contributed to +   almost half the total outbound Kazaa bandwidth.  Others reported +   Gnutella's topology mismatch, whereby only 2-5% of P2P connections +   link peers in the same Autonomous System (AS), despite over 40% of +   peers being in the top 10 ASs [65].  Together these studies +   underscore the significance of multimedia sharing applications.  They +   motivate interesting caching and locality solutions to the topology +   mismatch problem. + +   These same studies bear out one main dependability lesson: total +   system dependability may be sensitive to the dependability of high- + + + +Risson & Moors               Informational                     [Page 11] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   degree peers.  The designers of Scamp translated this observation to +   the design heuristic, "have the degree of each node be of nearly +   equal size" [153].  They analyzed a system of N peers, with mean +   degree c.log(n), where link failures occur independently with +   probability e.  If d>0 is fixed and c>(1+d)/(-log(e)), then the +   probability of graph disconnection goes to zero as N->infinity. +   Otherwise, if c<(1-d)/(-log(e)), then the probability of +   disconnection goes to one as N->infinity.  They presented a +   localizer, which finds approximate minima to a global function of +   peer degree and arbitrary link costs using only local information. +   The Scamp overlay construction algorithms could support any of the +   flooding and walking routing schemes above, or other epidemic and +   multicasting schemes for that matter.  Resilience to high churn rates +   was identified for future study. + +2.2.  Central Index (Napster) + +   Centralized schemes like Napster [256] are significant because they +   were the first to demonstrate the P2P scalability that comes from +   separating the data index from the data itself.  Ultimately, 36 +   million Napster users lost their service not because of technical +   failure, but because the single administration was vulnerable to the +   legal challenges of record companies [275]. + +   There has since been little research on P2P systems with central data +   indexes.  Such systems have also been called 'hybrid' since the index +   is centralized but the data is distributed.  Yang and Garcia-Molina +   devised a four-way classification of hybrid systems [276]: unchained +   servers, where users whose index is on one server do not see other +   servers' indexes; chained servers, where the server that receives a +   query forwards it to a list of servers if it does not own the index +   itself; full replication, where all centralized servers keep a +   complete index of all available metadata; and hashing, where keywords +   are hashed to the server where the associated inverted list is kept. +   The unchained architecture was used by Napster, but it has the +   disadvantage that users do not see all indexed data in the system. +   Strictly speaking, the other three options illustrate the distributed +   data index, not the central index.  The chained architecture was +   recommended as the optimum for the music-swapping application at the +   time.  The methods by which clients update the central index were +   classified as batch or incremental, with the optimum determined by +   the query-to-login ratio.  Measurements were derived from a clone of +   Napster called OpenNap[277].  Another study of live Napster data +   reported wide variation in the availability of peers, a general +   unwillingness to share files (20-40% of peers share few or no files), +   and a common understatement of available bandwidth so as to +   discourage other peers from sharing one's link [202]. + + + + +Risson & Moors               Informational                     [Page 12] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   Influenced by Napster's early demise, the P2P research community may +   have prematurely turned its back on centralized architectures. +   Chawathe, Ratnasamy, et al. opined that Google and Yahoo demonstrate +   the viability of a centralized index.  They argued that "the real +   barriers to Napster-like designs are not technical but legal and +   financial" [61].  Even this view may be a little too harsh on the +   centralized architectures -- it implies that they always have an up- +   front capital hurdle that is steeper than for distributed +   architectures.  The closer one looks at scalable 'centralized' +   architectures, the less the distinction with 'distributed' +   architectures seems to matter.  For example, it is clear that +   Google's designers consider Google a distributed, not centralized, +   file system [278].  Google demonstrates the scale and performance +   possible on commodity hardware, but still has a centralized master +   that is critical to the operation of each Google cluster.  Time may +   prove that the value of emerging P2P networks, regardless of the +   centralized-versus-distributed classification, is that they smooth +   the capital outlays and remove the single points of failure across +   the spectra of scale and geographic distribution. + +2.3.  Distributed Index (Freenet) + +   An important early P2P proposal for a distributed index was Freenet +   [5, 71, 279].  While its primary emphasis was the anonymity of peers, +   it did introduce a novel indexing scheme.  Files are identified by +   low-level "content-hash" keys and by "secure signed-subspace" keys, +   which ensure that only a file owner can write to a file while anyone +   can read from it.  To find a file, the requesting peer first checks +   its local table for the node with keys closest to the target.  When +   that node receives the query, it too checks for either a match or +   another node with keys close to the target.  Eventually, the query +   either finds the target or exceeds time-to-live (TTL) limits.  The +   query response traverses the successful query path in reverse, +   depositing a new routing table entry (the requested key and the data +   holder) at each peer.  The insert message similarly steps towards the +   target node, updating routing table entries as it goes, and finally +   stores the file there.  Whereas early versions of Gnutella used +   breadth-first flooding, Freenet uses a more economic depth-first +   search [280]. + +   An initial assessment has been done of Freenet's robustness.  It was +   shown that in a network of 1000 nodes, the median query path length +   stayed under 20 hops for a failure of 30% of nodes.  While the +   Freenet designers considered this as evidence that the system is +   "surprisingly robust against quite large failures" [71], the same +   datapoint may well be outside meaningful operating bounds.  How many +   applications are useful when the first quartile of queries have path +   lengths of several hundred hops in a network of only 1000 nodes, per + + + +Risson & Moors               Informational                     [Page 13] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   Figure 4 of [71]?  To date, there has been no analysis of Freenet's +   dynamic robustness.  For example, how does it perform when nodes are +   continually arriving and departing? + +   There have been both criticisms and extensions of the early Freenet +   work.  Gnutella proponents acknowledged the merit in Freenet's +   avoidance of query broadcasting [281].  However, they are critical on +   two counts: the exact file name is needed to construct a query; and +   exactly one match is returned for each query.  P2P designs using +   DHTs, per Section 3, share similar characteristics -- a precise query +   yields a precise response.  The similarity is not surprising since +   Freenet also uses a hash function to generate keys.  However, the +   query routing used in the DHTs has firmer theoretical foundations. +   Another difference with DHTs is that Freenet will take time, when a +   new node joins the network, to build an index that facilitates +   efficient query routing.  By the inventor's own admission, this is +   damaging for a user's first impressions [282].  It was proposed to +   download a copy of routing tables from seed nodes at startup, even +   though the new node might be far from the seed node.  Freenet's slow +   startup motivated Mache, Gilbert, et al. to amend the overlay after +   failed requests and to place additional index entries on successful +   requests -- they claim almost an order of magnitude reduction in +   average query path length [280].  Clarke also highlighted the lack of +   locality or bandwidth information available for efficient query +   routing decisions [282].  He proposed that each node gather response +   times, connection times, and proportion of successful requests for +   each entry in the query routing table.  When searching for a key that +   is not in its own routing table, it was proposed to estimate response +   times from the routing metrics for the nearest known keys and +   consequently choose the node that can retrieve the data fastest.  The +   response time heuristic assumed that nodes close in the key space +   have similar response times.  This assumption stemmed from early +   deployment observations that Freenet peers seemed to specialize in +   parts of the keyspace -- it has not been justified analytically. +   Kronfol drew attention to Freenet's inability to do keyword searches +   [283].  He suggested that peers cache lists of weighted keywords in +   order to route queries to documents, using Term Frequency Inverse +   Document Frequency (TFIDF) measures and inverted indexes (Section +   4.2.1).  With these methods, a peer can route queries for simple +   keyword lists or more complicated conjunctions and disjunctions of +   keywords.  Robustness analysis and simulation of Kronfol's proposal +   remain open. + +   The vast majority of P2P proposals in following sections rely on a +   distributed index. + + + + + + +Risson & Moors               Informational                     [Page 14] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +3.  Semantic Free Index + +   Many of today's distributed network indexes are semantic.  The +   semantic index is human-readable.  For example, it might associate +   information with other keywords, a document, a database key, or even +   an administrative domain.  It makes it easy to associate objects with +   particular network providers, companies, or organizations, as +   evidenced in the Domain Name System (DNS).  However, it can also +   trigger legal tussles and frustrate content replication and migration +   [216]. + +   Distributed Hash Tables (DHTs) have been proposed to provide +   semantic-free, data-centric references.  DHTs enable one to find an +   object's persistent key in a very large, changing set of hosts.  They +   are typically designed for [23]: + +   a) low degree.  If each node keeps routing information for only a +      small number of other nodes, the impact of high node arrival and +      departure rates is contained; + +   b) low hop count.  The hops and delay introduced by the extra +      indirection are minimized; + +   c) greedy routing.  Nodes independently calculate a short path to the +      target.  At each hop, the query moves closer to the target; and + +   d) robustness.  A path to the target can be found even when links or +      nodes fail. + +3.1.  Origins + +   To understand the origins of recent DHTs, one needs to look to three +   contributions from the 1990s.  The first two -- Plaxton, Rajaraman, +   and Richa (PRR) [30] and Consistent Hashing [49] -- were published +   within one month of each other.  The third, the Scalable Distributed +   Data Structure (SDDS) [52], was curiously ignored in significant +   structured P2P designs despite having some similar goals [2, 6, 7]. +   It has been briefly referenced in other P2P papers [46, 284-287]. + +3.1.1.  Plaxton, Rajaraman, and Richa (PRR) + +   PRR is the most recent of the three.  It influenced the designs of +   Pastry [2], Tapestry [6], and Chord [7].  The value of PRR is that it +   can locate objects using fixed-length routing tables [6].  Objects +   and nodes are assigned a semantic-free address, for example a 160-bit +   key.  Every node is effectively the root of a spanning tree.  A +   message routes toward an object by matching longer address suffixes, +   until it encounters either the object's root node or another node + + + +Risson & Moors               Informational                     [Page 15] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   with a 'nearby' copy.  It can route around link and node failure by +   matching nodes with a related suffix.  The scheme has several +   disadvantages [6]: global knowledge is needed to construct the +   overlay; an object's root node is a single point of failure; nodes +   cannot be inserted and deleted; and there is no mechanism for queries +   to avoid congestion hot spots. + +3.1.2.  Consistent Hashing + +   Consistent Hashing [288] strongly influenced the designs of Chord [7] +   and Koorde [37].  Karger, et al. introduced Consistent Hashing in the +   context of the Web-caching problem [49].  Web servers could +   conceivably use standard hashing to place objects across a network of +   caches.  Clients could use the approach to find the objects.  For +   normal hashing, most object references would be moved when caches are +   added or deleted.  On the other hand, Consistent Hashing is "smooth" +   -- when caches are added or deleted, the minimum number of object +   references move so as to maintain load balancing.  Consistent Hashing +   also ensures that the total number of caches responsible for a +   particular object is limited.  Whereas Litwin's Linear Hashing (LH*) +   scheme requires 'buckets' to be added one at a time in sequence [50], +   Consistent Hashing allows them to be added in any order [49].  There +   is an open Consistent Hashing problem pertaining to the fraction of +   items moved when a node is inserted [165].  Extended Consistent +   Hashing was recently proposed to randomize queries over the spread of +   caches to significantly reduce the load variance [289]. +   Interestingly, Karger [49] referred to an older DHT algorithm by +   Devine that used "a novel autonomous location discovery algorithm +   that learns the buckets' locations instead of using a centralized +   directory" [51]. + +3.1.3.  Scalable Distributed Data Structures (LH*) + +   In turn, Devine's primary point of reference was Litwin's work on +   SDDSs and the associated LH* algorithm [52].  An SDDS satisfies three +   design requirements: files grow to new servers only when existing +   servers are well loaded; there is no centralized directory; and the +   basic operations like insert, search, and split never require atomic +   updates to multiple clients.  Honicky and Miller suggested the first +   requirement could be considered a limitation since expansion to new +   servers is not under administrative control [286].  Litwin recently +   noted numerous similarities and differences between LH* and Chord +   [290].  He found that both implement key search.  Although LH* refers +   to clients and servers, nodes can operate as peers in both.  Chord +   'splits' nodes when a new node is inserted, while LH* schedules +   'splits' to avoid overload.  Chord requests travel O(log n) hops, +   while LH* client requests need, at most, two hops to find the target. +   Chord stores a small number of 'fingers' at each node.  LH* servers + + + +Risson & Moors               Informational                     [Page 16] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   store N/2 to N addresses while LH* clients store 1 to N addresses. +   This trade-off between hop count and the size of the index affects +   system robustness, and bears striking similarity to recent one- and +   two-hop P2P schemes in Section 2.  The arrival and departure of LH* +   clients does not disrupt LH* server metadata at all.  Given the size +   of the index, the arrival and departure of LH* servers are likely to +   cause more churn than that of Chord nodes.  Unlike Chord, LH* has a +   single point of failure, the split coordinator.  It can be +   replicated.  Alternatively, it can be removed in later LH* variants, +   though details have not been progressed for lack of practical need +   [290]. + +3.2.  Dependability + +   We make four overall observations about their dependability. +   Dependability metrics fall into two categories: static dependability, +   a measure of performance before recovery mechanisms take over; and +   dynamic dependability, for the most likely case in massive networks +   where there is continual failure and recovery ("churn"). + +3.2.1.  Static Dependability + +   Observation A: Static dependability comparisons show that no O(log n) +   DHT geometry is significantly more dependable than the other O(log n) +   geometries. + +   Gummadi, et al. compared the tree, hypercube, butterfly, ring, XOR, +   and hybrid geometries.  In such geometries, nodes generally know +   about O(log n) neighbors and route to a destination in O(log n) hops, +   where N is the number of nodes in the overlay.  Gummadi, et al. asked +   "Why not the ring?"  They concluded that only the ring and XOR +   geometries permit flexible choice of both neighbors and alternative +   routes [24].  Loguinov, et al. added the de Bruijn graph to their +   comparison [36].  They concluded that the classical analyses, for +   example the probability that a particular node becomes disconnected, +   yield no major differences between the resilience of Chord, CAN, and +   de Bruijn graphs.  Using bisection width (the minimum edge count +   between two equal partitions) and path overlap (the likelihood that +   backup paths will encounter the same failed nodes or links as the +   primary path), they argued for the superior resilience of the de +   Bruijn graph.  In short, ring, XOR, and de Bruijn graphs all permit +   flexible choice of alternative paths, but only in de Bruijn are the +   alternate paths independent of each other [36]. + + + + + + + + +Risson & Moors               Informational                     [Page 17] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +3.2.2.  Dynamic Dependability + +   Observation B: Dynamic dependability comparisons show that DHT +   dependability is sensitive to the underlying topology maintenance +   algorithms. + +   Li, et al. give the best comparison to date of several leading DHTs +   during churn [291].  They relate the disparate configuration +   parameters of Tapestry, Chord, Kademlia, Kelips, and OneHop to +   fundamental design choices.  For each of these DHTs, they plotted the +   optimal performance in terms of lookup latency (milliseconds) and +   fraction of failed lookups.  The results led to several important +   insights about the underlying algorithms, for example: increasing +   routing table size is more cost-effective than increasing the rate of +   periodic stabilization; learning about new nodes during the lookup +   process sometimes eliminates the need for stabilization; and parallel +   lookups reduce latency due to timeouts more effectively than faster +   stabilization.  Similarly, Zhuang, et al. compared keep-alive +   algorithms for DHT failure detection [292].  Such algorithmic +   comparisons can significantly improve the dependability of DHT +   designs. + +   In Figure 2, we propose a taxonomy for the topology maintenance +   algorithms that influence dependability.  The algorithms can be +   classified by how nodes join and leave, how they first detect +   failures, how they share information about topology updates, and how +   they react when they receive information about topology updates. + + + + + + + + + + + + + + + + + + + + + + + + +Risson & Moors               Informational                     [Page 18] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   Normal Updates +      Joins (passive; active) [293] +      Leaves (passive; active) [293] + +   Fault Detection [292] +      Maintenance +         Proactive (periodic or keep-alive probes) +         Reactive (correction-on-use, correction-on-failure) [294] +      Report +         Negative (all dead nodes, nodes recently failed) +         Positive (all live nodes; nodes recently recovered) [292] + +   Topology Sharing: yes/ no [292] +         Multicast Tree (explicit, implicit) [267, 295] +         Gossip (timeouts; number of contacts) [39] + +   Corrective Action +      Routing +         Rerouting actions +            (reroute once; route in parallel [291]; reject) +         Routing timeouts +            (TCP-style, virtual coordinates) [296] +      Topology +         Update action (evict/ replace/ tag node) +         Update timeliness (immediate, periodic[296], delayed [297]) + +        Figure 2: Topology Maintenance in Distributed Hash Tables + +3.2.3.  Ephemeral or Stable Nodes -- O(log n) or O(1) Hops + +   Observation C: Most DHTs use O(log n) geometries to suit ephemeral +   nodes.  The O(1) hop DHTs suit stable nodes and deserve more research +   attention. + +   Most of the DHTs in Section 3.5 assume that nodes are ephemeral, with +   expected lifetimes of one to two hours.  Therefore, they mostly use +   an O(log n) geometry.  The common assumption is that maintenance of +   full routing tables in the O(1) hop DHTs will consume excessive +   bandwidth when nodes are continually joining and leaving.  The +   corollary is that, when they run on stable infrastructure servers +   [298], most of the DHTs in Section 3.5 are less than optimal -- +   lookups take many more hops than necessary, wasting latency and +   bandwidth budgets.  The O(1) hop DHTs suit stable deployments and +   high lookup rates.  For a churning 1024-node network, Li, et al. +   concluded that OneHop is superior to Chord, Tapestry, Kademlia, and +   Kelips in terms of latency and lookup success rate [291].  For a +   3000-node network, they concluded that "OneHop is only preferable to +   Chord when the deployment scenario allows a communication cost + + + +Risson & Moors               Informational                     [Page 19] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   greater than 20 bytes per node per second" [291].  This apparent +   limitation needs to be put in context.  They assumed that each node +   issues only one lookup every 10 minutes and has a lifetime of only 60 +   minutes.  It seems reasonable to expect that in some deployments, +   nodes will have a lifetime of weeks or more, a maintenance bandwidth +   of tens of kilobits per second, and a load of hundreds of lookups per +   second.  O(1) hop DHTs are superior in such situations.  OneHop can +   scale at least to many tens of thousands of nodes [267].  The recent +   O(1) hop designs [267, 295] are vastly outnumbered by the O(log n) +   DHTs in Section 3.5.  Research on the algorithms of Figure 2 will +   also yield improvements in the dependability of the O(1) hop DHTs. + +3.2.4.  Simulation and Proof + +   Observation D: Although not yet a mature science, the study of DHT +   dependability is helped by recent simulation and formal development +   tools. + +   While there are recent reference architectures [294, 298], much of +   the DHT literature in Section 3.5 does not lend itself to repeatable, +   comparative studies.  The best comparative work to date [291] relies +   on the Peer-to-Peer Simulator (P2PSIM) [299].  At the time of +   writing, it supports more DHT geometries than any other simulator. +   As the study of DHTs matures, we can expect to see the simulation +   emphasis shift from geometric comparison to a comparison of the +   algorithms of Figure 2. + +   P2P correctness proofs generally rely on less-than-complete formal +   specifications of system invariants and events [7, 45, 300].  Li and +   Plaxton expressed concern that "when many joins and leaves happen +   concurrently, it is not clear whether the neighbor tables will remain +   in a 'good' state" [47].  While acknowledging that guaranteeing +   consistency in a failure-prone network is impossible, Lynch, Malkhi, +   et al. sketched amendments to the Chord algorithm to guarantee +   atomicity [301].  More recently, Gilbert, Lynch, et al. gave a new +   algorithm for atomic read/write memory in a churning distributed +   network, suggesting it to be a good match for P2P [302].  Lynch and +   Stoica show in an enhancement to Chord that lookups are provably +   correct when there is a limited rate of joins and failures [303]. +   Fault Tolerant Active Rings is a protocol for active joins and leaves +   that was formally specified and proven using B-method tools [304].  A +   good starting point for a formal DHT development would be the +   numerous informal API specifications [22, 305, 306].  Such work could +   be informed by other efforts to formally specify routing invariants +   [307, 308]. + + + + + + +Risson & Moors               Informational                     [Page 20] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +3.3.  Latency + +   The key metrics for DHT latency are: + +   1) Shortest-Path Distance and Diameter.  In graph theory, the +      shortest-path distance is the minimum number of edges in any path +      between two vertices of the graph.  Diameter is the largest of all +      shortest-path distances in a graph [309].  Networking synonyms for +      distance on a DHT are "hop count" and "lookup length". + +   2) Latency and Latency Stretch.  Two types of latency are relevant +      here -- network-layer latency and overlay latency.  Network-layer +      latency has been referred to as "proximity" or "locality" [24]. +      Stretch is the cost of an overlay path between two nodes, divided +      by the cost of the direct network path between those nodes [310]. +      Latency stretch is also known as the "relative delay penalty" +      [311]. + +3.3.1.  Hop Count and the O(1)-Hop DHTs + +   Hop count gives an approximate indication of path latency.  O(1)-hop +   DHTs have path latencies lower than the O(log n)-hop DHTs [291]. +   This significant advantage is often overlooked on account of concern +   about the messaging costs to maintain large routing tables (Section +   3.2.3).  Such concern is justified when the mean node lifetime is +   only a few hours and the mean lookup interval per node is more than a +   few seconds (the classic profile of a P2P file-sharing node). +   However, for a large, practical operating range (node lifetimes of +   days or more, lookup rates of over tens of lookups per second per +   node, up to ~100,000 nodes), the total messaging cost in O(1) hop +   DHTs is lower than in O(log n) DHTs [312].  Lookups and routing table +   maintenance contribute to the total messaging cost.  If a deployment +   fits this operating range, then O(1)-hop DHTs will give lower path +   latencies and lower total messaging costs.  An additional merit of +   the O(1)-hop DHTs is that they yield lower lookup failure rates than +   their O(log N)-hop counterparts [291]. + +   Low hop count can be achieved in two ways: each node has a large O(N) +   index of nodes; or the object references can be replicated on many +   nodes.  Beehive [313], Kelips [39], LAND [310], and Tulip [314] are +   examples of the latter category.  Beehive achieves O(1) hops on +   average and O(log n) hops in the worst case, by proactive replication +   of popular objects.  Kelips replicates the 'file index'.  It incurs +   O(sqrt(N)) storage costs for both the node index and the file index. +   LAND uses O(log n) reference pointers for each stored object and an +   O(log n) index to achieve a worst-case 1+e stretch, where 0<e.  The +   Kelips-like Tulip [314] requires 2 hops per lookup.  Each node + + + + +Risson & Moors               Informational                     [Page 21] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   maintains 2sqrt(N)log(N) links to other nodes and objects are +   replicated on O(sqrt(N)) nodes. + +   The DHTs with a large O(N) node index can be divided into two groups: +   those for which the index is always O(N); and those for which the +   index opportunistically ranges from O(log n) to O(N).  Linear Hashing +   (LH*) servers [52], OneHop [267], and 1h-Calot [295] fall into the +   former category.  EpiChord [315] and Accordion [316] are examples of +   the latter. + +3.3.2.  Proximity and the O(log n)-Hop DHTs + +   If one chooses not to use single-hop DHTs, hop count is a weak +   indicator of end-to-end path latency.  Some hops may incur large +   delays because of intercontinental or satellite links.  Consequently, +   numerous DHT designs minimize path latency by considering the +   proximity of nodes.  Gummadi, et al. classified the proximity methods +   as follows [24]: + +   1) Proximity Neighbor Selection (PNS).  The nodes in the routing +      table are chosen based on the latency of the direct hop to those +      nodes.  The latency may be explicitly measured [317], or it may be +      estimated using one of several synthetic coordinate systems [150, +      154, 318].  As a lower bound on PNS performance, Dabek, et al. +      showed that lookups on O(log n) DHTs take at least 1.5 times the +      average roundtrip time of the underlying network [154]. + +   2) Proximity Route Selection (PRS).  At lookup time, the choice of +      the next-hop node relies on the latency of the direct hop to that +      node.  PRS is less effective than PNS, though it may complement it +      [24].  Some of the routing geometries in Section 3.5 do not +      support PNS and/or PRS [24]. + +   3) Proximity Identifier Selection (PIS).  Node identifiers indicate +      geographic position.  PIS frustrates load balancing, increases the +      risk of correlated failures, and is not often used [24]. + +   The proximity study by Gummadi, et al. assumed recursive routing, +   though they suggested that PNS would also be superior to PRS with +   iterative routing [24].  Dabek, et al. found that recursive lookups +   take 0.6 times as long as iterative lookups [150]. + +   Beyond the explicit use of proximity information, redundancy can help +   to avoid slow paths and servers.  One may increase the number of +   replicas [150], use parallel lookups [291, 316], use alternate routes +   on failure [150], or use multiple gateway nodes to enter the DHT +   [317]. + + + + +Risson & Moors               Informational                     [Page 22] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +3.4.  Multicasting + +3.4.1.  Multicasting vs. Broadcasting + +   "Multicasting" here means sending a message to a subset of an +   overlay's nodes.  Nodes explicitly join and leave this subset, called +   a "multicast group".  "Broadcasting" here is a special case of +   multicasting in which a message is sent to all nodes in the overlay. +   Broadcasting relies on overlay membership messages -- it does not +   need extra group membership messaging.  Castro, et al. said +   multicasting on structured overlays is either "flooding" (one overlay +   per group) or "tree-based" (one tree per group) [319].  These are +   synonyms for broadcasting and multicasting respectively. + +   The first DHT-based designs for multicasting were CAN multicast +   [320], Scribe [241], Bayeux [242], and i3 [231].  They were based on +   CAN [8], Pastry [2], Tapestry [31], and Chord [7] respectively.  El- +   Ansary, et al. devised the first DHT-based broadcasting scheme [321]. +   It was based on Chord. + +   Multicast trees can be constructed using reverse-path forwarding or +   forward-path forwarding.  Scribe uses reverse-path forwarding [241]. +   Bayeux uses forward-path forwarding [242].  Borg, a multicast design +   based on Pastry, uses a combination of forward-path and reverse-path +   forwarding to minimize latency [237]. + +3.4.2.  Motivation for DHT-based Multicasting + +   Multicasting complements DHT search capability.  DHTs naturally +   support exact match queries.  With multicasting, they can support +   more complex queries.  Multicasting also enables the dissemination +   and collection of global information. + +   Consider, for example, aggregation queries like minimum, maximum, +   count, sum, and average (Section 5.4).  A node at the root of a +   dissemination tree might multicast such a query [322].  The leaf +   nodes return local results towards the root node.  Successive parents +   aggregate the result so that eventually the root node can compute the +   global result.  Such queries may help to monitor the capacity and +   health of the overlay itself. + +   Why bother with structured overlays for multicasting?  In Section +   2.1, we saw that Gnutella can multicast complex queries without them +   [4].  Castro, et al. posed the question, "Should we build Gnutella on +   a structured overlay?" [259].  While acknowledging that their study +   was preliminary, they did conclude that "we see no reason to build +   Gnutella on top of an unstructured overlay" [259].  The supposedly +   high maintenance costs of structured overlays were outweighed by + + + +Risson & Moors               Informational                     [Page 23] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   query cost savings.  The structured overlay ensured that nodes were +   only visited once during a complex query.  It also helped to +   accurately limit the total number of nodes visited.  Pai, et al. +   acknowledged that multicast trees based on structured overlays +   contribute to simple routing rules, low delay and low delay variation +   [323].  However, they opted for unstructured, gossip-based +   multicasting for reliability reasons: data loss near the tree root +   affects all subtended nodes; interior node failures must be repaired +   quickly; interior nodes are obliged to disseminate more than their +   fair share of traffic, giving leaf nodes a "free ride".  The most +   promising research direction is to improve on the Bimodal +   Multicasting approach [324].  It combines the bandwidth efficiency +   and low latency of structured, best-effort multicasting trees with +   the reliability of unstructured gossip protocols. + +3.4.3.  Design Issues + +   None of the early structured overlay multicast designs addressed all +   of the following issues [325]: + +   1) Heterogeneous Node Capacity.  Nodes differ in their processing, +      memory, and network capacity.  Multicast throughput is largely +      determined by the node with smallest throughput [325].  To limit +      the multicasting load on a node, one might cap its out-degree.  If +      the same node receives further join requests, it refers them to +      its children ("pushdown") [240].  Bharambe, et al. explored +      several pushdown strategies but found them inadequate to deal with +      heterogeneity [326].  They concluded that the heterogeneity issue +      remains open, and should be addressed before deploying DHTs for +      high-bandwidth multicasting applications.  Independently, Zhang et +      al. partially tackled heterogeneity by allowing nodes in their +      CAM-Chord and CAM-Koorde designs to vary out-degree according to +      the node's capacity [325].  However, they made no mention of the +      "pushdown" issue -- they did not describe topology maintenance +      when the out-degree limit is reached. + +   2) Reliability (Dynamic Membership).  If a multicast tree is to be +      resilient, it must survive dynamic membership.  There are several +      ways to deal with dynamic membership: ensure that the root node of +      the multicasting tree does not handle all requests to join or +      leave the multicast group [242]; use multiple interior-node- +      disjoint trees to avoid single points of failure in tree +      structures [322]; and split the root node into several replicas +      and partition members across them [241].  For example, Bayeux +      requires the root node to track all group membership changes +      whereas Scribe does not [241].  CAN-multicast uses a single, +      well-known host to bootstrap the join operations [320].  The +      earliest DHT-based broadcasting work by El-Ansary, et al. did not + + + +Risson & Moors               Informational                     [Page 24] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +      address the issue of dynamic membership [321].  Ghodsi, et al. +      addressed it in a subsequent paper, though, giving two broadcast +      algorithms that accommodate routing table inconsistencies [327]. +      One algorithm achieves a more optimal multicasting network at the +      expense of greater correction overhead.  Splitstream, based on +      Scribe and Pastry, redundantly striped content across multiple +      interior-node-disjoint multicast trees -- if one interior node +      fails, then only one stripe is lost [240]. + +   3) Large Any-Source Multicast Groups.  Any group member should be +      allowed to send multicast messages.  The group should scale to a +      very large number of hosts.  CAN-based multicast was the first +      application-level multicast scheme to scale to groups of several +      thousands of nodes without restricting the service model to a +      single source [320].  Bayeux scales to large groups but has a +      single root node for each multicast group.  It supports the any- +      source model only by having the root node operate as a reflector +      for multiple senders [242]. + +3.5.  Routing Geometries + +   In Sections 3.5.1 to 3.5.6, we introduce the main geometries for +   simple key lookup and survey their robustness mechanisms. + +3.5.1.  Plaxton Trees (Pastry, Tapestry) + +   Work began in March 2000 on a structured, fault-tolerant, wide-area +   Dynamic Object Location and Routing (DOLR) system called Tapestry [6, +   155].  While DHTs fix replica locations, a DOLR API enables +   applications to control object placement [31].  Tapestry's basic +   location and routing scheme follows Plaxton, Rajaraman, and Richa +   (PRR) [30], but it remedies PRR's robustness shortcomings described +   in Section 3.1.  Whereas each object has one root node in PRR, +   Tapestry uses several to avoid a single point of failure.  Unlike +   PRR, it allows nodes to be inserted and deleted.  Whereas PRR +   required a total ordering of nodes, Tapestry uses 'surrogate routing' +   to incrementally choose root nodes.  The PRR algorithm does not +   address congestion, but Tapestry can put object copies close to nodes +   generating high query loads.  PRR nodes only know of the nearest +   replica, whereas Tapestry nodes enable selection from a set of +   replicas (for example, to retrieve the most up to date).  To detect +   routing faults, Tapestry uses TCP timeouts and UDP heartbeats for +   detection, sequential secondary neighbours for rerouting, and a +   'second chance' window so that recovery can occur without the +   overhead of a full node insertion.  Tapestry's dependability has been +   measured on a testbed of about 100 machines and on simulations of + + + + + +Risson & Moors               Informational                     [Page 25] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   about 1000 nodes.  Successful routing rates and maintenance +   bandwidths were measured during instantaneous failures and ongoing +   churn [31]. + +   Pastry, like Tapestry, uses Plaxton-like prefix routing [2].  As in +   Tapestry, Pastry nodes maintain O(log n) neighbours and route to a +   target in O(log n) hops.  Pastry differs from Tapestry only in the +   method by which it handles network locality and replication [2]. +   Each Pastry node maintains a 'leaf set' and a 'routing table'.  The +   leaf set contains l/2 node IDs on either side of the local node ID in +   the node ID space.  The routing table, in row r, column c, points to +   the node ID with the same r-digit prefix as the local node, but with +   an r+1 digit of c.  A Pastry node periodically probes leaf set and +   routing table nodes, with periodicity of Tls and Trt and a timeout +   Tout.  Mahajan, Castry, et al. analyzed the reliability versus +   maintenance cost trade-offs in terms of the parameters l, Tls, Trt, +   and Tout [328].  They concluded that earlier concerns about excessive +   maintenance cost in a churning P2P network were unfounded, but +   suggested follow-up work for a wider range of reliability targets, +   maintenance costs, and probe periods.  Rhea Geels, et al. concluded +   that existing DHTs fail at high churn rates [329].  Building on a +   Pastry implementation from Rice University, they found that most +   lookups fail to complete when there is excessive churn.  They +   conjectured that short-lived nodes often leave the network with +   lookups that have not yet timed out, but no evidence was provided to +   confirm the theory.  They identified three design issues that affect +   DHT performance under churn: reactive versus periodic recovery of +   peers; lookup timeouts; and choice of nearby neighbours.  Since +   reactive recovery was found to add traffic to already congested +   links, the authors used periodic recovery in their design.  For +   lookup timeouts, they advocated an exponentially weighted moving +   average of each neighbour's response time, over alternative fixed +   timeout or 'virtual coordinate' schemes.  For selection of nearby +   neighbours, they found that 'global sampling' was more effective than +   simply sampling a 'neighbour's neighbours' or 'inverse neighbours'. +   Castro, Costa, et al. have refuted the suggestion that DHTs cannot +   cope with high churn rates [330].  By implementing methods for +   continuous detection and repair, their MSPastry implementation +   achieved shorter routing paths and a maintenance overhead of less +   than half a message per second per node. + +   There have been more recent proposals based on these early Plaxton- +   like schemes.  Kademlia uses a bit-wise exclusive or (XOR) metric for +   the 'distance' between 160-bit node identifiers [45].  Each node +   keeps a list of contact nodes for each section of the node space that +   is between 2^i and 2^(i+1) from itself (0.i<160).  Longer-lived nodes +   are deliberately given preference on this list -- it has been found +   in Gnutella that the longer a node has been active, the more likely + + + +Risson & Moors               Informational                     [Page 26] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   it is to remain active.  Like Kademlia, Willow uses the XOR metric +   [32].  It implements a Tree Maintenance Protocol to 'zipper' together +   broken segments of a tree.  Where other schemes use DHT routing to +   inefficiently add new peers, Willow can merge disjoint or broken +   trees in O(log n) parallel operations. + +3.5.2.  Rings (Chord, DKS) + +   Chord is the prototypical DHT ring, so we first sketch its operation. +   Chord maps nodes and keys to an identifier ring [7, 34].  Chord +   supports one main operation: find a node with the given key.  It uses +   Consistent Hashing (Section 3.1) to minimize disruption of keys when +   nodes join and leave the network.  However, Chord peers need only +   track O(log n) other peers, not all peers as in the original +   consistent hashing proposal [49].  It enables concurrent node +   insertions and deletions, improving on PRR.  Compared to Pastry, it +   has a simpler join protocol.  Each Chord peer tracks its predecessor, +   a list of successors, and a finger table.  Using the finger table, +   each hop is at least half the remaining distance around the ring to +   the target node, giving an average lookup hop count of (1/2)log +   n(base 2).  Each Chord node runs a periodic stabilization routine +   that updates predecessor and successor pointers to cater to newly +   added nodes.  All successors of a given node need to fail for the +   ring to fail.  Although a node departure could be treated the same as +   a failure, a departing Chord node first notifies the predecessor and +   successors, so as to improve performance. + +   In their definitive paper, Chord's inventors critiqued its +   dependability under churn [34].  They provided proofs on the +   behaviour of the Chord network when nodes in a stable network fail, +   stressing that such proofs are inadequate in the general case of a +   perpetually churning network.  An earlier paper had posed the +   question, "For lookups to be successful during churn, how regularly +   do the Chord stabilization routines need to run?" [331].  Stoica, +   Morris, et al. modeled a range of node join/departure rates and +   stabilization periods for a Chord network of 1000 nodes.  They +   measured the number of timeouts (caused by a finger pointing to a +   departed node) and lookup failures (caused by nodes that temporarily +   point to the wrong successor during churn).  They also modeled the +   'lookup stretch', the ratio of the Chord lookup time to optimal +   lookup time on the underlying network.  They demonstrated the latency +   advantage of recursive lookups over iterative lookups, but there +   remains room for delay reduction.  For further work, the authors +   proposed to improve resilience to network partitions, using a small +   set of known nodes or 'remembered' random nodes.  To reduce the +   number of messages per lookup, they suggested an increase in the size +   of each step around the ring, accomplished via a larger number of +   fingers at each node.  Much of the paper assumed independent, equally + + + +Risson & Moors               Informational                     [Page 27] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   likely node failures.  Analysis of correlated node failures, caused +   by massive site or backbone failures, will be more important in some +   deployments.  The paper did not attempt to recommend a fixed optimal +   stabilization rate.  Liben-Nowell, Balakrishnan, et al. had suggested +   that optimum stabilization rate might evolve according to +   measurements of peers' behaviour [331] -- such a mechanism has yet to +   be devised. + +   Alima, El-Ansary, et al. considered the communication costs of +   Chord's stabilization routines, referred to as 'active correction', +   to be excessive [332].  Two other robustness issues also motivated +   their Distributed K-ary Search (DKS) design, which is similar to +   Chord.  Firstly, the total system should evolve for an optimum +   balance between the number of peers, the lookup hop count, and the +   size of the routing table.  Secondly, lookups should be reliable -- +   P2P algorithms should be able to guarantee a successful lookup for +   key/value pairs that have been inserted into the system.  A similar +   lookup-correctness issue was raised elsewhere by one of Chord's +   authors; "Is it possible to augment the data structure to work even +   when nodes (and their associated finger lists) just disappear?" [333] +   Alima, El-Ansary, et al. asserted that P2Ps using active correction, +   like Chord, Pastry, and Tapestry, are unable to give such a +   guarantee.  They propose an alternate 'correction-on-use' scheme, +   whereby expired routing entries are corrected by information +   piggybacking lookups and insertions.  A prerequisite is that lookup +   and insertion rates are significantly higher than node arrival, +   departure, and failure rates.  Correct lookups are guaranteed in the +   presence of simultaneous node arrivals or up to f concurrent node +   departures, where f is configurable. + +3.5.3.  Tori (CAN) + +   Ratnasamy, Francis, et al. developed the Content-Addressable Network +   (CAN), another early DHT widely referenced alongside Tapestry, +   Pastry, and Chord [8, 334].  It is arranged as a virtual +   d-dimensional Cartesian coordinate space on a d-torus.  Each node is +   responsible for a zone in this coordinate space.  The designers used +   a heuristic thought to be important for large, churning P2P networks: +   keep the number of neighbours independent of system size. +   Consequently, its design differs significantly from Pastry, Tapestry, +   and Chord.  Whereas they have O(log n) neighbours per node and O(log +   n) hops per lookup, CAN has O(d) neighbours and O(dn^(1/d)) hop +   count.  When CAN's system-wide parameter d is set to log(n), CAN +   converges to their profile.  If the number of nodes grows, a major +   rearrangement of the CAN network may be required [151].  The CAN +   designers considered building on PRR, but opted for the simple, low- +   state-per-node CAN algorithm instead.  They had reasoned that a PRR- +   based design would not perform well under churn, given node + + + +Risson & Moors               Informational                     [Page 28] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   departures and arrivals would affect a logarithmic number of nodes +   [8]. + +   There have been preliminary assessments of CAN's resilience.  When a +   node leaves the CAN in an orderly fashion, it passes its own Virtual +   ID (VID), its neighbours' VIDs and IP addresses, and its key/value +   pairs to a takeover node.  If a node leaves abruptly, its neighbours +   send recovery messages towards the designated takeover node.  CAN +   ensures the recovery messages reach the takeover node, even if nodes +   die simultaneously, by maintaining a VID chain with Chord's +   stabilization algorithm.  Some initial 'proof of concept' resilience +   simulations were run using the Network Simulator (NS) [335] for up to +   a few hundred nodes.  Average hop counts and lookup failure +   probabilities were plotted against the total number of nodes for +   various node failure rates [8].  The CAN team documented several open +   research questions pertaining to state/hop count trade-offs, +   resilience, load, locality, and heterogeneous peers [44, 334]. + +3.5.4.  Butterflies (Viceroy) + +   Viceroy approximates a butterfly network [46].  It generally has +   constant degree like CAN.  Like Chord, Tapestry, and Pastry, it has +   logarithmic diameter.  It improves on these systems, inasmuch as its +   diameter is better than CAN and its degree is better than Chord, +   Tapestry, and Pastry.  As with most DHTs, it utilizes Consistent +   Hashing.  When a peer joins the Viceroy network, it takes a random +   but permanent 'identity' and selects its 'level' within the network. +   Each peer maintains general ring pointers ('predecessor' and +   'successor'), level ring pointers ('nextonlevel' and 'prevonlevel'), +   and butterfly pointers ('left', 'right', and 'up').  When a peer +   departs, it normally passes its key pairs to a successor, and +   notifies other peers to find a replacement peer. + +   The Viceroy paper scoped out the issue of robustness.  It explicitly +   assumed that peers do not fail [46].  It assumed that join and leave +   operations do not overlap, so as to avoid the complication of +   concurrency mechanisms like locking.  Kaashoek and Karger were +   somewhat critical of Viceroy's complexity [37].  They also pointed to +   its fault-tolerance blind spot.  Li and Plaxton suggested that such +   constant-degree algorithms deserve further consideration [47].  They +   offered several pros and cons.  The limited degree may increase the +   risk of a network partition, or inhibit use of local neighbours (for +   the simple reason that there are less of them).  On the other hand, +   it may be easier to reason about the correctness of fixed-degree +   networks.  One of the Viceroy authors has since proposed constant- +   degree peers in a two-tier, locality-aware DHT [310] -- the lower +   degree maintained by each lower-tier peer purportedly improves +   network adaptability.  Another Viceroy author has since explored an + + + +Risson & Moors               Informational                     [Page 29] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   alternative bounded-degree graph for P2P, namely the de Bruijn graph +   [336]. + +3.5.5.  de Bruijn (D2B, Koorde, Distance Halving, ODRI) + +   De Bruijn graphs have had numerous refinements since their inception +   [337, 338].  Schlumberger was the first to use them for networking +   [339].  Two research teams independently devised the 'generalized' de +   Bruijn graph that accommodates a flexible number of nodes in the +   system [340, 341].  Rowley and Bose studied fault-tolerant rings +   overlaid on the de Bruijn graph [342].  Lee, Liu, et al. devised a +   two-level de Bruijn hierarchy, whereby clusters of local nodes are +   interconnected by a second-tier ring [343]. + +   Many of the algorithms discussed previously are 'greedy' in that each +   time a query is forwarded, it moves closer to the destination. +   Unfortunately, greedy algorithms are generally suboptimal -- for a +   given degree, the routing distance is longer than necessary [344]. +   Unlike these earlier P2P designs, de Bruijn graphs of degree k +   achieve an asymptotically optimal diameter log n, where n is the +   number of nodes in the system and k can be varied to improve +   resilience.  If there are O(log n) neighbours per node, the de Bruijn +   hop count is O(log n/log log n).  To illustrate de Bruijn's practical +   advantage, consider a network with one million nodes of degree 20: +   Chord has a diameter of 20, while de Bruijn has a diameter of 5 [36]. +   In 2003, there were a quick succession of de Bruijn proposals -- D2B +   [345], Koorde [37], Distance Halving [132, 336], and the Optimal +   Diameter Routing Infrastructure (ODRI) [36]. + +   Fraigniaud and Gauron began the D2B design by laying out an informal +   problem statement: keys should be evenly distributed; lookup latency +   should be small; traffic load should be evenly distributed; updates +   of routing tables and redistribution of keys should be fast when +   nodes join or leave the network.  They defined a node's "congestion" +   to be the probability that a lookup will traverse it.  Apart from its +   optimal de Bruijn diameter, they highlighted D2B's merits: a constant +   expected update time when nodes join and leave (O(log n) with high +   probability (w.h.p.)); the expected node congestion is O((log n)/n) +   (O(((log n)^2)/n) w.h.p.) [345].  D2B's resilience was discussed only +   in passing. + +   Koorde extends Chord to attain the optimal de Bruijn degree/diameter +   trade-off above [37].  Unlike D2B, Koorde does not constrain the +   selection of node identifiers.  Also unlike D2B, it caters to +   concurrent joins, by extension of Chord's functionality.  Kaashoek +   and Karger investigated Koorde's resilience to a rather harsh failure +   scenario: "in order for a network to stay connected when all nodes +   fail with probability of 1/2, some nodes must have degree + + + +Risson & Moors               Informational                     [Page 30] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   omega(log n)" [37].  They sketched a mechanism to increase Koorde's +   degree for this more stringent fault tolerance, losing de Bruijn's +   constant degree advantage.  Similarly, to achieve a constant-factor +   load balance, Koorde would have to sacrifice its degree optimality. +   They suggested that the ability to trade the degree, and hence the +   maintenance overhead, against the expected hop count may be important +   for churning systems.  They also identified an open problem: find a +   load-balanced, degree optimal DHT.  Datta, Girdzijauskas, et al. +   showed that for arbitrary key distributions, de Bruijn graphs fail to +   meet the dual goals of load balancing and search efficiency [346]. +   They posed the question, "(Is there) a constant routing table sized +   DHT which meets the conflicting goals of storage load balancing and +   search efficiency for an arbitrary and changing key distribution?" + +   Distance Halving was also inspired by de Bruijn [336] and shares its +   optimal diameter.  Naor and Wieder argued for a two-step +   "continuous-discrete" approach for its design.  The correctness of +   its algorithms is proven in a continuous setting.  The algorithms are +   then mapped to a discrete space.  The source x and target y are +   points on the continuous interval [0,1).  Data items are hashed to +   this same interval.  <str> is a string that determines how messages +   leave any point on the ring: if bit t of the string is 0, the left +   leg is taken; if it is 1, the right leg is taken.  <str> increases by +   one bit each hop, giving a sequence by which to step around the ring. +   A lookup has two phases.  In the first, the lookup message containing +   the source, target, and the random string hops toward the midpoint of +   the source and target.  On each hop, the distance between <str>(x) +   and <str>(y) is halved, by virtue of the specific 'left' and 'right' +   functions.  In the second phase, the message steps 'backward' from +   the midpoint to the target, removing the last bit in <str> at each +   hop. 'Join' and 'leave' algorithms were outlined but there was no +   consideration of recovery times or message load on churn.  Using the +   Distance Halving properties, the authors devised a caching scheme to +   relieve congestion in a large P2P network.  They have also modified +   the algorithm to be more robust in the presence of random faults +   [132]. + +   Solid comparisons of DHT resilience are scarce, but Loguinov, Kumar, +   et al. give just that in their ODRI paper [36].  They compare Chord, +   CAN, and de Bruijn in terms of routing performance, graph expansion +   and clustering.  At the outset, they give the optimal diameter (the +   maximum hop count between any two nodes in the graph) and average hop +   count for graphs of fixed degree.  De Bruijn graphs converge to both +   optima, and outperform Chord and CAN on both counts.  These optima +   impact both delay and aggregate lookup load.  They present two +   clustering measures (edge expansion and node expansion), which are +   interesting for resilience.  Unfortunately, after decades of de +   Bruijn research, they have no exact solution.  De Bruijn was shown to + + + +Risson & Moors               Informational                     [Page 31] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   be superior in terms of path overlap - "de Bruijn automatically +   selects backup paths that do not overlap with the best shortest path +   or with each other" [36]. + +3.5.6.  Skip Graphs + +   Skip Graphs have been pursued by two research camps [38, 41].  They +   augment the earlier Skip Lists [347, 348].  Unlike earlier balanced +   trees, the Skip List is probabilistic -- its insert and delete +   operations do not require tree rearrangements and so are faster by a +   constant factor.  The Skip List consists of layers of ordered linked +   lists.  All nodes participate in the bottom layer 0 list.  Some of +   these nodes participate in the layer 1 list with some fixed +   probability.  A subset of layer 1 nodes participate in the layer 2 +   list, and so on.  A lookup can proceed quickly through the list by +   traversing the sparse upper layers until it is close to, or at, the +   target.  Unfortunately, nodes in the upper layers of a Skip List are +   potential hot spots and single points of failure.  Unlike Skip Lists, +   Skip Graphs provide multiple lists at each level for redundancy, and +   every node participates in one of the lists at each level. + +   Each node in a Skip Graph has theta(log n) neighbours on average, +   like some of the preceding DHTs.  The Skip Graph's primary edge over +   the DHTs is its support for prefix and proximity search.  DHTs hash +   objects to a random point in the graph.  Consequently, they give no +   guarantees over where the data is stored.  Nor do they guarantee that +   the path to the data will stay within the one administration as far +   as possible [38].  Skip graphs, on the other hand, provide for +   location-sensitive name searches.  For example, to find the document +   docname on the node user.company.com, the Skip Graph might step +   through its ordered lists for the prefix com.company.user [38]. +   Alternatively, to find an object with a numeric identifier, an +   algorithm might search the lowest layer of the Skip Graph for the +   first digit, the next layer for the next digit, in the same vein +   until all digits are resolved.  Being ordered, Skip Graphs also +   facilitate range searches.  In each of these examples, the Skip Graph +   can be arranged such that the path to the target, as far as possible, +   stays within an administrative boundary.  If one administration is +   detached from the rest of the Skip Graph, routing can continue within +   each of the partitions.  Mechanisms have been devised to merge +   disconnected segments [157], though at this stage, segments are re- +   merged one at a time.  A parallel merge algorithm has been flagged +   for future work. + +   The advantages of Skip Graphs come at a cost.  To be able to provide +   range queries and data placement flexibility, Skip Graph nodes +   require many more pointers than their DHT counterparts.  An increased +   number of pointers implies increased maintenance traffic.  Another + + + +Risson & Moors               Informational                     [Page 32] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   shortcoming of at least one of the early proposals was that no +   algorithm was given to assign keys to machines.  Consequently, there +   are no guarantees on system-wide load balancing or on the distance +   between adjacent keys [100].  Aspnes, Kirsch, et al. have recently +   devised a scheme to reduce the inter-machine pointer count from +   O(mlogm), where m is the number of data elements, to O(nlog n), where +   n is the number of nodes [100].  They proposed a two-layer scheme -- +   one layer for the Skip Graph itself and the second 'bucket layer'. +   Each machine is responsible for a number of buckets and each bucket +   elects a representative key.  Nodes locally adjust their load.  They +   accept additional keys if they are below their threshold or disperse +   keys to nearby nodes if they are above threshold.  There appear to be +   numerous open issues: simulations have been done but analysis is +   outstanding; mechanisms are required to handle the arrival and +   departure of nodes; there were only brief hints as to how to handle +   nodes with different capacities. + +4.  Semantic Index + +   Semantic indexes capture object relationships.  While the semantic- +   free methods (DHTs) have firmer theoretic foundations and guarantee +   that a key can be found if it exists, they do not capture the +   relationships between the document name and its content or metadata +   on their own.  Semantic P2P designs do.  However, since their design +   is often driven by heuristics, they may not guarantee that scarce +   items will be found. + +   So what might the semantically indexed P2Ps add to an already crowded +   field of distributed information architectures?  At one extreme, +   there are the distributed relational database management systems +   (RDBMSs), with their strong consistency guarantees [284].  They +   provide strong data independence, the flexibility of SQL queries, and +   strong transactional semantics -- Atomicity, Consistency, Isolation +   and Durability (ACID) [349].  They guarantee that the query response +   is complete -- all matching results are returned.  The price is +   performance.  They scale to perhaps 1000 nodes, as evidenced in +   Mariposa [350, 351], or require query caching front ends to constrain +   the load [284].  Database research has "arguably been cornered into +   traditional, high-end, transactional applications" [72].  Then there +   are distributed file systems, like the Network File System (NFS) or +   the Serverless Network File Systems (xFS), with little data +   independence, low-level file retrieval interfaces, and varied +   consistency [284].  Today's eclectic mix of Content Distribution +   Networks (CDNs) generally deload primary servers by redirecting Web +   requests to a nearby replica.  Some intercept the HTTP requests at +   the DNS level and then use consistent hashing to find a replica [23]. +   Since this same consistent hashing was a forerunner to the DHT + + + + +Risson & Moors               Informational                     [Page 33] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   approaches above, CDNs are generally constrained to the same simple +   key lookups. + +   The opportunity for semantically indexed P2Ps, then, is to provide: + +   a) graduated data independence, consistency, and query flexibility, +      and + +   b) probabilistically complete query responses, across + +   c) very large numbers of low-cost, geographically distributed, +      dynamic nodes. + +4.1.  Keyword Lookup + +   P2P keyword lookup is best understood by considering the structure of +   the underlying index and the algorithms by which queries are routed +   over that index.  Figure 3 summarizes the following paragraphs by +   classifying the keyword query algorithms, index structures, and +   metrics.  The research has largely focused on scalability, not +   dependability.  There have been very few studies that quantify the +   impact of network churn.  One exception is the work by Chawathe, et +   al. on the Gia system [61].  Gia's combination of algorithms from +   Figure 3 (receiver-based flow control, biased random walk, and one- +   hop replication) gave 2-4 orders of magnitude improvement in query +   success rates in churning networks. + + + + + + + + + + + + + + + + + + + + + + + + + +Risson & Moors               Informational                     [Page 34] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   QUERY +   Query routing +     Flooding: Peers only index local files so queries must propagate +       widely [4] +     Policy-based: Choice of the next hop node: random; most/least +       recently used; most files shared; most results [265, 352] +     Random walks: Parallel [67] or biased random walks [61, 66] +   Query forwarding +     Iterative: Nodes perform iterative unicast searches of ultrapeers, +       until the desired number of results is achieved.  See Gnutella +       UDP Extension for Scalable Searches (GUESS) [265, 353] +     Recursive +   Query flow control +     Receiver-controlled: Receivers grant query tokens to senders, so +       as to avoid overload [61] +     Reactive: sender throttles queries when it notices receivers are +       discarding packets [61, 66] +     Dynamic Time To Live: In the Dynamic Query Protocol, the sender +       adjusts the time-to-live on each iteration based on the number +       of results received, the number of connections left, and the +       number of nodes already theoretically reached by the search [354] + +   INDEX +   Distribution +     Compression: Leaf nodes periodically send ultrapeers compressed +       query routing tables, as in the Query Routing Protocol [260] +     One hop replication: Nodes maintain an index of content on their +       nearest neighbors [61, 352] +   Partitioning +     By document [210] +     By keyword: Use an inverted list to find a matching document, +       either locally or at another peer [21].  Partition by keyword +       sets [355] +     By document and keyword: Also called Multi-Level Partitioning [21] + +   METRIC +   Query load: Queries per second per node/link [65, 265] +   Degree: The number of links per node [66, 352].  Early P2P networks +     approximated power-law networks, where the number of nodes with L +     links is proportional to L^(-k), where k is a constant [65] +   Query delay: Reported in terms of time and hop count [61, 66] +   Query success rate: The "Collapse Point" is the per-node query rate +     at which the query success rate drops below 90% [61].  See +     also [61, 265, 352]. + +                  Figure 3: Keyword Lookup in P2P Systems + + + + + +Risson & Moors               Informational                     [Page 35] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +4.1.1.  Gnutella Enhancements + +   Perhaps the most widely referenced P2P system for simple keyword +   match is Gnutella [4].  Gnutella queries contain a string of +   keywords.  Gnutella peers answer when they have files whose names +   contain all the keywords.  As discussed in Section 2.1, early +   versions of Gnutella did not forward the document index.  Queries +   were flooded and peers searched their own local indexes for filename +   matches.  An early review highlighted numerous areas for improvement +   [65].  It was estimated that the query traffic alone from 50,000 +   early-generation Gnutella nodes would amount to 1.7% of the total +   U.S. Internet backbone traffic at December 2000 levels.  It was +   speculated that high-degree Gnutella nodes would impede +   dependability.  An unnecessarily high percentage of Gnutella traffic +   crossed Autonomous System (AS) boundaries -- a locality mechanism may +   have found suitable nearby peers. + +   Fortunately, there have since been numerous enhancements within the +   Gnutella Developer Forum.  At the time of writing, it has been +   reported that Gnutella has almost 350,000 unique hosts, of which +   nearly 90,000 accept incoming connections [356].  One of the main +   improvements is that an index of filename keywords, called the Query +   Routing Table (QRT), can now be forwarded from 'leaf peers' to its +   'ultrapeers' [260].  Ultrapeers can then ensure that the leaves only +   receive queries for which they have a match, dramatically reducing +   the query traffic at the leaves.  Ultrapeers can have connections to +   many leaf nodes (~10-100) and a small number of other ultrapeers +   (<10) [260].  Originally, a leaf node's QRT was not forwarded by the +   parent ultrapeer to other ultrapeers.  More recently, there has been +   a proposal to distribute aggregated QRTs amongst ultrapeers [357]. +   To further limit traffic, QRTs are compressed by hashing, according +   to the Query Routing Protocol (QRP) specification [281].  This same +   specification claims QRP may reduce Gnutella traffic by orders of +   magnitude, but cautions that simulation is required before mass +   deployment.  A known shortcoming of QRP was that the extent of query +   propagation was independent of the popularity of the search terms. +   The Dynamic Query Protocol addressed this [358].  It required leaf +   nodes to send single queries to high-degree ultrapeers that adjust +   the queries' time-to-live (TTL) bounds according to the number of +   received query results.  An earlier proposal, called the Gnutella UDP +   Extension for Scalable Searches (GUESS) [353], similarly aimed to +   reduce the number of queries for widely distributed files.  GUESS +   reuses the non-forwarding idea (Section 2).  A GUESS peer repeatedly +   queries single ultrapeers with a TTL of 1, with a small timeout on +   each query to limit load.  It chooses the number of iterations and +   selects ultrapeers so as to satisfy its search needs.  For +   adaptability, a small number of experimental Gnutella nodes have + + + + +Risson & Moors               Informational                     [Page 36] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   implemented eXtensible Markup Language (XML) schemas for richer +   queries [359, 360].  None of the above Gnutella proposals explicitly +   assess robustness. + +   The broader research community has recently been leveraging aspects +   of the Gnutella design.  Lv, Ratnasamy, et al. exposed one assumption +   implicit in some of the early DHT work -- that designs "such as +   Gnutella are inherently not scalable, and therefore should be +   abandoned" [66].  They argued that by making better use of the more +   powerful peers, Gnutella's scalability issues could be alleviated. +   Instead of its flooding mechanism, they used random walks.  Their +   preliminary design to bias random walks towards high capacity nodes +   did not go as far as the ultrapeer proposals in that the indexes did +   not move to the high-capacity nodes.  Chawathe, Ratnasamy, et al. +   chose to extend the Gnutella design with their Gia system, in +   response to the perceived shortcomings of DHTs in Section 1.2 [61]. +   Compared to the early Gnutella designs, they incorporated several +   novel features.  They devise a topology adaptation algorithm so that +   most peers are attached to high-degree peers.  They use a random walk +   search algorithm, in lieu of flooding, and bias the query load +   towards higher-degree peers.  For 'one-hop replication', they require +   all nodes to keep pointers to content on adjacent peers.  To +   implement a receiver-controlled token-based flow control, a peer must +   have a token from its neighbouring peer before it sends a query to +   it.  Chawathe, Ratnasamy, et al. show by simulations that the +   combination of these features provides a scalability improvement of +   three to five orders of magnitude over Gnutella "while retaining +   significant robustness".  The main robustness metrics they used were +   the 'collapse point' query rate (the per-node query rate at which the +   successful query rate falls below 90%) and the average hop count +   immediately prior to collapse.  Their comparison with Gnutella did +   not take into account the Gnutella enhancements above -- this was +   left as future work.  Castro, Costa, and Rowstron argued that if +   Gnutella were built on top of a structured overlay, then both the +   query and overlay maintenance traffic could be reduced [259].  Yang, +   Vinograd, et al. explore various policies for peer selection in the +   GUESS protocol, since the issue is left open in the original proposal +   [265].  For example, the peer initiating the query could choose peers +   that have been "most recently used" or that have the "most files +   shared".  Various policy pitfalls are identified.  For example, good +   peers could be overloaded, victims of their own success. +   Alternatively, malicious peers could encourage the querying peer to +   try inactive peers.  They conclude that a "most results" policy gives +   the best balance of robustness and efficiency.  Like Castro, Costa, +   and Rowstron, they concentrated on the static network scenario. +   Cholvi, Felber, et al. very briefly describe how similar "least +   recently used" and "most often used" heuristics can be used by a peer +   to select peer 'acquaintances' [352].  They were motivated by the + + + +Risson & Moors               Informational                     [Page 37] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   congestion associated with Gnutella's TTL-limited flooding. +   Recognizing that the busiest peers can quickly become overloaded +   central hubs for the entire network, they limit the number of +   acquaintances for any given peer to 25.  They sketch a mechanism to +   decrement a query's TTL multiple times when it traverses "interested +   peers".  In summary, these Gnutella-related investigations are +   characterized by a bias for high-degree peers and very short directed +   query paths, a disdain for flooding, and concern about excessive load +   on the 'better' peers.  Generally, the robustness analysis for +   dynamic networks (content updates and node arrivals/departures) +   remains open. + +4.1.2.  Partition-by-Document, Partition-by-Keyword + +   One aspect of P2P keyword search systems has received particular +   attention: should the index be partitioned by document or by keyword? +   The issue affects scalability.  To be partitioned by document, each +   node has a local index of documents for which it is responsible. +   Gnutella is a prime example.  Queries are generally flooded in +   systems partitioned by document.  On the other hand, a peer may +   assume responsibility for a set of keywords.  The peer uses an +   inverted list to find a matching document, either locally or at +   another peer.  If the query contains several keywords, inverted lists +   may need to be retrieved from several different peers to find the +   intersection [21].  The initial assessment by Li, Loo, et al. was +   that the partition-by-document approach was superior [210].  For one +   scenario of a full-text Web search, they estimated the communications +   costs to be about six times higher than the feasible budget. +   However, wanting to exploit prior work on inverted list intersection, +   they studied the partition-by-keyword strategy.  They proposed +   several optimizations that put the communication costs for a +   partition-by-keyword system within an order of magnitude of +   feasibility.  There had been a couple of prior papers that suggested +   partitioned-by-keyword designs incorporate DHTs to map keywords to +   peers [355, 361].  In Gnawali's Keyword-set Search System (KSS), the +   index is partitioned by sets of keywords [355].  Terpstra, Behnel, et +   al. point out that by keeping keyword pairs or triples, the number of +   lists per document in KSS is squared or tripled [362].  Shi, +   Guangwen, et al. interpreted the approximations of Li, Loo, et al. to +   mean that neither approach is feasible on its own [21].  Their +   Multi-Level Partitioning (MLP) scheme incorporates both partitioning +   approaches.  They arrange nodes into a group hierarchy, with all +   nodes in the single 'level 0' group, and with the same nodes sub- +   divided into k logical subgroups on 'level 1'.  The subgroups are +   again divided, level by level, until level l.  The inverted index is +   partitioned by document between groups and by keyword within groups. +   MLP avoids the query flooding normally associated with systems +   partitioned by document, since a small number of nodes in each group + + + +Risson & Moors               Informational                     [Page 38] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   process the query.  It reduces the bandwidth overheads associated +   with inverted list intersection in systems partitioned solely by +   keyword, since groups can calculate the intersection independently +   over the documents for which they are responsible.  MLP was overlaid +   on SkipNet, per Section 3.5.6 [38].  Some initial analyses of +   communications costs and query latencies were provided. + +4.1.3.  Partial Search, Exhaustive Search + +   Much of the research above addresses partial keyword search. +   Daswani, et al. highlighted the open problem of efficient, +   comprehensive keyword search [25].  How can exhaustive searches be +   achieved without flooding queries to every peer in the network? +   Terpstra, Behnel et al. couched the keyword search problem in +   rendezvous terms: dynamic keyword queries need to 'meet' with static +   document lists [362].  Their Bitzipper scheme is partitioned by +   document.  They improved on full flooding by putting document +   metadata on 2sqrt(n) nodes and forwarding queries through only +   6sqrt(n) nodes.  They reported that Bitzipper nodes need only 1/166th +   of the bandwidth of full-flooding Gnutella nodes for an exhaustive +   search.  An initial comparison of query load was given.  There was +   little consideration of either static or dynamic resilience; that is, +   of nodes failing, of documents continually changing, or of nodes +   continually joining and leaving the network. + +4.2.  Information Retrieval + +   The field of Information Retrieval (IR) has matured considerably +   since its inception in the 1950s [363].  A taxonomy for IR models has +   been formalized [262].  It consists of four elements: a +   representation of documents in a collection; a representation of user +   queries; a framework describing relationships between document +   representations and queries; and a ranking function that quantifies +   an ordering amongst documents for a particular query.  Three main +   issues motivate current IR research -- information relevance, query +   response time, and user interaction with IR systems.  The dominant IR +   trends for searching large text collections are also threefold [262]. +   The size of collections is increasing dramatically.  More complicated +   search mechanisms are being found to exploit document structure, to +   accommodate heterogeneous document collections, and to deal with +   document errors.  Compression is in favour -- it may be quicker to +   search compact text or retrieve it from external devices.  In a +   distributed IR system, query processing has four parts.  Firstly, +   particular collections are targeted for the search.  Secondly, +   queries are sent to the targeted collections.  Queries are then +   evaluated at the individual collections.  Finally, results from the +   collections are collated. + + + + +Risson & Moors               Informational                     [Page 39] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   So how do P2P networks differ from distributed IR systems?  Bawa, +   Manku, et al. presented four differences [62].  They suggested that a +   P2P network is typically larger, with tens or hundreds of thousands +   of nodes.  It is usually more dynamic, with node lifetimes measured +   in hours.  They suggested that a P2P network is usually homogeneous, +   with a common resource description language.  It lacks the +   centralized "mediators" found in many IR systems that assume +   responsibility for selecting collections, for rewriting queries, and +   for merging ranked results.  These distinctions are generally aligned +   with the peer characteristics in Section 1.  One might add that P2P +   nodes display more symmetry -- peers are often both information +   consumers and producers.  Daswani, Garcia-Molina, et al. pointed out +   that, while there are IR techniques for ranked keyword search at +   moderate scale, research is required so that ranking mechanisms are +   efficient at the larger scale targeted by P2P designs [25].  Joseph +   and Hoshiai surveyed several P2P systems using metadata techniques +   from the IR toolkit [60].  They described an assortment of IR +   techniques and P2P systems, including various metadata formats, +   retrieval models, bloom filters, DHTs, and trust issues. + +   In the ensuing paragraphs, we survey P2P work that has incorporated +   information retrieval models, particularly the Vector Model and the +   Latent Semantic Indexing Model.  We omit the P2P work based on +   Bayesian models.  Some have pointed to such work [60], but made no +   explicit mention of the model [364].  One early paper on P2P +   content-based image retrieval also leveraged the Bayesian model +   [365].  For the former two models, we briefly describe the design, +   then try to highlight robustness aspects.  On robustness, we are +   again stymied for lack of prior work.  Indeed, a search across all +   proceedings of the Annual ACM Conference on Research and Development +   in Information Retrieval for the words "reliable", "available", +   "dependable", or "adaptable" did not return any results at the time +   of writing.  In contrast, a standard text on distributed database +   management systems [366] contains a whole chapter on reliability.  IR +   research concentrates on performance measures.  Common performance +   measures include recall, the fraction of the relevant documents that +   has been retrieved and precision, the fraction of the retrieved +   documents that is relevant [262].  Ideally, an IR system would have +   high recall and high precision.  Unfortunately techniques favouring +   one often disadvantage the other [363]. + + + + + + + + + + + +Risson & Moors               Informational                     [Page 40] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +4.2.1.  Vector Model (PlanetP, FASD, eSearch) + +   The vector model [367] represents both documents and queries as term +   vectors, where a term could be a word or a phrase.  If a document or +   query has a term, the weight of the corresponding dimension of the +   vector is non-zero.  The similarity of the document and query vectors +   gives an indication of how well a document matches a particular +   query. + +   The weighting calculation is critical across the retrieval models. +   Amongst the numerous proposals for the probabilistic and vector +   models, there are some commonly recurring weighting factors [363]. +   One is term frequency.  The more a term is repeated in a document, +   the more important the term is.  Another is inverse document +   frequency.  Terms common to many documents give less information +   about the content of a document.  Then there is document length. +   Larger documents can bias term frequencies, so weightings are +   sometimes normalized against document length.  The expression "TFIDF +   weighting" refers to the collection of weighting calculations that +   incorporate term frequency and inverse document frequency, not just +   to one.  Two weighting calculations have been particularly dominant +   -- Okapi [368] and pivoted normalization [369].  A distributed +   version of Google's Pagerank algorithm has also been devised for a +   P2P environment [370].  It allows incremental, ongoing Pagerank +   calculations while documents are inserted and deleted. + +   A couple of early P2P systems leveraged the vector model.  Building +   on the vector model, PlanetP divided the ranking problem into two +   steps [215].  In the first, peers are ranked for the probability that +   they have matching documents.  In the second, higher-priority peers +   are contacted and the matching documents are ranked.  An Inverse Peer +   Frequency, analogous to the Inverse Document Frequency, is used to +   rank relevant peers.  To further constrain the query traffic, PlanetP +   contacts only the first group of m peers to retrieve a relevant set +   of documents.  In this way, it repeatedly contacts groups of m peers +   until the top k document rankings are stable.  While the PlanetP +   designers first quantified recall and precision, they also considered +   reliability.  Each PlanetP peer has a global index with a list of all +   other peers, their IP addresses, and their Bloom filters.  This large +   volume of shared information needs to be maintained.  Klampanos and +   Jose saw this as PlanetP's primary shortcoming [371].  Each Bloom +   filter summarized the set of terms in the local index of each peer. +   The time to propagate changes, be they new documents or peer +   arrivals/departures, was studied by simulation for up to 1000 peers. +   The reported propagation times were in the hundreds of seconds. +   Design workarounds were required for PlanetP to be viable across +   slower dial-up modem connections.  For future work, the authors were + + + + +Risson & Moors               Informational                     [Page 41] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   considering some sort of hierarchy to scale to larger numbers of +   peers. + +   A second early system using the vector model is the Fault-tolerant, +   Adaptive, Scalable Distributed (FASD) search engine [283], which +   extended the Freenet design (Section 2.3) for richer queries.  The +   original Freenet design could find a document based on a globally +   unique identifier.  Kronfol's design added the ability to search, for +   example, for documents about "apples AND oranges NOT bananas".  It +   uses a TFIDF weighting scheme to build a document's term vector. +   Each peer calculates the similarity of the query vector and local +   documents and forwards the query to the best downstream peer.  Once +   the best downstream peer returns a result, the second-best peer is +   tried, and so on.  Simulations with 1000 nodes gave an indication of +   the query path lengths in various situations -- when routing queries +   in a network with constant rates of node and document insertion, when +   bootstrapping the network in a "worst-case" ring topology, or when +   failing randomly and specifically selected peers.  Kronfol claimed +   excellent average-case performance -- less than 20 hops to retrieve +   the same top n results as a centralized search engine.  There were, +   however, numerous cases where the worst-case path length was several +   hundred hops in a network of only 1000 nodes. + +   In parallel, there have been some P2P designs based on the vector +   model from the University of Rochester -- pSearch [9, 372] and +   eSearch [373].  The early pSearch paper suggested a couple of +   retrieval models, one of which was the Vector Space Model, to search +   only the nodes likely to have matching documents.  To obtain +   approximate global statistics for the TFIDF calculation, a spanning +   tree was constructed across a subset of the peers.  For the m top +   terms, the term-to-document index was inserted into a Content- +   Addressable Network [334].  A variant that mapped terms to document +   clusters was also suggested. eSearch is a hybrid of the partition- +   by-document and partition-by-term approaches (Section 4.1.2) eSearch +   nodes are primarily partitioned by term.  Each is responsible for the +   inverted lists for some top terms.  For each document in the inverted +   list, the node stores the complete term list.  To reduce the size of +   the index, the complete term lists for a document are only kept on +   nodes that are responsible for top terms in the document.  eSearch +   uses the Okapi term weighting to select top terms.  It relies on the +   Chord DHT [34] to associate terms with nodes storing the inverted +   lists.  It also uses automatic query expansion.  This takes the +   significant terms from the top document matches and automatically +   adds them to the user's query to find additional relevant documents. +   The eSearch performance was quantified in terms of search precision, +   the number of retrieved documents, and various load-balancing +   metrics.  Compared to the more common proposals for partitioning by + + + + +Risson & Moors               Informational                     [Page 42] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   keywords, eSearch consumed 6.8 times the storage space to achieve +   faster search times. + +4.2.2.  Latent Semantic Indexing (pSearch) + +   Another retrieval model used in P2P proposals is Latent Semantic +   Indexing (LSI) [374].  Its key idea is to map both the document and +   query vectors to a concept space with lower dimensions.  The starting +   point is a t*N weighting matrix, where t is the total number of +   indexed terms, N is the total number of documents, and the matrix +   elements could be TFIDF rankings.  Using singular value +   decomposition, this matrix is reduced to a smaller number of +   dimensions, while retaining the more significant term-to-document +   mappings.  Baeza-Yates and Ribeiro-Neto suggested that LSI's value is +   a novel theoretic framework, but that its practical performance +   advantage for real document collections had yet to be proven [262]. +   pSearch incorporated LSI [9].  By placing the indices for +   semantically similar documents close in the network, Tang, Xu, et al. +   touted significant bandwidth savings relative to the early full- +   flooding variant of Gnutella [372].  They plotted the number of nodes +   visited by a query.  They also explored the trade-off with accuracy, +   the percentage match between the documents returned by the +   distributed pSearch algorithm and those from a centralized LSI +   baseline.  In a more recent update to the pSearch work, Tang, +   Dwarkadas, et al. summarized LSI's shortcomings [375].  Firstly, for +   large document collections, its retrieval quality is inherently +   inferior to Okapi.  Secondly, singular value decomposition consumes +   excessive memory and computation time.  Consequently, the authors +   used Okapi for searching while retaining LSI for indexing.  With +   Okapi, they selected the next node to be searched and selected +   documents on searched nodes.  With LSI, they ensured that similar +   documents are clustered near each other, thereby optimizing the +   network search costs.  When retrieving a small number of top +   documents, the precision of LSI+Okapi approached that of Okapi. +   However, if retrieving a large number of documents, the LSI+Okapi +   precision is inferior.  The authors want to improve this in future +   work. + +4.2.3.  Small Worlds + +   The "small world" concept originally described how people are +   interconnected by short chains of acquaintances [376].  Kleinberg was +   struck by the algorithmic lesson of the small world, namely "that +   individuals using local information are collectively very effective +   at constructing short paths between two points in a social network" +   [377].  Small world networks have a small diameter and a large +   clustering coefficient (a large number of connections amongst +   relevant nodes) [378]. + + + +Risson & Moors               Informational                     [Page 43] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   The small world idea has had a limited impact on peer-to-peer +   algorithms.  It has influenced only a few unstructured [62, 378-380] +   and structured [344, 381] algorithms.  The most promising work on +   "small worlds" in P2P networks are those concerned with the +   information retrieval metrics, precision and recall [62, 378, 380]. + +5.  Queries + +   Database research suggests directions for P2P research.  Hellerstein +   observed that, while work on fast P2P indexes is well underway, P2P +   query optimization remains a promising topic for future research +   [23].  Kossman reviewed the state of the art of distributed query +   processing, highlighting areas for future research: simulation and +   query optimization for networks of tens of thousands of servers and +   millions of clients; non-relational data types (e.g., XML, text, and +   images); and partial query responses since on the Internet, "failure +   is the rule rather than the exception" [19].  A primary motivation +   for the P2P system, PIER, was to scale from the largest database +   systems of a few hundred nodes to an Internet environment in which +   there are over 160 million nodes [22].  Litwin and Sahri have also +   considered ways to combine distributed hashing, more specifically the +   Scalable Distributed Data Structures, with SQL databases, claiming to +   be first to implement scalable distributed database partitioning +   [382].  Motivated by the lack of transparent distribution in current +   distributed databases, they measure query execution times for +   Microsoft SQL servers aggregated by means of an SDDS layer.  One of +   their starting assumptions was that it is too challenging to change +   the SQL query optimizer. + +   Database research also suggests the approach to P2P research. +   Researchers of database query optimization were divided between those +   looking for optimal solutions in special cases and those using +   heuristics to answer all queries [383].  Gribble, et al. cast query +   optimization in terms of the data placement problem, which is to +   "distribute data and work so the full query workload is answered with +   lowest cost under the existing bandwidth and resource constraints" +   [250].  They pointed out that even the static version of this problem +   is NP-complete in P2P networks.  Consequently, research on massive, +   dynamic P2P networks will likely progress using both strategies of +   early database research - heuristics and special-case optimizations. + +   If P2P networks are going to be adaptable, if they are to support a +   wide range of applications, then they need to accommodate many query +   types [72].  Up to this point, we have reviewed queries for keys +   (Section 3) and keywords (Sections 4.1. and 4.2).  Unfortunately, a +   major shortcoming of the DHTs in Section 3.5 is that they primarily +   support exact-match, single-key queries.  Skip Graphs support range +   and prefix queries, but not aggregation queries.  Here we probe below + + + +Risson & Moors               Informational                     [Page 44] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   the language syntax to identify the open research issues associated +   with more expressive P2P queries [25].  Triantafillou and Pitoura +   observed the disparate P2P designs for different types of queries and +   so outlined a unifying framework [76].  To classify queries, they +   considered the number of relations (single or multiple), the number +   of attributes (single or multiple), and the type of query operator. +   They described numerous operators:  equality, range, join, and +   "special functions".  The latter referred to aggregation (like sum, +   count, average, minimum, and maximum), grouping and ordering.  The +   following sections approximately fit their taxonomy -- range queries, +   multi-attribute queries, join queries and aggregation queries.  There +   has been some initial P2P work on other query types -- continuous +   queries [20, 22, 73], recursive queries [22, 74], and adaptive +   queries [23, 75].  For these, we defer to the primary references. + +5.1.  Range Queries + +   The support of efficient range predicates in P2P networks was +   identified as an important open research issue by Huebsch, et al. +   [22].  Range partitioning has been important in parallel databases to +   improve performance, so that a transaction commonly needs data from +   only one disk or node [22].  One type of range search, longest prefix +   match, is important because of its prevalence in routing schemes for +   voice and data networks alike.  In other applications, users may pose +   broad, inexact queries, even though they require only a small number +   of responses.  Consequently, techniques to locate similar ranges are +   also important [77].  Various proposals for range searches over P2P +   networks are summarized in Figure 4.  Since the Scalable Distributed +   Data Structure (SDDS) has been an important influence on contemporary +   Distributed Hash Tables (DHTs) [49-51], we also include ongoing work +   on SDDS range searches. + +   PEER-TO-PEER (P2P) +   Locality Sensitive Hashing (Chord) [77] +   Prefix Hash Trees (unspecified DHT) [78, 79] +   Space Filling Curves (CAN) [80] +   Space Filling Curves (Chord) [81] +   Quadtrees (Chord) [82] +   Skip Graphs [38, 41, 83, 100] +   Mercury [84] +   P-Grid [85, 86] + +   SCALABLE DISTRIBUTED DATA STRUCTURES (SDDS) +   RP*   [87, 88] + +       Figure 4: Solutions for Range Queries on P2P and SDDS Indexes + + + + + +Risson & Moors               Informational                     [Page 45] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   The papers on P2P range search can be divided into those that rely on +   an underlying DHT (the first five entries in Figure 4) and those that +   do not (the subsequent three entries).  Bharambe, Agrawal, et al. +   argued that DHTs are inherently ill-suited to range queries [84]. +   The very feature that makes for their good load balancing properties, +   randomized hash functions, works against range queries.  One possible +   solution would be to hash ranges, but this can require a priori +   partitioning.  If the partitions are too large, partitions risk +   overload.  If they are too small, there may be too many hops. + +   Despite these potential shortcomings, there have been several range +   query proposals based on DHTs.  If hashing ranges to nodes, it is +   entirely possible that overlapping ranges map to different nodes. +   Gupta, Agrawal, et al. rely on locality sensitive hashing to ensure +   that, with high probability, similar ranges are mapped to the same +   node [77].  They propose one particular family of locality sensitive +   hash functions, called min-wise independent permutations.  The number +   of partitions per node and the path length were plotted against the +   total numbers of peers in the system.  For a network with 1000 nodes, +   the hop count distribution was very similar to that of the exact- +   matching Chord scheme.  Was it load-balanced?  For the same network +   with 50,000 partitions, there were over two orders of magnitude +   variation in the number of partitions at each node (first and +   ninety-ninth percentiles).  The Prefix Hash Tree is a trie in which +   prefixes are hashed onto any DHT.  The preliminary analysis suggests +   efficient doubly logarithmic lookup, balanced load, and fault +   resilience [78, 79].  Andrzejak and Xu were perhaps the first to +   propose a mapping from ranges to DHTs [80].  They use one particular +   Space Filling Curve, the Hilbert curve, over a Content Addressable +   Network (CAN) construction (Section 3.5.3).  They maintain two +   properties: nearby ranges map to nearby CAN zones; if a range is +   split into two sub-ranges, then the zones of the sub-ranges partition +   the zone of the primary range.  They plot path length and load proxy +   measures (the total number of messages and nodes visited) for three +   algorithms to propagate range queries: brute force, controlled +   flooding, and directed controlled flooding.  Schmidt and Parashar +   also advocated Space Filling Curves to achieve range queries over a +   DHT [81].  However, they point out that, while Andrzejak and Xu use +   an inverse Space Filling Curve to map a one-dimensional space to d- +   dimensional zones, they map a d-dimensional space back to a one- +   dimensional index.  Such a construction gives the ability to search +   across multiple attributes (Section 5.2).  Tanin, Harwood, et al. +   suggested quadtrees over Chord [82], and gave preliminary simulation +   results for query response times. + +   Because DHTs are naturally constrained to exact-match, single-key +   queries, researchers have considered other P2P indexes for range +   searches.  Several were based on Skip Graphs [38, 41], which, unlike + + + +Risson & Moors               Informational                     [Page 46] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   the DHTs, do not necessitate randomizing hash functions and are +   therefore capable of range searches.  Unfortunately, they are not +   load balanced [83].  For example, in SkipNet [48], hashing was added +   to balance the load -- the Skip Graph could support range searches or +   load balancing, but not both.  One solution for load-balancing relies +   on an increased number of 'virtual' servers [168] but, in their +   search for a system that can both search for ranges and balance +   loads, Bharambe, Agrawal, et al. rejected the idea [84].  The virtual +   servers work assumed load imbalance stems from hashing; that is, by +   skewed data insertions and deletions.  In some situations, the +   imbalance is triggered by a skewed query load.  In such +   circumstances, additional virtual servers can increase the number of +   routing hops and increase the number of pointers that a Skip Graph +   needs to maintain.  Ganesan, Bawa, et al. devised an alternate method +   to balance load [83].  They proposed two Skip Graphs, one to index +   the data itself and the other to track load at each node in the +   system.  Each node is able to determine the load on its neighbours +   and the most (least) loaded nodes in the system.  They devise two +   algorithms: NBRADJUST balances load on neighbouring nodes; using +   REORDER, empty nodes can take over some of the tuples on heavily +   loaded nodes.  Their simulations focus on skewed storage load, rather +   than on skewed query loads, but they surmise that the same approach +   could be used for the latter. + +   Other proposals for range queries avoid both the DHT and the Skip +   Graph.  Bharambe, Agrawal, et al. distinguish their Mercury design by +   its support for multi-attribute range queries and its explicit load +   balancing [84].  In Mercury, nodes are grouped into routing hubs, +   each of which is responsible for various query attributes.  While it +   does not use hashing, Mercury is loosely similar to the DHT +   approaches: nodes within hubs are arranged into rings, like Chord +   [34]; for efficient routing within hubs, k long-distance links are +   used, like Symphony [381].  Range lookups require O(((log n)^2)/k) +   hops.  Random sampling is used to estimate the average load on nodes +   and to find the parts of the overlay that are lightly loaded. +   Whereas Symphony assumed that nodes are responsible for ranges of +   approximately equal size, Mercury's random sampling can determine the +   location of the start of the range, even for non-uniform ranges [84]. +   P-Grid [42] does provide for range queries, by virtue of the key +   ordering in its tree structures.  Ganesan, Bawa, et al. critiqued its +   capabilities [83]: P-Grid assumes fixed-capacity nodes; there was no +   formal characterization of imbalance ratios or balancing costs; every +   P-Grid periodically contacts other nodes for load information. + +   The work on Scalable Distributed Data Structures (SDDSs) has +   progressed in parallel with P2P work and has addressed range queries. +   Like the DHTs above, the early SDDS Linear Hashing (LH*) schemes were +   not order-preserving [52].  To facilitate range queries, Litwin, + + + +Risson & Moors               Informational                     [Page 47] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   Niemat, et al. devised a Range Parititioning variant, RP* [87]. +   There are options to dispense with the index, to add indexes to +   clients, and to add them to servers.  In the variant without an +   index, every query is issued via multicasting.  The other variants +   also use some multicasting.  The initial RP* paper suggested +   scalability to thousands of sites, but a more recent RP* simulation +   was capped at 140 servers [88].  In that work, Tsangou, Ndiaye, et +   al. investigated TCP and UDP mechanisms by which servers could return +   range query results to clients.  The primary metrics were search and +   response times.  Amongst the commercial parallel database management +   systems, they reported that the largest seems only to scale to 32 +   servers (SQL Server 2000).  For future work, they planned to explore +   aggregation of query results, rather than establishing a connection +   between the client and every single server with a response. + +   All in all, it seems there are numerous open research questions on +   P2P range queries.  How realistic is the maintenance of global load +   statistics considering the scale and dynamism of P2P networks? +   Simulations at larger scales are required.  Proposals should take +   into account both the storage load (insert and delete messages) and +   the query load (lookup messages).  Simplifying assumptions need to be +   attacked.  For example, how well do the above solutions work in +   networks with heterogeneous nodes, where the maximum message loads +   and index sizes are node-dependent? + +5.2.  Multi-Attribute Queries + +   There has been some work on multi-attribute P2P queries.  As late as +   September 2003, it was suggested that there has not been an efficient +   solution [76]. + +   Again, an early significant work on multi-attribute queries over +   aggregated commodity nodes germinated amongst SDDSs.  k-RP* [89] uses +   the multi-dimensional binary search tree (or k-d tree, where k +   indicates the number of dimensions of the search index) [384].  It +   builds on the RP* work from the previous section and inherits their +   capabilities for range search and partial match.  Like the other +   SDDSs, k-RP* indexes can fit into RAM for very fast lookup.  For +   future work, Litwin and Neimat suggested a) a formal analysis of the +   range search termination algorithm and the k-d paging algorithm, b) a +   comparison with other multi-attribute data structures (quad-trees and +   R-trees) and c) exploration of query processing, concurrency control, +   and transaction management for k-RP* files [89].  On the latter +   point, others have considered transactions to be inconsequential to +   the core problem of supporting more complex queries in P2P networks +   [72]. + + + + + +Risson & Moors               Informational                     [Page 48] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   In architecting their secure wide-area Service Discovery Service +   (SDS), Hodes, Czerwinski, et al. considered three possible designs +   for multi-criteria search -- Centralization, Mapping and Flooding +   [90].  These correlate to the index classifications of Section 2 -- +   Central, Distributed, and Local.  They discounted the centralized, +   Napster-like index for its risk of a single point of failure.  They +   considered the hash-based mappings of Section 3, but concluded that +   it would not be possible to adequately partition data.  A document +   satisfying many criteria would be wastefully stored in many +   partitions.  They rejected full flooding for its lack of scalability. +   Instead, they devised a query filtering technique, reminiscent of +   Gnutella's query routing protocol (Section 4.1).  Nodes push +   proactive summaries of their data rather than waiting for a query. +   Summaries are aggregated and stored throughout a server hierarchy, to +   guide subsequent queries.  Some initial prototype measurements were +   provided for total load on the system, but not for load distribution. +   They put several issues forward for future work.  The indexing needs +   to be flexible to change according to query and storage workloads.  A +   mesh topology might improve on their hierarchic topology since query +   misses would not propagate to root servers.  The choice is analogous +   to BGP meshes and DNS trees. + +   More recently, Cai, Frank, et al. devised the Multi-Attribute +   Addressable Network (MAAN) [91].  They built on Chord to provide both +   multi-attribute and range queries, claiming to be the first to +   service both query types in a structured P2P system.  Each MAAN node +   has O(log n) neighbours, where N is the number of nodes.  MAAN +   multi-attribute range queries require O(log n+N*Smin) hops, where +   Smin is the minimum range selectivity across all attributes. +   Selectivity is the ratio of the query range to the entire identifier +   range.  The paper assumed that a locality preserving hash function +   would ensure balanced load.  Per Section 5.1, the arguments by +   Bharambe, Agrawal, et al. have highlighted the shortcomings of this +   assumption [84].  MAAN required that the schema must be fixed and +   known in advance -- adaptable schemas were recommended for subsequent +   attention.  The authors also acknowledged that there is a selectivity +   breakpoint at which full flooding becomes more efficient than their +   scheme.  This begs for a query resolution algorithm that adapts to +   the profile of queries.  Cai and Frank followed up with RDFPeers +   [55].  They differentiate their work from other RDF proposals by a) +   guaranteeing to find query results if they exist and b) removing the +   requirement of prior definition of a fixed schema.  They hashed +   <subject, predicate, object> triples onto the MAAN and reported +   routing hop metrics for their implementation.  Load imbalance across +   nodes was reduced to less than one order of magnitude, but the +   specific measure was the number of triples stored per node - skewed +   query loads were not considered.  They plan to improve load balancing +   with the virtual servers of Section 5.1 [168]. + + + +Risson & Moors               Informational                     [Page 49] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +5.3.  Join Queries + +   Two research teams have done some initial work on P2P join +   operations.  Harren, Hellerstein, et al. initially described a +   three-layer architecture -- storage, DHT and query processing.  They +   implemented the join operation by modifying an existing Content +   Addressable Network (CAN) simulator, reporting "significant hot-spots +   in all dimensions: storage, processing, and routing" [72].  They +   progressed their design more recently in the context of PIER, a +   distributed query engine based on CAN [22, 385].  They implemented +   two equi-join algorithms.  In their design, a key is constructed from +   the "namespace" and the "resource ID".  There is a namespace for each +   relation and the resource ID is the primary key for base tuples in +   that relation.  Queries are multicast to all nodes in the two +   namespaces (relations) to be joined.  Their first algorithm is a DHT +   version of the symmetric hash join.  Each node in the two namespaces +   finds the relevant tuples and hashes them to a new query namespace. +   The resource ID in the new namespace is the concatenation of join +   attributes.  In the second algorithm, called "fetch matches", one of +   the relations is already hashed on the join attributes.  Each node in +   the second namespace finds tuples matching the query and retrieves +   the corresponding tuples from the first relation.  They leveraged two +   other techniques, namely the symmetric semi-join rewrite and the +   Bloom filter rewrite, to reduce the high bandwidth overheads of the +   symmetric hash join.  For an overlay of 10,000 nodes, they simulated +   the delay to retrieve tuples and the aggregate network bandwidth for +   these four schemes.  The initial prototype was on a cluster of 64 +   PCs, but it has more recently been expanded to PlanetLab. + +   Triantafillou and Pitoura considered multicasting to large numbers of +   peers to be inefficient [76].  They therefore allocated a limited +   number of special peers, called range guards.  The domain of the join +   attributes was divided, one partition per range guard.  Join queries +   were sent only to range guards, where the query was executed. +   Efficient selection of range guards and a quantitive evaluation of +   their proposal were left for future work. + +5.4.  Aggregation Queries + +   Aggregation queries invariable rely on tree-structures to combine +   results from a large number of nodes.  Examples of aggregation +   queries are Count, Sum, Maximum, Minimum, Average, Median, and Top-K +   [92, 386, 387].  Figure 5 summarizes the tree and query +   characteristics that affect dependability. + + + + + + + +Risson & Moors               Informational                     [Page 50] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   Tree type: Doesn't use DHT [92], use internal DHT trees [95], use +      independent trees on top of DHTs +   Tree repair: Periodic [93], exceptional [32] +   Tree count: One per key, one per overlay [56] +   Tree flexibility: Static [92], dynamic + +   Query interface: install, update, probe [98] +   Query distribution: multicast [98], gossip [92] +   Query applications: leader election, voting, resource location, +      object placement and error recovery [98, 388] +   Query semantics +      Consistency: Best-effort, eventual [92], snapshot / interval / +         single-site validity [99] +      Timeliness [388] +      Lifetime: Continuous [97, 99], single-shot +      No. attributes: Single, multiple +   Query types: Count, sum, maximum, minimum, average, median, top k +      [92, 386, 387] + +          Figure 5: Aggregation Trees and Queries in P2P Networks + +   Key: Astrolabe [92]; Cone [93]; Distributed Approximative System +   Information Service (DASIS) [95]; Scalable Distributed Information + +   Management System (SDIMS) [98]; Self-Organized Metadata Overlay +   (SOMO) [56]; Wildfire [99]; Willow [32]; Newscast [97] + +   The fundamental design choices for aggregation trees relate to how +   the overlay uses DHTs, how it repairs itself when there are failures, +   how many aggregation trees there are, and whether the tree is static +   or dynamic (Figure 5).  Astrolabe is one of the most influential P2P +   designs included in Figure 5, yet it makes no use of DHTs [92]. +   Other designs make use of the internal trees of Plaxton-like DHTs. +   Others build independent tree structures on top of DHTs.  Most of the +   designs repair the aggregation tree with periodic mechanisms similar +   to those used in the DHTs themselves.  Willow is an exception [32]. +   It uses a Tree Maintenance Protocol to "zip" disjoint aggregation +   trees together when there are major failures.  Yalagandula and Dahlin +   found reconfigurations at the aggregation layer to be costly, +   suggesting more research on techniques to reduce the cost and +   frequency of such reconfigurations [98].  Many of the designs use +   multiple aggregation trees, each rooted at the DHT node responsible +   for the aggregation attribute.  On the other hand, the Self-Organized +   Metadata Overlay [56] uses a single tree and is vulnerable to a +   single point of failure at its root. + + + + + + +Risson & Moors               Informational                     [Page 51] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   At the time of writing, researchers have just begun exploring the +   performance of queries in the presence of churn.  Most designs are +   for best-effort queries.  Bawa, et al. devised a better consistency +   model, called Single-Site Validity [99] to qualify the accuracy of +   results when there is churn.  Its price was a five-fold increase in +   the message load, when compared to an efficient but best-effort +   Spanning Tree.  Gossip mechanisms are resilient to churn, but they +   delay aggregation results and incur high message cost for aggregation +   attributes with small read-to-write ratios. + +6.  Security Considerations + +   An initial list of references to research on P2P security is given in +   Figure 1, Section 1.  This document addresses P2P search.  P2P +   storage, security, and applications are recommended for further +   investigation in Section 8. + +7.  Conclusions + +   Research on peer-to-peer networks can be divided into four categories +   -- search, storage, security and applications.  This critical survey +   has focused on search methods.  While P2P networks have been +   classified by the existence of an index (structured or unstructured) +   or the location of the index (local, centralized, and distributed), +   this survey has shown that most have evolved to have some structure, +   whether it is indexes at superpeers or indexes defined by DHT +   algorithms.  As for location, the distributed index is most common. +   The survey has characterized indexes as semantic and semantic-free. +   It has also critiqued P2P work on major query types.  While much of +   it addresses work from 2000 or later, we have traced important +   building blocks from the 1990s. + +   The initial motivation in this survey was to answer the question, +   "How robust are P2P search networks?"  The question is key to the +   deployment of P2P technology.  Balakrishnan, Kaashoek, et al. argued +   that the P2P architecture is appealing: the startup and growth +   barriers are low; they can aggregate enormous storage and processing +   resources; "the decentralized and distributed nature of P2P systems +   gives them the potential to be robust to faults or intentional +   attacks" [18].  If P2P is to be a disruptive technology in +   applications other than casual file sharing, then robustness needs to +   be practically verified [20]. + +   The best comparative research on P2P dependability has been done in +   the context of Distributed Hash Tables (DHTs) [291].  The entire body +   of DHT research can be distilled to four main observations about +   dependability (Section 3.2).  Firstly, static dependability +   comparisons show that no O(log n) DHT geometry is significantly more + + + +Risson & Moors               Informational                     [Page 52] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   dependable than the other O(log n) geometries.  Secondly, dynamic +   dependability comparisons show that DHT dependability is sensitive to +   the underlying topology maintenance algorithms (Figure 2).  Thirdly, +   most DHTs use O(log n) geometries to suit ephemeral nodes, whereas +   the O(1) hop DHTs suit stable nodes - they deserve more research +   attention.  Fourthly, although not yet a mature science, the study of +   DHT dependability is helped by recent simulation tools that support +   multiple DHTs [299]. + +   We make the following four suggestions for future P2P research: + +   1) Complete the companion P2P surveys for storage, security, and +      applications.  A rough outline has been suggested in Figure 1, +      along with references.  The need for such surveys was highlighted +      within the peer-to-peer research group of the Internet Research +      Task Force (IRTF) [17]. + +   2) P2P indexes are maturing.  P2P queries are embryonic.  Work on +      more expressive queries over P2P indexes started to gain momentum +      in 2003, but remains fraught with efficiency and load issues. + +   3) Isolate the low-level mechanisms affecting robustness.  There is +      limited value in comparing robustness of DHT geometries (like +      rings versus de Bruijn graphs), when robustness is highly +      sensitive to underlying topology maintenance algorithms (Figure +      2). + +   4) Build consensus on robustness metrics and their acceptable ranges. +      This paper has teased out numerous measures that impinge on +      robustness, for example, the median query path length for a +      failure of x% of nodes, bisection width, path overlap, the number +      of alternatives available for the next hop, lookup latency, +      average live bandwidth (bytes/node/sec), successful routing rates, +      the number of timeouts (caused by a finger pointing to a departed +      node), lookup failure rates (caused by nodes that temporarily +      point to the wrong successor during churn), and clustering +      measures (edge expansion and node expansion).  Application-level +      robustness metrics need to drive a consistent assessment of the +      underlying search mechanics. + +8.  Acknowledgments + +   This document was adapted from a paper in Elsevier's Computer +   Networks: + +      J. Risson & T. Moors, Survey of Research towards Robust Peer-to- +      Peer Networks: Search Methods, Computer Networks 51(7)2007. + + + + +Risson & Moors               Informational                     [Page 53] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   We thank Bill Yeager, Ali Ghodsi, and several anonymous reviewers for +   thorough comments that significantly improved the quality of earlier +   versions of this document. + +9.  References + +9.1.  Informative References + +   [1]   M. Roussopoulos, M. Baker, D. Rosenthal, T. Guili, P. Maniatis, +         and J. Mogul, 2 P2P of Not 2 P2P?, The 3rd Int'l Workshop on +         Peer-to-Peer Systems, February 26-27 2004. + +   [2]   A. Rowstron and P. Druschel, Pastry:  Scalable, distributed +         object location and routing for large-scale peer-to-peer +         systems, IFIP/ACM Middleware 2001, Nov 2001. + +   [3]   B. Yeager and B. Bhattacharjee, Peer-to-Peer Research Group +         Charter, http://www.irtf.org/charters/p2prg.html (2003) + +   [4]   T. Klingberg and R. Manfredi, Gnutella 0.6, (2002) + +   [5]   I. Clarke, A Distributed Decentralised Information Storage and +         Retrieval System, Undergraduate Thesis, 1999. + +   [6]   B. Zhao, J. Kubiatowicz, and A. Joseph, Tapestry:  an +         infrastructure for fault-tolerant wide-area location and +         routing, Report No. UCB/CSD-01-1141 2001. + +   [7]   I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, +         F. Dabek, and H. Balakrishnan, Chord:  A scalable peer-to-peer +         lookup service for internet applications, Proc.  ACM SIGCOMM +         2001 2001, pp. 149-160. + +   [8]   S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, +         A scalable content-addressable network, Proc. of the conf. on +         Applications, technologies, architectures and protocols for +         computer communications, August 27-31 2001, pp. 161-172. + +   [9]   C. Tang, Z. Xu, and M. Mahalingam, pSearch: information +         retrieval in structured overlays, First Workshop on Hot Topics +         in Networks. Also Computer Communication Review, Volume 33, +         Number 1, January 2003, Oct 28-29 2002. + +   [10]  W. Nejdl, S. Decker, and W. Siberski, Edutella Project, RDF- +         based Metadata Infrastructure for P2P Applications, +         http://edutella.jxta.org/ (2003) + + + + + +Risson & Moors               Informational                     [Page 54] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [11]  K. Aberer and M. Hauswirth, Peer-to-peer information systems: +         concepts and models, state-of-the-art, and future systems, ACM +         SIGSOFT Software Engineering Notes, Proc. 8th European software +         engineering conference held jointly with 9th ACM SIGSOFT +         international symposium on foundations of software engineering +         26 (5) (2001) + +   [12]  L. Zhou and R. van Renesse, P6P: a peer-to-peer approach to +         internet infrastructure, The 3rd Int'l Workshop on Peer-to-Peer +         Systems, February 26-27 2004. + +   [13]  Citeseer, Citeseer Scientific Literature Digital Library, +         http://citeseer.ist.psu.edu/ (2004) + +   [14]  D. Milojicic, V. Kalogeraki, R. Lukose, K. Nagaraja, J. Pruyne, +         B. Richard, S. Rollins, and Z. Xu, Peer-to-Peer Computing, HP +         Technical Report, HPL-2002-57 2002. + +   [15]  K. Aberer and M. Hauswirth, An overview on peer-to-peer +         information systems, Workshop on Distributed Data and +         Structures WDAS-2002 2002. + +   [16]  F. DePaoli and L. Mariani, Dependability in Peer-to-Peer +         Systems, IEEE Internet Computing 8 (4) (2004) 54-61. + +   [17]  B. Yeager, Proposed research tracks, Email to the Internet +         Research Task Force IRTF P2P Research Group, Nov 10 2003. + +   [18]  H. Balakrishnan, M. F. Kaashoek, D. Karger, R. Morris, and I. +         Stoica, Looking up data in P2P systems, Communications of the +         ACM 46 (2) (2003) 43-48. + +   [19]  D. Kossmann, The state of the art in distributed query +         processing, ACM Computing Surveys 32 (4) (2000) 422-469. + +   [20]  B. Gedik and L. Liu, Reliable peer-to-peer information +         monitoring through replication, Proc. 22nd Int'l Symp. on +         Reliable Distributed Systems, 6-8 Oct 2003, pp. 56-65. + +   [21]  S.-M. Shi, Y. Guangwen, D. Wang, J. Yu, S. Qu, and M. Chen, +         Making peer-to-peer keyword searching feasible using multi- +         level partitioning, The 3rd Int'l Workshop on Peer-to-Peer +         Systems, February 26-27 2004. + +   [22]  R. Huebsch, J. M. Hellerstein, N. Lanham, B. T. Loo, S. +         Shenker, and I. Stoica, Querying the Internet with PIER, Proc. +         29th Int'l Conf. on Very Large Databases VLDB'03, September +         2003. + + + +Risson & Moors               Informational                     [Page 55] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [23]  J. M. Hellerstein, Toward network data independence, ACM SIGMOD +         Record 32 (3) (2003) 34-40. + +   [24]  K. Gummadi, R. Gummadi, S. Gribble, S. Ratnasamy, S. Shenker, +         and I. Stoica, The impact of DHT routing geometry on resilience +         and proximity, Proc. 2003 conference on Applications, +         Technologies, Architectures and Protocols for Computer +         Communications 2003, pp. 381-394. + +   [25]  N. Daswani, H. Garcia-Molina, and B. Yang, Open Problems in +         Data- sharing Peer-to-peer Systems, The 9th Int'l Conf. on +         Database Theory (ICDT 2003), Siena, Italy, 8-10 January (2003) + +   [26]  B. Cooper and H. Garcia-Molina, Studying search networks with +         SIL, Second Int'l Workshop on Peer-to-Peer Systems IPTPS 03, +         20- 21 February 2003. + +   [27]  M. Bawa, Q. Sun, P. Vinograd, B. Yang, B. Cooper, A. Crespo, N. +         Daswani, P. Ganesan, H. Garcia-Molina, S. Kamvar, S. Marti, and +         M. Schlossed, Peer-to-peer research at Stanford, ACM SIGMOD +         Record 32 (3) (2003) 23-28. + +   [28]  B. Yang and H. Garcia-Molina, Improving search in peer-to-peer +         networks, Proc. 22nd IEEE Int'l Conf. on Distributed Computing +         Systems, July 2002. + +   [29]  B. Yang and H. Garcia-Molina, Efficient search in peer-to-peer +         networks, Proc. 22nd Int'l Conf. on Distributed Computing +         Systems, July 2-5 2002. + +   [30]  C. Plaxton, R. Rajaraman, and A. Richa, Accessing nearby copies +         of replicated objects in a distributed environment, ACM Symp. +         on Parallel Algorithms and Architectures (1997) + +   [31]  B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, and J. +         Kubiatowicz, Tapestry: A Resilient Global-Scale overlay for +         Service Deployment, IEEE Journal on Selected Areas in +         Communications 22 (1) (2004) 41-53. + +   [32]  R. van Renesse and A. Bozdog, Willow: DHT, aggregation and +         publish/subscribe in one protocol, The 3rd Int'l Workshop on +         Peer-to-Peer Systems, February 26-27 2004. + +   [33]  P. Ganesan, G. Krishna, and H. Garcia-Molina, Canon in G Major: +         Designing DHTs with Hierarchical Structure, Proc. Int'l Conf. +         on Distributed Computing Systems ICDCS 2004 2004. + + + + + +Risson & Moors               Informational                     [Page 56] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [34]  I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, +         F. Dabek, and H. Balakrishnan, Chord:  a scalable peer-to-peer +         lookup protocol for Internet applications, IEEE/ACM Trans. on +         Networking 11 (1) (2003) 17-32. + +   [35]  S. Rhea, T. Roscoe, and J. Kubiatowicz, Structured Peer-to-Peer +         Overlays Need Application-Driven Benchmarks, Proc. 2nd Int'l +         Workshop on Peer-to-Peer Systems IPTPS'03, February 20-21 2003. + +   [36]  D. Loguinov, A. Kumar, and S. Ganesh, Graph-theoretic analysis +         of structured peer-to-peer systems:  routing distances and +         fault resilience, Proc. 2003 conference on Applications, +         Technologies, Architectures and Protocols for Computer +         Communications, August 25-29 2003, pp. 395-406. + +   [37]  F. Kaashoek and D. Karger, Koorde:  A simple degree-optimal +         hash table, Second Int'l Workshop on Peer-to-Peer Systems +         IPTPS'03, 20-21 February 2003. + +   [38]  N. Harvey, M. B. Jones, S. Saroiu, M. Theimer, and A. Wolman, +         SkipNet: A Scalable Overlay Network with Practical Locality +         Properties, Proc. Fourth USENIX Symp. on Internet Technologies +         and Systems USITS'03, March 2003. + +   [39]  I. Gupta, K. Birman, P. Linga, A. Demers, and R. Van Renesse, +         Kelips:  Building an efficient and stable P2P DHT through +         increased memory and background overhead, Second Int'l Workshop +         on Peer-to-Peer Systems IPTPS 03, Feb 20-21 2003. + +   [40]  J. Cates, Robust and Efficient Data Management for a +         Distributed Hash Table, Master's Thesis, May 2003. + +   [41]  J. Aspnes and G. Shah, Skip graphs, Proc. 14th annual ACM-SIAM +         symposium on discrete algorithms (2003) 384-393. + +   [42]  K. Aberer, P. Cudre-Mauroux, A. Datta, Z. Despotovic, M. +         Hauswirth, M. Punceva, and R. Schmidt, P-Grid:  a self- +         organizing structured P2P system, ACM SIGMOD Record 32 (3) +         (2003) 29-33. + +   [43]  B. Zhao, Y. Duan, L. Huang, A. Joseph, and J. Kubiatowicz, +         Brocade: landmark routing on overlay networks, First Int'l +         Workshop on Peer-to-Peer Systems IPTPS'02, March 2002. + +   [44]  S. Ratnasamy, S. Shenker, and I. Stoica, Routing algorithms for +         DHTs:  some open questions, Proc. First Int'l Workshop on Peer +         to Peer Systems, IPTPS 2002, March 2002. + + + + +Risson & Moors               Informational                     [Page 57] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [45]  P. Maymounkov and D. Mazieres, Kademlia:  A peer-to-peer +         information system based on the XOR metric, Proc. First Int'l +         Workshop on Peer to Peer Systems, IPTPS 2002, March 7-8 2002. + +   [46]  D. Malkhi, M. Naor, and D. Ratajczak, Viceroy:  a scalable and +         dynamic emulation of the butterfly, Proc. 21st annual symposium +         on principles of distributed computing PODC, July 21-24 2002, +         pp. 183-192. + +   [47]  X. Li and C. Plaxton, On name resolution in peer to peer +         networks, Proc. ACM SIGACT Annual Workshop on Principles of +         Mobile Computing POMC'02 2002, pp. 82-89. + +   [48]  N. Harvey, J. Dunagan, M. B. Jones, S. Saroiu, M. Theimer, and +         A. Wolman, SkipNet:  A Scalable overlay Network with Practical +         Locality Properties, Microsoft Research Technical Report MSR- +         TR- 2002-92 (2002) + +   [49]  D. Karger, E. Lehman, T. Leighton, R. Panigraphy, M. Levin, and +         D. Lewin, Consistent hashing and random trees:  distributed +         caching protocols for relieving hot spots on the World Wide +         Web, ACM Symp. on Theory of Computing (1997) + +   [50]  W. Litwin, M. Neimat, and D. Schneider, LH* - a scalable, +         distributed data structure, ACM Trans. on Database Systems +         (TODS) 21 (4) (1996) 480-525. + +   [51]  R. Devine, Design and Implementation of DDH: A Distributed +         Dynamic Hashing Algorithm, Proc.  4th Int'l Conf. on +         Foundations of Data Organizations and Algorithms 1993. + +   [52]  W. Litwin, M.-A. Niemat, and D. Schneider, LH* - Linear Hashing +         for Distributed Files, Proc.  ACM Int'l Conf. on Mngt. of Data +         SIGMOD, May 1993, pp. 327-336. + +   [53]  C. Tempich, S. Staab, and A. Wranik, Remindin': semantic query +         routing in peer-to-peer networks, Proc. 13th conference on +         World Wide Web, New York, NY, USA, May 17-20 (2004) 640-649. + +   [54]  B. T. Loo, R. Huebsch, I. Stoica, and J. M. Hellerstein, The +         case for a hybrid P2P search infrastructure, The 3rd Int'l +         Workshop on Peer-to-Peer Systems, February 26-27 2004. + +   [55]  M. Cai and M. Frank, RDFPeers: a scalable distributed RDF +         repository based on a structured peer-to-peer network, Proc. +         13th conference on World Wide Web, May 17-20 2004, pp. 650-657. + + + + + +Risson & Moors               Informational                     [Page 58] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [56]  Z. Zhang, S.-M. Shi, and J. Zhu, SOMO: Self-organized metadata +         overlay for resource management in P2P DHTs, Second Int'l +         Workshop on Peer-to-Peer Systems IPTPS'03, Feb 20-21 2003. + +   [57]  B. Yang and H. Garcia-Molina, Designing a super-peer network, +         Proc. 19th Int'l Conf. on Data Engineering ICDE, March 2003. + +   [58]  I. Tatarinov, P. Mork, Z. Ives, J. Madhavan, A. Halevy, D. +         Suciu, N. Dalvi, X. Dong, Y. Kadiyska, and G. Miklau, The +         Piazza peer data management project, ACM SIGMOD Record 32 (3) +         (2003) 47-52. + +   [59]  W. Nejdl, W. Siberski, and M. Sintek, Design Issues and +         Challenges for RDF- and schema-based peer-to-peer systems, ACM +         SIGMOD Record 32 (3) (2003) 41-46. + +   [60]  S. Joseph and T. Hoshiai, Decentralized Meta-Data Strategies: +         Effective Peer-to-Peer Search, IEICE Trans. Commun. E86-B (6 +         June) (2003) 1740-1753. + +   [61]  Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S. +         Shenker, Making gnutella-like P2P systems scalable, Proc. 2003 +         conference on Applications, Technologies, Architectures and +         Protocols for Computer Communications, August 25-29 2003, pp. +         407-418. + +   [62]  M. Bawa, G. S. Manku, and P. Raghavan, SETS: search enhanced by +         topic segmentation, Proc. 26th annual international ACM SIGIR +         conference on Research and Development in Information Retrieval +         2003, pp. 306-313. + +   [63]  H. Sunaga, M. Takemoto, and T. Iwata, Advanced peer to peer +         network platform for various services - SIONet Semantic +         Information Oriented Network, Proc. Second Int'l Conf. on Peer +         to Peer Computing, Sept 5-7 2002, pp. 169-170. + +   [64]  M. Schlosser, M. Sintek, S. Decker, and W. Nejdl, HyperCuP - +         Hypercubes, Ontologies and P2P Networks, Springer Lecture Notes +         on Computer Science, Agents and Peer-to-Peer Systems Vol. 2530 +         (2002) + +   [65]  M. Ripeanu, A. Iamnitchi, and P. Foster, Mapping the Gnutella +         network, IEEE Internet Computing 6 (1) (2002) 50-57. + +   [66]  Q. Lv, S. Ratnasamy, and S. Shenker, Can Heterogeneity Make +         Gnutella Scalable?, Proc. 1st Int'l Workshop on Peer-to-Peer +         Systems IPTPS2002, March 7-8 2002. + + + + +Risson & Moors               Informational                     [Page 59] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [67]  Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, Search and +         replication in unstructured peer to peer networks, Proc. 16th +         international conference on supercomputing, June 22-26 2002, +         pp. 84-95. + +   [68]  V. Kalogaraki, D. Gunopulos, and D. Zeinalipour-Yasti, XML +         schemas:  integration and translation:  A local search +         mechanism for peer to peer networks, Proc. 11th ACM +         international conference on Information and Knowledge +         management 2002, pp. 300- 307. + +   [69]  O. Babaoglu, H. Meling, and Montresor, Anthill:  a framework +         for the development of agent-based peer-to-peer systems, Proc. +         IEEE Int'l Conf. on Distributed Computer systems 2002, pp. 15- +         22. + +   [70]  M. Jovanovic, Modeling large-scale peer-to-peer networks and a +         case study of Gnutella, Master's Thesis 2001. + +   [71]  I. Clarke, O. Sandberg, B. Wiley, and T. Hong, Freenet:  A +         Distributed Anonymous Information Storage and Retrieval System. +         Springer, New York, USA, 2001. + +   [72]  J. Harren, J. Hellerstein, R. Huebsch, B. Loo, S. Shenker, and +         I. Stoica, Complex queries in DHT-based peer-to-peer networks, +         Proc. First Int'l Workshop on Peer to Peer Systems IPTPS 2002, +         March 2002. + +   [73]  B. Gedik and L. Liu, PeerCQ: A Decentralized and Self- +         Configuring Peer-to-Peer Information Monitoring System, Proc. +         23rd Int'l Conf. on Distributed Computing Systems ICDCS2003, +         May 19-22 2003. + +   [74]  B. T. Loo, R. Huebsch, J. M. Hellerstein, T. Roscoe, and I. +         Stoica, Analyzing P2P Overlays with Recursive Queries, +         Technical Report, CSD-04-1301, January 14 2004. + +   [75]  R. Avnur and J. Hellerstein, Eddies: continuously adaptive +         query processing, Proc. 2000 ACM SIGMOD international +         conference on Management of Data 2000, pp. 261-272. + +   [76]  P. Triantafillou and T. Pitoura, Towards a unifying framework +         for complex query processing over structured peer-to-peer data +         networks, Proc. First Int'l Workshop on Databases, Information +         Systems and Peer-to-Peer Computing DBISP2P, Sept 7-8 2003, pp. +         169-183. + + + + + +Risson & Moors               Informational                     [Page 60] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [77]  A. Gupta, D. Agrawal, and A. E. Abbadi, Approximate range +         selection queries in peer-to-peer systems, Proc. First Biennial +         Conf. on Innovative Data Systems Research CIDR 2003 2003. + +   [78]  S. Ratnasamy, P. Francis, and M. Handley, Range queries in +         DHTs, Technical Report IRB-TR-03-009, July 2003. + +   [79]  S. Ramabhadran, S. Ratnasamy, J. Hellerstein, and S. Shenker, +         Brief announcement: prefix hash tree, Proc. 23rd Annual ACM +         SIGACT-SIGOPS Symp. on Principles of Distributed Computing, +         PODC 2004, July 25-28 2004, pp. 368-368. + +   [80]  A. Andrzejak and Z. Xu, Scalable, efficient range queries for +         grid information services, Proc. Second IEEE Int'l Conf. on +         Peer to Peer Computing, September 2002. + +   [81]  C. Schmidt and M. Parashar, Enabling flexible queries with +         guarantees in P2P systems, IEEE Internet Computing 8 (3) (2004) +         19-26. + +   [82]  E. Tanin, A. Harwood, and H. Samet, Indexing distributed +         complex data for complex queries, Proc. National Conf. on +         Digital Government Research 2004, pp. 81-90. + +   [83]  P. Ganesan, M. Bawa, and H. Garcia-Molina, Online Balancing of +         Range-Partitioned Data with Applications to Peer-to-Peer +         Systems, Proc. 30th Int'l Conf. on Very Large Data Bases VLDB +         2004, 29 August - 3 September 2004. + +   [84]  A. Bharambe, M. Agrawal, and S. Seshan, Mercury: Supporting +         Scalable Multi-Attribute Range Queries, SIGCOMM'04, Aug 30-Sept +         3 2004. + +   [85]  K. Aberer, Scalable Data Access in P2P Systems Using Unbalanced +         Search Trees, Workshop on Distributed Data and Structures WDAS- +         2002 2002. + +   [86]  K. Aberer, A. Datta, and M. Hauswirth, The Quest for Balancing +         Peer Load in Structured Peer-to-Peer Systems, Technical Report +         IC/2003/32 2003. + +   [87]  W. Litwin, M.-A. Neimat, and D. Schneider, RP*: a family of +         order-preserving scalable distributed data structures, Proc. +         20th Int'l Conf. on Very Large Data Bases VLDB'94, September +         12-15 1994. + + + + + + +Risson & Moors               Informational                     [Page 61] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [88]  M. Tsangou, S. Ndiaye, M. Seck, and W. Litwin, Range queries to +         scalable distributed data structure RP*, Proc. Fifth Workshop +         on Distributed Data and Structures, WDAS 2003, June 2003. + +   [89]  W. Litwin and M.-A. Neimat, k-RP*s: a scalable distributed data +         structure for high-performance multi-attributed access, Proc. +         Fourth Int'l Conf. on Parallel and Distributed Information +         Systems (1996) 120-131. + +   [90]  T. Hodes, S. Czerwinski, B. Zhao, A. Joseph, and R. Katz, An +         architecture for secure wide-area service discovery, Wireless +         Networks 8 (2/3) (2002) 213-230. + +   [91]  M. Cai, M. Frank, J. Chen, and P. Szekely, MAAN: A Multi- +         Attribute Addressable Network for Grid Information Services, +         Proc. Int'l Workshop on Grid Computing, November 2003. + +   [92]  R. van Renesse, K. P. Birman, and W. Vogels, Astrolabe:  A +         robust and scalable technology for distribute system +         monitoring, management and data mining, ACM Trans. on Computer +         Systems 21 (2) (2003) 164-206. + +   [93]  R. Bhagwan, G. Varghese, and G. Voelker, Cone: Augmenting DHTs +         to support distributed resource discovery, Technical Report, +         CS2003- 0755, July 2003. + +   [94]  K. Albrecht, R. Arnold, and R. Wattenhofer, Join and Leave in +         Peer-to-Peer Systems: The DASIS Approach, Technical Report 427, +         Department of Computer Science, November 2003. + +   [95]  K. Albrecht, R. Arnold, and R. Wattenhofer, Aggregating +         information in peer-to-peer systems for improved join and +         leave, Proc. Fourth IEEE Int'l Conf. on Peer-to-Peer Computing, +         25-27 August 2004. + +   [96]  A. Montresor, M. Jelasity, and O. Babaoglu, Robust aggregation +         protocol for large-scale overlay networks, Technical Report +         UBLCS-2003-16, December 2003. + +   [97]  M. Jelasity, W. Kowalczyk, and M. van Steen, An Approach to +         Aggregation in Large and Fully Distributed Peer-to-Peer Overlay +         Networks, Proc. 12th Euromicro Conf. on Parallel, Distributted +         and Network based Processing PDP 2004, February 2004. + +   [98]  P. Yalagandula and M. Dahlin, A scalable distributed +         information management system, SIGCOMM'04, Aug 30-Sept 3 2004. + + + + + +Risson & Moors               Informational                     [Page 62] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [99]  M. Bawa, A. Gionis, H. Garcia-Molina, and R. Motwani, The price +         of validity in dynamic networks, Proc. 2004 ACM SIGMOD Int'l +         Conf. on the management of data 2004, pp. 515-526. + +   [100] J. Aspnes, J. Kirsch, and A. Krishnamurthy, Load Balancing and +         Locality in Range-Queriable Data Structures, Proc. 23rd Annual +         ACM SIGACT-SIGOPS Symp. on Principles of Distributed Computing +         PODC 2004, July 25-28 2004. + +   [101] G. On, J. Schmitt, and R. Steinmetz, The effectiveness of +         realistic replication strategies on quality of availability for +         peer-to-peer systems, Proc. Third Int'l IEEE Conf. on Peer-to- +         Peer Computing, Sept 1-3 2003, pp. 57-64. + +   [102] D. Geels and J. Kubiatowicz, Replica management should be a +         game, Proc. SIGOPS European Workshop, September 2003. + +   [103] E. Cohen and S. Shenker, Replication strategies in unstructured +         peer to peer networks, Proc. 2002 conference on applications, +         technologies, architectures and protocols for computer +         communications 2002, pp. 177-190. + +   [104] E. Cohen and S. Shenker, P2P and multicast:  replication +         strategies in unstructured peer to peer networks, Proc. 2002 +         conference on applications, technologies, architectures and +         protocols for computer communications 2002, pp. 177-190. + +   [105] H. Weatherspoon and J. Kubiatowicz, Erasure coding vs +         replication:  a quantative comparison, Proc. First Int'l +         Workshop on Peer to Peer Systems IPTPS'02, March 2002. + +   [106] D. Lomet, Replicated indexes for distributed data, Proc. Fourth +         Int'l Conf. on Parallel and Distributed Information Systems, +         December 18-20 1996, pp. 108-119. + +   [107] V. Gopalakrishnan, B. Silaghi, B. Bhattacharjee, and P. +         Keleher, Adaptive Replication in Peer-to-Peer Systems, Proc. +         24th Int'l Conf. on Distributed Computing Systems ICDCS 2004, +         March 23-26 2004. + +   [108] S.-D. Lin, Q. Lian, M. Chen, and Z. Zhang, A practical +         distributed mutual exclusion protocol in dynamic peer-to-peer +         systems, The 3rd Int'l Workshop on Peer-to-Peer Systems, +         February 26-27 2004. + + + + + + + +Risson & Moors               Informational                     [Page 63] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [109] A. Adya, R. Wattenhofer, W. Bolosky, M. Castro, G. Cermak, R. +         Chaiken, J. Douceur, J. Howell, J. Lorch, and M. Thiemer, +         Farsite: federated, available and reliable storage for an +         incompletely trusted environment, ACM SIGOPS Operating Systems +         Review, Special issue on Decentralized storage systems (2002) +         1- 14. + +   [110] A. Rowstron and P. Druschel, Storage management and caching in +         PAST, a large-scale, persistent peer-to-peer storage utility, +         Proceedings ACM SOSP'01, October 2001, pp. 188-201. + +   [111] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. +         Weatherspoon, and J. Kubiatowicz, Maintenance-Free Global Data +         Storage, IEEE Internet Computing 5 (5) (2001) 40-49. + +   [112] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. +         Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. +         Wells, and B. Zhao, Oceanstore:  An Architecture for global- +         scale persistent storage, Proc. Ninth Int'l Conf. on +         Architecture Support for Programming Languages and Operating +         Systems ASPLOS 2000, November 2000, pp. 190-201. + +   [113] K. Birman, The Surprising Power of Epidemic Communication, +         Springer-Verlag Heidelberg Lecture Notes in Computer Science +         Volume 2584/2003 (2003) 97-102. + +   [114] P. Costa, M. Migliavacca, G. P. Picco, and G. Cugola, +         Introducing reliability in content-based publish-subscribe +         through epidemic algorithms, Proc. 2nd international workshop +         on Distributed event-based systems 2003, pp. 1-8. + +   [115] P. Costa, M. Migliavacca, G. P. Picco, and G. Cugola, Epidemic +         Algorithms for Reliable Content-Based Publish-Subscribe:  An +         Evaluation, The 24th Int'l Conf. on Distributed Computing +         Systems (ICDCS-2004), Mar 23-26, Tokyo University of +         Technology, Hachioji, Tokyo, Japan (2004) + +   [116] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. +         Shenker, H. Sturgis, D. Swinehart, and D. Terry, Epidemic +         algorithms for replicated data management, Proc. Sixth ACM +         Symp. on Principles of Distributed Computing 1987, pp. 1-12. + +   [117] P. Eugster, R. Guerraoiu, A. Kermarrec, and L. Massoulie, +         Epidemic information dissemination in distributed systems, IEEE +         Computer 37 (5) (2004) 60-67. + + + + + + +Risson & Moors               Informational                     [Page 64] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [118] W. Vogels, R. v. Renesse, and K. Birman, The power of +         epidemics: robust communication for large-scale distributed +         systems, ACM SIGCOMM  Computer Communication Review 33 (1) +         (2003) 131-135. + +   [119] S. Voulgaris and M. van Steen, An epidemic protocol for +         managing routing tables in very large peer to peer networks, +         Proc. 14th IFIP/IEEE Workshop on Distributed Systems: +         Operations and Management, October 2003. + +   [120] I. Gupta, On the design of distributed protocols from +         differential equations, Proc. 23rd Annual ACM SIGACT-SIGOPS +         Symp. on Principles of Distributed Computing PODC 2004, July +         25-28 2004, pp. 216-225. + +   [121] I. Gupta, K. Birman, and R. van Renesse, Fighting fire with +         fire: using randomized gossip to combat stochastic scalability +         limits, Cornell University Dept of Computer Science Technical +         Report, March 2001. + +   [122] K. Birman and I. Gupta, Building Scalable Solutions to +         Distributed Computing Problems using Probabilistic Components, +         Submitted to the Int'l Conf. on Dependable Systems and Networks +         DSN-2004, Dependable Computing and Computing Symp. DCCS, June +         28- July 1 2004. + +   [123] A. Ganesh, A.-M. Kermarrec, and L. Massoulie, Peer-to-peer +         membership management for gossip-based protocols, IEEE Trans. +         on Computers 52 (2) (2003) 139-149. + +   [124] N. Bailey, Epidemic Theory of Infectious Diseases and its +         Applications, Second Edition ed. Hafner Press, 1975. + +   [125] P. Eugster, R. Guerraoiu, S. Handurukande, P. Kouznetsov, and +         A.- M. Kermarrec, Lightweight probabilistic broadcast, ACM +         Trans. on Computer Systems 21 (4) (2003) 341-374. + +   [126] H. Weatherspoon and J. Kubiatowicz, Efficient heartbeats and +         repair of softstate in decentralized object location and +         routing systems, Proc. SIGOPS European Workshop, September +         2002. + +   [127] G. Koloniari and E. Pitoura, Content-based Routing of Path +         Queries in Peer-to-Peer Systems, Proc. 9th Int'l Conf. on +         Extending DataBase Technology EDBT, March 14-18 2004. + + + + + + +Risson & Moors               Informational                     [Page 65] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [128] A. Mohan and V. Kalogaraki, Speculative routing and update +         propagation: a kundali centric approach, IEEE Int'l Conf. on +         Communications ICC'03, May 2002. + +   [129] G. Koloniari, Y. Petrakis, and E. Pitoura, Content-Based +         Overlay Networks for XML Peers Based on Multi-Level Bloom +         Filters, Proc. First Int'l Workshop on Databases, Information +         Systems and Peer-to-Peer Computing DBISP2P, Sept 7-8 2003, pp. +         232-247. + +   [130] G. Koloniari and E. Pitoura, Bloom-Based Filters for +         Hierarchical Data, Proc. 5th Workshop on Distributed Data and +         Structures (WDAS) (2003) + +   [131] B. Bloom, Space/time trade-offs in hash coding with allowable +         errors, Communications of the ACM 13 (7) (1970) 422-426. + +   [132] M. Naor and U. Wieder, A Simple Fault Tolerant Distributed Hash +         Table, Second Int'l Workshop on Peer-to-Peer Systems (IPTPS +         03), Berkeley, CA, USA, 20-21 February (2003) + +   [133] P. Maymounkov and D. Mazieres, Rateless codes and big +         downloads, Second Int'l Workshop on Peer-to-Peer Systems, +         IPTPS'03, February 20-21 2003. + +   [134] M. Krohn, M. Freedman, and D. Mazieres, On-the-fly verification +         of rateless erasure codes for efficient content distribution, +         Proc. IEEE Symp. on Security and Privacy, May 2004. + +   [135] J. Byers, J. Considine, M. Mitzenmacher, and S. Rost, Informed +         content delivery across adaptive overlay networks, Proc. 2002 +         conference on applications, technologies, architectures and +         protocols for computer communications 2002, pp. 47-60. + +   [136] J. Plank, S. Atchley, Y. Ding, and M. Beck, Algorithms for High +         Performance, Wide-Area Distributed File Downloads, Parallel +         Processing Letters 13 (2) (2003) 207-223. + +   [137] M. Castro, P. Rodrigues, and B. Liskov, BASE:  Using +         abstraction to improve fault tolerance, ACM Trans. on Computer +         Systems 21 (3) (2003) 236-269. + +   [138] R. Rodrigues, B. Liskov, and L. Shrira, The design of a robust +         peer-to-peer system, 10th ACM SIGOPS European Workshop, Sep +         2002. + + + + + + +Risson & Moors               Informational                     [Page 66] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [139] H. Weatherspoon, T. Moscovitz, and J. Kubiatowicz, +         Introspective failure analysis: avoiding correlated failures in +         peer-to-peer systems, Proc.  Int'l Workshop on Reliable Peer- +         to-Peer Distributed Systems, Oct 2002. + +   [140] F. Dabek, R. Cox, F. Kaashoek, and R. Morris, Vivaldi: A +         Decentralized Network Coordinate System, SIGCOMM'04, Aug 30- +         Sept 3 2004. + +   [141] E.-K. Lua, J. Crowcroft, and M. Pias, Highways: proximity +         clustering for massively scaleable peer-to-peer network +         routing, Proc. Fourth IEEE Int'l Conf. on Peer-to-Peer +         Computing, August 25-27 2004. + +   [142] F. Fessant, S. Handurukande, A.-M. Kermarrec, and L. Massoulie, +         Clustering in Peer-to-Peer File Sharing Workloads, The 3rd +         Int'l Workshop on Peer-to-Peer Systems, February 26-27 2004. + +   [143] T. S. E. Ng and H. Zhang, Predicting internet network distance +         with coordinates-based approaches, IEEE Infocom 2002, The 21st +         Annual Joint Conf. of the IEEE Computer and Communication +         Societies, June 23-27 2002. + +   [144] K. Hildrum, R. Krauthgamer, and J. Kubiatowicz, Object Location +         in Realistic Networks, Proc. Sixteenth ACM Symp. on Parallel +         Algorithms and Architectures (SPAA 2004), June 2004, pp. 25-35. + +   [145] P. Keleher, S. Bhattacharjee, and B. Silaghi, Are Virtualized +         Overlay Networks Too Much of a Good Thing?, First Int'l +         Workshop on Peer-to-Peer Systems IPTPS, March 2002. + +   [146] A. Mislove and P. Druschel, Providing administrative control +         and autonomy in structured peer-to-peer overlays, The 3rd Int'l +         Workshop on Peer-to-Peer Systems, June 9-12 2004. + +   [147] D. Karger and M. Ruhl, Diminished Chord: A Protocol for +         Heterogeneous SubGroup Formation in Peer-to-Peer Networks, The +         3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27 +         2004. + +   [148] B. Awerbuch and C. Scheideler, Consistent, order-preserving +         data management in distributed storage systems, Proc. Sixteenth +         ACM Symp. on Parallel Algorithms and Architectures SPAA 2004, +         June 27-30 2004, pp. 44-53. + +   [149] M. Freedman and D. Mazieres, Sloppy Hashing and Self-Organizing +         Clusters, Proc. 2nd Int'l Workshop on Peer-to-Peer Systems +         IPTPS + + + +Risson & Moors               Informational                     [Page 67] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [150] F. Dabek, J. Li, E. Sit, J. Robertson, F. Kaashoek, and R. +         Morris, Designing a DHT for low latency and high throughput, +         Proc. First Symp. on Networked Systems Design and +         Implementation (NSDI'04), San Francisco, California, March +         29-31 (2004) 85-98. + +   [151] M. Ruhl, Efficient algorithms for new computational models, +         Doctoral Dissertation, September 2003. + +   [152] K. Sollins, Designing for scale and differentiation, Proc. ACM +         SIGCOMM workshop on Future Directions in network architecture, +         August 25-27 2003. + +   [153] L. Massoulie, A. Kermarrec, and A. Ganesh, Network awareness +         and failure resilience in self-organizing overlay networks, +         Proc. 22nd Int'l Symp. on Reliable Distributed Systems, +         SRDS'03, Oct 6-8 2003, pp. 47-55. + +   [154] R. Cox, F. Dabek, F. Kaashoek, J. Li, and R. Morris, +         Practical,distributed network coordinates, ACM SIGCOMM Computer +         Communication Review 34 (1) (2004) 113-118. + +   [155] K. Hildrum, J. Kubiatowicz, S. Rao, and B. Zhao, Distributed +         object location in a dynamic network, Proc. 14th annual ACM +         symposium on parallel algorithms and architectures 2002, pp. +         41- 52. + +   [156] X. Zhang, Q. Zhang, G. Song, and W. Zhu, A Construction of +         Locality-Aware Overlay Network: mOverlay and its Performance, +         IEEE Journal on Selected Areas in Communications 22 (1) (2004) +         18-28. + +   [157] N. Harvey, M. B. Jones, M. Theimer, and A. Wolman, Efficient +         recovery from organization disconnects in Skipnet, Second Int'l +         Workshop on Peer-to-Peer Systems IPTPS'03, Feb 20-21 2003. + +   [158] M. Pias, J. Crowcroft, S. Wilbur, T. Harris, and S. Bhatti, +         Lighthouses for scalable distributed location, Second Int'l +         Workshop on Peer-to-Peer Systems IPTPS'03, February 20-21 2003. + +   [159] K. Gummadi, S. Saroui, S. Gribble, and D. King, Estimating +         latency between arbitrary internet end hosts, Proc.  SIGCOMM +         IMW 2002, November 2002. + +   [160] Y. Liu, X. Liu, L. Xiao, L. Ni, and X. Zhang, Location-aware +         topology matching in P2P systems, Proc.  IEEE Infocomm, Mar +         7-11 2004. + + + + +Risson & Moors               Informational                     [Page 68] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [161] G. S. Manku, Balanced binary trees for ID management and load +         balance in distributed hash tables, Proc. 23rd Annual ACM +         SIGACT-SIGOPS Symp. on Principles of Distributed Computing, +         PODC 2004, July 25-28 2004, pp. 197-205. + +   [162] J. Gao and P. Steenkiste, Design and Evaluation of a +         Distributed Scalable Content Delivery System, IEEE Journal on +         Selected Areas in Communications 22 (1) (2004) 54-66. + +   [163] X. Wang, Y. Zhang, X. Li, and D. Loguinov, On zone-balancing of +         peer-to-peer networks: analysis of random node join, Proc. +         joint international conference on measurement and modeling of +         computer systems, June 2004. + +   [164] D. Karger and M. Ruhl, Simple efficient load balancing +         algorithms for peer-to-peer systems, Proc. Sixteenth ACM Symp. +         on Parallel Algorithms and Architectures SPAA 2004, June 27-30 +         2004. + +   [165] D. Karger and M. Ruhl, Simple efficient load balancing +         algorithms for peer-to-peer systems, The 3rd Int'l Workshop on +         Peer-to-Peer Systems, February 26-27 2004. + +   [166] M. Adler, E. Halperin, R. Karp, and V. Vazirani, A stochastic +         process on the hypercube with applications to peer-to-peer +         networks, Proc. 35th ACM symposium on Theory of Computing 2003, +         pp. 575-584. + +   [167] C. Baquero and N. Lopes, Towards peer to peer content indexing, +         ACM SIGOPS Operating Systems Review 37 (4) (2003) 90-96. + +   [168] A. Rao, K. Lakshminarayanan, S. Surana, R. Karp, and I. Stoica, +         Load balancing in structured P2P systems, Proc. 2nd Int'l +         Workshop on Peer-to-Peer Systems, IPTPS'03, February 20-21 +         2003. + +   [169] J. Byers, J. Considine, and M. Mitzenmacher, Simple Load +         Balancing for Distributed Hash Tables, Second Int'l Workshop on +         Peer-to-Peer Systems IPTPS 03, 20-21 February 2003. + +   [170] P. Castro, J. Lee, and A. Misra, CLASH: A Protocol for +         Internet- Scale Utility-Oriented Distributed Computing, Proc. +         24th Int'l Conf. on Distributed Computing Systems ICDCS 2004, +         March 23-26 2004. + +   [171] A. Stavrou, D. Rubenstein, and S. Sahu, A Lightwight, Robust +         P2P System to Handle Flash Crowds, IEEE Journal on Selected +         Areas in Communications 22 (1) (2004) 6-17. + + + +Risson & Moors               Informational                     [Page 69] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [172] A. Selcuk, E. Uzun, and M. R. Pariente, A reputation-based +         trust management system for P2P networks, Fourth Int'l Workshop +         on Global and Peer-to-Peer Computing, April 20-21 2004. + +   [173] T. Papaioannou and G. Stamoulis, Effective use of reputation in +         peer-to-peer environments, Fourth Int'l Workshop on Global and +         Peer-to-Peer Computing, April 20-21 2004. + +   [174] M. Blaze, J. Feigenbaum, and J. Lacy, Trust and Reputation in +         P2P networks, +         http://www.neurogrid.net/twiki/bin/view/Main/ReputationAndTrust +         (2003) + +   [175] E. Damiani, D. C. di Vimercati, S. Paraboschi, P. Samarati, and +         F. Violante, A reputation-based approach for choosing reliable +         resources in peer to peer networks, Proc. 9th conference on +         computer and communications security 2002, pp. 207-216. + +   [176] S. Marti, P. Ganesan, and H. Garcia-Molina, DHT routing using +         social links, The 3rd Int'l Workshop on Peer-to-Peer Systems, +         February 26-27 2004. + +   [177] G. Caronni and M. Waldvogel, Establishing trust in distributed +         storage providers, Proc. Third Int'l IEEE Conf. on Peer-to-Peer +         Computing, 1-3 Sept 2003, pp. 128-133. + +   [178] B. Sieka, A. Kshemkalyani, and M. Singhal, On the security of +         polling protocols in peer-to-peer systems, Proc. Fourth IEEE +         Int'l Conf. on Peer-to-Peer Computing, 25-27 August 2004. + +   [179] M. Feldman, K. Lai, I. Stoica, and J. Chuang, Robust Incentive +         Techniques for Peer-to-Peer Networks, ACM E-Commerce Conf. +         EC'04, May 2004. + +   [180] K. Anagnostakis and M. Greenwald, Exchange-based Incentive +         Mechanism for Peer-to-Peer File Sharing, Proc. 24th Int'l Conf. +         on Distributed Computing Systems ICDCS 2004, March 23-26 2004. + +   [181] J. Schneidman and D. Parkes, Rationality and self-Interest in +         peer to peer networks, Second Int'l Workshop on Peer-to-Peer +         Systems IPTPS'03, February 20-21 2003. + +   [182] C. Buragohain, D. Agrawal, and S. Subhash, A game theoretic +         framework for incentives in P2P systems, Proc. Third Int'l IEEE +         Conf. on Peer-to-Peer Computing, 1-3 Sept 2003, pp. 48-56. + + + + + + +Risson & Moors               Informational                     [Page 70] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [183] W. Josephson, E. Sirer, and F. Schneider, Peer-to-Peer +         Authentication with a Distributed Single Sign-On Service, The +         3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27 +         2004. + +   [184] A. Fiat and J. Saia, Censorship resistant peer to peer content +         addressable networks, Proc. 13th annual ACM-SIAM symposium on +         discrete algorithms 2002, pp. 94-103. + +   [185] N. Daswani and H. Garcia-Molina, Query-flood DoS attacks in +         gnutella, Proc. 9th ACM Conf. on Computer and Communications +         Security 2002, pp. 181-192. + +   [186] A. Singh and L. Liu, TrustMe: anonymous management of trust +         relationships in decentralized P2P systems, Proc. Third Int'l +         IEEE Conf. on Peer-to-Peer Computing, Sept 1-3 2003. + +   [187] A. Serjantov, Anonymizing censorship resistant systems, Proc. +         Second Int'l Conf. on Peer to Peer Computing, March 2002. + +   [188] S. Hazel and B. Wiley, Achord: A Variant of the Chord Lookup +         Service for Use in Censorship Resistant Peer-to-Peer Publishing +         Systems, Proc. Second Int'l Conf. on Peer to Peer Computing, +         March 2002. + +   [189] M. Freedman and R. Morris, Tarzan: a peer-to-peer anonymizing +         network layer, Proc. 9th ACM Conf. on Computer and +         Communications Security (2002) 193-206. + +   [190] M. Feldman, C. Papadimitriou, J. Chuang, and I. Stoica, Free- +         Riding and Whitewashing in Peer-to-Peer Systems, 3rd Annual +         Workshop on Economics and Information Security WEIS04, May +         2004. + +   [191] L. Ramaswamy and L. Liu, FreeRiding: a new challenge for peer- +         to-peer file sharing systems, Proc. 2003 Hawaii Int'l Conf. on +         System Sciences, P2P Track, HICSS2003, January 6-9 2003. + +   [192] T.-W. Ngan, D. Wallach, and P. Druschel, Enforcing fair sharing +         of peer-to-peer resources, Second Int'l Workshop on Peer-to- +         Peer Systems, IPTPS'03, 20-21 February 2003. + +   [193] L. Cox and B. D. Noble, Samsara: honor among thieves in peer- +         to-peer storage, Proc. nineteenth ACM symposium on Operating +         System Principles 2003, pp. 120-132. + + + + + + +Risson & Moors               Informational                     [Page 71] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [194] M. Surridge and C. Upstill, Grid security: lessons for peer-to- +         peer systems, Proc. Third Int'l IEEE Conf. on Peer-to-Peer +         Computing, Sept 1-3 2003, pp. 2-6. + +   [195] E. Sit and R. Morris, Security considerations for peer-to-peer +         distributed hash tables, First Int'l Workshop on Peer-to-Peer +         Systems, March 2002. + +   [196] C. O'Donnel and V. Vaikuntanathan, Information leak in the +         Chord lookup protocol, Proc. Fourth IEEE Int'l Conf. on Peer- +         to-Peer Computing, 25-27 August 2004. + +   [197] K. Berket, A. Essiari, and A. Muratas, PKI-Based Security for +         Peer-to-Peer Information Sharing, Proc. Fourth IEEE Int'l Conf. +         on Peer-to-Peer Computing, 25-27 August 2004. + +   [198] B. Karp, S. Ratnasamy, S. Rhea, and S. Shenker, Spurring +         adoption of DHTs with OpenHash, a public DHT service, The 3rd +         Int'l Workshop on Peer-to-Peer Systems, February 26-27 2004. + +   [199] J. Considine, M. Walfish, and D. G. Andersen, A pragmatic +         approach to DHT adoption, Technical Report,, December 2003. + +   [200] G. Li, Peer to Peer Networks in Action, IEEE Internet Computing +         6 (1) (2002) 37-39. + +   [201] A. Mislove, A. Post, C. Reis, P. Willmann, P. Druschel, D. +         Wallach, X. Bonnaire, P. Sens, J.-M. Busca, and L. Arantes- +         Bezerra, POST:  A Secure, Resilient, Cooperative Messaging +         System, 9th Workshop on Hot Topics in Operating Systems, HotOS, +         May 2003. + +   [202] S. Saroiu, P. Gummadi, and S. Gribble, A measurement study of +         peer-to-peer file sharing systems, Proc.  Multimedia Computing +         and Networking 2002 MMCN'02, January 2002. + +   [203] A. Muthitacharoen, R. Morris, T. Gil, and B. Chen, Ivy: a +         read/write peer-to-peer file system, ACM SIGOPS Operating +         Systems Review, Special issue on Decentralized storage systems, +         December 2002, pp. 31-44. + +   [204] A. Muthitacharoen, R. Morris, T. Gil, and B. Chen, A read/write +         peer-to-peer file system, Proc. 5th Symp. on Operating System +         Design and Implementation (OSDI 2002), Boston, MA, December +         (2002) + + + + + + +Risson & Moors               Informational                     [Page 72] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [205] F. Annexstein, K. Berman, M. Jovanovic, and K. Ponnavaikko, +         Indexing techniques for file sharing in scalable peer to peer +         networks, 11th IEEE Int'l Conf. on Computer Communications and +         Networks (2002) 10-15. + +   [206] G. Kan and Y. Faybishenko, Introduction to Gnougat, First Int'l +         Conf. on Peer-to-Peer Computing 2001 2001, pp. 4-12. + +   [207] R. Gold and D. Tidhar, Towards a content-based aggregation +         network, Proc. First Int'l Conf. on Peer to Peer Compuuting +         2001, pp. 62-68. + +   [208] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica, +         Wide-area cooperative storage with CFS, Proc. 18th ACM +         symposium on Operating System Principles 2001, pp. 202-215. + +   [209] M. Freedman, E. Freudenthal, and D. Mazieres, Democratizing +         content publication with coral, Proc. First Symp. on Networked +         Systems Design and Implementation NSDI'04, March 29-31 2004, +         pp. 239-252. + +   [210] J. Li, B. T. Loo, J. Hellerstein, F. Kaashoek, D. Karger, and +         R. Morris, On the Feasibility of Peer-to-Peer Web Indexing and +         Search, Second Int'l Workshop on Peer-to-Peer Systems IPTPS 03, +         20-21 February 2003. + +   [211] S. Iyer, A. Rowstron, and P. Druschel, Squirrel: a +         decentralized peer-to-peer web cache, Proc. 21st annual +         symposium on principles of distributed computing 2002, pp. +         213-222. + +   [212] M. Bawa, R. Bayardo, S. Rajagopalan, and E. Shekita, Make it +         fresh, make it quick: searching a network of personal +         webservers, Proc. 12th international conference on World Wide +         Web 2003, pp. 577-586. + +   [213] B. T. Loo, S. Krishnamurthy, and O. Cooper, Distributed web +         crawling over DHTs, Technical Report, CSD-04-1305, February 9 +         2004. + +   [214] M. Junginger and Y. Lee, A self-organizing publish/subscribe +         middleware for dynamic peer-to-peer networks, IEEE Network 18 +         (1) (2004) 38-43. + + + + + + + + +Risson & Moors               Informational                     [Page 73] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [215] F. Cuenca-Acuna, C. Peery, R. Martin, and T. Nguyen, PlanetP: +         Using Gossiping to Build Content Addressable Peer-to-Peer +         Information Sharing Communities, Proc. 12th international +         symposium on High Performance Distributed Computing (HPDC), +         June 2002. + +   [216] M. Walfish, H. Balakrishnan, and S. Shenker, Untangling the web +         from DNS, Proc. First Symp. on Networked Systems Design and +         Implementation NSDI'04, March 29-31 2004, pp. 225-238. + +   [217] B. Awerbuch and C. Scheideler, Robust distributed name service, +         The 3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27 +         2004. + +   [218] A. Iamnitchi, Resource Discovery in Large Resource-Sharing +         Environments, Doctoral Dissertation 2003. + +   [219] R. Cox, A. Muthitacharoen, and R. Morris, Serving DNS using a +         Peer-to-Peer Lookup Service, First Int'l Workshop on Peer-to- +         Peer Systems (IPTPS), March 2002. + +   [220] A. Chander, S. Dawson, P. Lincoln, and D. Stringer-Calvert, +         NEVRLATE:  scalable resource discovery, Second IEEE/ACM Int'l +         Symp. on Cluster Computing and the Grid CCGRID2002 2002, pp. +         56-65. + +   [221] M. Balazinska, H. Balakrishnan, and D. Karger, INS/Twine:  A +         scalable Peer-to-Peer architecture for Intentional Resource +         Discovery, Proc. First Int'l Conf. on Pervasive Computing +         (IEEE) (2002) + +   [222] J. Kangasharju, K. Ross, and D. Turner, Secure and resilient +         peer-to-peer E-mail: design and implementation, Proc. Third +         Int'l IEEE Conf. on Peer-to-Peer Computing, 1-3 Sept 2003. + +   [223] V. Lo, D. Zappala, D. Zhou, Y. Liu, and S. Zhao, Cluster +         computing on the fly: P2P scheduling of idle cycles in the +         internet, The 3rd Int'l Workshop on Peer-to-Peer Systems, +         February 26-27 2004. + +   [224] A. Iamnitchi, I. Foster, and D. Nurmi, A peer-to-peer approach +         to resource discovery in grid environments, IEEE High +         Performance Distributed Computing 2002. + +   [225] I. Foster and A. Iamnitchi, On Death, Taxes and the Convergence +         of Peer-to-Peer and Grid Computing, Second Int'l Workshop on +         Peer-to-Peer Systems IPTPS 03, 20-21 February 2003. + + + + +Risson & Moors               Informational                     [Page 74] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [226] W. Hoschek, Peer-to-Peer Grid Databases for Web Service +         Discovery, Concurrency - Practice and Experience (2002) 1-7. + +   [227] K. Aberer, A. Datta, and M. Hauswirth, A decentralized public +         key infrastructure for customer-to-customer e-commerce, Int'l +         Journal of Business Process Integration and Management (2004) + +   [228] S. Ajmani, D. Clarke, C.-H. Moh, and S. Richman, ConChord: +         Cooperative SDSI Certificate Storage and Name Resolution, First +         Int'l Workshop on Peer-to-Peer Systems IPTPS, March 2002. + +   [229] E. Sit, F. Dabek, and J. Robertson, UsenetDHT: a low overhead +         Usenet server, The 3rd Int'l Workshop on Peer-to-Peer Systems, +         February 26-27 2004. + +   [230] H.-Y. Hsieh and R. Sivakumar, On transport layer support for +         peer-to-peer networks, The 3rd Int'l Workshop on Peer-to-Peer +         Systems, February 26-27 2004. + +   [231] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S. Surana, +         Internet indirection infrastructure, Proc. 2002 conference on +         applications, technologies, architectures and protocols for +         computer communications, August 19-23 2002, pp. 73-86. + +   [232] E. Halepovic and R. Deters, Building a P2P forum system with +         JXTA, Proc. Second IEEE Int'l Conf. on Peer to Peer Computing +         P2P'02, September 5-7 2002. + +   [233] M. Wawrzoniak, L. Peterson, and T. Roscoe, Sophia: an +         Information Plane for networked systems, ACM SIGCOMM Computer +         Communication Review 34 (1) (2004) 15-20. + +   [234] D. Tran, K. Hua, and T. Do, A Peer-to-Peer Architecture for +         Media Streaming, IEEE Journal on Selected Areas in +         Communications 22 (1) (2004) 121-133. + +   [235] V. Padmanabhan, H. Wang, and P. Chou, Supporting heterogeneity +         and congestion control in peer-to-peer multicast streaming, The +         3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27 +         2004. + +   [236] A. Nicolosi and D. Mazieres, Secure acknowledgment of multicast +         messages in open peer-to-peer networks, The 3rd Int'l Workshop +         on Peer-to-Peer Systems, February 26-27 2004. + + + + + + + +Risson & Moors               Informational                     [Page 75] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [237] R. Zhang and C. Hu, Borg: a hybrid protocol for scalable +         application-level multicast in peer-to-peer networks, Proc. +         13th international workshop on network and operating systems +         for digital audio and video 2003, pp. 172-179. + +   [238] M. Sasabe, N. Wakamiya, M. Murata, and H. Miyahara, Scalable +         and continuous media streaming on peer-to-peer networks, Proc. +         Third Int'l IEEE Conf. on Peer-to-Peer Computing, Sept 1-3 +         2003, pp. 92-99. + +   [239] M. Hefeeda, A. Habib, B. Botev, D. Xu, and B. Bhargava, +         PROMISE: peer-to-peer media streaming using CollectCast, Proc. +         eleventh ACM international conference on multimedia 2003, pp. +         45-54. + +   [240] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, +         and A. Singh, SplitStream:  high-bandwidth multicast in +         cooperative environments, Proc. 19th ACM symposium on operating +         systems principles 2003, pp. 298-313. + +   [241] M. Castro, P. Druschel, A.-M. Kermarrec, and A. Rowstron, +         SCRIBE: a large-scale and decentralized application-level +         multicast infrastructure, IEEE Journal on Selected Areas in +         Communications 20 (8) (2002) + +   [242] S. Zhuang, B. Zhao, A. Joseph, R. Katz, and J. Kubiatowicz, +         Bayeux: an architecture for scalable and fault-tolerant wide- +         area data dissemination, Proc. 11th ACM international workshop +         on network and operating systems support for digital audio and +         video, Jan 2001. + +   [243] R. Lienhart, M. Holliman, Y.-K. Chen, I. Kozintsev, and M. +         Yeung, Improving media services on P2P networks, IEEE Internet +         Computing 6 (1) (2002) 58-67. + +   [244] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. +         Yin, and F. Yu, Data Centric Storage in Sensornets with GHT, a +         geographic hash table, Mobile Networks and Applications 8 (4) +         (2003) 427-442. + +   [245] M. Demirbas and H. Ferhatosmanoglu, Peer-to-peer spatial +         queries in sensor networks, Proc. Third Int'l IEEE Conf. on +         Peer-to-Peer Computing, 1-3 Sept 2003, pp. 32-39. + +   [246] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, +         and S. Shenker, GHT:  a geographic hash table for data-centric +         storage, Proc. First ACM Int'l Workshop on Wireless Sensor +         Networks and Applications (Mobicom) 2002, pp. 78-87. + + + +Risson & Moors               Informational                     [Page 76] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [247] J. Hellerstein and W. Wang, Optimization of In-Network Data +         Reduction, Proc. First Workshop on Data Management for Sensor +         Networks DMSN 2004, August 30th 2004. + +   [248] J. Li, J. Stribling, T. Gil, R. Morris, and F. Kaashoek, +         Comparing the performance of distributed hash tables under +         churn, The 3rd Int'l Workshop on Peer-to-Peer Systems, February +         26-27 2004. + +   [249] S. Shenker, The data-centric revolution in networking, Keynote +         Speech, 29th Int'l Conf. on Very Large Data Bases, September +         9-12 2003. + +   [250] S. Gribble, A. Halevy, Z. Ives, M. Rodrig, and D. Suciu, What +         can databases do for P2P?, Proc.  Fourth Int'l Workshop on +         Databases and the Web, WebDB2001, May 24-25 2001. + +   [251] D. Clark, The design philosophy of the DARPA internet +         protocols, ACM SIGCOMM Computer Communication Review, Symp. +         proceedings on communications architectures and protocols 18 +         (4) (1988) + +   [252] J.-C. Laprie, Dependable Computing and Fault Tolerance: +         Concepts and Terminology, Twenty-Fifth Int'l Symp. on Fault- +         Tolerant Computing, Highlights from Twenty-Five Years 1995, pp. +         2-13. + +   [253] D. Clark, J. Wroclawski, K. Sollins, and R. Braden, Tussle in +         cyberspace:  defining tomorrow's internet, Conf. on +         Applications, Technologies, Architectures and Protocols for +         Computer Communications 2002, pp. 347-356. + +   [254] L. O. Alima, A. Ghodsi, and S. Haridi, "A framework for +         structured peer-to-peer overlay networks," in Global computing, +         vol. 3267, Lecture Notes in Computer Science: Springer Berlin / +         Heidelberg, 2005, pp. 223-249. + +   [255] Clip2, The Gnutella Protocol Specification, +         http://www.clip2.com (2000) + +   [256] Napster, http://www.napster.com (1999) + +   [257] J. Mishchke and B. Stiller, A methodology for the design of +         distributed search in P2P middleware, IEEE Network 18 (1) +         (2004) 30-37. + + + + + + +Risson & Moors               Informational                     [Page 77] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [258] J. Li and K. Sollins, Implementing aggregation and broadcast +         over distributed hash tables.  Full report, +         http://krs.lcs.mit.edu/regions/docs.html (November) (2003) + +   [259] M. Castro, M. Costa, and A. Rowstron, Should we build Gnutella +         on a structured overlay?, ACM SIGCOMM Computer Communication +         Review 34 (1) (2004) 131-136. + +   [260] A. Singla and C. Rohrs, Ultrapeers: Another Step Towards +         Gnutella Scalability, +         http://groups.yahoo.com/group/the_gdf/files/Proposals/ +         Working%20Proposals/Ultrapeer/ Version 1.0, 26 November (2002) + +   [261] B. Cooper and H. Garcia-Molina, Ad hoc, Self-Supervising Peer- +         to-Peer Search Networks, Technical Report, +         http://www.cc.gatech.edu/~cooperb/odin/ 2003. + +   [262] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information +         Retrieval.  Addison Wesley, Essex, England, 1999. + +   [263] S. Sen and J. Wang, Analyzing peer-to-peer traffic across large +         networks, IEEE/ACM Trans. on Networking 12 (2) (2004) 219-232. + +   [264] H. Balakrishnan, S. Shenker, and M. Walfish, Semantic-Free +         Referencing in Linked Distributed Systems, Second Int'l +         Workshop on Peer-to-Peer Systems IPTPS 03, 20-21 February 2003. + +   [265] B. Yang, P. Vinograd, and H. Garcia-Molina, Evaluating GUESS +         and non-forwarding peer-to-peer search, The 24th Int'l Conf. on +         Distributed Computing Systems ICDCS'04, Mar 23-26 2004. + +   [266] A. Gupta, B. Liskov, and R. Rodrigues, One Hop Lookups for +         Peer-to-Peer Overlays, 9th Workshop on Hot Topics in Operating +         Systems (HotOS), 18-21 May 2003. + +   [267] A. Gupta, B. Liskov, and R. Rodrigues, Efficient routing for +         peer-to-peer overlays, First symp. on Networked Systems Design +         and Implementation (NSDI), Mar 29-31 2004, pp. 113-126. + +   [268] A. Mizrak, Y. Cheng, V. Kumar, and S. Savage, Structured +         superpeers: leveraging heterogeneity to provide constant-time +         lookup, IEEE Workshop on Internet Applications, June 23-24 +         2003. + +   [269] L. Adamic, R. Lukose, A. Puniyani, and B. Huberman, Search in +         power-law networks, Physical review E, The American Physical +         Society 64 (046135) (2001) + + + + +Risson & Moors               Informational                     [Page 78] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [270] F. Banaei-Kashani and C. Shahabi, Criticality-based analysis +         and design of unstructured peer-to-peer networks as "complex +         systems", Proc. 3rd IEEE/ACM Int'l Symp. on Cluster Computing +         and the Grid 2003, pp. 351-358. + +   [271] KaZaa, KaZaa Media Desktop, www.kazaa.com (2001) + +   [272] S. Sen and J. Wang, Analyzing peer-to-peer traffic across large +         networks, Proc. second ACM SIGCOMM workshop on Internet +         measurement, November 06-08 2002, pp. 137-150. + +   [273] DirectConnect, http:www.neo-modus.com (2001) + +   [274] S. Saroiu, K. Gummadi, R. Dunn, S. Gribble, and H. Levy, An +         analysis of Internet content delivery systems, ACM SIGOPS +         Operating Systems Review 36 (2002) 315-327. + +   [275] A. Loo, The Future or Peer-to-Peer Computing, Communications of +         the ACM 46 (9) (2003) 56-61. + +   [276] B. Yang and H. Garcia-Molina, Comparing Hybrid Peer-to-Peer +         Systems (extended), 27th Int'l Conf. on Very Large Data Bases, +         September 11-14 2001. + +   [277] D. Scholl, OpenNap Home Page, http://opennap.sourceforge.net/ +         (2001) + +   [278] S. Ghemawat, H. Gobioff, and S.-T. Leung, The Google file +         system, Proc. 19th ACM symposium on operating systems +         principles 2003, pp. 29-43. + +   [279] I. Clarke, S. Miller, T. Hong, O. Sandberg, and B. Wiley, +         Protecting Free Expression Online with Freenet, IEEE Internet +         Computing 6 (1) (2002) + +   [280] J. Mache, M. Gilbert, J. Guchereau, J. Lesh, F. Ramli, and M. +         Wilkinson, Request algorithms in Freenet-style peer-to-peer +         systems, Proc. Second IEEE Int'l Conf. on Peer to Peer +         Computing P2P'02, September 5-7 2002. + +   [281] C. Rohrs, Query Routing for the Gnutella Networks, +         http://www.limewire.com/developer/query_routing/ +         keyword%20routing.htm Version 1.0 (2002) + + + + + + + + +Risson & Moors               Informational                     [Page 79] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [282] I. Clarke, Freenet's Next Generation Routing Protocol, +         http://freenetproject.org/index.php?page=ngrouting, 20th July +         2003. + +   [283] A. Z. Kronfol, FASD: A fault-tolerant, adaptive scalable +         distributed search engine, Master's Thesis +         http://www.cs.princeton.edu/~akronfol/fasd/ 2002. + +   [284] S. Gribble, E. Brewer, J. M. Hellerstein, and D. Culler, +         Scalable, Distributed Data Structures for Internet Service +         Construction, Proc. 4th Symp. on Operating Systems Design and +         Implementation OSDI 2000, October 2000. + +   [285] K. Aberer, Efficient Search in Unbalanced, Randomized Peer-to- +         Peer Search Trees, EPFL Technical Report IC/2002/79 (2002) + +   [286] R. Honicky and E. Miller, A fast algorithm for online placement +         and reorganization of replicated data, Proc. 17th Int'l +         Parallel and Distributed Processing Symp., April 2003. + +   [287] G. S. Manku, Routing networks for distributed hash tables, +         Proc. 22nd annual ACM Symp. on Principles of Distributed +         Computing, PODC 2003, July 13-16 2003, pp. 133-142. + +   [288] D. Lewin, Consistent hashing and random trees: algorithms for +         caching in distributed networks, Master's Thesis, Department of +         Electrical Engineering and Computer Science, Massachusetts +         Institute of Technology (1998) + +   [289] S. Lei and A. Grama, Extended consistent hashing: a framework +         for distributed servers, Proc. 24th Int'l Conf. on Distributed +         Computing Systems ICDCS 2004, March 23-26 2004. + +   [290] W. Litwin, Re: Chord & LH*, Email to Ion Stoica, March 23 +         2004a. + +   [291] J. Li, J. Stribling, R. Morris, F. Kaashoek, and T. Gil, A +         performance vs. cost framework for evaluating DHT design +         tradeoffs under churn, Proc. IEEE Infocom, Mar 13-17 2005. + +   [292] S. Zhuang, D. Geels, I. Stoica, and R. Katz, On failure +         detection algorithms in overlay networks, Proc. IEEE Infocomm, +         Mar 13-17 2005. + +   [293] X. Li, J. Misra, and C. G. Plaxton, Active and Concurrent +         Topology Maintenance, The 18th Annual Conf. on Distributed +         Computing (DISC 2004), Trippenhuis, Amsterdam, the Netherlands, +         October 4 - October 7 (2004) + + + +Risson & Moors               Informational                     [Page 80] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [294] K. Aberer, L. O. Alima, A. Ghodsi, S. Girdzijauskas, M. +         Hauswirth, and S. Haridi, The essence of P2P: a reference +         architecture for overlay networks, Proc. of the 5th +         international conference on peer-to-peer computing, Aug 31-Sep +         2 2005. + +   [295] C. Tang, M. Buco, R. Chang, S. Dwarkadas, L. Luan, E. So, and +         C. Ward, Low traffic overlay networks with large routing +         tables, Proc. of ACM Sigmetrics Int'l Conf. on Measurement and +         Modeling of Comp. Sys., Jun 6-10 2005, pp. 14-25. + +   [296] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, Handling +         churn in a DHT, Proc. of the USENIX Annual Technical +         Conference, June 2004. + +   [297] C. Blake and R. Rodrigues, High Availability, Scalable Storage, +         Dynamic Peer Networks:  Pick Two, 9th Workshop on Hot Topics in +         Operating Systems (HotOS), Lihue, Hawaii, 18-21 May (2003) + +   [298] S. Rhea, B. Godfrey, B. Karp, J. Kubiatowicz, S. Ratnasamy, S. +         Shenker, I. Stoica, and H. Yu, OpenDHT: a public DHT service +         and its uses, Proc. of the conf. on Applications, technologies, +         architectures and protocols for computer communications, Aug +         22-26 2005, pp. 73-84. + +   [299] T. Gil, F. Kaashoek, J. Li, R. Morris, and J. Stribling, +         p2psim, a simulator for peer-to-peer protocols, +         http://www.pdos.lcs.mit.edu/p2psim/ (2003) + +   [300] K. Hildrum, J. D. Kubiatowicz, S. Rao, and B. Y. Zhao, +         Distributed object location in a dynamic network, Theory of +         Computing Systems (2004) + +   [301] N. Lynch, D. Malkhi, and D. Ratajczak, Atomic data access in +         distributed hash tables, Proc. Int'l Peer-to-Peer Symp., March +         7-8 2002. + +   [302] S. Gilbert, N. Lynch, and A. Shvartsman, RAMBO II: Rapidly +         Reconfigurable Atomic Memory for Dynamic Networks, Technical +         Report, MIT-CSAIL-TR-890 2004. + +   [303] N. Lynch and I. Stoica, MultiChord: A resilient namespace +         management algorithm, Technical Memo MIT-LCS-TR-936 2004. + +   [304] J. Risson, K. Robinson, and T. Moors, Fault tolerant active +         rings for structured peer-to-peer overlays, Proc. of the 30th +         Annual IEEE Conf. on Local Computer Networks, Nov 15-17 2005, +         pp. 18-25. + + + +Risson & Moors               Informational                     [Page 81] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [305] B. Awerbuch and C. Scheideler, Peer-to-peer systems for prefix +         search, Proc. 22nd annual ACM Symp. on Principles of +         Distributed Computing 2003, pp. 123-132. + +   [306] F. Dabek, B. Zhao, P. Druschel, J. Kubiatowicz, and I. Stoica, +         Towards a common API for structured P2P overlays, Proc. Second +         Int'l Workshop on Peer to Peer Systems IPTPS 2003, February +         2003. + +   [307] N. Feamster and H. Balakrishnan, Towards a logic for wide-area +         Internet routing, Proc. ACM SIGCOMM workshop on Future +         Directions in Network Architecture, August 25-27 2003, pp. +         289-300. + +   [308] B. Ahlgren, M. Brunner, L. Eggert, R. Hancock, and S. Schmid, +         Invariants: a new design methodology for network architectures, +         Proc. ACM SIGCOMM workshop on Future Direction in Network +         Architecture, August 30 2004, pp. 65-70. + +   [309] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction +         to Algorithms, 2nd Edition. MIT Press, McGraw-Hill, Cambridge, +         London, England, 2003. + +   [310] I. Abraham, D. Malkhi, and O. Dubzinski, LAND:Stretch +         (1+epsilon) Locality Aware Networks for DHTs, Proc. ACM-SIAM +         Symp. on Discrete Algorithms SODA-04 2004. + +   [311] S. Jain, R. Mahajan, and D. Wetherall, A study of the +         performance potential of DHT-based overlays, Proc. of the 4th +         Usenix symposium on internet technologies and systems (USITS), +         Mar 2003. + +   [312] J. Risson, A. Harwood, and T. Moors, Stable high-capacity one- +         hop distributed hash tables, Proc. of the IEEE Symposium on +         Computers and Communications (ISCC'06), Jun 26-29 2006. + +   [313] V. Ramasubramanian and E. Sirer, Beehive: O(1) Lookup +         Performance for Power-Law Query Distributions in Peer-to-Peer +         Overlays, Proc. First Symp. on Networked Systems Design and +         Implementation (NSDI'04), San Francisco, California, March +         29-31 (2004) 99-112. + +   [314] I. Abraham, A. Badola, D. Bickson, D. Malkhi, S. Maloo, and S. +         Ron, Practical locality-awareness for large scale information +         sharing, Proc. 4th International Workshop on Peer-to-Peer +         Systems, Feb 24-25 2005. + + + + + +Risson & Moors               Informational                     [Page 82] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [315] B. Leong, B. Liskov, and E. Demaine, Epichord: parallelizing +         the Chord lookup algorithm with reactive routing state +         management, Proc. of the 12th International Conference on +         Networks, Nov 2004. + +   [316] J. Li, J. Stribling, R. Morris, and F. Kaashoek, Bandwidth- +         efficient management of DHT routing tables, Proc. 2nd Symposium +         on Networked Systems Design and Implementation, May 2-4 2005. + +   [317] S. Rhea, B.-G. Chun, J. Kubiatowicz, and S. Shenker, Fixing the +         embarrassing slowness of OpenDHT on PlanetLab, Proc. of the +         Second USENIX Workshop on Real, Large Distributed Systems, Dec +         13 2005. + +   [318] M. Costa, M. Castro, A. Rowstron, and P. Key, PIC: Practical +         Internet coordinates for distance estimation, Proc. of the 24th +         international conference on distributed computing systems, Mar +         2004. + +   [319] M. Castro, M. B. Jones, A.-M. Kermarrec, A. Rowstron, M. +         Theimer, H. Wang, and A. Wolman, An evaluation of scalable +         application- level multicast built using peer-to-peer overlays, +         Proc. of the 22nd Annual Joint Conf. of the IEEE Comp. and +         Comm. Soc. (INFOCOM), 30 Mar - 3 Apr 2003, pp. 1510-1520. + +   [320] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker, +         Application-level multicast using content-addressable networks, +         Proc. of the Third International Workshop on Networked Group +         Communication, Nov 7-9 2001. + +   [321] S. El-Ansary, L. Alima, P. Brand, and S. Haridi, Efficient +         broadcast in structured P2P networks, Second Int'l Workshop on +         Peer-to-Peer Systems (IPTPS 03), Berkeley, CA, USA, 20-21 +         February (2003) + +   [322] J. Li, K. Sollins, and D.-Y. Lim, Implementing aggregation and +         broadcast over Distributed Hash Tables, ACM Computer +         Communication Reviews 35 (1) (2005) 81-92. + +   [323] V. Pai, K. Tamilmani, V. Sambamurthy, K. Kumar, and A. Mohr, +         Chainsaw: eliminating trees from overlay multicast, Proc. 4th +         Int'l Workshop on Peer-to-Peer Systems, February 24-25 2005. + +   [324] K. Birman, M. Hayden, O. Ozkasap, Z. Xiao, and M. Budiu, +         Bimodal Multicast, ACM Trans. on Computer Systems 17 (2) (1999) +         41-88. + + + + + +Risson & Moors               Informational                     [Page 83] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [325] Z. Zhang, S. Chen, Y. Ling, and R. Chow, Resilient capacity- +         aware multicasting based on overlay networks, Proc. of the 25th +         IEEE Int'l Conf. on Distributed Computing Systems, 6-10 June +         2005, pp. 565-574. + +   [326] A. Bharambe, S. Rao, V. Padmanabhan, S. Seshan, and H. Zhang, +         The impact of heterogeneous bandwidth constraints on DHT-based +         multicast protocols, Proc. 4th Int'l Workshop on Peer-to-Peer +         Systems, February 24-25 2005. + +   [327] A. Ghodsi, L. O. Alima, S. El-Ansary, P. Brand, and S. Haridi, +         Self-correcting broadcast in distributed hash tables, Proc. of +         the 15th IASTED International Conf. on Parallel and Distributed +         Computing and Systems, Nov 2003. + +   [328] R. Mahajan, M. Castro, and A. Rowstron, Controlling the cost of +         reliability in peer-to-peer overlays, Second Int'l Workshop on +         Peer-to-Peer Systems IPTPS'03, February 20-21 2003. + +   [329] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, Handling +         churn in a DHT, Report No. UCB/CSD-03-1299, University of +         California, also Proc. USENIX Annual Technical Conference, June +         2003. + +   [330] M. Castro, M. Costa, and A. Rowstron, Performance and +         dependability of structured peer-to-peer overlays, Microsoft +         Research Technical Report MSR-TR-2003-94, December. Also 2004 +         Int'l Conf. on Dependable Systems and Networks, June 28-July 1 +         2003. + +   [331] D. Liben-Nowell, H. Balakrishnan, and D. Karger, Analysis of +         the evolution of peer-to-peer systems, Annual ACM Symp. on +         Principles of Distributed Computing 2002, pp. 233-242. + +   [332] L. Alima, S. El-Ansary, P. Brand, and S. Haridi, DKS(N,k,f): a +         family of low communication, scalable and fault-tolerant +         infrastructures for P2P applications, Proc. 3rd IEEE/ACM Int'l +         Symp. on Cluster Computing and the Grid (2003) 344-350. + +   [333] D. Karger and M. Ruhl, Finding nearest neighbours in growth- +         restricted metrics, Proc. 34th annual ACM symposium on Theory +         of computing 2002, pp. 741-750. + +   [334] S. Ratnasamy, A Scalable Content-Addressable Network, Doctoral +         Dissertation 2002. + +   [335] S. McCanne and S. Floyd, The LBNL/UCB Network Simulator. + + + + +Risson & Moors               Informational                     [Page 84] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [336] M. Naor and U. Wieder, Novel architectures for P2P +         applications: the continuous-discrete approach, Proc. fifteenth +         annual ACM Symp. on Parallel Algorithms and Architectures, SPAA +         2003, June 7-9 2003, pp. 50-59. + +   [337] N. D. de Bruijn, A combinatorial problem, Koninklijke +         Netherlands: Academe Van Wetenschappen 49 (1946) 758-764. + +   [338] J.-W. Mao, "The Coloring and Routing Problems on de Bruijn +         Interconnection Networks," in Doctoral Dissertation, National +         Sun Yat-sen University, 2003. + +   [339] M. L. Schlumberger, De Bruijn communication networks, Doctoral +         Dissertation 1974. + +   [340] M. Imase and M. Itoh, Design to minimize diameter on building- +         block network, IEEE Trans. on Computers C-30 (6) (1981) 439- +         442. + +   [341] S. M. Reddy, D. K. Pradhan, and J. G. Kuhl, Direct graphs with +         minimal and maximal connectivity, Technical Report, School of +         Engineering, Oakland University (1980) + +   [342] R. A. Rowley and B. Bose, Fault-tolerant ring embedding in de +         Bruijn networks, IEEE Trans. on Computers 42 (12) (1993) 1480- +         1486. + +   [343] K. Y. Lee, G. Liu, and H. F. Jordan, Hierarchical networks for +         optical communications, Journal of Parallel and Distributed +         Computing 60 (2000) 1-16. + +   [344] M. Naor and U. Wieder, Know thy neighbor's neighbor:  better +         routing for skip-graphs and small worlds, The 3rd Int'l +         Workshop on Peer-to-Peer Systems, February 26-27 2004. + +   [345] P. Fraigniaud and P. Gauron, The content-addressable networks +         D2B, Technical Report 1349, Laboratoire de Recherche en +         Informatique, January 2003. + +   [346] A. Datta, S. Girdzijauskas, and K. Aberer, On de Bruijn routing +         in distributed hash tables: there and back again, Proc. Fourth +         IEEE Int'l Conf. on Peer-to-Peer Computing, , 25-27 August +         2004. + +   [347] W. Pugh, Skip lists: a probabilistic alternative to balanced +         trees, Proc. Workshop on Algorithms and Data Structures, August +         17-19 1989, pp. 437-449. + + + + +Risson & Moors               Informational                     [Page 85] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [348] W. Pugh, Skip lists: a probabilistic alternative to balanced +         trees, Communications of the ACM 33 (6) (1990) 668-676. + +   [349] J. Gray, The transaction concept: Virtues and limitations, +         Proc. VLDB, September 1981. + +   [350] B. T. Loo, J. M. Hellerstein, R. Huebsch, S. Shenker, and I. +         Stoica, Enhancing P2P file-sharing with internet-scale query +         processor, Proc. 30th Int'l Conf. on Very Large Data Bases VLDB +         2004, 29 August-3 September 2004. + +   [351] M. Stonebraker, P. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. +         Sidell, C. Staelin, and A. Yu, Mariposa: a wide-area +         distributed database system, THE VLDB Journal - The Int'l +         Journal of Very Large Data Bases (5) (1996) 48-63. + +   [352] V. Cholvi, P. Felber, and E. Biersack, Efficient Search in +         Unstructured Peer-to-Peer Networks, Proc. Symp. on Parallel +         Algorithms and Architectures, July 2004. + +   [353] S. Daswani and A. Fisk, Gnutella UDP Extension for Scalable +         Searches (GUESS) v0.1, +         http://www.limewire.org/fisheye/viewrep/~raw,r=1.2/limecvs/ +         core/guess_01.html (2002) + +   [354] A. Fisk, Gnutella Dynamic Query Protocol v0.1, Gnutella +         Developer Forum (2003) + +   [355] O. Gnawali, A Keyword Set Search System for Peer-to-Peer +         Networks, Master's Thesis 2002. + +   [356] Limewire, Limewire Host Count, +         http://www.limewire.com/english/content/netsize.shtml (2004) + +   [357] A. Fisk, Gnutella Ultrapeer Query Routing, +         http://groups.yahoo.com/group/the_gdf/files/Proposals/ +         Working%20Proposals/search/Ultrapeer%20QRP/ v0.1 (2003) + +   [358] A. Fisk, Gnutella Dynamic Query Protocol, +         http://groups.yahoo.com/group/the_gdf/files/Proposals/ +         Working%20Proposals/search/Dynamic%20Querying/ v0.1 (2003) + +   [359] S. Thadani, Meta Data searches on the Gnutella Network +         (addendum), http://www.limewire.com/developer/MetaProposal2.htm +         (2001) + + + + + + +Risson & Moors               Informational                     [Page 86] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [360] S. Thadani, Meta Information Searches on the Gnutella Networks, +         http://www.limewire.com/developer/metainfo_searches.html (2001) + +   [361] P. Reynolds and A. Vahdat, Efficient peer-to-peer keyword +         searching, ACM/IFP/USENIX Int'l Middleware Conference, +         Middleware 2003, June 16-20 2003. + +   [362] W. Terpstra, S. Behnel, L. Fiege, J. Kangasharju, and A. +         Buchmann, Bit Zipper Rendezvous, optimal data placement for +         general P2P queries, Proc. First Int'l Workshop on Peer-to-Peer +         Computing and Databases, March 14 2004. + +   [363] A. Singhal, Modern Information Retrieval: A Brief Overview, +         IEEE Data Engineering Bulletin 24 (4) (2001) 35-43. + +   [364] E. Cohen, A. Fiat, and H. Kaplan, Associative Search in Peer to +         Peer Networks: Harnessing Latent Semantics, IEEE Infocom 2003, +         The 22nd Annual Joint Conf. of the IEEE Computer and +         Communications Societies, March 30-April 3 2003. + +   [365] W. Muller and A. Henrich, Fast retrieval of high-dimensional +         feature vectors in P2P networks using compact peer data +         summaries, Proc. 5th ACM SIGMM international workshop on +         Multimedia Information Retrieval, November 7 2003, pp. 79-86. + +   [366] M. T. Ozsu and P. Valduriez, Principles of Distributed Database +         Systems, 2nd edition ed. Prentice Hall, 1999. + +   [367] G. Salton, A. Wong, and C. S. Yang, A vector space model for +         automatic indexing, Communications of the ACM 18 (11) (1975) +         613- 620. + +   [368] S. E. Robertson, S. Walker, and M. Beaulieu, Okapi at TREC-7: +         automatic ad hoc, filtering, VLC and filtering tracks, Proc. +         Seventh Text REtrieval Conference, TREC-7, NIST Special +         Publication 500-242, July 1999, pp. 253-264. + +   [369] A. Singhal, J. Choi, D. Hindle, D. Lewis, and F. Pereira, AT&T +         at TREC-7, Proc. Seventh Text REtrieval Conf. TREC-7, July +         1999, pp. 253-264. + +   [370] K. Sankaralingam, S. Sethumadhavan, and J. Browne, Distributed +         Pagerank for P2P Systems, Proc. 12th international symposium on +         High Performance Distributed Computing HPDC, June 22-24 2003. + +   [371] I. Klampanos and J. Jose, An architecture for information +         retrieval over semi-collaborated peer-to-peer networks, Proc. +         2004 ACM symposium on applied computing 2004, pp. 1078-1083. + + + +Risson & Moors               Informational                     [Page 87] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [372] C. Tang, Z. Xu, and S. Dwarkadas, Peer-to-peer information +         retrieval using self-organizing semantic overlay networks, +         Proc. 2003 conference on Applications, Technologies, +         Architectures and Protocols for Computer Communications, August +         25-29 2003, pp. 175-186. + +   [373] C. Tang and S. Dwarkadas, Hybrid global-local indexing for +         efficient peer-to-peer information retrieval, Proc. First Symp. +         on Networked Systems Design and Implementation NSDI'04, March +         29-31 2004, pp. 211-224. + +   [374] G. W. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer, R. +         A. Harshman, L. A. Streeter, and K. E. Lochbaum, Information +         retrieval using a singular value decomposition model of latent +         semantic structure, Proc. 11th Annual Int'l ACM SIGIR Conf. on +         Research and Development in Information Retrieval 1988, pp. +         465-480. + +   [375] C. Tang, S. Dwarkadas, and Z. Xu, On scaling latent semantic +         indexing for large peer-to-peer systems, The 27th Annual Int'l +         ACM SIGIR Conf. SIGIR'04, ACM Special Interest Group on +         Information Retrieval, July 2004. + +   [376] S. Milgram, The small world problem, Psychology Today 1 (61) +         (1967) + +   [377] J. Kleinberg, The small-world phenonemon: An algorithmic +         perspective, Proc. 32nd ACM Symp. on Theory of Computing (2000) + +   [378] Y. Petrakis and E. Pitoura, "On constructing small worlds in +         unstructured peer-to-peer systems," in Current trends in +         database technology (Proc. First Int'l Workshop on Peer-to-Peer +         Computing and Databases, Heraklion, Crete, Greece, March 14), +         vol. 3268, Lecture Notes in Computer Science: Springer, 2004, +         pp. 415-424. + +   [379] A. Iamnitchi, M. Ripeanu, and I. Foster, Locating Data in +         (Small World?) P2P Scientific Collaborations, First Int'l +         Workshop on Peer-to-Peer Systems (IPTPS), Cambridge, MA, March +         (2002) + + + + + + + + + + + +Risson & Moors               Informational                     [Page 88] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +   [380] Y. Ren, C. Sha, W. Qian, A. Zhou, B. Ooi, and K. Tan, Explore +         the "small world phenomena" in pure P2P information sharing +         systems, Proc. 3rd IEEE/ACM Int'l Symp. on Cluster Computing +         and the Grid (2003) 232-239. + +   [381] G. S. Manku, M. Bawa, and P. Raghavan, Symphony:  Distributed +         Hashing in a Small World, Proc. 4th USENIX Symp. on Internet +         Technologies and Systems, March 26-28 2003. + +   [382] W. Litwin and S. Sahri, Implementing SD-SQL Server: a Scalable +         Distributed Database System, CERIA Research Rerpot 2004-04-02, +         April 2004. + +   [383] M. Jarke and J. Koch, Query Optimization in Database Systems, +         ACM Computing Surveys 16 (2) (1984) 111-152. + +   [384] J. L. Bentley, Multidimensional binary search trees used for +         associative searching, Communications of the ACM 18 (9) (1975) +         509-517. + +   [385] B. Chun, I. Stoica, J. Hellerstein, R. Huebsch, S. Jeffery, B. +         T. Loo, S. Mardanbeigi, T. Roscoe, S. Rhea, and S. Schenker, +         Querying at Internet Scale, Proc. 2004 ACM SIGMOD international +         conference on management of data, demonstration session 2004, +         pp. 935-936. + +   [386] P. Cao and Z. Wang, Efficient top-K query calculation in +         distributed networks, Proc. 23rd Annual ACM SIGACT-SIGOPS Symp. +         on Principles of Distributed Computing PODC 2004, July 25-28 +         2004, pp. 206-215. + +   [387] D. Psaltoulis, I. Kostoulas, I. Gupta, K. Birman, and A. +         Demers, Practical algorithms for size estimation in large and +         dynamic groups, Proc. Twenty-Third Annual ACM SIGACT-SIGOPS +         Symp. on Principles of Distributed Computing, PODC 2004, July +         25-28 2004. + +   [388] R. van Renesse, The importance of aggregation, Springer-Verlag +         Lecture Notes in Computer Science  "Future Directions in +         Distributed Computing".  A. Schiper, A. A. Shvartsman, H. +         Weatherspoon, and B. Y. Zhao, editors. Springer-Verlag, +         Heidelberg volume 2584 (2003) + + + + + + + + + +Risson & Moors               Informational                     [Page 89] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +Author's Addresses + +   John Risson +   School of Elec Eng and Telecommunications +   University of New South Wales +   Sydney NSW 2052 Australia + +   EMail: jr@tuffit.com + + +   Tim Moors +   School of Elec Eng and Telecommunications +   University of New South Wales +   Sydney NSW 2052 Australia + +   EMail: t.moors@unsw.edu.au + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Risson & Moors               Informational                     [Page 90] + +RFC 4981            Survey of Research on P2P Search      September 2007 + + +Full Copyright Statement + +   Copyright (C) The IETF Trust (2007). + +   This document is subject to the rights, licenses and restrictions +   contained in BCP 78, and except as set forth therein, the authors +   retain all their rights. + +   This document and the information contained herein are provided on an +   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS +   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND +   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS +   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF +   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED +   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + +   The IETF takes no position regarding the validity or scope of any +   Intellectual Property Rights or other rights that might be claimed to +   pertain to the implementation or use of the technology described in +   this document or the extent to which any license under such rights +   might or might not be available; nor does it represent that it has +   made any independent effort to identify any such rights.  Information +   on the procedures with respect to rights in RFC documents can be +   found in BCP 78 and BCP 79. + +   Copies of IPR disclosures made to the IETF Secretariat and any +   assurances of licenses to be made available, or the result of an +   attempt made to obtain a general license or permission for the use of +   such proprietary rights by implementers or users of this +   specification can be obtained from the IETF on-line IPR repository at +   http://www.ietf.org/ipr. + +   The IETF invites any interested party to bring to its attention any +   copyrights, patents or patent applications, or other proprietary +   rights that may cover technology that may be required to implement +   this standard.  Please address the information to the IETF at +   ietf-ipr@ietf.org. + + + + + + + + + + + + +Risson & Moors               Informational                     [Page 91] + |