summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc4981.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc4981.txt')
-rw-r--r--doc/rfc/rfc4981.txt5099
1 files changed, 5099 insertions, 0 deletions
diff --git a/doc/rfc/rfc4981.txt b/doc/rfc/rfc4981.txt
new file mode 100644
index 0000000..6e832a9
--- /dev/null
+++ b/doc/rfc/rfc4981.txt
@@ -0,0 +1,5099 @@
+
+
+
+
+
+
+Network Working Group J. Risson
+Request for Comments: 4981 T. Moors
+Category: Informational University of New South Wales
+ September 2007
+
+
+ Survey of Research towards Robust Peer-to-Peer Networks:
+ Search Methods
+
+Status of This Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+IESG Note
+
+ This RFC is not a candidate for any level of Internet Standard. The
+ IETF disclaims any knowledge of the fitness of this RFC for any
+ purpose and notes that the decision to publish is not based on IETF
+ review apart from IESG review for conflict with IETF work. The RFC
+ Editor has chosen to publish this document at its discretion. See
+ RFC 3932 for more information.
+
+Abstract
+
+ The pace of research on peer-to-peer (P2P) networking in the last
+ five years warrants a critical survey. P2P has the makings of a
+ disruptive technology -- it can aggregate enormous storage and
+ processing resources while minimizing entry and scaling costs.
+
+ Failures are common amongst massive numbers of distributed peers,
+ though the impact of individual failures may be less than in
+ conventional architectures. Thus, the key to realizing P2P's
+ potential in applications other than casual file sharing is
+ robustness.
+
+ P2P search methods are first couched within an overall P2P taxonomy.
+ P2P indexes for simple key lookup are assessed, including those based
+ on Plaxton trees, rings, tori, butterflies, de Bruijn graphs, and
+ skip graphs. Similarly, P2P indexes for keyword lookup, information
+ retrieval and data management are explored. Finally, early efforts
+ to optimize range, multi-attribute, join, and aggregation queries
+ over P2P indexes are reviewed. Insofar as they are available in the
+ primary literature, robustness mechanisms and metrics are highlighted
+ throughout. However, the low-level mechanisms that most affect
+ robustness are not well isolated in the literature. Recommendations
+ are given for future research.
+
+
+
+Risson & Moors Informational [Page 1]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+Table of Contents
+
+ 1. Introduction ....................................................3
+ 1.1. Related Disciplines ........................................6
+ 1.2. Structured and Unstructured Routing ........................7
+ 1.3. Indexes and Queries ........................................9
+ 2. Index Types ....................................................10
+ 2.1. Local Index (Gnutella) ....................................10
+ 2.2. Central Index (Napster) ...................................12
+ 2.3. Distributed Index (Freenet) ...............................13
+ 3. Semantic Free Index ............................................15
+ 3.1. Origins ...................................................15
+ 3.1.1. Plaxton, Rajaraman, and Richa (PRR) ................15
+ 3.1.2. Consistent Hashing .................................16
+ 3.1.3. Scalable Distributed Data Structures (LH*) .........16
+ 3.2. Dependability .............................................17
+ 3.2.1. Static Dependability ...............................17
+ 3.2.2. Dynamic Dependability ..............................18
+ 3.2.3. Ephemeral or Stable Nodes -- O(log n) or
+ O(1) Hops ..........................................19
+ 3.2.4. Simulation and Proof ...............................20
+ 3.3. Latency ...................................................21
+ 3.3.1. Hop Count and the O(1)-Hop DHTs ....................21
+ 3.3.2. Proximity and the O(log n)-Hop DHTs ................22
+ 3.4. Multicasting ..............................................23
+ 3.4.1. Multicasting vs. Broadcasting ......................23
+ 3.4.2. Motivation for DHT-based Multicasting ..............23
+ 3.4.3. Design Issues ......................................24
+ 3.5. Routing Geometries ........................................25
+ 3.5.1. Plaxton Trees (Pastry, Tapestry) ...................25
+ 3.5.2. Rings (Chord, DKS) .................................27
+ 3.5.3. Tori (CAN) .........................................28
+ 3.5.4. Butterflies (Viceroy) ..............................29
+ 3.5.5. de Bruijn (D2B, Koorde, Distance Halving, ODRI) ....30
+ 3.5.6. Skip Graphs ........................................32
+ 4. Semantic Index .................................................33
+ 4.1. Keyword Lookup ............................................34
+ 4.1.1. Gnutella Enhancements ..............................36
+ 4.1.2. Partition-by-Document, Partition-by-Keyword ........38
+ 4.1.3. Partial Search, Exhaustive Search ..................39
+ 4.2. Information Retrieval .....................................39
+ 4.2.1. Vector Model (PlanetP, FASD, eSearch) ..............41
+ 4.2.2. Latent Semantic Indexing (pSearch) .................43
+ 4.2.3. Small Worlds .......................................43
+ 5. Queries ........................................................44
+ 5.1. Range Queries .............................................45
+ 5.2. Multi-Attribute Queries ...................................48
+ 5.3. Join Queries ..............................................50
+
+
+
+Risson & Moors Informational [Page 2]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ 5.4. Aggregation Queries .......................................50
+ 6. Security Considerations ........................................52
+ 7. Conclusions ....................................................52
+ 8. Acknowledgments ................................................53
+ 9. References .....................................................54
+ 9.1. Informative References ....................................54
+
+1. Introduction
+
+ Peer-to-peer (P2P) networks are those that exhibit three
+ characteristics: self-organization, symmetric communication, and
+ distributed control [1]. A self-organizing P2P network
+ "automatically adapts to the arrival, departure and failure of nodes"
+ [2]. Communication is symmetric in that peers act as both clients
+ and servers. It has no centralized directory or control point.
+ USENET servers and BGP peers have these traits [3] but the emphasis
+ here is on the flurry of research since 2000. Leading examples
+ include Gnutella [4], Freenet [5], Pastry [2], Tapestry [6], Chord
+ [7], the Content Addressable Network (CAN) [8], pSearch [9], and
+ Edutella [10]. Some have suggested that peers are inherently
+ unreliable [11]. Others have assumed well-connected, stable peers
+ [12].
+
+ This critical survey of P2P academic literature is warranted, given
+ the intensity of recent research. At the time of writing, one
+ research database lists over 5,800 P2P publications [13]. One vendor
+ surveyed P2P products and deployments [14]. There is also a tutorial
+ survey of leading P2P systems [15]. DePaoli and Mariani recently
+ reviewed the dependability of some early P2P systems at a high level
+ [16]. The need for a critical survey was flagged in the peer-to-peer
+ research group of the Internet Research Task Force (IRTF) [17].
+
+ P2P is potentially a disruptive technology with numerous
+ applications, but this potential will not be realized unless it is
+ demonstrated to be robust. A massively distributed search technique
+ may yield numerous practical benefits for applications [18]. A P2P
+ system has potential to be more dependable than architectures relying
+ on a small number of centralized servers. It has potential to evolve
+ better from small configurations -- the capital outlays for high
+ performance servers can be reduced and spread over time if a P2P
+ assembly of general purpose nodes is used. A similar argument
+ motivated the deployment of distributed databases -- one thousand,
+ off-the-shelf PC processors are more powerful and much less expensive
+ than a large mainframe computer [19]. Storage and processing can be
+ aggregated to achieve massive scale. Wasteful partitioning between
+ servers or clusters can be avoided. As Gedik and Liu put it, if P2P
+ is to find its way into applications other than casual file sharing,
+ then reliability needs to be addressed [20].
+
+
+
+Risson & Moors Informational [Page 3]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ The taxonomy of Figure 1 divides the entire body of P2P research
+ literature along four lines: search, storage, security, and
+ applications. This survey concentrates on search aspects. A P2P
+ search network consists of an underlying index (Sections 2 to 4) and
+ queries that propagate over that index (Section 5).
+
+ Search [18, 21-29]
+ Semantic-Free Indexes [2, 6, 7, 30-52]
+ Plaxton Trees
+ Rings
+ Tori
+ Butterflies
+ de Bruijn Graphs
+ Skip Graphs
+ Semantic Indexes [4, 53-71]
+ Keyword Lookup
+ Peer Information Retrieval
+ Peer Data Management
+ Queries [20, 22, 23, 25, 32, 38, 41, 56, 72-100]
+ Range Queries
+ Multi-Attribute Queries
+ Join Queries
+ Aggregation Queries
+ Continuous Queries
+ Recursive Queries
+ Adaptive Queries
+
+ Storage
+ Consistency & Replication [101-112]
+ Eventual consistency
+ Trade-offs
+ Distribution [39, 42, 90, 92, 113-131]
+ Epidemics, Bloom Filters
+ Fault Tolerance [40, 105, 132-139]
+ Erasure Coding
+ Byzantine Agreement
+ Locality [24, 43, 47, 140-160]
+ Load Balancing [37, 86, 100, 107, 151, 161-171]
+
+
+
+
+
+
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 4]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ Security
+ Character [172-182]
+ Identity
+ Reputation and Trust
+ Incentives
+ Goals [25, 27, 71, 183-197]
+ Availability
+ Authenticity
+ Anonymity
+ Access Control
+ Fair Trading
+
+ Applications [1, 198-200]
+ Memory [32, 90, 142, 201-222]
+ File Systems
+ Web
+ Content Delivery Networks
+ Directories
+ Service Discovery
+ Publish / Subscribe ...
+ Intelligence [223-228]
+ GRID
+ Security...
+ Communication [12, 92, 119, 229-247]
+ Multicasting
+ Streaming Media
+ Mobility
+ Sensors...
+
+ Figure 1: Classification of P2P Research Literature
+
+ This survey is concerned with two questions. The first, "How do P2P
+ search networks work?" This foundation is important given the pace
+ and breadth of P2P research in the last five years. In Section 2, we
+ classify indexes as local, centralized and distributed. Since
+ distributed indexes are becoming dominant, they are given closer
+ attention in Sections 3 and 4. Section 3 compares distributed P2P
+ indexes for simple key lookup; in particular, their origins (Section
+ 3.1), dependability (Section 3.2), latency (Section 3.3), and their
+ support for multicast (Section 3.4). It classifies those indexes
+ according to their routing geometry (Section 3.5) -- Plaxton trees,
+ rings, tori, butterflies, de Bruijn graphs and skip graphs. Section
+ 4 reviews distributed P2P indexes supporting keyword lookup (Section
+ 4.1) and information retrieval (Section 4.2). Section 5 probes the
+ embryonic research on P2P queries; in particular, range queries
+ (Section 5.1), multi-attribute queries (Section 5.2), join queries
+ (Section 5.3), and aggregation queries (Section 5.4).
+
+
+
+
+Risson & Moors Informational [Page 5]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ The second question, "How robust are P2P search networks?" Insofar
+ as it is available in the research literature, we tease out the
+ robustness mechanisms and metrics throughout Sections 2 to 5.
+ Unfortunately, robustness is often more sensitive to low-level design
+ choices than it is to the broad P2P index structure, yet these
+ underlying design choices are seldom isolated in the primary
+ literature [248]. Furthermore, there has been little consensus on
+ P2P robustness metrics (Section 3.2). Section 8 gives
+ recommendations to address these important gaps.
+
+1.1. Related Disciplines
+
+ Peer-to-peer research draws upon numerous distributed systems
+ disciplines. Networking researchers will recognize familiar issues
+ of naming, routing, and congestion control. P2P designs need to
+ address routing and security issues across network region boundaries
+ [152]. Networking research has traditionally been host-centric. The
+ Web's Universal Resource Identifiers are naturally tied to specific
+ hosts, making object mobility a challenge [216].
+
+ P2P work is data-centric [249]. P2P systems for dynamic object
+ location and routing have borrowed heavily from the distributed
+ systems corpus. Some have used replication, erasure codes, and
+ Byzantine agreement [111]. Others have used epidemics for durable
+ peer group communication [39].
+
+ Similarly, P2P research is set to benefit from database research
+ [250]. Database researchers will recognize the need to reapply
+ Codd's principle of physical data independence, that is, to decouple
+ data indexes from the applications that use the data [23]. It was
+ the invention of appropriate indexing mechanisms and query
+ optimizations that enabled data independence. Database indexes like
+ B+ trees have an analog in P2P's distributed hash tables (DHTs).
+ Wide-area, P2P query optimization is a ripe, but challenging, area
+ for innovation.
+
+ More flexible distribution of objects comes with increased security
+ risks. There are opportunities for security researchers to deliver
+ new methods for availability, file authenticity, anonymity, and
+ access control [25]. Proactive and reactive mechanisms are needed to
+ deal with large numbers of autonomous, distributed peers. To build
+ robust systems from cooperating but self-interested peers, issues of
+ identity, reputation, trust, and incentives need to be tackled.
+ Although it is beyond the scope of this paper, robustness against
+ malicious attacks also ought to be addressed [195].
+
+ Possibly the largest portion of P2P research has majored on basic
+ routing structures [18], where research on algorithms comes to the
+
+
+
+Risson & Moors Informational [Page 6]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ fore. Should the overlay be "structured" or "unstructured"? Are the
+ two approaches competing or complementary? Comparisons of the
+ "structured" approaches (hypercubes, rings, toroids, butterflies, de
+ Bruijn, and skip graphs) have weighed the amount of routing state per
+ peer and the number of links per peer against overlay hop counts.
+ While "unstructured" overlays initially used blind flooding and
+ random walks, overheads usually trigger some structure, for example,
+ super-peers and clusters.
+
+ P2P applications rely on cooperation between these disciplines.
+ Applications have included file sharing, directories, content
+ delivery networks, email, distributed computation, publish-subscribe
+ middleware, multicasting, and distributed authentication. Which
+ applications will be suited to which structures? Are there adaptable
+ mechanisms that can decouple applications from the underlying data
+ structures? What are the criteria for selection of applications
+ amenable to a P2P design [1]?
+
+ Robustness is emphasized throughout the survey. We are particularly
+ interested in two aspects. The first, dependability, was a leading
+ design goal for the original Internet [251]. It deserves the same
+ status in P2P. The measures of dependability are well established:
+ reliability, a measure of the mean-time-to-failure (MTTF);
+ availability, a measure of both the MTTF and the mean-time-to-repair
+ (MTTR); maintainability; and safety [252]. The second aspect is the
+ ability to accommodate variation in outcome, which one could call
+ adaptability. Its measures have yet to be defined. In the context
+ of the Internet, it was only recently acknowledged as a first-class
+ requirement [253]. In P2P, it means planning for the tussles over
+ resources and identity. It means handling different kinds of queries
+ and accommodating changeable application requirements with minimal
+ intervention. It means "organic scaling" [22], whereby the system
+ grows gracefully, without a priori data center costs or architectural
+ breakpoints.
+
+ In the following section, we discuss one notable omission from the
+ taxonomy of P2P networking in Figure 1 -- routing.
+
+1.2. Structured and Unstructured Routing
+
+ P2P routing algorithms have been classified as "structured" or
+ "unstructured". Peers in unstructured overlay networks join by
+ connecting to any existing peers [254]. In structured overlays, the
+ identifier of the joining peer determines the set of peers that it
+ connects to [254]. Early instantiations of Gnutella were
+ unstructured -- keyword queries were flooded widely [255]. Napster
+ [256] had decentralized content and a centralized index, so it only
+ partially satisfies the distributed control criteria for P2P systems.
+
+
+
+Risson & Moors Informational [Page 7]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ Early structured algorithms included Plaxton, Rajaraman and Richa
+ (PRR) [30], Pastry [2], Tapestry [31], Chord [7], and the Content
+ Addressable Network [8]. Mishchke and Stiller recently classified
+ P2P systems by the presence or absence of structure in routing tables
+ and network topology [257].
+
+ Some have cast unstructured and structured algorithms as competing
+ alternatives. Unstructured approaches have been called "first
+ generation", implicitly inferior to the "second generation"
+ structured algorithms [2, 31]. When generic key lookups are
+ required, these structured, key-based routing schemes can guarantee
+ location of a target within a bounded number of hops [23]. The
+ broadcasting unstructured approaches, however, may have large routing
+ costs, or fail to find available content [22]. Despite the apparent
+ advantages of structured P2P, several research groups are still
+ pursuing unstructured P2P.
+
+ There have been two main criticisms of structured systems [61]. The
+ first relates to peer transience, which in turn, affects robustness.
+ Chawathe, et al. opined that highly transient peers are not well
+ supported by DHTs [61]. P2P systems often exhibit "churn", with
+ peers continually arriving and departing. One objection to concerns
+ about highly transient peers is that many applications use peers in
+ well-connected parts of the network. The Tapestry authors analyzed
+ the impact of churn in a network of 1000 nodes [31]. Others opined
+ that it is possible to maintain a robust DHT at relatively low cost
+ [258]. Very few papers have quantitatively compared the resilience
+ of structured systems. Loguinov, Kumar, et al. claimed that there
+ were only two such works [24, 36].
+
+ The second criticism of structured systems is that they do not
+ support keyword searches and complex queries as well as unstructured
+ systems. Given the current file-sharing deployments, keyword
+ searches seem more important than exact-match key searches in the
+ short term. Paraphrased, "most queries are for hay, not needles"
+ [61].
+
+ More recently, some have justifiably seen unstructured and structured
+ proposals as complementary, and have devised hybrid models [259].
+ Their starting point was the observation that unstructured flooding
+ or random walks are inefficient for data that is not highly
+ replicated across the P2P network. Structured graphs can find keys
+ efficiently, irrespective of replication. Castro, et al. proposed
+ Structella, a hybrid of Gnutella built on top of Pastry [259].
+ Another design used structured search for rare items and unstructured
+ search for massively replicated items [54].
+
+
+
+
+
+Risson & Moors Informational [Page 8]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ However, the "structured versus unstructured routing" taxonomy is
+ becoming less useful, for two reasons, Firstly, most "unstructured"
+ proposals have evolved and incorporated structure. Consider the
+ classic "unstructured" system, Gnutella [4]. For scalability, its
+ peers are either ultrapeers or leaf nodes. This hierarchy is
+ augmented with a query routing protocol whereby ultrapeers receive a
+ hashed summary of the resource names available at leaf nodes.
+ Between ultrapeers, simple query broadcast is still used, though
+ methods to reduce the query load here have been considered [260].
+ Secondly, there are emerging schema-based P2P designs [59], with
+ super-node hierarchies and structure within documents. These are
+ quite distinct from the structured DHT proposals.
+
+1.3. Indexes and Queries
+
+ Given that most, if not all, P2P designs today assume some structure,
+ a more instructive taxonomy would describe the structure. In this
+ survey, we use a database taxonomy in lieu of the networking
+ taxonomy, as suggested by Hellerstein, Cooper, and Garcia-Molina [23,
+ 261]. The structure is determined by the type of index (Sections 2 ,
+ 3, and 4). Queries feature in lieu of routing (Section 5). The DHT
+ algorithms implement a "semantic-free index" [216]. They are
+ oblivious of whether keys represent document titles, meta-data, or
+ text. Gnutella-like and schema-based proposals have a "semantic
+ index".
+
+ Index engineering is at the heart of P2P search methods. It captures
+ a broad range of P2P issues, as demonstrated by the Search/Index
+ Links model [261]. As Manber put it, "the most important of the
+ tools for information retrieval is the index -- a collection of terms
+ with pointers to places where information about documents can be
+ found" [262]. Sen and Wang noted that a "P2P network" usually
+ consists of connections between hosts for application-layer
+ signaling, rather than for the data transfer itself [263].
+ Similarly, we concentrate on the "signaled" indexes and queries.
+
+ Our focus here is the dependability and adaptability of the search
+ network. Static dependability is a measure of how well queries route
+ around failures in a network that is normally fault-free. Dynamic
+ dependability gives an indication of query success when nodes and
+ data are continually joining and leaving the P2P system. An
+ adaptable index accommodates change in the data and query
+ distribution. It enables data independence, in that it facilitates
+ changes to the data layout without requiring changes to the
+ applications that use the data [23]. An adaptable P2P system can
+ support rich queries for a wide range of applications. Some
+ applications benefit from simple, semantic-free key lookups [264].
+ Others require more complex, Structured Query Language (SQL)-like
+
+
+
+Risson & Moors Informational [Page 9]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ queries to find documents with multiple keywords, or to aggregate or
+ join query results from distributed relations [22].
+
+2. Index Types
+
+ A P2P index can be local, centralized, or distributed. With a local
+ index, a peer only keeps the references to its own data, and does not
+ receive references for data at other nodes. The very early Gnutella
+ design epitomized the local index (Section 2.1). In a centralized
+ index, a single server keeps references to data on many peers. The
+ classic example is Napster (Section 2.2). With distributed indexes,
+ pointers towards the target reside at several nodes. One very early
+ example is Freenet (Section 2.3). Distributed indexes are used in
+ most P2P designs nowadays -- they dominate this survey.
+
+ P2P indexes can also be classified as non-forwarding and forwarding.
+ When queries are guided by a non-forwarding index, they jump to the
+ node containing the target data in a single hop. There have been
+ semantic and semantic-free one-hop schemes [138, 265, 266]. Where
+ scalability to a massive number of peers is required, these schemes
+ have been extended to two hops [267, 268]. More common are the
+ forwarding P2Ps, where the number of hops varies with the total
+ number of peers, often logarithmically. The related trade-offs
+ between routing state, lookup latency, update bandwidth, and peer
+ churn are critical to total system dependability.
+
+2.1. Local Index (Gnutella)
+
+ P2Ps with a purely local data index are becoming rare. In such
+ designs, peers flood queries widely and only index their own content.
+ They enable rich queries - the search is not limited to a simple key
+ lookup. However, they also generate a large volume of query traffic
+ with no guarantee that a match will be found, even if it does exist
+ on the network. For example, to find potential peers on the early
+ instantiations of Gnutella, 'ping' messages were broadcast over the
+ P2P network and the 'pong' responses were used to build the node
+ index. Then, small 'query' messages, each with a list of keywords,
+ are broadcast to peers that respond with matching filenames [4].
+
+ There have been numerous attempts to improve the scalability of
+ local-index P2P networks. Gnutella uses fixed time-to-live (TTL)
+ rings, where the query's TTL is set less than 7-10 hops [4]. Small
+ TTLs reduce the network traffic and the load on peers, but also
+ reduce the chances of a successful query hit. One paper reported,
+ perhaps a little too bluntly, that the fixed "TTL-based mechanism
+ does not work" [67]. To address this TTL selection problem, they
+ proposed an expanding ring, known elsewhere as iterative deepening
+ [29]. It uses successively larger TTL counters until there is a
+
+
+
+Risson & Moors Informational [Page 10]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ match. The flooding, ring, and expanding ring methods all increase
+ network load with duplicated query messages. A random walk, whereby
+ an unduplicated query wanders about the network, does indeed reduce
+ the network load but massively increases the search latency. One
+ solution is to replicate the query k times at each peer. Called
+ random k-walkers, this technique can be coupled with TTL limits, or
+ periodic checks with the query originator, to cap the query load
+ [67]. Adamic, Lukose, et al. suggested that the random walk searches
+ be directed to nodes with a higher degree, that is, with larger
+ numbers of inter-peer connections [269]. They assumed that higher-
+ degree peers are also capable of higher query throughputs. However,
+ without some balancing design rule, such peers would be swamped with
+ the entire P2P signaling traffic. In addition to the above
+ approaches, there is the 'directed breadth-first' algorithm [29]. It
+ forwards queries within a subset of peers selected according to
+ heuristics on previous performance, like the number of successful
+ query results. Another algorithm, called probabilistic flooding, has
+ been modeled using percolation theory [270].
+
+ Several measurement studies have investigated locally indexed P2Ps.
+ Jovanovic noted Gnutella's power law behaviour [70]. Sen and Wang
+ compared the performance of Gnutella, Fasttrack [271], and Direct
+ Connect [263, 272, 273]. At the time, only Gnutella used local data
+ indexes. All three schemes now use distributed data indexes, with
+ hierarchy in the form of Ultrapeers (Gnutella), Super-Nodes
+ FastTrack), and Hubs (Direct Connect). It was found that a very
+ small percentage of peers have a very high degree and that the total
+ system dependability is at the mercy of such peers. While peer up-
+ time and bandwidth were heavy-tailed, they did not fit well with the
+ Zipf distribution. Fortunately for Internet Service Providers,
+ measures aggregated by IP prefix and Autonomous System (AS) were more
+ stable than for individual IP addresses. A study of University of
+ Washington traffic found that Gnutella and Kazaa together contributed
+ 43% of the university's total TCP traffic [274]. They also reported
+ a heavy-tailed distribution, with 600 external peers (out of 281,026)
+ delivering 26% of Kazaa bytes to internal peers. Furthermore,
+ objects retrieved from the P2P network were typically three orders of
+ magnitude larger than Web objects -- 300 objects contributed to
+ almost half the total outbound Kazaa bandwidth. Others reported
+ Gnutella's topology mismatch, whereby only 2-5% of P2P connections
+ link peers in the same Autonomous System (AS), despite over 40% of
+ peers being in the top 10 ASs [65]. Together these studies
+ underscore the significance of multimedia sharing applications. They
+ motivate interesting caching and locality solutions to the topology
+ mismatch problem.
+
+ These same studies bear out one main dependability lesson: total
+ system dependability may be sensitive to the dependability of high-
+
+
+
+Risson & Moors Informational [Page 11]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ degree peers. The designers of Scamp translated this observation to
+ the design heuristic, "have the degree of each node be of nearly
+ equal size" [153]. They analyzed a system of N peers, with mean
+ degree c.log(n), where link failures occur independently with
+ probability e. If d>0 is fixed and c>(1+d)/(-log(e)), then the
+ probability of graph disconnection goes to zero as N->infinity.
+ Otherwise, if c<(1-d)/(-log(e)), then the probability of
+ disconnection goes to one as N->infinity. They presented a
+ localizer, which finds approximate minima to a global function of
+ peer degree and arbitrary link costs using only local information.
+ The Scamp overlay construction algorithms could support any of the
+ flooding and walking routing schemes above, or other epidemic and
+ multicasting schemes for that matter. Resilience to high churn rates
+ was identified for future study.
+
+2.2. Central Index (Napster)
+
+ Centralized schemes like Napster [256] are significant because they
+ were the first to demonstrate the P2P scalability that comes from
+ separating the data index from the data itself. Ultimately, 36
+ million Napster users lost their service not because of technical
+ failure, but because the single administration was vulnerable to the
+ legal challenges of record companies [275].
+
+ There has since been little research on P2P systems with central data
+ indexes. Such systems have also been called 'hybrid' since the index
+ is centralized but the data is distributed. Yang and Garcia-Molina
+ devised a four-way classification of hybrid systems [276]: unchained
+ servers, where users whose index is on one server do not see other
+ servers' indexes; chained servers, where the server that receives a
+ query forwards it to a list of servers if it does not own the index
+ itself; full replication, where all centralized servers keep a
+ complete index of all available metadata; and hashing, where keywords
+ are hashed to the server where the associated inverted list is kept.
+ The unchained architecture was used by Napster, but it has the
+ disadvantage that users do not see all indexed data in the system.
+ Strictly speaking, the other three options illustrate the distributed
+ data index, not the central index. The chained architecture was
+ recommended as the optimum for the music-swapping application at the
+ time. The methods by which clients update the central index were
+ classified as batch or incremental, with the optimum determined by
+ the query-to-login ratio. Measurements were derived from a clone of
+ Napster called OpenNap[277]. Another study of live Napster data
+ reported wide variation in the availability of peers, a general
+ unwillingness to share files (20-40% of peers share few or no files),
+ and a common understatement of available bandwidth so as to
+ discourage other peers from sharing one's link [202].
+
+
+
+
+Risson & Moors Informational [Page 12]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ Influenced by Napster's early demise, the P2P research community may
+ have prematurely turned its back on centralized architectures.
+ Chawathe, Ratnasamy, et al. opined that Google and Yahoo demonstrate
+ the viability of a centralized index. They argued that "the real
+ barriers to Napster-like designs are not technical but legal and
+ financial" [61]. Even this view may be a little too harsh on the
+ centralized architectures -- it implies that they always have an up-
+ front capital hurdle that is steeper than for distributed
+ architectures. The closer one looks at scalable 'centralized'
+ architectures, the less the distinction with 'distributed'
+ architectures seems to matter. For example, it is clear that
+ Google's designers consider Google a distributed, not centralized,
+ file system [278]. Google demonstrates the scale and performance
+ possible on commodity hardware, but still has a centralized master
+ that is critical to the operation of each Google cluster. Time may
+ prove that the value of emerging P2P networks, regardless of the
+ centralized-versus-distributed classification, is that they smooth
+ the capital outlays and remove the single points of failure across
+ the spectra of scale and geographic distribution.
+
+2.3. Distributed Index (Freenet)
+
+ An important early P2P proposal for a distributed index was Freenet
+ [5, 71, 279]. While its primary emphasis was the anonymity of peers,
+ it did introduce a novel indexing scheme. Files are identified by
+ low-level "content-hash" keys and by "secure signed-subspace" keys,
+ which ensure that only a file owner can write to a file while anyone
+ can read from it. To find a file, the requesting peer first checks
+ its local table for the node with keys closest to the target. When
+ that node receives the query, it too checks for either a match or
+ another node with keys close to the target. Eventually, the query
+ either finds the target or exceeds time-to-live (TTL) limits. The
+ query response traverses the successful query path in reverse,
+ depositing a new routing table entry (the requested key and the data
+ holder) at each peer. The insert message similarly steps towards the
+ target node, updating routing table entries as it goes, and finally
+ stores the file there. Whereas early versions of Gnutella used
+ breadth-first flooding, Freenet uses a more economic depth-first
+ search [280].
+
+ An initial assessment has been done of Freenet's robustness. It was
+ shown that in a network of 1000 nodes, the median query path length
+ stayed under 20 hops for a failure of 30% of nodes. While the
+ Freenet designers considered this as evidence that the system is
+ "surprisingly robust against quite large failures" [71], the same
+ datapoint may well be outside meaningful operating bounds. How many
+ applications are useful when the first quartile of queries have path
+ lengths of several hundred hops in a network of only 1000 nodes, per
+
+
+
+Risson & Moors Informational [Page 13]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ Figure 4 of [71]? To date, there has been no analysis of Freenet's
+ dynamic robustness. For example, how does it perform when nodes are
+ continually arriving and departing?
+
+ There have been both criticisms and extensions of the early Freenet
+ work. Gnutella proponents acknowledged the merit in Freenet's
+ avoidance of query broadcasting [281]. However, they are critical on
+ two counts: the exact file name is needed to construct a query; and
+ exactly one match is returned for each query. P2P designs using
+ DHTs, per Section 3, share similar characteristics -- a precise query
+ yields a precise response. The similarity is not surprising since
+ Freenet also uses a hash function to generate keys. However, the
+ query routing used in the DHTs has firmer theoretical foundations.
+ Another difference with DHTs is that Freenet will take time, when a
+ new node joins the network, to build an index that facilitates
+ efficient query routing. By the inventor's own admission, this is
+ damaging for a user's first impressions [282]. It was proposed to
+ download a copy of routing tables from seed nodes at startup, even
+ though the new node might be far from the seed node. Freenet's slow
+ startup motivated Mache, Gilbert, et al. to amend the overlay after
+ failed requests and to place additional index entries on successful
+ requests -- they claim almost an order of magnitude reduction in
+ average query path length [280]. Clarke also highlighted the lack of
+ locality or bandwidth information available for efficient query
+ routing decisions [282]. He proposed that each node gather response
+ times, connection times, and proportion of successful requests for
+ each entry in the query routing table. When searching for a key that
+ is not in its own routing table, it was proposed to estimate response
+ times from the routing metrics for the nearest known keys and
+ consequently choose the node that can retrieve the data fastest. The
+ response time heuristic assumed that nodes close in the key space
+ have similar response times. This assumption stemmed from early
+ deployment observations that Freenet peers seemed to specialize in
+ parts of the keyspace -- it has not been justified analytically.
+ Kronfol drew attention to Freenet's inability to do keyword searches
+ [283]. He suggested that peers cache lists of weighted keywords in
+ order to route queries to documents, using Term Frequency Inverse
+ Document Frequency (TFIDF) measures and inverted indexes (Section
+ 4.2.1). With these methods, a peer can route queries for simple
+ keyword lists or more complicated conjunctions and disjunctions of
+ keywords. Robustness analysis and simulation of Kronfol's proposal
+ remain open.
+
+ The vast majority of P2P proposals in following sections rely on a
+ distributed index.
+
+
+
+
+
+
+Risson & Moors Informational [Page 14]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+3. Semantic Free Index
+
+ Many of today's distributed network indexes are semantic. The
+ semantic index is human-readable. For example, it might associate
+ information with other keywords, a document, a database key, or even
+ an administrative domain. It makes it easy to associate objects with
+ particular network providers, companies, or organizations, as
+ evidenced in the Domain Name System (DNS). However, it can also
+ trigger legal tussles and frustrate content replication and migration
+ [216].
+
+ Distributed Hash Tables (DHTs) have been proposed to provide
+ semantic-free, data-centric references. DHTs enable one to find an
+ object's persistent key in a very large, changing set of hosts. They
+ are typically designed for [23]:
+
+ a) low degree. If each node keeps routing information for only a
+ small number of other nodes, the impact of high node arrival and
+ departure rates is contained;
+
+ b) low hop count. The hops and delay introduced by the extra
+ indirection are minimized;
+
+ c) greedy routing. Nodes independently calculate a short path to the
+ target. At each hop, the query moves closer to the target; and
+
+ d) robustness. A path to the target can be found even when links or
+ nodes fail.
+
+3.1. Origins
+
+ To understand the origins of recent DHTs, one needs to look to three
+ contributions from the 1990s. The first two -- Plaxton, Rajaraman,
+ and Richa (PRR) [30] and Consistent Hashing [49] -- were published
+ within one month of each other. The third, the Scalable Distributed
+ Data Structure (SDDS) [52], was curiously ignored in significant
+ structured P2P designs despite having some similar goals [2, 6, 7].
+ It has been briefly referenced in other P2P papers [46, 284-287].
+
+3.1.1. Plaxton, Rajaraman, and Richa (PRR)
+
+ PRR is the most recent of the three. It influenced the designs of
+ Pastry [2], Tapestry [6], and Chord [7]. The value of PRR is that it
+ can locate objects using fixed-length routing tables [6]. Objects
+ and nodes are assigned a semantic-free address, for example a 160-bit
+ key. Every node is effectively the root of a spanning tree. A
+ message routes toward an object by matching longer address suffixes,
+ until it encounters either the object's root node or another node
+
+
+
+Risson & Moors Informational [Page 15]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ with a 'nearby' copy. It can route around link and node failure by
+ matching nodes with a related suffix. The scheme has several
+ disadvantages [6]: global knowledge is needed to construct the
+ overlay; an object's root node is a single point of failure; nodes
+ cannot be inserted and deleted; and there is no mechanism for queries
+ to avoid congestion hot spots.
+
+3.1.2. Consistent Hashing
+
+ Consistent Hashing [288] strongly influenced the designs of Chord [7]
+ and Koorde [37]. Karger, et al. introduced Consistent Hashing in the
+ context of the Web-caching problem [49]. Web servers could
+ conceivably use standard hashing to place objects across a network of
+ caches. Clients could use the approach to find the objects. For
+ normal hashing, most object references would be moved when caches are
+ added or deleted. On the other hand, Consistent Hashing is "smooth"
+ -- when caches are added or deleted, the minimum number of object
+ references move so as to maintain load balancing. Consistent Hashing
+ also ensures that the total number of caches responsible for a
+ particular object is limited. Whereas Litwin's Linear Hashing (LH*)
+ scheme requires 'buckets' to be added one at a time in sequence [50],
+ Consistent Hashing allows them to be added in any order [49]. There
+ is an open Consistent Hashing problem pertaining to the fraction of
+ items moved when a node is inserted [165]. Extended Consistent
+ Hashing was recently proposed to randomize queries over the spread of
+ caches to significantly reduce the load variance [289].
+ Interestingly, Karger [49] referred to an older DHT algorithm by
+ Devine that used "a novel autonomous location discovery algorithm
+ that learns the buckets' locations instead of using a centralized
+ directory" [51].
+
+3.1.3. Scalable Distributed Data Structures (LH*)
+
+ In turn, Devine's primary point of reference was Litwin's work on
+ SDDSs and the associated LH* algorithm [52]. An SDDS satisfies three
+ design requirements: files grow to new servers only when existing
+ servers are well loaded; there is no centralized directory; and the
+ basic operations like insert, search, and split never require atomic
+ updates to multiple clients. Honicky and Miller suggested the first
+ requirement could be considered a limitation since expansion to new
+ servers is not under administrative control [286]. Litwin recently
+ noted numerous similarities and differences between LH* and Chord
+ [290]. He found that both implement key search. Although LH* refers
+ to clients and servers, nodes can operate as peers in both. Chord
+ 'splits' nodes when a new node is inserted, while LH* schedules
+ 'splits' to avoid overload. Chord requests travel O(log n) hops,
+ while LH* client requests need, at most, two hops to find the target.
+ Chord stores a small number of 'fingers' at each node. LH* servers
+
+
+
+Risson & Moors Informational [Page 16]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ store N/2 to N addresses while LH* clients store 1 to N addresses.
+ This trade-off between hop count and the size of the index affects
+ system robustness, and bears striking similarity to recent one- and
+ two-hop P2P schemes in Section 2. The arrival and departure of LH*
+ clients does not disrupt LH* server metadata at all. Given the size
+ of the index, the arrival and departure of LH* servers are likely to
+ cause more churn than that of Chord nodes. Unlike Chord, LH* has a
+ single point of failure, the split coordinator. It can be
+ replicated. Alternatively, it can be removed in later LH* variants,
+ though details have not been progressed for lack of practical need
+ [290].
+
+3.2. Dependability
+
+ We make four overall observations about their dependability.
+ Dependability metrics fall into two categories: static dependability,
+ a measure of performance before recovery mechanisms take over; and
+ dynamic dependability, for the most likely case in massive networks
+ where there is continual failure and recovery ("churn").
+
+3.2.1. Static Dependability
+
+ Observation A: Static dependability comparisons show that no O(log n)
+ DHT geometry is significantly more dependable than the other O(log n)
+ geometries.
+
+ Gummadi, et al. compared the tree, hypercube, butterfly, ring, XOR,
+ and hybrid geometries. In such geometries, nodes generally know
+ about O(log n) neighbors and route to a destination in O(log n) hops,
+ where N is the number of nodes in the overlay. Gummadi, et al. asked
+ "Why not the ring?" They concluded that only the ring and XOR
+ geometries permit flexible choice of both neighbors and alternative
+ routes [24]. Loguinov, et al. added the de Bruijn graph to their
+ comparison [36]. They concluded that the classical analyses, for
+ example the probability that a particular node becomes disconnected,
+ yield no major differences between the resilience of Chord, CAN, and
+ de Bruijn graphs. Using bisection width (the minimum edge count
+ between two equal partitions) and path overlap (the likelihood that
+ backup paths will encounter the same failed nodes or links as the
+ primary path), they argued for the superior resilience of the de
+ Bruijn graph. In short, ring, XOR, and de Bruijn graphs all permit
+ flexible choice of alternative paths, but only in de Bruijn are the
+ alternate paths independent of each other [36].
+
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 17]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+3.2.2. Dynamic Dependability
+
+ Observation B: Dynamic dependability comparisons show that DHT
+ dependability is sensitive to the underlying topology maintenance
+ algorithms.
+
+ Li, et al. give the best comparison to date of several leading DHTs
+ during churn [291]. They relate the disparate configuration
+ parameters of Tapestry, Chord, Kademlia, Kelips, and OneHop to
+ fundamental design choices. For each of these DHTs, they plotted the
+ optimal performance in terms of lookup latency (milliseconds) and
+ fraction of failed lookups. The results led to several important
+ insights about the underlying algorithms, for example: increasing
+ routing table size is more cost-effective than increasing the rate of
+ periodic stabilization; learning about new nodes during the lookup
+ process sometimes eliminates the need for stabilization; and parallel
+ lookups reduce latency due to timeouts more effectively than faster
+ stabilization. Similarly, Zhuang, et al. compared keep-alive
+ algorithms for DHT failure detection [292]. Such algorithmic
+ comparisons can significantly improve the dependability of DHT
+ designs.
+
+ In Figure 2, we propose a taxonomy for the topology maintenance
+ algorithms that influence dependability. The algorithms can be
+ classified by how nodes join and leave, how they first detect
+ failures, how they share information about topology updates, and how
+ they react when they receive information about topology updates.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 18]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ Normal Updates
+ Joins (passive; active) [293]
+ Leaves (passive; active) [293]
+
+ Fault Detection [292]
+ Maintenance
+ Proactive (periodic or keep-alive probes)
+ Reactive (correction-on-use, correction-on-failure) [294]
+ Report
+ Negative (all dead nodes, nodes recently failed)
+ Positive (all live nodes; nodes recently recovered) [292]
+
+ Topology Sharing: yes/ no [292]
+ Multicast Tree (explicit, implicit) [267, 295]
+ Gossip (timeouts; number of contacts) [39]
+
+ Corrective Action
+ Routing
+ Rerouting actions
+ (reroute once; route in parallel [291]; reject)
+ Routing timeouts
+ (TCP-style, virtual coordinates) [296]
+ Topology
+ Update action (evict/ replace/ tag node)
+ Update timeliness (immediate, periodic[296], delayed [297])
+
+ Figure 2: Topology Maintenance in Distributed Hash Tables
+
+3.2.3. Ephemeral or Stable Nodes -- O(log n) or O(1) Hops
+
+ Observation C: Most DHTs use O(log n) geometries to suit ephemeral
+ nodes. The O(1) hop DHTs suit stable nodes and deserve more research
+ attention.
+
+ Most of the DHTs in Section 3.5 assume that nodes are ephemeral, with
+ expected lifetimes of one to two hours. Therefore, they mostly use
+ an O(log n) geometry. The common assumption is that maintenance of
+ full routing tables in the O(1) hop DHTs will consume excessive
+ bandwidth when nodes are continually joining and leaving. The
+ corollary is that, when they run on stable infrastructure servers
+ [298], most of the DHTs in Section 3.5 are less than optimal --
+ lookups take many more hops than necessary, wasting latency and
+ bandwidth budgets. The O(1) hop DHTs suit stable deployments and
+ high lookup rates. For a churning 1024-node network, Li, et al.
+ concluded that OneHop is superior to Chord, Tapestry, Kademlia, and
+ Kelips in terms of latency and lookup success rate [291]. For a
+ 3000-node network, they concluded that "OneHop is only preferable to
+ Chord when the deployment scenario allows a communication cost
+
+
+
+Risson & Moors Informational [Page 19]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ greater than 20 bytes per node per second" [291]. This apparent
+ limitation needs to be put in context. They assumed that each node
+ issues only one lookup every 10 minutes and has a lifetime of only 60
+ minutes. It seems reasonable to expect that in some deployments,
+ nodes will have a lifetime of weeks or more, a maintenance bandwidth
+ of tens of kilobits per second, and a load of hundreds of lookups per
+ second. O(1) hop DHTs are superior in such situations. OneHop can
+ scale at least to many tens of thousands of nodes [267]. The recent
+ O(1) hop designs [267, 295] are vastly outnumbered by the O(log n)
+ DHTs in Section 3.5. Research on the algorithms of Figure 2 will
+ also yield improvements in the dependability of the O(1) hop DHTs.
+
+3.2.4. Simulation and Proof
+
+ Observation D: Although not yet a mature science, the study of DHT
+ dependability is helped by recent simulation and formal development
+ tools.
+
+ While there are recent reference architectures [294, 298], much of
+ the DHT literature in Section 3.5 does not lend itself to repeatable,
+ comparative studies. The best comparative work to date [291] relies
+ on the Peer-to-Peer Simulator (P2PSIM) [299]. At the time of
+ writing, it supports more DHT geometries than any other simulator.
+ As the study of DHTs matures, we can expect to see the simulation
+ emphasis shift from geometric comparison to a comparison of the
+ algorithms of Figure 2.
+
+ P2P correctness proofs generally rely on less-than-complete formal
+ specifications of system invariants and events [7, 45, 300]. Li and
+ Plaxton expressed concern that "when many joins and leaves happen
+ concurrently, it is not clear whether the neighbor tables will remain
+ in a 'good' state" [47]. While acknowledging that guaranteeing
+ consistency in a failure-prone network is impossible, Lynch, Malkhi,
+ et al. sketched amendments to the Chord algorithm to guarantee
+ atomicity [301]. More recently, Gilbert, Lynch, et al. gave a new
+ algorithm for atomic read/write memory in a churning distributed
+ network, suggesting it to be a good match for P2P [302]. Lynch and
+ Stoica show in an enhancement to Chord that lookups are provably
+ correct when there is a limited rate of joins and failures [303].
+ Fault Tolerant Active Rings is a protocol for active joins and leaves
+ that was formally specified and proven using B-method tools [304]. A
+ good starting point for a formal DHT development would be the
+ numerous informal API specifications [22, 305, 306]. Such work could
+ be informed by other efforts to formally specify routing invariants
+ [307, 308].
+
+
+
+
+
+
+Risson & Moors Informational [Page 20]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+3.3. Latency
+
+ The key metrics for DHT latency are:
+
+ 1) Shortest-Path Distance and Diameter. In graph theory, the
+ shortest-path distance is the minimum number of edges in any path
+ between two vertices of the graph. Diameter is the largest of all
+ shortest-path distances in a graph [309]. Networking synonyms for
+ distance on a DHT are "hop count" and "lookup length".
+
+ 2) Latency and Latency Stretch. Two types of latency are relevant
+ here -- network-layer latency and overlay latency. Network-layer
+ latency has been referred to as "proximity" or "locality" [24].
+ Stretch is the cost of an overlay path between two nodes, divided
+ by the cost of the direct network path between those nodes [310].
+ Latency stretch is also known as the "relative delay penalty"
+ [311].
+
+3.3.1. Hop Count and the O(1)-Hop DHTs
+
+ Hop count gives an approximate indication of path latency. O(1)-hop
+ DHTs have path latencies lower than the O(log n)-hop DHTs [291].
+ This significant advantage is often overlooked on account of concern
+ about the messaging costs to maintain large routing tables (Section
+ 3.2.3). Such concern is justified when the mean node lifetime is
+ only a few hours and the mean lookup interval per node is more than a
+ few seconds (the classic profile of a P2P file-sharing node).
+ However, for a large, practical operating range (node lifetimes of
+ days or more, lookup rates of over tens of lookups per second per
+ node, up to ~100,000 nodes), the total messaging cost in O(1) hop
+ DHTs is lower than in O(log n) DHTs [312]. Lookups and routing table
+ maintenance contribute to the total messaging cost. If a deployment
+ fits this operating range, then O(1)-hop DHTs will give lower path
+ latencies and lower total messaging costs. An additional merit of
+ the O(1)-hop DHTs is that they yield lower lookup failure rates than
+ their O(log N)-hop counterparts [291].
+
+ Low hop count can be achieved in two ways: each node has a large O(N)
+ index of nodes; or the object references can be replicated on many
+ nodes. Beehive [313], Kelips [39], LAND [310], and Tulip [314] are
+ examples of the latter category. Beehive achieves O(1) hops on
+ average and O(log n) hops in the worst case, by proactive replication
+ of popular objects. Kelips replicates the 'file index'. It incurs
+ O(sqrt(N)) storage costs for both the node index and the file index.
+ LAND uses O(log n) reference pointers for each stored object and an
+ O(log n) index to achieve a worst-case 1+e stretch, where 0<e. The
+ Kelips-like Tulip [314] requires 2 hops per lookup. Each node
+
+
+
+
+Risson & Moors Informational [Page 21]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ maintains 2sqrt(N)log(N) links to other nodes and objects are
+ replicated on O(sqrt(N)) nodes.
+
+ The DHTs with a large O(N) node index can be divided into two groups:
+ those for which the index is always O(N); and those for which the
+ index opportunistically ranges from O(log n) to O(N). Linear Hashing
+ (LH*) servers [52], OneHop [267], and 1h-Calot [295] fall into the
+ former category. EpiChord [315] and Accordion [316] are examples of
+ the latter.
+
+3.3.2. Proximity and the O(log n)-Hop DHTs
+
+ If one chooses not to use single-hop DHTs, hop count is a weak
+ indicator of end-to-end path latency. Some hops may incur large
+ delays because of intercontinental or satellite links. Consequently,
+ numerous DHT designs minimize path latency by considering the
+ proximity of nodes. Gummadi, et al. classified the proximity methods
+ as follows [24]:
+
+ 1) Proximity Neighbor Selection (PNS). The nodes in the routing
+ table are chosen based on the latency of the direct hop to those
+ nodes. The latency may be explicitly measured [317], or it may be
+ estimated using one of several synthetic coordinate systems [150,
+ 154, 318]. As a lower bound on PNS performance, Dabek, et al.
+ showed that lookups on O(log n) DHTs take at least 1.5 times the
+ average roundtrip time of the underlying network [154].
+
+ 2) Proximity Route Selection (PRS). At lookup time, the choice of
+ the next-hop node relies on the latency of the direct hop to that
+ node. PRS is less effective than PNS, though it may complement it
+ [24]. Some of the routing geometries in Section 3.5 do not
+ support PNS and/or PRS [24].
+
+ 3) Proximity Identifier Selection (PIS). Node identifiers indicate
+ geographic position. PIS frustrates load balancing, increases the
+ risk of correlated failures, and is not often used [24].
+
+ The proximity study by Gummadi, et al. assumed recursive routing,
+ though they suggested that PNS would also be superior to PRS with
+ iterative routing [24]. Dabek, et al. found that recursive lookups
+ take 0.6 times as long as iterative lookups [150].
+
+ Beyond the explicit use of proximity information, redundancy can help
+ to avoid slow paths and servers. One may increase the number of
+ replicas [150], use parallel lookups [291, 316], use alternate routes
+ on failure [150], or use multiple gateway nodes to enter the DHT
+ [317].
+
+
+
+
+Risson & Moors Informational [Page 22]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+3.4. Multicasting
+
+3.4.1. Multicasting vs. Broadcasting
+
+ "Multicasting" here means sending a message to a subset of an
+ overlay's nodes. Nodes explicitly join and leave this subset, called
+ a "multicast group". "Broadcasting" here is a special case of
+ multicasting in which a message is sent to all nodes in the overlay.
+ Broadcasting relies on overlay membership messages -- it does not
+ need extra group membership messaging. Castro, et al. said
+ multicasting on structured overlays is either "flooding" (one overlay
+ per group) or "tree-based" (one tree per group) [319]. These are
+ synonyms for broadcasting and multicasting respectively.
+
+ The first DHT-based designs for multicasting were CAN multicast
+ [320], Scribe [241], Bayeux [242], and i3 [231]. They were based on
+ CAN [8], Pastry [2], Tapestry [31], and Chord [7] respectively. El-
+ Ansary, et al. devised the first DHT-based broadcasting scheme [321].
+ It was based on Chord.
+
+ Multicast trees can be constructed using reverse-path forwarding or
+ forward-path forwarding. Scribe uses reverse-path forwarding [241].
+ Bayeux uses forward-path forwarding [242]. Borg, a multicast design
+ based on Pastry, uses a combination of forward-path and reverse-path
+ forwarding to minimize latency [237].
+
+3.4.2. Motivation for DHT-based Multicasting
+
+ Multicasting complements DHT search capability. DHTs naturally
+ support exact match queries. With multicasting, they can support
+ more complex queries. Multicasting also enables the dissemination
+ and collection of global information.
+
+ Consider, for example, aggregation queries like minimum, maximum,
+ count, sum, and average (Section 5.4). A node at the root of a
+ dissemination tree might multicast such a query [322]. The leaf
+ nodes return local results towards the root node. Successive parents
+ aggregate the result so that eventually the root node can compute the
+ global result. Such queries may help to monitor the capacity and
+ health of the overlay itself.
+
+ Why bother with structured overlays for multicasting? In Section
+ 2.1, we saw that Gnutella can multicast complex queries without them
+ [4]. Castro, et al. posed the question, "Should we build Gnutella on
+ a structured overlay?" [259]. While acknowledging that their study
+ was preliminary, they did conclude that "we see no reason to build
+ Gnutella on top of an unstructured overlay" [259]. The supposedly
+ high maintenance costs of structured overlays were outweighed by
+
+
+
+Risson & Moors Informational [Page 23]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ query cost savings. The structured overlay ensured that nodes were
+ only visited once during a complex query. It also helped to
+ accurately limit the total number of nodes visited. Pai, et al.
+ acknowledged that multicast trees based on structured overlays
+ contribute to simple routing rules, low delay and low delay variation
+ [323]. However, they opted for unstructured, gossip-based
+ multicasting for reliability reasons: data loss near the tree root
+ affects all subtended nodes; interior node failures must be repaired
+ quickly; interior nodes are obliged to disseminate more than their
+ fair share of traffic, giving leaf nodes a "free ride". The most
+ promising research direction is to improve on the Bimodal
+ Multicasting approach [324]. It combines the bandwidth efficiency
+ and low latency of structured, best-effort multicasting trees with
+ the reliability of unstructured gossip protocols.
+
+3.4.3. Design Issues
+
+ None of the early structured overlay multicast designs addressed all
+ of the following issues [325]:
+
+ 1) Heterogeneous Node Capacity. Nodes differ in their processing,
+ memory, and network capacity. Multicast throughput is largely
+ determined by the node with smallest throughput [325]. To limit
+ the multicasting load on a node, one might cap its out-degree. If
+ the same node receives further join requests, it refers them to
+ its children ("pushdown") [240]. Bharambe, et al. explored
+ several pushdown strategies but found them inadequate to deal with
+ heterogeneity [326]. They concluded that the heterogeneity issue
+ remains open, and should be addressed before deploying DHTs for
+ high-bandwidth multicasting applications. Independently, Zhang et
+ al. partially tackled heterogeneity by allowing nodes in their
+ CAM-Chord and CAM-Koorde designs to vary out-degree according to
+ the node's capacity [325]. However, they made no mention of the
+ "pushdown" issue -- they did not describe topology maintenance
+ when the out-degree limit is reached.
+
+ 2) Reliability (Dynamic Membership). If a multicast tree is to be
+ resilient, it must survive dynamic membership. There are several
+ ways to deal with dynamic membership: ensure that the root node of
+ the multicasting tree does not handle all requests to join or
+ leave the multicast group [242]; use multiple interior-node-
+ disjoint trees to avoid single points of failure in tree
+ structures [322]; and split the root node into several replicas
+ and partition members across them [241]. For example, Bayeux
+ requires the root node to track all group membership changes
+ whereas Scribe does not [241]. CAN-multicast uses a single,
+ well-known host to bootstrap the join operations [320]. The
+ earliest DHT-based broadcasting work by El-Ansary, et al. did not
+
+
+
+Risson & Moors Informational [Page 24]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ address the issue of dynamic membership [321]. Ghodsi, et al.
+ addressed it in a subsequent paper, though, giving two broadcast
+ algorithms that accommodate routing table inconsistencies [327].
+ One algorithm achieves a more optimal multicasting network at the
+ expense of greater correction overhead. Splitstream, based on
+ Scribe and Pastry, redundantly striped content across multiple
+ interior-node-disjoint multicast trees -- if one interior node
+ fails, then only one stripe is lost [240].
+
+ 3) Large Any-Source Multicast Groups. Any group member should be
+ allowed to send multicast messages. The group should scale to a
+ very large number of hosts. CAN-based multicast was the first
+ application-level multicast scheme to scale to groups of several
+ thousands of nodes without restricting the service model to a
+ single source [320]. Bayeux scales to large groups but has a
+ single root node for each multicast group. It supports the any-
+ source model only by having the root node operate as a reflector
+ for multiple senders [242].
+
+3.5. Routing Geometries
+
+ In Sections 3.5.1 to 3.5.6, we introduce the main geometries for
+ simple key lookup and survey their robustness mechanisms.
+
+3.5.1. Plaxton Trees (Pastry, Tapestry)
+
+ Work began in March 2000 on a structured, fault-tolerant, wide-area
+ Dynamic Object Location and Routing (DOLR) system called Tapestry [6,
+ 155]. While DHTs fix replica locations, a DOLR API enables
+ applications to control object placement [31]. Tapestry's basic
+ location and routing scheme follows Plaxton, Rajaraman, and Richa
+ (PRR) [30], but it remedies PRR's robustness shortcomings described
+ in Section 3.1. Whereas each object has one root node in PRR,
+ Tapestry uses several to avoid a single point of failure. Unlike
+ PRR, it allows nodes to be inserted and deleted. Whereas PRR
+ required a total ordering of nodes, Tapestry uses 'surrogate routing'
+ to incrementally choose root nodes. The PRR algorithm does not
+ address congestion, but Tapestry can put object copies close to nodes
+ generating high query loads. PRR nodes only know of the nearest
+ replica, whereas Tapestry nodes enable selection from a set of
+ replicas (for example, to retrieve the most up to date). To detect
+ routing faults, Tapestry uses TCP timeouts and UDP heartbeats for
+ detection, sequential secondary neighbours for rerouting, and a
+ 'second chance' window so that recovery can occur without the
+ overhead of a full node insertion. Tapestry's dependability has been
+ measured on a testbed of about 100 machines and on simulations of
+
+
+
+
+
+Risson & Moors Informational [Page 25]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ about 1000 nodes. Successful routing rates and maintenance
+ bandwidths were measured during instantaneous failures and ongoing
+ churn [31].
+
+ Pastry, like Tapestry, uses Plaxton-like prefix routing [2]. As in
+ Tapestry, Pastry nodes maintain O(log n) neighbours and route to a
+ target in O(log n) hops. Pastry differs from Tapestry only in the
+ method by which it handles network locality and replication [2].
+ Each Pastry node maintains a 'leaf set' and a 'routing table'. The
+ leaf set contains l/2 node IDs on either side of the local node ID in
+ the node ID space. The routing table, in row r, column c, points to
+ the node ID with the same r-digit prefix as the local node, but with
+ an r+1 digit of c. A Pastry node periodically probes leaf set and
+ routing table nodes, with periodicity of Tls and Trt and a timeout
+ Tout. Mahajan, Castry, et al. analyzed the reliability versus
+ maintenance cost trade-offs in terms of the parameters l, Tls, Trt,
+ and Tout [328]. They concluded that earlier concerns about excessive
+ maintenance cost in a churning P2P network were unfounded, but
+ suggested follow-up work for a wider range of reliability targets,
+ maintenance costs, and probe periods. Rhea Geels, et al. concluded
+ that existing DHTs fail at high churn rates [329]. Building on a
+ Pastry implementation from Rice University, they found that most
+ lookups fail to complete when there is excessive churn. They
+ conjectured that short-lived nodes often leave the network with
+ lookups that have not yet timed out, but no evidence was provided to
+ confirm the theory. They identified three design issues that affect
+ DHT performance under churn: reactive versus periodic recovery of
+ peers; lookup timeouts; and choice of nearby neighbours. Since
+ reactive recovery was found to add traffic to already congested
+ links, the authors used periodic recovery in their design. For
+ lookup timeouts, they advocated an exponentially weighted moving
+ average of each neighbour's response time, over alternative fixed
+ timeout or 'virtual coordinate' schemes. For selection of nearby
+ neighbours, they found that 'global sampling' was more effective than
+ simply sampling a 'neighbour's neighbours' or 'inverse neighbours'.
+ Castro, Costa, et al. have refuted the suggestion that DHTs cannot
+ cope with high churn rates [330]. By implementing methods for
+ continuous detection and repair, their MSPastry implementation
+ achieved shorter routing paths and a maintenance overhead of less
+ than half a message per second per node.
+
+ There have been more recent proposals based on these early Plaxton-
+ like schemes. Kademlia uses a bit-wise exclusive or (XOR) metric for
+ the 'distance' between 160-bit node identifiers [45]. Each node
+ keeps a list of contact nodes for each section of the node space that
+ is between 2^i and 2^(i+1) from itself (0.i<160). Longer-lived nodes
+ are deliberately given preference on this list -- it has been found
+ in Gnutella that the longer a node has been active, the more likely
+
+
+
+Risson & Moors Informational [Page 26]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ it is to remain active. Like Kademlia, Willow uses the XOR metric
+ [32]. It implements a Tree Maintenance Protocol to 'zipper' together
+ broken segments of a tree. Where other schemes use DHT routing to
+ inefficiently add new peers, Willow can merge disjoint or broken
+ trees in O(log n) parallel operations.
+
+3.5.2. Rings (Chord, DKS)
+
+ Chord is the prototypical DHT ring, so we first sketch its operation.
+ Chord maps nodes and keys to an identifier ring [7, 34]. Chord
+ supports one main operation: find a node with the given key. It uses
+ Consistent Hashing (Section 3.1) to minimize disruption of keys when
+ nodes join and leave the network. However, Chord peers need only
+ track O(log n) other peers, not all peers as in the original
+ consistent hashing proposal [49]. It enables concurrent node
+ insertions and deletions, improving on PRR. Compared to Pastry, it
+ has a simpler join protocol. Each Chord peer tracks its predecessor,
+ a list of successors, and a finger table. Using the finger table,
+ each hop is at least half the remaining distance around the ring to
+ the target node, giving an average lookup hop count of (1/2)log
+ n(base 2). Each Chord node runs a periodic stabilization routine
+ that updates predecessor and successor pointers to cater to newly
+ added nodes. All successors of a given node need to fail for the
+ ring to fail. Although a node departure could be treated the same as
+ a failure, a departing Chord node first notifies the predecessor and
+ successors, so as to improve performance.
+
+ In their definitive paper, Chord's inventors critiqued its
+ dependability under churn [34]. They provided proofs on the
+ behaviour of the Chord network when nodes in a stable network fail,
+ stressing that such proofs are inadequate in the general case of a
+ perpetually churning network. An earlier paper had posed the
+ question, "For lookups to be successful during churn, how regularly
+ do the Chord stabilization routines need to run?" [331]. Stoica,
+ Morris, et al. modeled a range of node join/departure rates and
+ stabilization periods for a Chord network of 1000 nodes. They
+ measured the number of timeouts (caused by a finger pointing to a
+ departed node) and lookup failures (caused by nodes that temporarily
+ point to the wrong successor during churn). They also modeled the
+ 'lookup stretch', the ratio of the Chord lookup time to optimal
+ lookup time on the underlying network. They demonstrated the latency
+ advantage of recursive lookups over iterative lookups, but there
+ remains room for delay reduction. For further work, the authors
+ proposed to improve resilience to network partitions, using a small
+ set of known nodes or 'remembered' random nodes. To reduce the
+ number of messages per lookup, they suggested an increase in the size
+ of each step around the ring, accomplished via a larger number of
+ fingers at each node. Much of the paper assumed independent, equally
+
+
+
+Risson & Moors Informational [Page 27]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ likely node failures. Analysis of correlated node failures, caused
+ by massive site or backbone failures, will be more important in some
+ deployments. The paper did not attempt to recommend a fixed optimal
+ stabilization rate. Liben-Nowell, Balakrishnan, et al. had suggested
+ that optimum stabilization rate might evolve according to
+ measurements of peers' behaviour [331] -- such a mechanism has yet to
+ be devised.
+
+ Alima, El-Ansary, et al. considered the communication costs of
+ Chord's stabilization routines, referred to as 'active correction',
+ to be excessive [332]. Two other robustness issues also motivated
+ their Distributed K-ary Search (DKS) design, which is similar to
+ Chord. Firstly, the total system should evolve for an optimum
+ balance between the number of peers, the lookup hop count, and the
+ size of the routing table. Secondly, lookups should be reliable --
+ P2P algorithms should be able to guarantee a successful lookup for
+ key/value pairs that have been inserted into the system. A similar
+ lookup-correctness issue was raised elsewhere by one of Chord's
+ authors; "Is it possible to augment the data structure to work even
+ when nodes (and their associated finger lists) just disappear?" [333]
+ Alima, El-Ansary, et al. asserted that P2Ps using active correction,
+ like Chord, Pastry, and Tapestry, are unable to give such a
+ guarantee. They propose an alternate 'correction-on-use' scheme,
+ whereby expired routing entries are corrected by information
+ piggybacking lookups and insertions. A prerequisite is that lookup
+ and insertion rates are significantly higher than node arrival,
+ departure, and failure rates. Correct lookups are guaranteed in the
+ presence of simultaneous node arrivals or up to f concurrent node
+ departures, where f is configurable.
+
+3.5.3. Tori (CAN)
+
+ Ratnasamy, Francis, et al. developed the Content-Addressable Network
+ (CAN), another early DHT widely referenced alongside Tapestry,
+ Pastry, and Chord [8, 334]. It is arranged as a virtual
+ d-dimensional Cartesian coordinate space on a d-torus. Each node is
+ responsible for a zone in this coordinate space. The designers used
+ a heuristic thought to be important for large, churning P2P networks:
+ keep the number of neighbours independent of system size.
+ Consequently, its design differs significantly from Pastry, Tapestry,
+ and Chord. Whereas they have O(log n) neighbours per node and O(log
+ n) hops per lookup, CAN has O(d) neighbours and O(dn^(1/d)) hop
+ count. When CAN's system-wide parameter d is set to log(n), CAN
+ converges to their profile. If the number of nodes grows, a major
+ rearrangement of the CAN network may be required [151]. The CAN
+ designers considered building on PRR, but opted for the simple, low-
+ state-per-node CAN algorithm instead. They had reasoned that a PRR-
+ based design would not perform well under churn, given node
+
+
+
+Risson & Moors Informational [Page 28]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ departures and arrivals would affect a logarithmic number of nodes
+ [8].
+
+ There have been preliminary assessments of CAN's resilience. When a
+ node leaves the CAN in an orderly fashion, it passes its own Virtual
+ ID (VID), its neighbours' VIDs and IP addresses, and its key/value
+ pairs to a takeover node. If a node leaves abruptly, its neighbours
+ send recovery messages towards the designated takeover node. CAN
+ ensures the recovery messages reach the takeover node, even if nodes
+ die simultaneously, by maintaining a VID chain with Chord's
+ stabilization algorithm. Some initial 'proof of concept' resilience
+ simulations were run using the Network Simulator (NS) [335] for up to
+ a few hundred nodes. Average hop counts and lookup failure
+ probabilities were plotted against the total number of nodes for
+ various node failure rates [8]. The CAN team documented several open
+ research questions pertaining to state/hop count trade-offs,
+ resilience, load, locality, and heterogeneous peers [44, 334].
+
+3.5.4. Butterflies (Viceroy)
+
+ Viceroy approximates a butterfly network [46]. It generally has
+ constant degree like CAN. Like Chord, Tapestry, and Pastry, it has
+ logarithmic diameter. It improves on these systems, inasmuch as its
+ diameter is better than CAN and its degree is better than Chord,
+ Tapestry, and Pastry. As with most DHTs, it utilizes Consistent
+ Hashing. When a peer joins the Viceroy network, it takes a random
+ but permanent 'identity' and selects its 'level' within the network.
+ Each peer maintains general ring pointers ('predecessor' and
+ 'successor'), level ring pointers ('nextonlevel' and 'prevonlevel'),
+ and butterfly pointers ('left', 'right', and 'up'). When a peer
+ departs, it normally passes its key pairs to a successor, and
+ notifies other peers to find a replacement peer.
+
+ The Viceroy paper scoped out the issue of robustness. It explicitly
+ assumed that peers do not fail [46]. It assumed that join and leave
+ operations do not overlap, so as to avoid the complication of
+ concurrency mechanisms like locking. Kaashoek and Karger were
+ somewhat critical of Viceroy's complexity [37]. They also pointed to
+ its fault-tolerance blind spot. Li and Plaxton suggested that such
+ constant-degree algorithms deserve further consideration [47]. They
+ offered several pros and cons. The limited degree may increase the
+ risk of a network partition, or inhibit use of local neighbours (for
+ the simple reason that there are less of them). On the other hand,
+ it may be easier to reason about the correctness of fixed-degree
+ networks. One of the Viceroy authors has since proposed constant-
+ degree peers in a two-tier, locality-aware DHT [310] -- the lower
+ degree maintained by each lower-tier peer purportedly improves
+ network adaptability. Another Viceroy author has since explored an
+
+
+
+Risson & Moors Informational [Page 29]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ alternative bounded-degree graph for P2P, namely the de Bruijn graph
+ [336].
+
+3.5.5. de Bruijn (D2B, Koorde, Distance Halving, ODRI)
+
+ De Bruijn graphs have had numerous refinements since their inception
+ [337, 338]. Schlumberger was the first to use them for networking
+ [339]. Two research teams independently devised the 'generalized' de
+ Bruijn graph that accommodates a flexible number of nodes in the
+ system [340, 341]. Rowley and Bose studied fault-tolerant rings
+ overlaid on the de Bruijn graph [342]. Lee, Liu, et al. devised a
+ two-level de Bruijn hierarchy, whereby clusters of local nodes are
+ interconnected by a second-tier ring [343].
+
+ Many of the algorithms discussed previously are 'greedy' in that each
+ time a query is forwarded, it moves closer to the destination.
+ Unfortunately, greedy algorithms are generally suboptimal -- for a
+ given degree, the routing distance is longer than necessary [344].
+ Unlike these earlier P2P designs, de Bruijn graphs of degree k
+ achieve an asymptotically optimal diameter log n, where n is the
+ number of nodes in the system and k can be varied to improve
+ resilience. If there are O(log n) neighbours per node, the de Bruijn
+ hop count is O(log n/log log n). To illustrate de Bruijn's practical
+ advantage, consider a network with one million nodes of degree 20:
+ Chord has a diameter of 20, while de Bruijn has a diameter of 5 [36].
+ In 2003, there were a quick succession of de Bruijn proposals -- D2B
+ [345], Koorde [37], Distance Halving [132, 336], and the Optimal
+ Diameter Routing Infrastructure (ODRI) [36].
+
+ Fraigniaud and Gauron began the D2B design by laying out an informal
+ problem statement: keys should be evenly distributed; lookup latency
+ should be small; traffic load should be evenly distributed; updates
+ of routing tables and redistribution of keys should be fast when
+ nodes join or leave the network. They defined a node's "congestion"
+ to be the probability that a lookup will traverse it. Apart from its
+ optimal de Bruijn diameter, they highlighted D2B's merits: a constant
+ expected update time when nodes join and leave (O(log n) with high
+ probability (w.h.p.)); the expected node congestion is O((log n)/n)
+ (O(((log n)^2)/n) w.h.p.) [345]. D2B's resilience was discussed only
+ in passing.
+
+ Koorde extends Chord to attain the optimal de Bruijn degree/diameter
+ trade-off above [37]. Unlike D2B, Koorde does not constrain the
+ selection of node identifiers. Also unlike D2B, it caters to
+ concurrent joins, by extension of Chord's functionality. Kaashoek
+ and Karger investigated Koorde's resilience to a rather harsh failure
+ scenario: "in order for a network to stay connected when all nodes
+ fail with probability of 1/2, some nodes must have degree
+
+
+
+Risson & Moors Informational [Page 30]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ omega(log n)" [37]. They sketched a mechanism to increase Koorde's
+ degree for this more stringent fault tolerance, losing de Bruijn's
+ constant degree advantage. Similarly, to achieve a constant-factor
+ load balance, Koorde would have to sacrifice its degree optimality.
+ They suggested that the ability to trade the degree, and hence the
+ maintenance overhead, against the expected hop count may be important
+ for churning systems. They also identified an open problem: find a
+ load-balanced, degree optimal DHT. Datta, Girdzijauskas, et al.
+ showed that for arbitrary key distributions, de Bruijn graphs fail to
+ meet the dual goals of load balancing and search efficiency [346].
+ They posed the question, "(Is there) a constant routing table sized
+ DHT which meets the conflicting goals of storage load balancing and
+ search efficiency for an arbitrary and changing key distribution?"
+
+ Distance Halving was also inspired by de Bruijn [336] and shares its
+ optimal diameter. Naor and Wieder argued for a two-step
+ "continuous-discrete" approach for its design. The correctness of
+ its algorithms is proven in a continuous setting. The algorithms are
+ then mapped to a discrete space. The source x and target y are
+ points on the continuous interval [0,1). Data items are hashed to
+ this same interval. <str> is a string that determines how messages
+ leave any point on the ring: if bit t of the string is 0, the left
+ leg is taken; if it is 1, the right leg is taken. <str> increases by
+ one bit each hop, giving a sequence by which to step around the ring.
+ A lookup has two phases. In the first, the lookup message containing
+ the source, target, and the random string hops toward the midpoint of
+ the source and target. On each hop, the distance between <str>(x)
+ and <str>(y) is halved, by virtue of the specific 'left' and 'right'
+ functions. In the second phase, the message steps 'backward' from
+ the midpoint to the target, removing the last bit in <str> at each
+ hop. 'Join' and 'leave' algorithms were outlined but there was no
+ consideration of recovery times or message load on churn. Using the
+ Distance Halving properties, the authors devised a caching scheme to
+ relieve congestion in a large P2P network. They have also modified
+ the algorithm to be more robust in the presence of random faults
+ [132].
+
+ Solid comparisons of DHT resilience are scarce, but Loguinov, Kumar,
+ et al. give just that in their ODRI paper [36]. They compare Chord,
+ CAN, and de Bruijn in terms of routing performance, graph expansion
+ and clustering. At the outset, they give the optimal diameter (the
+ maximum hop count between any two nodes in the graph) and average hop
+ count for graphs of fixed degree. De Bruijn graphs converge to both
+ optima, and outperform Chord and CAN on both counts. These optima
+ impact both delay and aggregate lookup load. They present two
+ clustering measures (edge expansion and node expansion), which are
+ interesting for resilience. Unfortunately, after decades of de
+ Bruijn research, they have no exact solution. De Bruijn was shown to
+
+
+
+Risson & Moors Informational [Page 31]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ be superior in terms of path overlap - "de Bruijn automatically
+ selects backup paths that do not overlap with the best shortest path
+ or with each other" [36].
+
+3.5.6. Skip Graphs
+
+ Skip Graphs have been pursued by two research camps [38, 41]. They
+ augment the earlier Skip Lists [347, 348]. Unlike earlier balanced
+ trees, the Skip List is probabilistic -- its insert and delete
+ operations do not require tree rearrangements and so are faster by a
+ constant factor. The Skip List consists of layers of ordered linked
+ lists. All nodes participate in the bottom layer 0 list. Some of
+ these nodes participate in the layer 1 list with some fixed
+ probability. A subset of layer 1 nodes participate in the layer 2
+ list, and so on. A lookup can proceed quickly through the list by
+ traversing the sparse upper layers until it is close to, or at, the
+ target. Unfortunately, nodes in the upper layers of a Skip List are
+ potential hot spots and single points of failure. Unlike Skip Lists,
+ Skip Graphs provide multiple lists at each level for redundancy, and
+ every node participates in one of the lists at each level.
+
+ Each node in a Skip Graph has theta(log n) neighbours on average,
+ like some of the preceding DHTs. The Skip Graph's primary edge over
+ the DHTs is its support for prefix and proximity search. DHTs hash
+ objects to a random point in the graph. Consequently, they give no
+ guarantees over where the data is stored. Nor do they guarantee that
+ the path to the data will stay within the one administration as far
+ as possible [38]. Skip graphs, on the other hand, provide for
+ location-sensitive name searches. For example, to find the document
+ docname on the node user.company.com, the Skip Graph might step
+ through its ordered lists for the prefix com.company.user [38].
+ Alternatively, to find an object with a numeric identifier, an
+ algorithm might search the lowest layer of the Skip Graph for the
+ first digit, the next layer for the next digit, in the same vein
+ until all digits are resolved. Being ordered, Skip Graphs also
+ facilitate range searches. In each of these examples, the Skip Graph
+ can be arranged such that the path to the target, as far as possible,
+ stays within an administrative boundary. If one administration is
+ detached from the rest of the Skip Graph, routing can continue within
+ each of the partitions. Mechanisms have been devised to merge
+ disconnected segments [157], though at this stage, segments are re-
+ merged one at a time. A parallel merge algorithm has been flagged
+ for future work.
+
+ The advantages of Skip Graphs come at a cost. To be able to provide
+ range queries and data placement flexibility, Skip Graph nodes
+ require many more pointers than their DHT counterparts. An increased
+ number of pointers implies increased maintenance traffic. Another
+
+
+
+Risson & Moors Informational [Page 32]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ shortcoming of at least one of the early proposals was that no
+ algorithm was given to assign keys to machines. Consequently, there
+ are no guarantees on system-wide load balancing or on the distance
+ between adjacent keys [100]. Aspnes, Kirsch, et al. have recently
+ devised a scheme to reduce the inter-machine pointer count from
+ O(mlogm), where m is the number of data elements, to O(nlog n), where
+ n is the number of nodes [100]. They proposed a two-layer scheme --
+ one layer for the Skip Graph itself and the second 'bucket layer'.
+ Each machine is responsible for a number of buckets and each bucket
+ elects a representative key. Nodes locally adjust their load. They
+ accept additional keys if they are below their threshold or disperse
+ keys to nearby nodes if they are above threshold. There appear to be
+ numerous open issues: simulations have been done but analysis is
+ outstanding; mechanisms are required to handle the arrival and
+ departure of nodes; there were only brief hints as to how to handle
+ nodes with different capacities.
+
+4. Semantic Index
+
+ Semantic indexes capture object relationships. While the semantic-
+ free methods (DHTs) have firmer theoretic foundations and guarantee
+ that a key can be found if it exists, they do not capture the
+ relationships between the document name and its content or metadata
+ on their own. Semantic P2P designs do. However, since their design
+ is often driven by heuristics, they may not guarantee that scarce
+ items will be found.
+
+ So what might the semantically indexed P2Ps add to an already crowded
+ field of distributed information architectures? At one extreme,
+ there are the distributed relational database management systems
+ (RDBMSs), with their strong consistency guarantees [284]. They
+ provide strong data independence, the flexibility of SQL queries, and
+ strong transactional semantics -- Atomicity, Consistency, Isolation
+ and Durability (ACID) [349]. They guarantee that the query response
+ is complete -- all matching results are returned. The price is
+ performance. They scale to perhaps 1000 nodes, as evidenced in
+ Mariposa [350, 351], or require query caching front ends to constrain
+ the load [284]. Database research has "arguably been cornered into
+ traditional, high-end, transactional applications" [72]. Then there
+ are distributed file systems, like the Network File System (NFS) or
+ the Serverless Network File Systems (xFS), with little data
+ independence, low-level file retrieval interfaces, and varied
+ consistency [284]. Today's eclectic mix of Content Distribution
+ Networks (CDNs) generally deload primary servers by redirecting Web
+ requests to a nearby replica. Some intercept the HTTP requests at
+ the DNS level and then use consistent hashing to find a replica [23].
+ Since this same consistent hashing was a forerunner to the DHT
+
+
+
+
+Risson & Moors Informational [Page 33]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ approaches above, CDNs are generally constrained to the same simple
+ key lookups.
+
+ The opportunity for semantically indexed P2Ps, then, is to provide:
+
+ a) graduated data independence, consistency, and query flexibility,
+ and
+
+ b) probabilistically complete query responses, across
+
+ c) very large numbers of low-cost, geographically distributed,
+ dynamic nodes.
+
+4.1. Keyword Lookup
+
+ P2P keyword lookup is best understood by considering the structure of
+ the underlying index and the algorithms by which queries are routed
+ over that index. Figure 3 summarizes the following paragraphs by
+ classifying the keyword query algorithms, index structures, and
+ metrics. The research has largely focused on scalability, not
+ dependability. There have been very few studies that quantify the
+ impact of network churn. One exception is the work by Chawathe, et
+ al. on the Gia system [61]. Gia's combination of algorithms from
+ Figure 3 (receiver-based flow control, biased random walk, and one-
+ hop replication) gave 2-4 orders of magnitude improvement in query
+ success rates in churning networks.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 34]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ QUERY
+ Query routing
+ Flooding: Peers only index local files so queries must propagate
+ widely [4]
+ Policy-based: Choice of the next hop node: random; most/least
+ recently used; most files shared; most results [265, 352]
+ Random walks: Parallel [67] or biased random walks [61, 66]
+ Query forwarding
+ Iterative: Nodes perform iterative unicast searches of ultrapeers,
+ until the desired number of results is achieved. See Gnutella
+ UDP Extension for Scalable Searches (GUESS) [265, 353]
+ Recursive
+ Query flow control
+ Receiver-controlled: Receivers grant query tokens to senders, so
+ as to avoid overload [61]
+ Reactive: sender throttles queries when it notices receivers are
+ discarding packets [61, 66]
+ Dynamic Time To Live: In the Dynamic Query Protocol, the sender
+ adjusts the time-to-live on each iteration based on the number
+ of results received, the number of connections left, and the
+ number of nodes already theoretically reached by the search [354]
+
+ INDEX
+ Distribution
+ Compression: Leaf nodes periodically send ultrapeers compressed
+ query routing tables, as in the Query Routing Protocol [260]
+ One hop replication: Nodes maintain an index of content on their
+ nearest neighbors [61, 352]
+ Partitioning
+ By document [210]
+ By keyword: Use an inverted list to find a matching document,
+ either locally or at another peer [21]. Partition by keyword
+ sets [355]
+ By document and keyword: Also called Multi-Level Partitioning [21]
+
+ METRIC
+ Query load: Queries per second per node/link [65, 265]
+ Degree: The number of links per node [66, 352]. Early P2P networks
+ approximated power-law networks, where the number of nodes with L
+ links is proportional to L^(-k), where k is a constant [65]
+ Query delay: Reported in terms of time and hop count [61, 66]
+ Query success rate: The "Collapse Point" is the per-node query rate
+ at which the query success rate drops below 90% [61]. See
+ also [61, 265, 352].
+
+ Figure 3: Keyword Lookup in P2P Systems
+
+
+
+
+
+Risson & Moors Informational [Page 35]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+4.1.1. Gnutella Enhancements
+
+ Perhaps the most widely referenced P2P system for simple keyword
+ match is Gnutella [4]. Gnutella queries contain a string of
+ keywords. Gnutella peers answer when they have files whose names
+ contain all the keywords. As discussed in Section 2.1, early
+ versions of Gnutella did not forward the document index. Queries
+ were flooded and peers searched their own local indexes for filename
+ matches. An early review highlighted numerous areas for improvement
+ [65]. It was estimated that the query traffic alone from 50,000
+ early-generation Gnutella nodes would amount to 1.7% of the total
+ U.S. Internet backbone traffic at December 2000 levels. It was
+ speculated that high-degree Gnutella nodes would impede
+ dependability. An unnecessarily high percentage of Gnutella traffic
+ crossed Autonomous System (AS) boundaries -- a locality mechanism may
+ have found suitable nearby peers.
+
+ Fortunately, there have since been numerous enhancements within the
+ Gnutella Developer Forum. At the time of writing, it has been
+ reported that Gnutella has almost 350,000 unique hosts, of which
+ nearly 90,000 accept incoming connections [356]. One of the main
+ improvements is that an index of filename keywords, called the Query
+ Routing Table (QRT), can now be forwarded from 'leaf peers' to its
+ 'ultrapeers' [260]. Ultrapeers can then ensure that the leaves only
+ receive queries for which they have a match, dramatically reducing
+ the query traffic at the leaves. Ultrapeers can have connections to
+ many leaf nodes (~10-100) and a small number of other ultrapeers
+ (<10) [260]. Originally, a leaf node's QRT was not forwarded by the
+ parent ultrapeer to other ultrapeers. More recently, there has been
+ a proposal to distribute aggregated QRTs amongst ultrapeers [357].
+ To further limit traffic, QRTs are compressed by hashing, according
+ to the Query Routing Protocol (QRP) specification [281]. This same
+ specification claims QRP may reduce Gnutella traffic by orders of
+ magnitude, but cautions that simulation is required before mass
+ deployment. A known shortcoming of QRP was that the extent of query
+ propagation was independent of the popularity of the search terms.
+ The Dynamic Query Protocol addressed this [358]. It required leaf
+ nodes to send single queries to high-degree ultrapeers that adjust
+ the queries' time-to-live (TTL) bounds according to the number of
+ received query results. An earlier proposal, called the Gnutella UDP
+ Extension for Scalable Searches (GUESS) [353], similarly aimed to
+ reduce the number of queries for widely distributed files. GUESS
+ reuses the non-forwarding idea (Section 2). A GUESS peer repeatedly
+ queries single ultrapeers with a TTL of 1, with a small timeout on
+ each query to limit load. It chooses the number of iterations and
+ selects ultrapeers so as to satisfy its search needs. For
+ adaptability, a small number of experimental Gnutella nodes have
+
+
+
+
+Risson & Moors Informational [Page 36]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ implemented eXtensible Markup Language (XML) schemas for richer
+ queries [359, 360]. None of the above Gnutella proposals explicitly
+ assess robustness.
+
+ The broader research community has recently been leveraging aspects
+ of the Gnutella design. Lv, Ratnasamy, et al. exposed one assumption
+ implicit in some of the early DHT work -- that designs "such as
+ Gnutella are inherently not scalable, and therefore should be
+ abandoned" [66]. They argued that by making better use of the more
+ powerful peers, Gnutella's scalability issues could be alleviated.
+ Instead of its flooding mechanism, they used random walks. Their
+ preliminary design to bias random walks towards high capacity nodes
+ did not go as far as the ultrapeer proposals in that the indexes did
+ not move to the high-capacity nodes. Chawathe, Ratnasamy, et al.
+ chose to extend the Gnutella design with their Gia system, in
+ response to the perceived shortcomings of DHTs in Section 1.2 [61].
+ Compared to the early Gnutella designs, they incorporated several
+ novel features. They devise a topology adaptation algorithm so that
+ most peers are attached to high-degree peers. They use a random walk
+ search algorithm, in lieu of flooding, and bias the query load
+ towards higher-degree peers. For 'one-hop replication', they require
+ all nodes to keep pointers to content on adjacent peers. To
+ implement a receiver-controlled token-based flow control, a peer must
+ have a token from its neighbouring peer before it sends a query to
+ it. Chawathe, Ratnasamy, et al. show by simulations that the
+ combination of these features provides a scalability improvement of
+ three to five orders of magnitude over Gnutella "while retaining
+ significant robustness". The main robustness metrics they used were
+ the 'collapse point' query rate (the per-node query rate at which the
+ successful query rate falls below 90%) and the average hop count
+ immediately prior to collapse. Their comparison with Gnutella did
+ not take into account the Gnutella enhancements above -- this was
+ left as future work. Castro, Costa, and Rowstron argued that if
+ Gnutella were built on top of a structured overlay, then both the
+ query and overlay maintenance traffic could be reduced [259]. Yang,
+ Vinograd, et al. explore various policies for peer selection in the
+ GUESS protocol, since the issue is left open in the original proposal
+ [265]. For example, the peer initiating the query could choose peers
+ that have been "most recently used" or that have the "most files
+ shared". Various policy pitfalls are identified. For example, good
+ peers could be overloaded, victims of their own success.
+ Alternatively, malicious peers could encourage the querying peer to
+ try inactive peers. They conclude that a "most results" policy gives
+ the best balance of robustness and efficiency. Like Castro, Costa,
+ and Rowstron, they concentrated on the static network scenario.
+ Cholvi, Felber, et al. very briefly describe how similar "least
+ recently used" and "most often used" heuristics can be used by a peer
+ to select peer 'acquaintances' [352]. They were motivated by the
+
+
+
+Risson & Moors Informational [Page 37]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ congestion associated with Gnutella's TTL-limited flooding.
+ Recognizing that the busiest peers can quickly become overloaded
+ central hubs for the entire network, they limit the number of
+ acquaintances for any given peer to 25. They sketch a mechanism to
+ decrement a query's TTL multiple times when it traverses "interested
+ peers". In summary, these Gnutella-related investigations are
+ characterized by a bias for high-degree peers and very short directed
+ query paths, a disdain for flooding, and concern about excessive load
+ on the 'better' peers. Generally, the robustness analysis for
+ dynamic networks (content updates and node arrivals/departures)
+ remains open.
+
+4.1.2. Partition-by-Document, Partition-by-Keyword
+
+ One aspect of P2P keyword search systems has received particular
+ attention: should the index be partitioned by document or by keyword?
+ The issue affects scalability. To be partitioned by document, each
+ node has a local index of documents for which it is responsible.
+ Gnutella is a prime example. Queries are generally flooded in
+ systems partitioned by document. On the other hand, a peer may
+ assume responsibility for a set of keywords. The peer uses an
+ inverted list to find a matching document, either locally or at
+ another peer. If the query contains several keywords, inverted lists
+ may need to be retrieved from several different peers to find the
+ intersection [21]. The initial assessment by Li, Loo, et al. was
+ that the partition-by-document approach was superior [210]. For one
+ scenario of a full-text Web search, they estimated the communications
+ costs to be about six times higher than the feasible budget.
+ However, wanting to exploit prior work on inverted list intersection,
+ they studied the partition-by-keyword strategy. They proposed
+ several optimizations that put the communication costs for a
+ partition-by-keyword system within an order of magnitude of
+ feasibility. There had been a couple of prior papers that suggested
+ partitioned-by-keyword designs incorporate DHTs to map keywords to
+ peers [355, 361]. In Gnawali's Keyword-set Search System (KSS), the
+ index is partitioned by sets of keywords [355]. Terpstra, Behnel, et
+ al. point out that by keeping keyword pairs or triples, the number of
+ lists per document in KSS is squared or tripled [362]. Shi,
+ Guangwen, et al. interpreted the approximations of Li, Loo, et al. to
+ mean that neither approach is feasible on its own [21]. Their
+ Multi-Level Partitioning (MLP) scheme incorporates both partitioning
+ approaches. They arrange nodes into a group hierarchy, with all
+ nodes in the single 'level 0' group, and with the same nodes sub-
+ divided into k logical subgroups on 'level 1'. The subgroups are
+ again divided, level by level, until level l. The inverted index is
+ partitioned by document between groups and by keyword within groups.
+ MLP avoids the query flooding normally associated with systems
+ partitioned by document, since a small number of nodes in each group
+
+
+
+Risson & Moors Informational [Page 38]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ process the query. It reduces the bandwidth overheads associated
+ with inverted list intersection in systems partitioned solely by
+ keyword, since groups can calculate the intersection independently
+ over the documents for which they are responsible. MLP was overlaid
+ on SkipNet, per Section 3.5.6 [38]. Some initial analyses of
+ communications costs and query latencies were provided.
+
+4.1.3. Partial Search, Exhaustive Search
+
+ Much of the research above addresses partial keyword search.
+ Daswani, et al. highlighted the open problem of efficient,
+ comprehensive keyword search [25]. How can exhaustive searches be
+ achieved without flooding queries to every peer in the network?
+ Terpstra, Behnel et al. couched the keyword search problem in
+ rendezvous terms: dynamic keyword queries need to 'meet' with static
+ document lists [362]. Their Bitzipper scheme is partitioned by
+ document. They improved on full flooding by putting document
+ metadata on 2sqrt(n) nodes and forwarding queries through only
+ 6sqrt(n) nodes. They reported that Bitzipper nodes need only 1/166th
+ of the bandwidth of full-flooding Gnutella nodes for an exhaustive
+ search. An initial comparison of query load was given. There was
+ little consideration of either static or dynamic resilience; that is,
+ of nodes failing, of documents continually changing, or of nodes
+ continually joining and leaving the network.
+
+4.2. Information Retrieval
+
+ The field of Information Retrieval (IR) has matured considerably
+ since its inception in the 1950s [363]. A taxonomy for IR models has
+ been formalized [262]. It consists of four elements: a
+ representation of documents in a collection; a representation of user
+ queries; a framework describing relationships between document
+ representations and queries; and a ranking function that quantifies
+ an ordering amongst documents for a particular query. Three main
+ issues motivate current IR research -- information relevance, query
+ response time, and user interaction with IR systems. The dominant IR
+ trends for searching large text collections are also threefold [262].
+ The size of collections is increasing dramatically. More complicated
+ search mechanisms are being found to exploit document structure, to
+ accommodate heterogeneous document collections, and to deal with
+ document errors. Compression is in favour -- it may be quicker to
+ search compact text or retrieve it from external devices. In a
+ distributed IR system, query processing has four parts. Firstly,
+ particular collections are targeted for the search. Secondly,
+ queries are sent to the targeted collections. Queries are then
+ evaluated at the individual collections. Finally, results from the
+ collections are collated.
+
+
+
+
+Risson & Moors Informational [Page 39]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ So how do P2P networks differ from distributed IR systems? Bawa,
+ Manku, et al. presented four differences [62]. They suggested that a
+ P2P network is typically larger, with tens or hundreds of thousands
+ of nodes. It is usually more dynamic, with node lifetimes measured
+ in hours. They suggested that a P2P network is usually homogeneous,
+ with a common resource description language. It lacks the
+ centralized "mediators" found in many IR systems that assume
+ responsibility for selecting collections, for rewriting queries, and
+ for merging ranked results. These distinctions are generally aligned
+ with the peer characteristics in Section 1. One might add that P2P
+ nodes display more symmetry -- peers are often both information
+ consumers and producers. Daswani, Garcia-Molina, et al. pointed out
+ that, while there are IR techniques for ranked keyword search at
+ moderate scale, research is required so that ranking mechanisms are
+ efficient at the larger scale targeted by P2P designs [25]. Joseph
+ and Hoshiai surveyed several P2P systems using metadata techniques
+ from the IR toolkit [60]. They described an assortment of IR
+ techniques and P2P systems, including various metadata formats,
+ retrieval models, bloom filters, DHTs, and trust issues.
+
+ In the ensuing paragraphs, we survey P2P work that has incorporated
+ information retrieval models, particularly the Vector Model and the
+ Latent Semantic Indexing Model. We omit the P2P work based on
+ Bayesian models. Some have pointed to such work [60], but made no
+ explicit mention of the model [364]. One early paper on P2P
+ content-based image retrieval also leveraged the Bayesian model
+ [365]. For the former two models, we briefly describe the design,
+ then try to highlight robustness aspects. On robustness, we are
+ again stymied for lack of prior work. Indeed, a search across all
+ proceedings of the Annual ACM Conference on Research and Development
+ in Information Retrieval for the words "reliable", "available",
+ "dependable", or "adaptable" did not return any results at the time
+ of writing. In contrast, a standard text on distributed database
+ management systems [366] contains a whole chapter on reliability. IR
+ research concentrates on performance measures. Common performance
+ measures include recall, the fraction of the relevant documents that
+ has been retrieved and precision, the fraction of the retrieved
+ documents that is relevant [262]. Ideally, an IR system would have
+ high recall and high precision. Unfortunately techniques favouring
+ one often disadvantage the other [363].
+
+
+
+
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 40]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+4.2.1. Vector Model (PlanetP, FASD, eSearch)
+
+ The vector model [367] represents both documents and queries as term
+ vectors, where a term could be a word or a phrase. If a document or
+ query has a term, the weight of the corresponding dimension of the
+ vector is non-zero. The similarity of the document and query vectors
+ gives an indication of how well a document matches a particular
+ query.
+
+ The weighting calculation is critical across the retrieval models.
+ Amongst the numerous proposals for the probabilistic and vector
+ models, there are some commonly recurring weighting factors [363].
+ One is term frequency. The more a term is repeated in a document,
+ the more important the term is. Another is inverse document
+ frequency. Terms common to many documents give less information
+ about the content of a document. Then there is document length.
+ Larger documents can bias term frequencies, so weightings are
+ sometimes normalized against document length. The expression "TFIDF
+ weighting" refers to the collection of weighting calculations that
+ incorporate term frequency and inverse document frequency, not just
+ to one. Two weighting calculations have been particularly dominant
+ -- Okapi [368] and pivoted normalization [369]. A distributed
+ version of Google's Pagerank algorithm has also been devised for a
+ P2P environment [370]. It allows incremental, ongoing Pagerank
+ calculations while documents are inserted and deleted.
+
+ A couple of early P2P systems leveraged the vector model. Building
+ on the vector model, PlanetP divided the ranking problem into two
+ steps [215]. In the first, peers are ranked for the probability that
+ they have matching documents. In the second, higher-priority peers
+ are contacted and the matching documents are ranked. An Inverse Peer
+ Frequency, analogous to the Inverse Document Frequency, is used to
+ rank relevant peers. To further constrain the query traffic, PlanetP
+ contacts only the first group of m peers to retrieve a relevant set
+ of documents. In this way, it repeatedly contacts groups of m peers
+ until the top k document rankings are stable. While the PlanetP
+ designers first quantified recall and precision, they also considered
+ reliability. Each PlanetP peer has a global index with a list of all
+ other peers, their IP addresses, and their Bloom filters. This large
+ volume of shared information needs to be maintained. Klampanos and
+ Jose saw this as PlanetP's primary shortcoming [371]. Each Bloom
+ filter summarized the set of terms in the local index of each peer.
+ The time to propagate changes, be they new documents or peer
+ arrivals/departures, was studied by simulation for up to 1000 peers.
+ The reported propagation times were in the hundreds of seconds.
+ Design workarounds were required for PlanetP to be viable across
+ slower dial-up modem connections. For future work, the authors were
+
+
+
+
+Risson & Moors Informational [Page 41]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ considering some sort of hierarchy to scale to larger numbers of
+ peers.
+
+ A second early system using the vector model is the Fault-tolerant,
+ Adaptive, Scalable Distributed (FASD) search engine [283], which
+ extended the Freenet design (Section 2.3) for richer queries. The
+ original Freenet design could find a document based on a globally
+ unique identifier. Kronfol's design added the ability to search, for
+ example, for documents about "apples AND oranges NOT bananas". It
+ uses a TFIDF weighting scheme to build a document's term vector.
+ Each peer calculates the similarity of the query vector and local
+ documents and forwards the query to the best downstream peer. Once
+ the best downstream peer returns a result, the second-best peer is
+ tried, and so on. Simulations with 1000 nodes gave an indication of
+ the query path lengths in various situations -- when routing queries
+ in a network with constant rates of node and document insertion, when
+ bootstrapping the network in a "worst-case" ring topology, or when
+ failing randomly and specifically selected peers. Kronfol claimed
+ excellent average-case performance -- less than 20 hops to retrieve
+ the same top n results as a centralized search engine. There were,
+ however, numerous cases where the worst-case path length was several
+ hundred hops in a network of only 1000 nodes.
+
+ In parallel, there have been some P2P designs based on the vector
+ model from the University of Rochester -- pSearch [9, 372] and
+ eSearch [373]. The early pSearch paper suggested a couple of
+ retrieval models, one of which was the Vector Space Model, to search
+ only the nodes likely to have matching documents. To obtain
+ approximate global statistics for the TFIDF calculation, a spanning
+ tree was constructed across a subset of the peers. For the m top
+ terms, the term-to-document index was inserted into a Content-
+ Addressable Network [334]. A variant that mapped terms to document
+ clusters was also suggested. eSearch is a hybrid of the partition-
+ by-document and partition-by-term approaches (Section 4.1.2) eSearch
+ nodes are primarily partitioned by term. Each is responsible for the
+ inverted lists for some top terms. For each document in the inverted
+ list, the node stores the complete term list. To reduce the size of
+ the index, the complete term lists for a document are only kept on
+ nodes that are responsible for top terms in the document. eSearch
+ uses the Okapi term weighting to select top terms. It relies on the
+ Chord DHT [34] to associate terms with nodes storing the inverted
+ lists. It also uses automatic query expansion. This takes the
+ significant terms from the top document matches and automatically
+ adds them to the user's query to find additional relevant documents.
+ The eSearch performance was quantified in terms of search precision,
+ the number of retrieved documents, and various load-balancing
+ metrics. Compared to the more common proposals for partitioning by
+
+
+
+
+Risson & Moors Informational [Page 42]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ keywords, eSearch consumed 6.8 times the storage space to achieve
+ faster search times.
+
+4.2.2. Latent Semantic Indexing (pSearch)
+
+ Another retrieval model used in P2P proposals is Latent Semantic
+ Indexing (LSI) [374]. Its key idea is to map both the document and
+ query vectors to a concept space with lower dimensions. The starting
+ point is a t*N weighting matrix, where t is the total number of
+ indexed terms, N is the total number of documents, and the matrix
+ elements could be TFIDF rankings. Using singular value
+ decomposition, this matrix is reduced to a smaller number of
+ dimensions, while retaining the more significant term-to-document
+ mappings. Baeza-Yates and Ribeiro-Neto suggested that LSI's value is
+ a novel theoretic framework, but that its practical performance
+ advantage for real document collections had yet to be proven [262].
+ pSearch incorporated LSI [9]. By placing the indices for
+ semantically similar documents close in the network, Tang, Xu, et al.
+ touted significant bandwidth savings relative to the early full-
+ flooding variant of Gnutella [372]. They plotted the number of nodes
+ visited by a query. They also explored the trade-off with accuracy,
+ the percentage match between the documents returned by the
+ distributed pSearch algorithm and those from a centralized LSI
+ baseline. In a more recent update to the pSearch work, Tang,
+ Dwarkadas, et al. summarized LSI's shortcomings [375]. Firstly, for
+ large document collections, its retrieval quality is inherently
+ inferior to Okapi. Secondly, singular value decomposition consumes
+ excessive memory and computation time. Consequently, the authors
+ used Okapi for searching while retaining LSI for indexing. With
+ Okapi, they selected the next node to be searched and selected
+ documents on searched nodes. With LSI, they ensured that similar
+ documents are clustered near each other, thereby optimizing the
+ network search costs. When retrieving a small number of top
+ documents, the precision of LSI+Okapi approached that of Okapi.
+ However, if retrieving a large number of documents, the LSI+Okapi
+ precision is inferior. The authors want to improve this in future
+ work.
+
+4.2.3. Small Worlds
+
+ The "small world" concept originally described how people are
+ interconnected by short chains of acquaintances [376]. Kleinberg was
+ struck by the algorithmic lesson of the small world, namely "that
+ individuals using local information are collectively very effective
+ at constructing short paths between two points in a social network"
+ [377]. Small world networks have a small diameter and a large
+ clustering coefficient (a large number of connections amongst
+ relevant nodes) [378].
+
+
+
+Risson & Moors Informational [Page 43]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ The small world idea has had a limited impact on peer-to-peer
+ algorithms. It has influenced only a few unstructured [62, 378-380]
+ and structured [344, 381] algorithms. The most promising work on
+ "small worlds" in P2P networks are those concerned with the
+ information retrieval metrics, precision and recall [62, 378, 380].
+
+5. Queries
+
+ Database research suggests directions for P2P research. Hellerstein
+ observed that, while work on fast P2P indexes is well underway, P2P
+ query optimization remains a promising topic for future research
+ [23]. Kossman reviewed the state of the art of distributed query
+ processing, highlighting areas for future research: simulation and
+ query optimization for networks of tens of thousands of servers and
+ millions of clients; non-relational data types (e.g., XML, text, and
+ images); and partial query responses since on the Internet, "failure
+ is the rule rather than the exception" [19]. A primary motivation
+ for the P2P system, PIER, was to scale from the largest database
+ systems of a few hundred nodes to an Internet environment in which
+ there are over 160 million nodes [22]. Litwin and Sahri have also
+ considered ways to combine distributed hashing, more specifically the
+ Scalable Distributed Data Structures, with SQL databases, claiming to
+ be first to implement scalable distributed database partitioning
+ [382]. Motivated by the lack of transparent distribution in current
+ distributed databases, they measure query execution times for
+ Microsoft SQL servers aggregated by means of an SDDS layer. One of
+ their starting assumptions was that it is too challenging to change
+ the SQL query optimizer.
+
+ Database research also suggests the approach to P2P research.
+ Researchers of database query optimization were divided between those
+ looking for optimal solutions in special cases and those using
+ heuristics to answer all queries [383]. Gribble, et al. cast query
+ optimization in terms of the data placement problem, which is to
+ "distribute data and work so the full query workload is answered with
+ lowest cost under the existing bandwidth and resource constraints"
+ [250]. They pointed out that even the static version of this problem
+ is NP-complete in P2P networks. Consequently, research on massive,
+ dynamic P2P networks will likely progress using both strategies of
+ early database research - heuristics and special-case optimizations.
+
+ If P2P networks are going to be adaptable, if they are to support a
+ wide range of applications, then they need to accommodate many query
+ types [72]. Up to this point, we have reviewed queries for keys
+ (Section 3) and keywords (Sections 4.1. and 4.2). Unfortunately, a
+ major shortcoming of the DHTs in Section 3.5 is that they primarily
+ support exact-match, single-key queries. Skip Graphs support range
+ and prefix queries, but not aggregation queries. Here we probe below
+
+
+
+Risson & Moors Informational [Page 44]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ the language syntax to identify the open research issues associated
+ with more expressive P2P queries [25]. Triantafillou and Pitoura
+ observed the disparate P2P designs for different types of queries and
+ so outlined a unifying framework [76]. To classify queries, they
+ considered the number of relations (single or multiple), the number
+ of attributes (single or multiple), and the type of query operator.
+ They described numerous operators: equality, range, join, and
+ "special functions". The latter referred to aggregation (like sum,
+ count, average, minimum, and maximum), grouping and ordering. The
+ following sections approximately fit their taxonomy -- range queries,
+ multi-attribute queries, join queries and aggregation queries. There
+ has been some initial P2P work on other query types -- continuous
+ queries [20, 22, 73], recursive queries [22, 74], and adaptive
+ queries [23, 75]. For these, we defer to the primary references.
+
+5.1. Range Queries
+
+ The support of efficient range predicates in P2P networks was
+ identified as an important open research issue by Huebsch, et al.
+ [22]. Range partitioning has been important in parallel databases to
+ improve performance, so that a transaction commonly needs data from
+ only one disk or node [22]. One type of range search, longest prefix
+ match, is important because of its prevalence in routing schemes for
+ voice and data networks alike. In other applications, users may pose
+ broad, inexact queries, even though they require only a small number
+ of responses. Consequently, techniques to locate similar ranges are
+ also important [77]. Various proposals for range searches over P2P
+ networks are summarized in Figure 4. Since the Scalable Distributed
+ Data Structure (SDDS) has been an important influence on contemporary
+ Distributed Hash Tables (DHTs) [49-51], we also include ongoing work
+ on SDDS range searches.
+
+ PEER-TO-PEER (P2P)
+ Locality Sensitive Hashing (Chord) [77]
+ Prefix Hash Trees (unspecified DHT) [78, 79]
+ Space Filling Curves (CAN) [80]
+ Space Filling Curves (Chord) [81]
+ Quadtrees (Chord) [82]
+ Skip Graphs [38, 41, 83, 100]
+ Mercury [84]
+ P-Grid [85, 86]
+
+ SCALABLE DISTRIBUTED DATA STRUCTURES (SDDS)
+ RP* [87, 88]
+
+ Figure 4: Solutions for Range Queries on P2P and SDDS Indexes
+
+
+
+
+
+Risson & Moors Informational [Page 45]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ The papers on P2P range search can be divided into those that rely on
+ an underlying DHT (the first five entries in Figure 4) and those that
+ do not (the subsequent three entries). Bharambe, Agrawal, et al.
+ argued that DHTs are inherently ill-suited to range queries [84].
+ The very feature that makes for their good load balancing properties,
+ randomized hash functions, works against range queries. One possible
+ solution would be to hash ranges, but this can require a priori
+ partitioning. If the partitions are too large, partitions risk
+ overload. If they are too small, there may be too many hops.
+
+ Despite these potential shortcomings, there have been several range
+ query proposals based on DHTs. If hashing ranges to nodes, it is
+ entirely possible that overlapping ranges map to different nodes.
+ Gupta, Agrawal, et al. rely on locality sensitive hashing to ensure
+ that, with high probability, similar ranges are mapped to the same
+ node [77]. They propose one particular family of locality sensitive
+ hash functions, called min-wise independent permutations. The number
+ of partitions per node and the path length were plotted against the
+ total numbers of peers in the system. For a network with 1000 nodes,
+ the hop count distribution was very similar to that of the exact-
+ matching Chord scheme. Was it load-balanced? For the same network
+ with 50,000 partitions, there were over two orders of magnitude
+ variation in the number of partitions at each node (first and
+ ninety-ninth percentiles). The Prefix Hash Tree is a trie in which
+ prefixes are hashed onto any DHT. The preliminary analysis suggests
+ efficient doubly logarithmic lookup, balanced load, and fault
+ resilience [78, 79]. Andrzejak and Xu were perhaps the first to
+ propose a mapping from ranges to DHTs [80]. They use one particular
+ Space Filling Curve, the Hilbert curve, over a Content Addressable
+ Network (CAN) construction (Section 3.5.3). They maintain two
+ properties: nearby ranges map to nearby CAN zones; if a range is
+ split into two sub-ranges, then the zones of the sub-ranges partition
+ the zone of the primary range. They plot path length and load proxy
+ measures (the total number of messages and nodes visited) for three
+ algorithms to propagate range queries: brute force, controlled
+ flooding, and directed controlled flooding. Schmidt and Parashar
+ also advocated Space Filling Curves to achieve range queries over a
+ DHT [81]. However, they point out that, while Andrzejak and Xu use
+ an inverse Space Filling Curve to map a one-dimensional space to d-
+ dimensional zones, they map a d-dimensional space back to a one-
+ dimensional index. Such a construction gives the ability to search
+ across multiple attributes (Section 5.2). Tanin, Harwood, et al.
+ suggested quadtrees over Chord [82], and gave preliminary simulation
+ results for query response times.
+
+ Because DHTs are naturally constrained to exact-match, single-key
+ queries, researchers have considered other P2P indexes for range
+ searches. Several were based on Skip Graphs [38, 41], which, unlike
+
+
+
+Risson & Moors Informational [Page 46]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ the DHTs, do not necessitate randomizing hash functions and are
+ therefore capable of range searches. Unfortunately, they are not
+ load balanced [83]. For example, in SkipNet [48], hashing was added
+ to balance the load -- the Skip Graph could support range searches or
+ load balancing, but not both. One solution for load-balancing relies
+ on an increased number of 'virtual' servers [168] but, in their
+ search for a system that can both search for ranges and balance
+ loads, Bharambe, Agrawal, et al. rejected the idea [84]. The virtual
+ servers work assumed load imbalance stems from hashing; that is, by
+ skewed data insertions and deletions. In some situations, the
+ imbalance is triggered by a skewed query load. In such
+ circumstances, additional virtual servers can increase the number of
+ routing hops and increase the number of pointers that a Skip Graph
+ needs to maintain. Ganesan, Bawa, et al. devised an alternate method
+ to balance load [83]. They proposed two Skip Graphs, one to index
+ the data itself and the other to track load at each node in the
+ system. Each node is able to determine the load on its neighbours
+ and the most (least) loaded nodes in the system. They devise two
+ algorithms: NBRADJUST balances load on neighbouring nodes; using
+ REORDER, empty nodes can take over some of the tuples on heavily
+ loaded nodes. Their simulations focus on skewed storage load, rather
+ than on skewed query loads, but they surmise that the same approach
+ could be used for the latter.
+
+ Other proposals for range queries avoid both the DHT and the Skip
+ Graph. Bharambe, Agrawal, et al. distinguish their Mercury design by
+ its support for multi-attribute range queries and its explicit load
+ balancing [84]. In Mercury, nodes are grouped into routing hubs,
+ each of which is responsible for various query attributes. While it
+ does not use hashing, Mercury is loosely similar to the DHT
+ approaches: nodes within hubs are arranged into rings, like Chord
+ [34]; for efficient routing within hubs, k long-distance links are
+ used, like Symphony [381]. Range lookups require O(((log n)^2)/k)
+ hops. Random sampling is used to estimate the average load on nodes
+ and to find the parts of the overlay that are lightly loaded.
+ Whereas Symphony assumed that nodes are responsible for ranges of
+ approximately equal size, Mercury's random sampling can determine the
+ location of the start of the range, even for non-uniform ranges [84].
+ P-Grid [42] does provide for range queries, by virtue of the key
+ ordering in its tree structures. Ganesan, Bawa, et al. critiqued its
+ capabilities [83]: P-Grid assumes fixed-capacity nodes; there was no
+ formal characterization of imbalance ratios or balancing costs; every
+ P-Grid periodically contacts other nodes for load information.
+
+ The work on Scalable Distributed Data Structures (SDDSs) has
+ progressed in parallel with P2P work and has addressed range queries.
+ Like the DHTs above, the early SDDS Linear Hashing (LH*) schemes were
+ not order-preserving [52]. To facilitate range queries, Litwin,
+
+
+
+Risson & Moors Informational [Page 47]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ Niemat, et al. devised a Range Parititioning variant, RP* [87].
+ There are options to dispense with the index, to add indexes to
+ clients, and to add them to servers. In the variant without an
+ index, every query is issued via multicasting. The other variants
+ also use some multicasting. The initial RP* paper suggested
+ scalability to thousands of sites, but a more recent RP* simulation
+ was capped at 140 servers [88]. In that work, Tsangou, Ndiaye, et
+ al. investigated TCP and UDP mechanisms by which servers could return
+ range query results to clients. The primary metrics were search and
+ response times. Amongst the commercial parallel database management
+ systems, they reported that the largest seems only to scale to 32
+ servers (SQL Server 2000). For future work, they planned to explore
+ aggregation of query results, rather than establishing a connection
+ between the client and every single server with a response.
+
+ All in all, it seems there are numerous open research questions on
+ P2P range queries. How realistic is the maintenance of global load
+ statistics considering the scale and dynamism of P2P networks?
+ Simulations at larger scales are required. Proposals should take
+ into account both the storage load (insert and delete messages) and
+ the query load (lookup messages). Simplifying assumptions need to be
+ attacked. For example, how well do the above solutions work in
+ networks with heterogeneous nodes, where the maximum message loads
+ and index sizes are node-dependent?
+
+5.2. Multi-Attribute Queries
+
+ There has been some work on multi-attribute P2P queries. As late as
+ September 2003, it was suggested that there has not been an efficient
+ solution [76].
+
+ Again, an early significant work on multi-attribute queries over
+ aggregated commodity nodes germinated amongst SDDSs. k-RP* [89] uses
+ the multi-dimensional binary search tree (or k-d tree, where k
+ indicates the number of dimensions of the search index) [384]. It
+ builds on the RP* work from the previous section and inherits their
+ capabilities for range search and partial match. Like the other
+ SDDSs, k-RP* indexes can fit into RAM for very fast lookup. For
+ future work, Litwin and Neimat suggested a) a formal analysis of the
+ range search termination algorithm and the k-d paging algorithm, b) a
+ comparison with other multi-attribute data structures (quad-trees and
+ R-trees) and c) exploration of query processing, concurrency control,
+ and transaction management for k-RP* files [89]. On the latter
+ point, others have considered transactions to be inconsequential to
+ the core problem of supporting more complex queries in P2P networks
+ [72].
+
+
+
+
+
+Risson & Moors Informational [Page 48]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ In architecting their secure wide-area Service Discovery Service
+ (SDS), Hodes, Czerwinski, et al. considered three possible designs
+ for multi-criteria search -- Centralization, Mapping and Flooding
+ [90]. These correlate to the index classifications of Section 2 --
+ Central, Distributed, and Local. They discounted the centralized,
+ Napster-like index for its risk of a single point of failure. They
+ considered the hash-based mappings of Section 3, but concluded that
+ it would not be possible to adequately partition data. A document
+ satisfying many criteria would be wastefully stored in many
+ partitions. They rejected full flooding for its lack of scalability.
+ Instead, they devised a query filtering technique, reminiscent of
+ Gnutella's query routing protocol (Section 4.1). Nodes push
+ proactive summaries of their data rather than waiting for a query.
+ Summaries are aggregated and stored throughout a server hierarchy, to
+ guide subsequent queries. Some initial prototype measurements were
+ provided for total load on the system, but not for load distribution.
+ They put several issues forward for future work. The indexing needs
+ to be flexible to change according to query and storage workloads. A
+ mesh topology might improve on their hierarchic topology since query
+ misses would not propagate to root servers. The choice is analogous
+ to BGP meshes and DNS trees.
+
+ More recently, Cai, Frank, et al. devised the Multi-Attribute
+ Addressable Network (MAAN) [91]. They built on Chord to provide both
+ multi-attribute and range queries, claiming to be the first to
+ service both query types in a structured P2P system. Each MAAN node
+ has O(log n) neighbours, where N is the number of nodes. MAAN
+ multi-attribute range queries require O(log n+N*Smin) hops, where
+ Smin is the minimum range selectivity across all attributes.
+ Selectivity is the ratio of the query range to the entire identifier
+ range. The paper assumed that a locality preserving hash function
+ would ensure balanced load. Per Section 5.1, the arguments by
+ Bharambe, Agrawal, et al. have highlighted the shortcomings of this
+ assumption [84]. MAAN required that the schema must be fixed and
+ known in advance -- adaptable schemas were recommended for subsequent
+ attention. The authors also acknowledged that there is a selectivity
+ breakpoint at which full flooding becomes more efficient than their
+ scheme. This begs for a query resolution algorithm that adapts to
+ the profile of queries. Cai and Frank followed up with RDFPeers
+ [55]. They differentiate their work from other RDF proposals by a)
+ guaranteeing to find query results if they exist and b) removing the
+ requirement of prior definition of a fixed schema. They hashed
+ <subject, predicate, object> triples onto the MAAN and reported
+ routing hop metrics for their implementation. Load imbalance across
+ nodes was reduced to less than one order of magnitude, but the
+ specific measure was the number of triples stored per node - skewed
+ query loads were not considered. They plan to improve load balancing
+ with the virtual servers of Section 5.1 [168].
+
+
+
+Risson & Moors Informational [Page 49]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+5.3. Join Queries
+
+ Two research teams have done some initial work on P2P join
+ operations. Harren, Hellerstein, et al. initially described a
+ three-layer architecture -- storage, DHT and query processing. They
+ implemented the join operation by modifying an existing Content
+ Addressable Network (CAN) simulator, reporting "significant hot-spots
+ in all dimensions: storage, processing, and routing" [72]. They
+ progressed their design more recently in the context of PIER, a
+ distributed query engine based on CAN [22, 385]. They implemented
+ two equi-join algorithms. In their design, a key is constructed from
+ the "namespace" and the "resource ID". There is a namespace for each
+ relation and the resource ID is the primary key for base tuples in
+ that relation. Queries are multicast to all nodes in the two
+ namespaces (relations) to be joined. Their first algorithm is a DHT
+ version of the symmetric hash join. Each node in the two namespaces
+ finds the relevant tuples and hashes them to a new query namespace.
+ The resource ID in the new namespace is the concatenation of join
+ attributes. In the second algorithm, called "fetch matches", one of
+ the relations is already hashed on the join attributes. Each node in
+ the second namespace finds tuples matching the query and retrieves
+ the corresponding tuples from the first relation. They leveraged two
+ other techniques, namely the symmetric semi-join rewrite and the
+ Bloom filter rewrite, to reduce the high bandwidth overheads of the
+ symmetric hash join. For an overlay of 10,000 nodes, they simulated
+ the delay to retrieve tuples and the aggregate network bandwidth for
+ these four schemes. The initial prototype was on a cluster of 64
+ PCs, but it has more recently been expanded to PlanetLab.
+
+ Triantafillou and Pitoura considered multicasting to large numbers of
+ peers to be inefficient [76]. They therefore allocated a limited
+ number of special peers, called range guards. The domain of the join
+ attributes was divided, one partition per range guard. Join queries
+ were sent only to range guards, where the query was executed.
+ Efficient selection of range guards and a quantitive evaluation of
+ their proposal were left for future work.
+
+5.4. Aggregation Queries
+
+ Aggregation queries invariable rely on tree-structures to combine
+ results from a large number of nodes. Examples of aggregation
+ queries are Count, Sum, Maximum, Minimum, Average, Median, and Top-K
+ [92, 386, 387]. Figure 5 summarizes the tree and query
+ characteristics that affect dependability.
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 50]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ Tree type: Doesn't use DHT [92], use internal DHT trees [95], use
+ independent trees on top of DHTs
+ Tree repair: Periodic [93], exceptional [32]
+ Tree count: One per key, one per overlay [56]
+ Tree flexibility: Static [92], dynamic
+
+ Query interface: install, update, probe [98]
+ Query distribution: multicast [98], gossip [92]
+ Query applications: leader election, voting, resource location,
+ object placement and error recovery [98, 388]
+ Query semantics
+ Consistency: Best-effort, eventual [92], snapshot / interval /
+ single-site validity [99]
+ Timeliness [388]
+ Lifetime: Continuous [97, 99], single-shot
+ No. attributes: Single, multiple
+ Query types: Count, sum, maximum, minimum, average, median, top k
+ [92, 386, 387]
+
+ Figure 5: Aggregation Trees and Queries in P2P Networks
+
+ Key: Astrolabe [92]; Cone [93]; Distributed Approximative System
+ Information Service (DASIS) [95]; Scalable Distributed Information
+
+ Management System (SDIMS) [98]; Self-Organized Metadata Overlay
+ (SOMO) [56]; Wildfire [99]; Willow [32]; Newscast [97]
+
+ The fundamental design choices for aggregation trees relate to how
+ the overlay uses DHTs, how it repairs itself when there are failures,
+ how many aggregation trees there are, and whether the tree is static
+ or dynamic (Figure 5). Astrolabe is one of the most influential P2P
+ designs included in Figure 5, yet it makes no use of DHTs [92].
+ Other designs make use of the internal trees of Plaxton-like DHTs.
+ Others build independent tree structures on top of DHTs. Most of the
+ designs repair the aggregation tree with periodic mechanisms similar
+ to those used in the DHTs themselves. Willow is an exception [32].
+ It uses a Tree Maintenance Protocol to "zip" disjoint aggregation
+ trees together when there are major failures. Yalagandula and Dahlin
+ found reconfigurations at the aggregation layer to be costly,
+ suggesting more research on techniques to reduce the cost and
+ frequency of such reconfigurations [98]. Many of the designs use
+ multiple aggregation trees, each rooted at the DHT node responsible
+ for the aggregation attribute. On the other hand, the Self-Organized
+ Metadata Overlay [56] uses a single tree and is vulnerable to a
+ single point of failure at its root.
+
+
+
+
+
+
+Risson & Moors Informational [Page 51]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ At the time of writing, researchers have just begun exploring the
+ performance of queries in the presence of churn. Most designs are
+ for best-effort queries. Bawa, et al. devised a better consistency
+ model, called Single-Site Validity [99] to qualify the accuracy of
+ results when there is churn. Its price was a five-fold increase in
+ the message load, when compared to an efficient but best-effort
+ Spanning Tree. Gossip mechanisms are resilient to churn, but they
+ delay aggregation results and incur high message cost for aggregation
+ attributes with small read-to-write ratios.
+
+6. Security Considerations
+
+ An initial list of references to research on P2P security is given in
+ Figure 1, Section 1. This document addresses P2P search. P2P
+ storage, security, and applications are recommended for further
+ investigation in Section 8.
+
+7. Conclusions
+
+ Research on peer-to-peer networks can be divided into four categories
+ -- search, storage, security and applications. This critical survey
+ has focused on search methods. While P2P networks have been
+ classified by the existence of an index (structured or unstructured)
+ or the location of the index (local, centralized, and distributed),
+ this survey has shown that most have evolved to have some structure,
+ whether it is indexes at superpeers or indexes defined by DHT
+ algorithms. As for location, the distributed index is most common.
+ The survey has characterized indexes as semantic and semantic-free.
+ It has also critiqued P2P work on major query types. While much of
+ it addresses work from 2000 or later, we have traced important
+ building blocks from the 1990s.
+
+ The initial motivation in this survey was to answer the question,
+ "How robust are P2P search networks?" The question is key to the
+ deployment of P2P technology. Balakrishnan, Kaashoek, et al. argued
+ that the P2P architecture is appealing: the startup and growth
+ barriers are low; they can aggregate enormous storage and processing
+ resources; "the decentralized and distributed nature of P2P systems
+ gives them the potential to be robust to faults or intentional
+ attacks" [18]. If P2P is to be a disruptive technology in
+ applications other than casual file sharing, then robustness needs to
+ be practically verified [20].
+
+ The best comparative research on P2P dependability has been done in
+ the context of Distributed Hash Tables (DHTs) [291]. The entire body
+ of DHT research can be distilled to four main observations about
+ dependability (Section 3.2). Firstly, static dependability
+ comparisons show that no O(log n) DHT geometry is significantly more
+
+
+
+Risson & Moors Informational [Page 52]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ dependable than the other O(log n) geometries. Secondly, dynamic
+ dependability comparisons show that DHT dependability is sensitive to
+ the underlying topology maintenance algorithms (Figure 2). Thirdly,
+ most DHTs use O(log n) geometries to suit ephemeral nodes, whereas
+ the O(1) hop DHTs suit stable nodes - they deserve more research
+ attention. Fourthly, although not yet a mature science, the study of
+ DHT dependability is helped by recent simulation tools that support
+ multiple DHTs [299].
+
+ We make the following four suggestions for future P2P research:
+
+ 1) Complete the companion P2P surveys for storage, security, and
+ applications. A rough outline has been suggested in Figure 1,
+ along with references. The need for such surveys was highlighted
+ within the peer-to-peer research group of the Internet Research
+ Task Force (IRTF) [17].
+
+ 2) P2P indexes are maturing. P2P queries are embryonic. Work on
+ more expressive queries over P2P indexes started to gain momentum
+ in 2003, but remains fraught with efficiency and load issues.
+
+ 3) Isolate the low-level mechanisms affecting robustness. There is
+ limited value in comparing robustness of DHT geometries (like
+ rings versus de Bruijn graphs), when robustness is highly
+ sensitive to underlying topology maintenance algorithms (Figure
+ 2).
+
+ 4) Build consensus on robustness metrics and their acceptable ranges.
+ This paper has teased out numerous measures that impinge on
+ robustness, for example, the median query path length for a
+ failure of x% of nodes, bisection width, path overlap, the number
+ of alternatives available for the next hop, lookup latency,
+ average live bandwidth (bytes/node/sec), successful routing rates,
+ the number of timeouts (caused by a finger pointing to a departed
+ node), lookup failure rates (caused by nodes that temporarily
+ point to the wrong successor during churn), and clustering
+ measures (edge expansion and node expansion). Application-level
+ robustness metrics need to drive a consistent assessment of the
+ underlying search mechanics.
+
+8. Acknowledgments
+
+ This document was adapted from a paper in Elsevier's Computer
+ Networks:
+
+ J. Risson & T. Moors, Survey of Research towards Robust Peer-to-
+ Peer Networks: Search Methods, Computer Networks 51(7)2007.
+
+
+
+
+Risson & Moors Informational [Page 53]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ We thank Bill Yeager, Ali Ghodsi, and several anonymous reviewers for
+ thorough comments that significantly improved the quality of earlier
+ versions of this document.
+
+9. References
+
+9.1. Informative References
+
+ [1] M. Roussopoulos, M. Baker, D. Rosenthal, T. Guili, P. Maniatis,
+ and J. Mogul, 2 P2P of Not 2 P2P?, The 3rd Int'l Workshop on
+ Peer-to-Peer Systems, February 26-27 2004.
+
+ [2] A. Rowstron and P. Druschel, Pastry: Scalable, distributed
+ object location and routing for large-scale peer-to-peer
+ systems, IFIP/ACM Middleware 2001, Nov 2001.
+
+ [3] B. Yeager and B. Bhattacharjee, Peer-to-Peer Research Group
+ Charter, http://www.irtf.org/charters/p2prg.html (2003)
+
+ [4] T. Klingberg and R. Manfredi, Gnutella 0.6, (2002)
+
+ [5] I. Clarke, A Distributed Decentralised Information Storage and
+ Retrieval System, Undergraduate Thesis, 1999.
+
+ [6] B. Zhao, J. Kubiatowicz, and A. Joseph, Tapestry: an
+ infrastructure for fault-tolerant wide-area location and
+ routing, Report No. UCB/CSD-01-1141 2001.
+
+ [7] I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek,
+ F. Dabek, and H. Balakrishnan, Chord: A scalable peer-to-peer
+ lookup service for internet applications, Proc. ACM SIGCOMM
+ 2001 2001, pp. 149-160.
+
+ [8] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker,
+ A scalable content-addressable network, Proc. of the conf. on
+ Applications, technologies, architectures and protocols for
+ computer communications, August 27-31 2001, pp. 161-172.
+
+ [9] C. Tang, Z. Xu, and M. Mahalingam, pSearch: information
+ retrieval in structured overlays, First Workshop on Hot Topics
+ in Networks. Also Computer Communication Review, Volume 33,
+ Number 1, January 2003, Oct 28-29 2002.
+
+ [10] W. Nejdl, S. Decker, and W. Siberski, Edutella Project, RDF-
+ based Metadata Infrastructure for P2P Applications,
+ http://edutella.jxta.org/ (2003)
+
+
+
+
+
+Risson & Moors Informational [Page 54]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [11] K. Aberer and M. Hauswirth, Peer-to-peer information systems:
+ concepts and models, state-of-the-art, and future systems, ACM
+ SIGSOFT Software Engineering Notes, Proc. 8th European software
+ engineering conference held jointly with 9th ACM SIGSOFT
+ international symposium on foundations of software engineering
+ 26 (5) (2001)
+
+ [12] L. Zhou and R. van Renesse, P6P: a peer-to-peer approach to
+ internet infrastructure, The 3rd Int'l Workshop on Peer-to-Peer
+ Systems, February 26-27 2004.
+
+ [13] Citeseer, Citeseer Scientific Literature Digital Library,
+ http://citeseer.ist.psu.edu/ (2004)
+
+ [14] D. Milojicic, V. Kalogeraki, R. Lukose, K. Nagaraja, J. Pruyne,
+ B. Richard, S. Rollins, and Z. Xu, Peer-to-Peer Computing, HP
+ Technical Report, HPL-2002-57 2002.
+
+ [15] K. Aberer and M. Hauswirth, An overview on peer-to-peer
+ information systems, Workshop on Distributed Data and
+ Structures WDAS-2002 2002.
+
+ [16] F. DePaoli and L. Mariani, Dependability in Peer-to-Peer
+ Systems, IEEE Internet Computing 8 (4) (2004) 54-61.
+
+ [17] B. Yeager, Proposed research tracks, Email to the Internet
+ Research Task Force IRTF P2P Research Group, Nov 10 2003.
+
+ [18] H. Balakrishnan, M. F. Kaashoek, D. Karger, R. Morris, and I.
+ Stoica, Looking up data in P2P systems, Communications of the
+ ACM 46 (2) (2003) 43-48.
+
+ [19] D. Kossmann, The state of the art in distributed query
+ processing, ACM Computing Surveys 32 (4) (2000) 422-469.
+
+ [20] B. Gedik and L. Liu, Reliable peer-to-peer information
+ monitoring through replication, Proc. 22nd Int'l Symp. on
+ Reliable Distributed Systems, 6-8 Oct 2003, pp. 56-65.
+
+ [21] S.-M. Shi, Y. Guangwen, D. Wang, J. Yu, S. Qu, and M. Chen,
+ Making peer-to-peer keyword searching feasible using multi-
+ level partitioning, The 3rd Int'l Workshop on Peer-to-Peer
+ Systems, February 26-27 2004.
+
+ [22] R. Huebsch, J. M. Hellerstein, N. Lanham, B. T. Loo, S.
+ Shenker, and I. Stoica, Querying the Internet with PIER, Proc.
+ 29th Int'l Conf. on Very Large Databases VLDB'03, September
+ 2003.
+
+
+
+Risson & Moors Informational [Page 55]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [23] J. M. Hellerstein, Toward network data independence, ACM SIGMOD
+ Record 32 (3) (2003) 34-40.
+
+ [24] K. Gummadi, R. Gummadi, S. Gribble, S. Ratnasamy, S. Shenker,
+ and I. Stoica, The impact of DHT routing geometry on resilience
+ and proximity, Proc. 2003 conference on Applications,
+ Technologies, Architectures and Protocols for Computer
+ Communications 2003, pp. 381-394.
+
+ [25] N. Daswani, H. Garcia-Molina, and B. Yang, Open Problems in
+ Data- sharing Peer-to-peer Systems, The 9th Int'l Conf. on
+ Database Theory (ICDT 2003), Siena, Italy, 8-10 January (2003)
+
+ [26] B. Cooper and H. Garcia-Molina, Studying search networks with
+ SIL, Second Int'l Workshop on Peer-to-Peer Systems IPTPS 03,
+ 20- 21 February 2003.
+
+ [27] M. Bawa, Q. Sun, P. Vinograd, B. Yang, B. Cooper, A. Crespo, N.
+ Daswani, P. Ganesan, H. Garcia-Molina, S. Kamvar, S. Marti, and
+ M. Schlossed, Peer-to-peer research at Stanford, ACM SIGMOD
+ Record 32 (3) (2003) 23-28.
+
+ [28] B. Yang and H. Garcia-Molina, Improving search in peer-to-peer
+ networks, Proc. 22nd IEEE Int'l Conf. on Distributed Computing
+ Systems, July 2002.
+
+ [29] B. Yang and H. Garcia-Molina, Efficient search in peer-to-peer
+ networks, Proc. 22nd Int'l Conf. on Distributed Computing
+ Systems, July 2-5 2002.
+
+ [30] C. Plaxton, R. Rajaraman, and A. Richa, Accessing nearby copies
+ of replicated objects in a distributed environment, ACM Symp.
+ on Parallel Algorithms and Architectures (1997)
+
+ [31] B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, and J.
+ Kubiatowicz, Tapestry: A Resilient Global-Scale overlay for
+ Service Deployment, IEEE Journal on Selected Areas in
+ Communications 22 (1) (2004) 41-53.
+
+ [32] R. van Renesse and A. Bozdog, Willow: DHT, aggregation and
+ publish/subscribe in one protocol, The 3rd Int'l Workshop on
+ Peer-to-Peer Systems, February 26-27 2004.
+
+ [33] P. Ganesan, G. Krishna, and H. Garcia-Molina, Canon in G Major:
+ Designing DHTs with Hierarchical Structure, Proc. Int'l Conf.
+ on Distributed Computing Systems ICDCS 2004 2004.
+
+
+
+
+
+Risson & Moors Informational [Page 56]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [34] I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek,
+ F. Dabek, and H. Balakrishnan, Chord: a scalable peer-to-peer
+ lookup protocol for Internet applications, IEEE/ACM Trans. on
+ Networking 11 (1) (2003) 17-32.
+
+ [35] S. Rhea, T. Roscoe, and J. Kubiatowicz, Structured Peer-to-Peer
+ Overlays Need Application-Driven Benchmarks, Proc. 2nd Int'l
+ Workshop on Peer-to-Peer Systems IPTPS'03, February 20-21 2003.
+
+ [36] D. Loguinov, A. Kumar, and S. Ganesh, Graph-theoretic analysis
+ of structured peer-to-peer systems: routing distances and
+ fault resilience, Proc. 2003 conference on Applications,
+ Technologies, Architectures and Protocols for Computer
+ Communications, August 25-29 2003, pp. 395-406.
+
+ [37] F. Kaashoek and D. Karger, Koorde: A simple degree-optimal
+ hash table, Second Int'l Workshop on Peer-to-Peer Systems
+ IPTPS'03, 20-21 February 2003.
+
+ [38] N. Harvey, M. B. Jones, S. Saroiu, M. Theimer, and A. Wolman,
+ SkipNet: A Scalable Overlay Network with Practical Locality
+ Properties, Proc. Fourth USENIX Symp. on Internet Technologies
+ and Systems USITS'03, March 2003.
+
+ [39] I. Gupta, K. Birman, P. Linga, A. Demers, and R. Van Renesse,
+ Kelips: Building an efficient and stable P2P DHT through
+ increased memory and background overhead, Second Int'l Workshop
+ on Peer-to-Peer Systems IPTPS 03, Feb 20-21 2003.
+
+ [40] J. Cates, Robust and Efficient Data Management for a
+ Distributed Hash Table, Master's Thesis, May 2003.
+
+ [41] J. Aspnes and G. Shah, Skip graphs, Proc. 14th annual ACM-SIAM
+ symposium on discrete algorithms (2003) 384-393.
+
+ [42] K. Aberer, P. Cudre-Mauroux, A. Datta, Z. Despotovic, M.
+ Hauswirth, M. Punceva, and R. Schmidt, P-Grid: a self-
+ organizing structured P2P system, ACM SIGMOD Record 32 (3)
+ (2003) 29-33.
+
+ [43] B. Zhao, Y. Duan, L. Huang, A. Joseph, and J. Kubiatowicz,
+ Brocade: landmark routing on overlay networks, First Int'l
+ Workshop on Peer-to-Peer Systems IPTPS'02, March 2002.
+
+ [44] S. Ratnasamy, S. Shenker, and I. Stoica, Routing algorithms for
+ DHTs: some open questions, Proc. First Int'l Workshop on Peer
+ to Peer Systems, IPTPS 2002, March 2002.
+
+
+
+
+Risson & Moors Informational [Page 57]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [45] P. Maymounkov and D. Mazieres, Kademlia: A peer-to-peer
+ information system based on the XOR metric, Proc. First Int'l
+ Workshop on Peer to Peer Systems, IPTPS 2002, March 7-8 2002.
+
+ [46] D. Malkhi, M. Naor, and D. Ratajczak, Viceroy: a scalable and
+ dynamic emulation of the butterfly, Proc. 21st annual symposium
+ on principles of distributed computing PODC, July 21-24 2002,
+ pp. 183-192.
+
+ [47] X. Li and C. Plaxton, On name resolution in peer to peer
+ networks, Proc. ACM SIGACT Annual Workshop on Principles of
+ Mobile Computing POMC'02 2002, pp. 82-89.
+
+ [48] N. Harvey, J. Dunagan, M. B. Jones, S. Saroiu, M. Theimer, and
+ A. Wolman, SkipNet: A Scalable overlay Network with Practical
+ Locality Properties, Microsoft Research Technical Report MSR-
+ TR- 2002-92 (2002)
+
+ [49] D. Karger, E. Lehman, T. Leighton, R. Panigraphy, M. Levin, and
+ D. Lewin, Consistent hashing and random trees: distributed
+ caching protocols for relieving hot spots on the World Wide
+ Web, ACM Symp. on Theory of Computing (1997)
+
+ [50] W. Litwin, M. Neimat, and D. Schneider, LH* - a scalable,
+ distributed data structure, ACM Trans. on Database Systems
+ (TODS) 21 (4) (1996) 480-525.
+
+ [51] R. Devine, Design and Implementation of DDH: A Distributed
+ Dynamic Hashing Algorithm, Proc. 4th Int'l Conf. on
+ Foundations of Data Organizations and Algorithms 1993.
+
+ [52] W. Litwin, M.-A. Niemat, and D. Schneider, LH* - Linear Hashing
+ for Distributed Files, Proc. ACM Int'l Conf. on Mngt. of Data
+ SIGMOD, May 1993, pp. 327-336.
+
+ [53] C. Tempich, S. Staab, and A. Wranik, Remindin': semantic query
+ routing in peer-to-peer networks, Proc. 13th conference on
+ World Wide Web, New York, NY, USA, May 17-20 (2004) 640-649.
+
+ [54] B. T. Loo, R. Huebsch, I. Stoica, and J. M. Hellerstein, The
+ case for a hybrid P2P search infrastructure, The 3rd Int'l
+ Workshop on Peer-to-Peer Systems, February 26-27 2004.
+
+ [55] M. Cai and M. Frank, RDFPeers: a scalable distributed RDF
+ repository based on a structured peer-to-peer network, Proc.
+ 13th conference on World Wide Web, May 17-20 2004, pp. 650-657.
+
+
+
+
+
+Risson & Moors Informational [Page 58]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [56] Z. Zhang, S.-M. Shi, and J. Zhu, SOMO: Self-organized metadata
+ overlay for resource management in P2P DHTs, Second Int'l
+ Workshop on Peer-to-Peer Systems IPTPS'03, Feb 20-21 2003.
+
+ [57] B. Yang and H. Garcia-Molina, Designing a super-peer network,
+ Proc. 19th Int'l Conf. on Data Engineering ICDE, March 2003.
+
+ [58] I. Tatarinov, P. Mork, Z. Ives, J. Madhavan, A. Halevy, D.
+ Suciu, N. Dalvi, X. Dong, Y. Kadiyska, and G. Miklau, The
+ Piazza peer data management project, ACM SIGMOD Record 32 (3)
+ (2003) 47-52.
+
+ [59] W. Nejdl, W. Siberski, and M. Sintek, Design Issues and
+ Challenges for RDF- and schema-based peer-to-peer systems, ACM
+ SIGMOD Record 32 (3) (2003) 41-46.
+
+ [60] S. Joseph and T. Hoshiai, Decentralized Meta-Data Strategies:
+ Effective Peer-to-Peer Search, IEICE Trans. Commun. E86-B (6
+ June) (2003) 1740-1753.
+
+ [61] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S.
+ Shenker, Making gnutella-like P2P systems scalable, Proc. 2003
+ conference on Applications, Technologies, Architectures and
+ Protocols for Computer Communications, August 25-29 2003, pp.
+ 407-418.
+
+ [62] M. Bawa, G. S. Manku, and P. Raghavan, SETS: search enhanced by
+ topic segmentation, Proc. 26th annual international ACM SIGIR
+ conference on Research and Development in Information Retrieval
+ 2003, pp. 306-313.
+
+ [63] H. Sunaga, M. Takemoto, and T. Iwata, Advanced peer to peer
+ network platform for various services - SIONet Semantic
+ Information Oriented Network, Proc. Second Int'l Conf. on Peer
+ to Peer Computing, Sept 5-7 2002, pp. 169-170.
+
+ [64] M. Schlosser, M. Sintek, S. Decker, and W. Nejdl, HyperCuP -
+ Hypercubes, Ontologies and P2P Networks, Springer Lecture Notes
+ on Computer Science, Agents and Peer-to-Peer Systems Vol. 2530
+ (2002)
+
+ [65] M. Ripeanu, A. Iamnitchi, and P. Foster, Mapping the Gnutella
+ network, IEEE Internet Computing 6 (1) (2002) 50-57.
+
+ [66] Q. Lv, S. Ratnasamy, and S. Shenker, Can Heterogeneity Make
+ Gnutella Scalable?, Proc. 1st Int'l Workshop on Peer-to-Peer
+ Systems IPTPS2002, March 7-8 2002.
+
+
+
+
+Risson & Moors Informational [Page 59]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [67] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, Search and
+ replication in unstructured peer to peer networks, Proc. 16th
+ international conference on supercomputing, June 22-26 2002,
+ pp. 84-95.
+
+ [68] V. Kalogaraki, D. Gunopulos, and D. Zeinalipour-Yasti, XML
+ schemas: integration and translation: A local search
+ mechanism for peer to peer networks, Proc. 11th ACM
+ international conference on Information and Knowledge
+ management 2002, pp. 300- 307.
+
+ [69] O. Babaoglu, H. Meling, and Montresor, Anthill: a framework
+ for the development of agent-based peer-to-peer systems, Proc.
+ IEEE Int'l Conf. on Distributed Computer systems 2002, pp. 15-
+ 22.
+
+ [70] M. Jovanovic, Modeling large-scale peer-to-peer networks and a
+ case study of Gnutella, Master's Thesis 2001.
+
+ [71] I. Clarke, O. Sandberg, B. Wiley, and T. Hong, Freenet: A
+ Distributed Anonymous Information Storage and Retrieval System.
+ Springer, New York, USA, 2001.
+
+ [72] J. Harren, J. Hellerstein, R. Huebsch, B. Loo, S. Shenker, and
+ I. Stoica, Complex queries in DHT-based peer-to-peer networks,
+ Proc. First Int'l Workshop on Peer to Peer Systems IPTPS 2002,
+ March 2002.
+
+ [73] B. Gedik and L. Liu, PeerCQ: A Decentralized and Self-
+ Configuring Peer-to-Peer Information Monitoring System, Proc.
+ 23rd Int'l Conf. on Distributed Computing Systems ICDCS2003,
+ May 19-22 2003.
+
+ [74] B. T. Loo, R. Huebsch, J. M. Hellerstein, T. Roscoe, and I.
+ Stoica, Analyzing P2P Overlays with Recursive Queries,
+ Technical Report, CSD-04-1301, January 14 2004.
+
+ [75] R. Avnur and J. Hellerstein, Eddies: continuously adaptive
+ query processing, Proc. 2000 ACM SIGMOD international
+ conference on Management of Data 2000, pp. 261-272.
+
+ [76] P. Triantafillou and T. Pitoura, Towards a unifying framework
+ for complex query processing over structured peer-to-peer data
+ networks, Proc. First Int'l Workshop on Databases, Information
+ Systems and Peer-to-Peer Computing DBISP2P, Sept 7-8 2003, pp.
+ 169-183.
+
+
+
+
+
+Risson & Moors Informational [Page 60]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [77] A. Gupta, D. Agrawal, and A. E. Abbadi, Approximate range
+ selection queries in peer-to-peer systems, Proc. First Biennial
+ Conf. on Innovative Data Systems Research CIDR 2003 2003.
+
+ [78] S. Ratnasamy, P. Francis, and M. Handley, Range queries in
+ DHTs, Technical Report IRB-TR-03-009, July 2003.
+
+ [79] S. Ramabhadran, S. Ratnasamy, J. Hellerstein, and S. Shenker,
+ Brief announcement: prefix hash tree, Proc. 23rd Annual ACM
+ SIGACT-SIGOPS Symp. on Principles of Distributed Computing,
+ PODC 2004, July 25-28 2004, pp. 368-368.
+
+ [80] A. Andrzejak and Z. Xu, Scalable, efficient range queries for
+ grid information services, Proc. Second IEEE Int'l Conf. on
+ Peer to Peer Computing, September 2002.
+
+ [81] C. Schmidt and M. Parashar, Enabling flexible queries with
+ guarantees in P2P systems, IEEE Internet Computing 8 (3) (2004)
+ 19-26.
+
+ [82] E. Tanin, A. Harwood, and H. Samet, Indexing distributed
+ complex data for complex queries, Proc. National Conf. on
+ Digital Government Research 2004, pp. 81-90.
+
+ [83] P. Ganesan, M. Bawa, and H. Garcia-Molina, Online Balancing of
+ Range-Partitioned Data with Applications to Peer-to-Peer
+ Systems, Proc. 30th Int'l Conf. on Very Large Data Bases VLDB
+ 2004, 29 August - 3 September 2004.
+
+ [84] A. Bharambe, M. Agrawal, and S. Seshan, Mercury: Supporting
+ Scalable Multi-Attribute Range Queries, SIGCOMM'04, Aug 30-Sept
+ 3 2004.
+
+ [85] K. Aberer, Scalable Data Access in P2P Systems Using Unbalanced
+ Search Trees, Workshop on Distributed Data and Structures WDAS-
+ 2002 2002.
+
+ [86] K. Aberer, A. Datta, and M. Hauswirth, The Quest for Balancing
+ Peer Load in Structured Peer-to-Peer Systems, Technical Report
+ IC/2003/32 2003.
+
+ [87] W. Litwin, M.-A. Neimat, and D. Schneider, RP*: a family of
+ order-preserving scalable distributed data structures, Proc.
+ 20th Int'l Conf. on Very Large Data Bases VLDB'94, September
+ 12-15 1994.
+
+
+
+
+
+
+Risson & Moors Informational [Page 61]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [88] M. Tsangou, S. Ndiaye, M. Seck, and W. Litwin, Range queries to
+ scalable distributed data structure RP*, Proc. Fifth Workshop
+ on Distributed Data and Structures, WDAS 2003, June 2003.
+
+ [89] W. Litwin and M.-A. Neimat, k-RP*s: a scalable distributed data
+ structure for high-performance multi-attributed access, Proc.
+ Fourth Int'l Conf. on Parallel and Distributed Information
+ Systems (1996) 120-131.
+
+ [90] T. Hodes, S. Czerwinski, B. Zhao, A. Joseph, and R. Katz, An
+ architecture for secure wide-area service discovery, Wireless
+ Networks 8 (2/3) (2002) 213-230.
+
+ [91] M. Cai, M. Frank, J. Chen, and P. Szekely, MAAN: A Multi-
+ Attribute Addressable Network for Grid Information Services,
+ Proc. Int'l Workshop on Grid Computing, November 2003.
+
+ [92] R. van Renesse, K. P. Birman, and W. Vogels, Astrolabe: A
+ robust and scalable technology for distribute system
+ monitoring, management and data mining, ACM Trans. on Computer
+ Systems 21 (2) (2003) 164-206.
+
+ [93] R. Bhagwan, G. Varghese, and G. Voelker, Cone: Augmenting DHTs
+ to support distributed resource discovery, Technical Report,
+ CS2003- 0755, July 2003.
+
+ [94] K. Albrecht, R. Arnold, and R. Wattenhofer, Join and Leave in
+ Peer-to-Peer Systems: The DASIS Approach, Technical Report 427,
+ Department of Computer Science, November 2003.
+
+ [95] K. Albrecht, R. Arnold, and R. Wattenhofer, Aggregating
+ information in peer-to-peer systems for improved join and
+ leave, Proc. Fourth IEEE Int'l Conf. on Peer-to-Peer Computing,
+ 25-27 August 2004.
+
+ [96] A. Montresor, M. Jelasity, and O. Babaoglu, Robust aggregation
+ protocol for large-scale overlay networks, Technical Report
+ UBLCS-2003-16, December 2003.
+
+ [97] M. Jelasity, W. Kowalczyk, and M. van Steen, An Approach to
+ Aggregation in Large and Fully Distributed Peer-to-Peer Overlay
+ Networks, Proc. 12th Euromicro Conf. on Parallel, Distributted
+ and Network based Processing PDP 2004, February 2004.
+
+ [98] P. Yalagandula and M. Dahlin, A scalable distributed
+ information management system, SIGCOMM'04, Aug 30-Sept 3 2004.
+
+
+
+
+
+Risson & Moors Informational [Page 62]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [99] M. Bawa, A. Gionis, H. Garcia-Molina, and R. Motwani, The price
+ of validity in dynamic networks, Proc. 2004 ACM SIGMOD Int'l
+ Conf. on the management of data 2004, pp. 515-526.
+
+ [100] J. Aspnes, J. Kirsch, and A. Krishnamurthy, Load Balancing and
+ Locality in Range-Queriable Data Structures, Proc. 23rd Annual
+ ACM SIGACT-SIGOPS Symp. on Principles of Distributed Computing
+ PODC 2004, July 25-28 2004.
+
+ [101] G. On, J. Schmitt, and R. Steinmetz, The effectiveness of
+ realistic replication strategies on quality of availability for
+ peer-to-peer systems, Proc. Third Int'l IEEE Conf. on Peer-to-
+ Peer Computing, Sept 1-3 2003, pp. 57-64.
+
+ [102] D. Geels and J. Kubiatowicz, Replica management should be a
+ game, Proc. SIGOPS European Workshop, September 2003.
+
+ [103] E. Cohen and S. Shenker, Replication strategies in unstructured
+ peer to peer networks, Proc. 2002 conference on applications,
+ technologies, architectures and protocols for computer
+ communications 2002, pp. 177-190.
+
+ [104] E. Cohen and S. Shenker, P2P and multicast: replication
+ strategies in unstructured peer to peer networks, Proc. 2002
+ conference on applications, technologies, architectures and
+ protocols for computer communications 2002, pp. 177-190.
+
+ [105] H. Weatherspoon and J. Kubiatowicz, Erasure coding vs
+ replication: a quantative comparison, Proc. First Int'l
+ Workshop on Peer to Peer Systems IPTPS'02, March 2002.
+
+ [106] D. Lomet, Replicated indexes for distributed data, Proc. Fourth
+ Int'l Conf. on Parallel and Distributed Information Systems,
+ December 18-20 1996, pp. 108-119.
+
+ [107] V. Gopalakrishnan, B. Silaghi, B. Bhattacharjee, and P.
+ Keleher, Adaptive Replication in Peer-to-Peer Systems, Proc.
+ 24th Int'l Conf. on Distributed Computing Systems ICDCS 2004,
+ March 23-26 2004.
+
+ [108] S.-D. Lin, Q. Lian, M. Chen, and Z. Zhang, A practical
+ distributed mutual exclusion protocol in dynamic peer-to-peer
+ systems, The 3rd Int'l Workshop on Peer-to-Peer Systems,
+ February 26-27 2004.
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 63]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [109] A. Adya, R. Wattenhofer, W. Bolosky, M. Castro, G. Cermak, R.
+ Chaiken, J. Douceur, J. Howell, J. Lorch, and M. Thiemer,
+ Farsite: federated, available and reliable storage for an
+ incompletely trusted environment, ACM SIGOPS Operating Systems
+ Review, Special issue on Decentralized storage systems (2002)
+ 1- 14.
+
+ [110] A. Rowstron and P. Druschel, Storage management and caching in
+ PAST, a large-scale, persistent peer-to-peer storage utility,
+ Proceedings ACM SOSP'01, October 2001, pp. 188-201.
+
+ [111] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H.
+ Weatherspoon, and J. Kubiatowicz, Maintenance-Free Global Data
+ Storage, IEEE Internet Computing 5 (5) (2001) 40-49.
+
+ [112] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D.
+ Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C.
+ Wells, and B. Zhao, Oceanstore: An Architecture for global-
+ scale persistent storage, Proc. Ninth Int'l Conf. on
+ Architecture Support for Programming Languages and Operating
+ Systems ASPLOS 2000, November 2000, pp. 190-201.
+
+ [113] K. Birman, The Surprising Power of Epidemic Communication,
+ Springer-Verlag Heidelberg Lecture Notes in Computer Science
+ Volume 2584/2003 (2003) 97-102.
+
+ [114] P. Costa, M. Migliavacca, G. P. Picco, and G. Cugola,
+ Introducing reliability in content-based publish-subscribe
+ through epidemic algorithms, Proc. 2nd international workshop
+ on Distributed event-based systems 2003, pp. 1-8.
+
+ [115] P. Costa, M. Migliavacca, G. P. Picco, and G. Cugola, Epidemic
+ Algorithms for Reliable Content-Based Publish-Subscribe: An
+ Evaluation, The 24th Int'l Conf. on Distributed Computing
+ Systems (ICDCS-2004), Mar 23-26, Tokyo University of
+ Technology, Hachioji, Tokyo, Japan (2004)
+
+ [116] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S.
+ Shenker, H. Sturgis, D. Swinehart, and D. Terry, Epidemic
+ algorithms for replicated data management, Proc. Sixth ACM
+ Symp. on Principles of Distributed Computing 1987, pp. 1-12.
+
+ [117] P. Eugster, R. Guerraoiu, A. Kermarrec, and L. Massoulie,
+ Epidemic information dissemination in distributed systems, IEEE
+ Computer 37 (5) (2004) 60-67.
+
+
+
+
+
+
+Risson & Moors Informational [Page 64]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [118] W. Vogels, R. v. Renesse, and K. Birman, The power of
+ epidemics: robust communication for large-scale distributed
+ systems, ACM SIGCOMM Computer Communication Review 33 (1)
+ (2003) 131-135.
+
+ [119] S. Voulgaris and M. van Steen, An epidemic protocol for
+ managing routing tables in very large peer to peer networks,
+ Proc. 14th IFIP/IEEE Workshop on Distributed Systems:
+ Operations and Management, October 2003.
+
+ [120] I. Gupta, On the design of distributed protocols from
+ differential equations, Proc. 23rd Annual ACM SIGACT-SIGOPS
+ Symp. on Principles of Distributed Computing PODC 2004, July
+ 25-28 2004, pp. 216-225.
+
+ [121] I. Gupta, K. Birman, and R. van Renesse, Fighting fire with
+ fire: using randomized gossip to combat stochastic scalability
+ limits, Cornell University Dept of Computer Science Technical
+ Report, March 2001.
+
+ [122] K. Birman and I. Gupta, Building Scalable Solutions to
+ Distributed Computing Problems using Probabilistic Components,
+ Submitted to the Int'l Conf. on Dependable Systems and Networks
+ DSN-2004, Dependable Computing and Computing Symp. DCCS, June
+ 28- July 1 2004.
+
+ [123] A. Ganesh, A.-M. Kermarrec, and L. Massoulie, Peer-to-peer
+ membership management for gossip-based protocols, IEEE Trans.
+ on Computers 52 (2) (2003) 139-149.
+
+ [124] N. Bailey, Epidemic Theory of Infectious Diseases and its
+ Applications, Second Edition ed. Hafner Press, 1975.
+
+ [125] P. Eugster, R. Guerraoiu, S. Handurukande, P. Kouznetsov, and
+ A.- M. Kermarrec, Lightweight probabilistic broadcast, ACM
+ Trans. on Computer Systems 21 (4) (2003) 341-374.
+
+ [126] H. Weatherspoon and J. Kubiatowicz, Efficient heartbeats and
+ repair of softstate in decentralized object location and
+ routing systems, Proc. SIGOPS European Workshop, September
+ 2002.
+
+ [127] G. Koloniari and E. Pitoura, Content-based Routing of Path
+ Queries in Peer-to-Peer Systems, Proc. 9th Int'l Conf. on
+ Extending DataBase Technology EDBT, March 14-18 2004.
+
+
+
+
+
+
+Risson & Moors Informational [Page 65]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [128] A. Mohan and V. Kalogaraki, Speculative routing and update
+ propagation: a kundali centric approach, IEEE Int'l Conf. on
+ Communications ICC'03, May 2002.
+
+ [129] G. Koloniari, Y. Petrakis, and E. Pitoura, Content-Based
+ Overlay Networks for XML Peers Based on Multi-Level Bloom
+ Filters, Proc. First Int'l Workshop on Databases, Information
+ Systems and Peer-to-Peer Computing DBISP2P, Sept 7-8 2003, pp.
+ 232-247.
+
+ [130] G. Koloniari and E. Pitoura, Bloom-Based Filters for
+ Hierarchical Data, Proc. 5th Workshop on Distributed Data and
+ Structures (WDAS) (2003)
+
+ [131] B. Bloom, Space/time trade-offs in hash coding with allowable
+ errors, Communications of the ACM 13 (7) (1970) 422-426.
+
+ [132] M. Naor and U. Wieder, A Simple Fault Tolerant Distributed Hash
+ Table, Second Int'l Workshop on Peer-to-Peer Systems (IPTPS
+ 03), Berkeley, CA, USA, 20-21 February (2003)
+
+ [133] P. Maymounkov and D. Mazieres, Rateless codes and big
+ downloads, Second Int'l Workshop on Peer-to-Peer Systems,
+ IPTPS'03, February 20-21 2003.
+
+ [134] M. Krohn, M. Freedman, and D. Mazieres, On-the-fly verification
+ of rateless erasure codes for efficient content distribution,
+ Proc. IEEE Symp. on Security and Privacy, May 2004.
+
+ [135] J. Byers, J. Considine, M. Mitzenmacher, and S. Rost, Informed
+ content delivery across adaptive overlay networks, Proc. 2002
+ conference on applications, technologies, architectures and
+ protocols for computer communications 2002, pp. 47-60.
+
+ [136] J. Plank, S. Atchley, Y. Ding, and M. Beck, Algorithms for High
+ Performance, Wide-Area Distributed File Downloads, Parallel
+ Processing Letters 13 (2) (2003) 207-223.
+
+ [137] M. Castro, P. Rodrigues, and B. Liskov, BASE: Using
+ abstraction to improve fault tolerance, ACM Trans. on Computer
+ Systems 21 (3) (2003) 236-269.
+
+ [138] R. Rodrigues, B. Liskov, and L. Shrira, The design of a robust
+ peer-to-peer system, 10th ACM SIGOPS European Workshop, Sep
+ 2002.
+
+
+
+
+
+
+Risson & Moors Informational [Page 66]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [139] H. Weatherspoon, T. Moscovitz, and J. Kubiatowicz,
+ Introspective failure analysis: avoiding correlated failures in
+ peer-to-peer systems, Proc. Int'l Workshop on Reliable Peer-
+ to-Peer Distributed Systems, Oct 2002.
+
+ [140] F. Dabek, R. Cox, F. Kaashoek, and R. Morris, Vivaldi: A
+ Decentralized Network Coordinate System, SIGCOMM'04, Aug 30-
+ Sept 3 2004.
+
+ [141] E.-K. Lua, J. Crowcroft, and M. Pias, Highways: proximity
+ clustering for massively scaleable peer-to-peer network
+ routing, Proc. Fourth IEEE Int'l Conf. on Peer-to-Peer
+ Computing, August 25-27 2004.
+
+ [142] F. Fessant, S. Handurukande, A.-M. Kermarrec, and L. Massoulie,
+ Clustering in Peer-to-Peer File Sharing Workloads, The 3rd
+ Int'l Workshop on Peer-to-Peer Systems, February 26-27 2004.
+
+ [143] T. S. E. Ng and H. Zhang, Predicting internet network distance
+ with coordinates-based approaches, IEEE Infocom 2002, The 21st
+ Annual Joint Conf. of the IEEE Computer and Communication
+ Societies, June 23-27 2002.
+
+ [144] K. Hildrum, R. Krauthgamer, and J. Kubiatowicz, Object Location
+ in Realistic Networks, Proc. Sixteenth ACM Symp. on Parallel
+ Algorithms and Architectures (SPAA 2004), June 2004, pp. 25-35.
+
+ [145] P. Keleher, S. Bhattacharjee, and B. Silaghi, Are Virtualized
+ Overlay Networks Too Much of a Good Thing?, First Int'l
+ Workshop on Peer-to-Peer Systems IPTPS, March 2002.
+
+ [146] A. Mislove and P. Druschel, Providing administrative control
+ and autonomy in structured peer-to-peer overlays, The 3rd Int'l
+ Workshop on Peer-to-Peer Systems, June 9-12 2004.
+
+ [147] D. Karger and M. Ruhl, Diminished Chord: A Protocol for
+ Heterogeneous SubGroup Formation in Peer-to-Peer Networks, The
+ 3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27
+ 2004.
+
+ [148] B. Awerbuch and C. Scheideler, Consistent, order-preserving
+ data management in distributed storage systems, Proc. Sixteenth
+ ACM Symp. on Parallel Algorithms and Architectures SPAA 2004,
+ June 27-30 2004, pp. 44-53.
+
+ [149] M. Freedman and D. Mazieres, Sloppy Hashing and Self-Organizing
+ Clusters, Proc. 2nd Int'l Workshop on Peer-to-Peer Systems
+ IPTPS
+
+
+
+Risson & Moors Informational [Page 67]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [150] F. Dabek, J. Li, E. Sit, J. Robertson, F. Kaashoek, and R.
+ Morris, Designing a DHT for low latency and high throughput,
+ Proc. First Symp. on Networked Systems Design and
+ Implementation (NSDI'04), San Francisco, California, March
+ 29-31 (2004) 85-98.
+
+ [151] M. Ruhl, Efficient algorithms for new computational models,
+ Doctoral Dissertation, September 2003.
+
+ [152] K. Sollins, Designing for scale and differentiation, Proc. ACM
+ SIGCOMM workshop on Future Directions in network architecture,
+ August 25-27 2003.
+
+ [153] L. Massoulie, A. Kermarrec, and A. Ganesh, Network awareness
+ and failure resilience in self-organizing overlay networks,
+ Proc. 22nd Int'l Symp. on Reliable Distributed Systems,
+ SRDS'03, Oct 6-8 2003, pp. 47-55.
+
+ [154] R. Cox, F. Dabek, F. Kaashoek, J. Li, and R. Morris,
+ Practical,distributed network coordinates, ACM SIGCOMM Computer
+ Communication Review 34 (1) (2004) 113-118.
+
+ [155] K. Hildrum, J. Kubiatowicz, S. Rao, and B. Zhao, Distributed
+ object location in a dynamic network, Proc. 14th annual ACM
+ symposium on parallel algorithms and architectures 2002, pp.
+ 41- 52.
+
+ [156] X. Zhang, Q. Zhang, G. Song, and W. Zhu, A Construction of
+ Locality-Aware Overlay Network: mOverlay and its Performance,
+ IEEE Journal on Selected Areas in Communications 22 (1) (2004)
+ 18-28.
+
+ [157] N. Harvey, M. B. Jones, M. Theimer, and A. Wolman, Efficient
+ recovery from organization disconnects in Skipnet, Second Int'l
+ Workshop on Peer-to-Peer Systems IPTPS'03, Feb 20-21 2003.
+
+ [158] M. Pias, J. Crowcroft, S. Wilbur, T. Harris, and S. Bhatti,
+ Lighthouses for scalable distributed location, Second Int'l
+ Workshop on Peer-to-Peer Systems IPTPS'03, February 20-21 2003.
+
+ [159] K. Gummadi, S. Saroui, S. Gribble, and D. King, Estimating
+ latency between arbitrary internet end hosts, Proc. SIGCOMM
+ IMW 2002, November 2002.
+
+ [160] Y. Liu, X. Liu, L. Xiao, L. Ni, and X. Zhang, Location-aware
+ topology matching in P2P systems, Proc. IEEE Infocomm, Mar
+ 7-11 2004.
+
+
+
+
+Risson & Moors Informational [Page 68]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [161] G. S. Manku, Balanced binary trees for ID management and load
+ balance in distributed hash tables, Proc. 23rd Annual ACM
+ SIGACT-SIGOPS Symp. on Principles of Distributed Computing,
+ PODC 2004, July 25-28 2004, pp. 197-205.
+
+ [162] J. Gao and P. Steenkiste, Design and Evaluation of a
+ Distributed Scalable Content Delivery System, IEEE Journal on
+ Selected Areas in Communications 22 (1) (2004) 54-66.
+
+ [163] X. Wang, Y. Zhang, X. Li, and D. Loguinov, On zone-balancing of
+ peer-to-peer networks: analysis of random node join, Proc.
+ joint international conference on measurement and modeling of
+ computer systems, June 2004.
+
+ [164] D. Karger and M. Ruhl, Simple efficient load balancing
+ algorithms for peer-to-peer systems, Proc. Sixteenth ACM Symp.
+ on Parallel Algorithms and Architectures SPAA 2004, June 27-30
+ 2004.
+
+ [165] D. Karger and M. Ruhl, Simple efficient load balancing
+ algorithms for peer-to-peer systems, The 3rd Int'l Workshop on
+ Peer-to-Peer Systems, February 26-27 2004.
+
+ [166] M. Adler, E. Halperin, R. Karp, and V. Vazirani, A stochastic
+ process on the hypercube with applications to peer-to-peer
+ networks, Proc. 35th ACM symposium on Theory of Computing 2003,
+ pp. 575-584.
+
+ [167] C. Baquero and N. Lopes, Towards peer to peer content indexing,
+ ACM SIGOPS Operating Systems Review 37 (4) (2003) 90-96.
+
+ [168] A. Rao, K. Lakshminarayanan, S. Surana, R. Karp, and I. Stoica,
+ Load balancing in structured P2P systems, Proc. 2nd Int'l
+ Workshop on Peer-to-Peer Systems, IPTPS'03, February 20-21
+ 2003.
+
+ [169] J. Byers, J. Considine, and M. Mitzenmacher, Simple Load
+ Balancing for Distributed Hash Tables, Second Int'l Workshop on
+ Peer-to-Peer Systems IPTPS 03, 20-21 February 2003.
+
+ [170] P. Castro, J. Lee, and A. Misra, CLASH: A Protocol for
+ Internet- Scale Utility-Oriented Distributed Computing, Proc.
+ 24th Int'l Conf. on Distributed Computing Systems ICDCS 2004,
+ March 23-26 2004.
+
+ [171] A. Stavrou, D. Rubenstein, and S. Sahu, A Lightwight, Robust
+ P2P System to Handle Flash Crowds, IEEE Journal on Selected
+ Areas in Communications 22 (1) (2004) 6-17.
+
+
+
+Risson & Moors Informational [Page 69]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [172] A. Selcuk, E. Uzun, and M. R. Pariente, A reputation-based
+ trust management system for P2P networks, Fourth Int'l Workshop
+ on Global and Peer-to-Peer Computing, April 20-21 2004.
+
+ [173] T. Papaioannou and G. Stamoulis, Effective use of reputation in
+ peer-to-peer environments, Fourth Int'l Workshop on Global and
+ Peer-to-Peer Computing, April 20-21 2004.
+
+ [174] M. Blaze, J. Feigenbaum, and J. Lacy, Trust and Reputation in
+ P2P networks,
+ http://www.neurogrid.net/twiki/bin/view/Main/ReputationAndTrust
+ (2003)
+
+ [175] E. Damiani, D. C. di Vimercati, S. Paraboschi, P. Samarati, and
+ F. Violante, A reputation-based approach for choosing reliable
+ resources in peer to peer networks, Proc. 9th conference on
+ computer and communications security 2002, pp. 207-216.
+
+ [176] S. Marti, P. Ganesan, and H. Garcia-Molina, DHT routing using
+ social links, The 3rd Int'l Workshop on Peer-to-Peer Systems,
+ February 26-27 2004.
+
+ [177] G. Caronni and M. Waldvogel, Establishing trust in distributed
+ storage providers, Proc. Third Int'l IEEE Conf. on Peer-to-Peer
+ Computing, 1-3 Sept 2003, pp. 128-133.
+
+ [178] B. Sieka, A. Kshemkalyani, and M. Singhal, On the security of
+ polling protocols in peer-to-peer systems, Proc. Fourth IEEE
+ Int'l Conf. on Peer-to-Peer Computing, 25-27 August 2004.
+
+ [179] M. Feldman, K. Lai, I. Stoica, and J. Chuang, Robust Incentive
+ Techniques for Peer-to-Peer Networks, ACM E-Commerce Conf.
+ EC'04, May 2004.
+
+ [180] K. Anagnostakis and M. Greenwald, Exchange-based Incentive
+ Mechanism for Peer-to-Peer File Sharing, Proc. 24th Int'l Conf.
+ on Distributed Computing Systems ICDCS 2004, March 23-26 2004.
+
+ [181] J. Schneidman and D. Parkes, Rationality and self-Interest in
+ peer to peer networks, Second Int'l Workshop on Peer-to-Peer
+ Systems IPTPS'03, February 20-21 2003.
+
+ [182] C. Buragohain, D. Agrawal, and S. Subhash, A game theoretic
+ framework for incentives in P2P systems, Proc. Third Int'l IEEE
+ Conf. on Peer-to-Peer Computing, 1-3 Sept 2003, pp. 48-56.
+
+
+
+
+
+
+Risson & Moors Informational [Page 70]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [183] W. Josephson, E. Sirer, and F. Schneider, Peer-to-Peer
+ Authentication with a Distributed Single Sign-On Service, The
+ 3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27
+ 2004.
+
+ [184] A. Fiat and J. Saia, Censorship resistant peer to peer content
+ addressable networks, Proc. 13th annual ACM-SIAM symposium on
+ discrete algorithms 2002, pp. 94-103.
+
+ [185] N. Daswani and H. Garcia-Molina, Query-flood DoS attacks in
+ gnutella, Proc. 9th ACM Conf. on Computer and Communications
+ Security 2002, pp. 181-192.
+
+ [186] A. Singh and L. Liu, TrustMe: anonymous management of trust
+ relationships in decentralized P2P systems, Proc. Third Int'l
+ IEEE Conf. on Peer-to-Peer Computing, Sept 1-3 2003.
+
+ [187] A. Serjantov, Anonymizing censorship resistant systems, Proc.
+ Second Int'l Conf. on Peer to Peer Computing, March 2002.
+
+ [188] S. Hazel and B. Wiley, Achord: A Variant of the Chord Lookup
+ Service for Use in Censorship Resistant Peer-to-Peer Publishing
+ Systems, Proc. Second Int'l Conf. on Peer to Peer Computing,
+ March 2002.
+
+ [189] M. Freedman and R. Morris, Tarzan: a peer-to-peer anonymizing
+ network layer, Proc. 9th ACM Conf. on Computer and
+ Communications Security (2002) 193-206.
+
+ [190] M. Feldman, C. Papadimitriou, J. Chuang, and I. Stoica, Free-
+ Riding and Whitewashing in Peer-to-Peer Systems, 3rd Annual
+ Workshop on Economics and Information Security WEIS04, May
+ 2004.
+
+ [191] L. Ramaswamy and L. Liu, FreeRiding: a new challenge for peer-
+ to-peer file sharing systems, Proc. 2003 Hawaii Int'l Conf. on
+ System Sciences, P2P Track, HICSS2003, January 6-9 2003.
+
+ [192] T.-W. Ngan, D. Wallach, and P. Druschel, Enforcing fair sharing
+ of peer-to-peer resources, Second Int'l Workshop on Peer-to-
+ Peer Systems, IPTPS'03, 20-21 February 2003.
+
+ [193] L. Cox and B. D. Noble, Samsara: honor among thieves in peer-
+ to-peer storage, Proc. nineteenth ACM symposium on Operating
+ System Principles 2003, pp. 120-132.
+
+
+
+
+
+
+Risson & Moors Informational [Page 71]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [194] M. Surridge and C. Upstill, Grid security: lessons for peer-to-
+ peer systems, Proc. Third Int'l IEEE Conf. on Peer-to-Peer
+ Computing, Sept 1-3 2003, pp. 2-6.
+
+ [195] E. Sit and R. Morris, Security considerations for peer-to-peer
+ distributed hash tables, First Int'l Workshop on Peer-to-Peer
+ Systems, March 2002.
+
+ [196] C. O'Donnel and V. Vaikuntanathan, Information leak in the
+ Chord lookup protocol, Proc. Fourth IEEE Int'l Conf. on Peer-
+ to-Peer Computing, 25-27 August 2004.
+
+ [197] K. Berket, A. Essiari, and A. Muratas, PKI-Based Security for
+ Peer-to-Peer Information Sharing, Proc. Fourth IEEE Int'l Conf.
+ on Peer-to-Peer Computing, 25-27 August 2004.
+
+ [198] B. Karp, S. Ratnasamy, S. Rhea, and S. Shenker, Spurring
+ adoption of DHTs with OpenHash, a public DHT service, The 3rd
+ Int'l Workshop on Peer-to-Peer Systems, February 26-27 2004.
+
+ [199] J. Considine, M. Walfish, and D. G. Andersen, A pragmatic
+ approach to DHT adoption, Technical Report,, December 2003.
+
+ [200] G. Li, Peer to Peer Networks in Action, IEEE Internet Computing
+ 6 (1) (2002) 37-39.
+
+ [201] A. Mislove, A. Post, C. Reis, P. Willmann, P. Druschel, D.
+ Wallach, X. Bonnaire, P. Sens, J.-M. Busca, and L. Arantes-
+ Bezerra, POST: A Secure, Resilient, Cooperative Messaging
+ System, 9th Workshop on Hot Topics in Operating Systems, HotOS,
+ May 2003.
+
+ [202] S. Saroiu, P. Gummadi, and S. Gribble, A measurement study of
+ peer-to-peer file sharing systems, Proc. Multimedia Computing
+ and Networking 2002 MMCN'02, January 2002.
+
+ [203] A. Muthitacharoen, R. Morris, T. Gil, and B. Chen, Ivy: a
+ read/write peer-to-peer file system, ACM SIGOPS Operating
+ Systems Review, Special issue on Decentralized storage systems,
+ December 2002, pp. 31-44.
+
+ [204] A. Muthitacharoen, R. Morris, T. Gil, and B. Chen, A read/write
+ peer-to-peer file system, Proc. 5th Symp. on Operating System
+ Design and Implementation (OSDI 2002), Boston, MA, December
+ (2002)
+
+
+
+
+
+
+Risson & Moors Informational [Page 72]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [205] F. Annexstein, K. Berman, M. Jovanovic, and K. Ponnavaikko,
+ Indexing techniques for file sharing in scalable peer to peer
+ networks, 11th IEEE Int'l Conf. on Computer Communications and
+ Networks (2002) 10-15.
+
+ [206] G. Kan and Y. Faybishenko, Introduction to Gnougat, First Int'l
+ Conf. on Peer-to-Peer Computing 2001 2001, pp. 4-12.
+
+ [207] R. Gold and D. Tidhar, Towards a content-based aggregation
+ network, Proc. First Int'l Conf. on Peer to Peer Compuuting
+ 2001, pp. 62-68.
+
+ [208] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica,
+ Wide-area cooperative storage with CFS, Proc. 18th ACM
+ symposium on Operating System Principles 2001, pp. 202-215.
+
+ [209] M. Freedman, E. Freudenthal, and D. Mazieres, Democratizing
+ content publication with coral, Proc. First Symp. on Networked
+ Systems Design and Implementation NSDI'04, March 29-31 2004,
+ pp. 239-252.
+
+ [210] J. Li, B. T. Loo, J. Hellerstein, F. Kaashoek, D. Karger, and
+ R. Morris, On the Feasibility of Peer-to-Peer Web Indexing and
+ Search, Second Int'l Workshop on Peer-to-Peer Systems IPTPS 03,
+ 20-21 February 2003.
+
+ [211] S. Iyer, A. Rowstron, and P. Druschel, Squirrel: a
+ decentralized peer-to-peer web cache, Proc. 21st annual
+ symposium on principles of distributed computing 2002, pp.
+ 213-222.
+
+ [212] M. Bawa, R. Bayardo, S. Rajagopalan, and E. Shekita, Make it
+ fresh, make it quick: searching a network of personal
+ webservers, Proc. 12th international conference on World Wide
+ Web 2003, pp. 577-586.
+
+ [213] B. T. Loo, S. Krishnamurthy, and O. Cooper, Distributed web
+ crawling over DHTs, Technical Report, CSD-04-1305, February 9
+ 2004.
+
+ [214] M. Junginger and Y. Lee, A self-organizing publish/subscribe
+ middleware for dynamic peer-to-peer networks, IEEE Network 18
+ (1) (2004) 38-43.
+
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 73]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [215] F. Cuenca-Acuna, C. Peery, R. Martin, and T. Nguyen, PlanetP:
+ Using Gossiping to Build Content Addressable Peer-to-Peer
+ Information Sharing Communities, Proc. 12th international
+ symposium on High Performance Distributed Computing (HPDC),
+ June 2002.
+
+ [216] M. Walfish, H. Balakrishnan, and S. Shenker, Untangling the web
+ from DNS, Proc. First Symp. on Networked Systems Design and
+ Implementation NSDI'04, March 29-31 2004, pp. 225-238.
+
+ [217] B. Awerbuch and C. Scheideler, Robust distributed name service,
+ The 3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27
+ 2004.
+
+ [218] A. Iamnitchi, Resource Discovery in Large Resource-Sharing
+ Environments, Doctoral Dissertation 2003.
+
+ [219] R. Cox, A. Muthitacharoen, and R. Morris, Serving DNS using a
+ Peer-to-Peer Lookup Service, First Int'l Workshop on Peer-to-
+ Peer Systems (IPTPS), March 2002.
+
+ [220] A. Chander, S. Dawson, P. Lincoln, and D. Stringer-Calvert,
+ NEVRLATE: scalable resource discovery, Second IEEE/ACM Int'l
+ Symp. on Cluster Computing and the Grid CCGRID2002 2002, pp.
+ 56-65.
+
+ [221] M. Balazinska, H. Balakrishnan, and D. Karger, INS/Twine: A
+ scalable Peer-to-Peer architecture for Intentional Resource
+ Discovery, Proc. First Int'l Conf. on Pervasive Computing
+ (IEEE) (2002)
+
+ [222] J. Kangasharju, K. Ross, and D. Turner, Secure and resilient
+ peer-to-peer E-mail: design and implementation, Proc. Third
+ Int'l IEEE Conf. on Peer-to-Peer Computing, 1-3 Sept 2003.
+
+ [223] V. Lo, D. Zappala, D. Zhou, Y. Liu, and S. Zhao, Cluster
+ computing on the fly: P2P scheduling of idle cycles in the
+ internet, The 3rd Int'l Workshop on Peer-to-Peer Systems,
+ February 26-27 2004.
+
+ [224] A. Iamnitchi, I. Foster, and D. Nurmi, A peer-to-peer approach
+ to resource discovery in grid environments, IEEE High
+ Performance Distributed Computing 2002.
+
+ [225] I. Foster and A. Iamnitchi, On Death, Taxes and the Convergence
+ of Peer-to-Peer and Grid Computing, Second Int'l Workshop on
+ Peer-to-Peer Systems IPTPS 03, 20-21 February 2003.
+
+
+
+
+Risson & Moors Informational [Page 74]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [226] W. Hoschek, Peer-to-Peer Grid Databases for Web Service
+ Discovery, Concurrency - Practice and Experience (2002) 1-7.
+
+ [227] K. Aberer, A. Datta, and M. Hauswirth, A decentralized public
+ key infrastructure for customer-to-customer e-commerce, Int'l
+ Journal of Business Process Integration and Management (2004)
+
+ [228] S. Ajmani, D. Clarke, C.-H. Moh, and S. Richman, ConChord:
+ Cooperative SDSI Certificate Storage and Name Resolution, First
+ Int'l Workshop on Peer-to-Peer Systems IPTPS, March 2002.
+
+ [229] E. Sit, F. Dabek, and J. Robertson, UsenetDHT: a low overhead
+ Usenet server, The 3rd Int'l Workshop on Peer-to-Peer Systems,
+ February 26-27 2004.
+
+ [230] H.-Y. Hsieh and R. Sivakumar, On transport layer support for
+ peer-to-peer networks, The 3rd Int'l Workshop on Peer-to-Peer
+ Systems, February 26-27 2004.
+
+ [231] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S. Surana,
+ Internet indirection infrastructure, Proc. 2002 conference on
+ applications, technologies, architectures and protocols for
+ computer communications, August 19-23 2002, pp. 73-86.
+
+ [232] E. Halepovic and R. Deters, Building a P2P forum system with
+ JXTA, Proc. Second IEEE Int'l Conf. on Peer to Peer Computing
+ P2P'02, September 5-7 2002.
+
+ [233] M. Wawrzoniak, L. Peterson, and T. Roscoe, Sophia: an
+ Information Plane for networked systems, ACM SIGCOMM Computer
+ Communication Review 34 (1) (2004) 15-20.
+
+ [234] D. Tran, K. Hua, and T. Do, A Peer-to-Peer Architecture for
+ Media Streaming, IEEE Journal on Selected Areas in
+ Communications 22 (1) (2004) 121-133.
+
+ [235] V. Padmanabhan, H. Wang, and P. Chou, Supporting heterogeneity
+ and congestion control in peer-to-peer multicast streaming, The
+ 3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27
+ 2004.
+
+ [236] A. Nicolosi and D. Mazieres, Secure acknowledgment of multicast
+ messages in open peer-to-peer networks, The 3rd Int'l Workshop
+ on Peer-to-Peer Systems, February 26-27 2004.
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 75]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [237] R. Zhang and C. Hu, Borg: a hybrid protocol for scalable
+ application-level multicast in peer-to-peer networks, Proc.
+ 13th international workshop on network and operating systems
+ for digital audio and video 2003, pp. 172-179.
+
+ [238] M. Sasabe, N. Wakamiya, M. Murata, and H. Miyahara, Scalable
+ and continuous media streaming on peer-to-peer networks, Proc.
+ Third Int'l IEEE Conf. on Peer-to-Peer Computing, Sept 1-3
+ 2003, pp. 92-99.
+
+ [239] M. Hefeeda, A. Habib, B. Botev, D. Xu, and B. Bhargava,
+ PROMISE: peer-to-peer media streaming using CollectCast, Proc.
+ eleventh ACM international conference on multimedia 2003, pp.
+ 45-54.
+
+ [240] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron,
+ and A. Singh, SplitStream: high-bandwidth multicast in
+ cooperative environments, Proc. 19th ACM symposium on operating
+ systems principles 2003, pp. 298-313.
+
+ [241] M. Castro, P. Druschel, A.-M. Kermarrec, and A. Rowstron,
+ SCRIBE: a large-scale and decentralized application-level
+ multicast infrastructure, IEEE Journal on Selected Areas in
+ Communications 20 (8) (2002)
+
+ [242] S. Zhuang, B. Zhao, A. Joseph, R. Katz, and J. Kubiatowicz,
+ Bayeux: an architecture for scalable and fault-tolerant wide-
+ area data dissemination, Proc. 11th ACM international workshop
+ on network and operating systems support for digital audio and
+ video, Jan 2001.
+
+ [243] R. Lienhart, M. Holliman, Y.-K. Chen, I. Kozintsev, and M.
+ Yeung, Improving media services on P2P networks, IEEE Internet
+ Computing 6 (1) (2002) 58-67.
+
+ [244] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L.
+ Yin, and F. Yu, Data Centric Storage in Sensornets with GHT, a
+ geographic hash table, Mobile Networks and Applications 8 (4)
+ (2003) 427-442.
+
+ [245] M. Demirbas and H. Ferhatosmanoglu, Peer-to-peer spatial
+ queries in sensor networks, Proc. Third Int'l IEEE Conf. on
+ Peer-to-Peer Computing, 1-3 Sept 2003, pp. 32-39.
+
+ [246] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan,
+ and S. Shenker, GHT: a geographic hash table for data-centric
+ storage, Proc. First ACM Int'l Workshop on Wireless Sensor
+ Networks and Applications (Mobicom) 2002, pp. 78-87.
+
+
+
+Risson & Moors Informational [Page 76]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [247] J. Hellerstein and W. Wang, Optimization of In-Network Data
+ Reduction, Proc. First Workshop on Data Management for Sensor
+ Networks DMSN 2004, August 30th 2004.
+
+ [248] J. Li, J. Stribling, T. Gil, R. Morris, and F. Kaashoek,
+ Comparing the performance of distributed hash tables under
+ churn, The 3rd Int'l Workshop on Peer-to-Peer Systems, February
+ 26-27 2004.
+
+ [249] S. Shenker, The data-centric revolution in networking, Keynote
+ Speech, 29th Int'l Conf. on Very Large Data Bases, September
+ 9-12 2003.
+
+ [250] S. Gribble, A. Halevy, Z. Ives, M. Rodrig, and D. Suciu, What
+ can databases do for P2P?, Proc. Fourth Int'l Workshop on
+ Databases and the Web, WebDB2001, May 24-25 2001.
+
+ [251] D. Clark, The design philosophy of the DARPA internet
+ protocols, ACM SIGCOMM Computer Communication Review, Symp.
+ proceedings on communications architectures and protocols 18
+ (4) (1988)
+
+ [252] J.-C. Laprie, Dependable Computing and Fault Tolerance:
+ Concepts and Terminology, Twenty-Fifth Int'l Symp. on Fault-
+ Tolerant Computing, Highlights from Twenty-Five Years 1995, pp.
+ 2-13.
+
+ [253] D. Clark, J. Wroclawski, K. Sollins, and R. Braden, Tussle in
+ cyberspace: defining tomorrow's internet, Conf. on
+ Applications, Technologies, Architectures and Protocols for
+ Computer Communications 2002, pp. 347-356.
+
+ [254] L. O. Alima, A. Ghodsi, and S. Haridi, "A framework for
+ structured peer-to-peer overlay networks," in Global computing,
+ vol. 3267, Lecture Notes in Computer Science: Springer Berlin /
+ Heidelberg, 2005, pp. 223-249.
+
+ [255] Clip2, The Gnutella Protocol Specification,
+ http://www.clip2.com (2000)
+
+ [256] Napster, http://www.napster.com (1999)
+
+ [257] J. Mishchke and B. Stiller, A methodology for the design of
+ distributed search in P2P middleware, IEEE Network 18 (1)
+ (2004) 30-37.
+
+
+
+
+
+
+Risson & Moors Informational [Page 77]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [258] J. Li and K. Sollins, Implementing aggregation and broadcast
+ over distributed hash tables. Full report,
+ http://krs.lcs.mit.edu/regions/docs.html (November) (2003)
+
+ [259] M. Castro, M. Costa, and A. Rowstron, Should we build Gnutella
+ on a structured overlay?, ACM SIGCOMM Computer Communication
+ Review 34 (1) (2004) 131-136.
+
+ [260] A. Singla and C. Rohrs, Ultrapeers: Another Step Towards
+ Gnutella Scalability,
+ http://groups.yahoo.com/group/the_gdf/files/Proposals/
+ Working%20Proposals/Ultrapeer/ Version 1.0, 26 November (2002)
+
+ [261] B. Cooper and H. Garcia-Molina, Ad hoc, Self-Supervising Peer-
+ to-Peer Search Networks, Technical Report,
+ http://www.cc.gatech.edu/~cooperb/odin/ 2003.
+
+ [262] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information
+ Retrieval. Addison Wesley, Essex, England, 1999.
+
+ [263] S. Sen and J. Wang, Analyzing peer-to-peer traffic across large
+ networks, IEEE/ACM Trans. on Networking 12 (2) (2004) 219-232.
+
+ [264] H. Balakrishnan, S. Shenker, and M. Walfish, Semantic-Free
+ Referencing in Linked Distributed Systems, Second Int'l
+ Workshop on Peer-to-Peer Systems IPTPS 03, 20-21 February 2003.
+
+ [265] B. Yang, P. Vinograd, and H. Garcia-Molina, Evaluating GUESS
+ and non-forwarding peer-to-peer search, The 24th Int'l Conf. on
+ Distributed Computing Systems ICDCS'04, Mar 23-26 2004.
+
+ [266] A. Gupta, B. Liskov, and R. Rodrigues, One Hop Lookups for
+ Peer-to-Peer Overlays, 9th Workshop on Hot Topics in Operating
+ Systems (HotOS), 18-21 May 2003.
+
+ [267] A. Gupta, B. Liskov, and R. Rodrigues, Efficient routing for
+ peer-to-peer overlays, First symp. on Networked Systems Design
+ and Implementation (NSDI), Mar 29-31 2004, pp. 113-126.
+
+ [268] A. Mizrak, Y. Cheng, V. Kumar, and S. Savage, Structured
+ superpeers: leveraging heterogeneity to provide constant-time
+ lookup, IEEE Workshop on Internet Applications, June 23-24
+ 2003.
+
+ [269] L. Adamic, R. Lukose, A. Puniyani, and B. Huberman, Search in
+ power-law networks, Physical review E, The American Physical
+ Society 64 (046135) (2001)
+
+
+
+
+Risson & Moors Informational [Page 78]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [270] F. Banaei-Kashani and C. Shahabi, Criticality-based analysis
+ and design of unstructured peer-to-peer networks as "complex
+ systems", Proc. 3rd IEEE/ACM Int'l Symp. on Cluster Computing
+ and the Grid 2003, pp. 351-358.
+
+ [271] KaZaa, KaZaa Media Desktop, www.kazaa.com (2001)
+
+ [272] S. Sen and J. Wang, Analyzing peer-to-peer traffic across large
+ networks, Proc. second ACM SIGCOMM workshop on Internet
+ measurement, November 06-08 2002, pp. 137-150.
+
+ [273] DirectConnect, http:www.neo-modus.com (2001)
+
+ [274] S. Saroiu, K. Gummadi, R. Dunn, S. Gribble, and H. Levy, An
+ analysis of Internet content delivery systems, ACM SIGOPS
+ Operating Systems Review 36 (2002) 315-327.
+
+ [275] A. Loo, The Future or Peer-to-Peer Computing, Communications of
+ the ACM 46 (9) (2003) 56-61.
+
+ [276] B. Yang and H. Garcia-Molina, Comparing Hybrid Peer-to-Peer
+ Systems (extended), 27th Int'l Conf. on Very Large Data Bases,
+ September 11-14 2001.
+
+ [277] D. Scholl, OpenNap Home Page, http://opennap.sourceforge.net/
+ (2001)
+
+ [278] S. Ghemawat, H. Gobioff, and S.-T. Leung, The Google file
+ system, Proc. 19th ACM symposium on operating systems
+ principles 2003, pp. 29-43.
+
+ [279] I. Clarke, S. Miller, T. Hong, O. Sandberg, and B. Wiley,
+ Protecting Free Expression Online with Freenet, IEEE Internet
+ Computing 6 (1) (2002)
+
+ [280] J. Mache, M. Gilbert, J. Guchereau, J. Lesh, F. Ramli, and M.
+ Wilkinson, Request algorithms in Freenet-style peer-to-peer
+ systems, Proc. Second IEEE Int'l Conf. on Peer to Peer
+ Computing P2P'02, September 5-7 2002.
+
+ [281] C. Rohrs, Query Routing for the Gnutella Networks,
+ http://www.limewire.com/developer/query_routing/
+ keyword%20routing.htm Version 1.0 (2002)
+
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 79]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [282] I. Clarke, Freenet's Next Generation Routing Protocol,
+ http://freenetproject.org/index.php?page=ngrouting, 20th July
+ 2003.
+
+ [283] A. Z. Kronfol, FASD: A fault-tolerant, adaptive scalable
+ distributed search engine, Master's Thesis
+ http://www.cs.princeton.edu/~akronfol/fasd/ 2002.
+
+ [284] S. Gribble, E. Brewer, J. M. Hellerstein, and D. Culler,
+ Scalable, Distributed Data Structures for Internet Service
+ Construction, Proc. 4th Symp. on Operating Systems Design and
+ Implementation OSDI 2000, October 2000.
+
+ [285] K. Aberer, Efficient Search in Unbalanced, Randomized Peer-to-
+ Peer Search Trees, EPFL Technical Report IC/2002/79 (2002)
+
+ [286] R. Honicky and E. Miller, A fast algorithm for online placement
+ and reorganization of replicated data, Proc. 17th Int'l
+ Parallel and Distributed Processing Symp., April 2003.
+
+ [287] G. S. Manku, Routing networks for distributed hash tables,
+ Proc. 22nd annual ACM Symp. on Principles of Distributed
+ Computing, PODC 2003, July 13-16 2003, pp. 133-142.
+
+ [288] D. Lewin, Consistent hashing and random trees: algorithms for
+ caching in distributed networks, Master's Thesis, Department of
+ Electrical Engineering and Computer Science, Massachusetts
+ Institute of Technology (1998)
+
+ [289] S. Lei and A. Grama, Extended consistent hashing: a framework
+ for distributed servers, Proc. 24th Int'l Conf. on Distributed
+ Computing Systems ICDCS 2004, March 23-26 2004.
+
+ [290] W. Litwin, Re: Chord & LH*, Email to Ion Stoica, March 23
+ 2004a.
+
+ [291] J. Li, J. Stribling, R. Morris, F. Kaashoek, and T. Gil, A
+ performance vs. cost framework for evaluating DHT design
+ tradeoffs under churn, Proc. IEEE Infocom, Mar 13-17 2005.
+
+ [292] S. Zhuang, D. Geels, I. Stoica, and R. Katz, On failure
+ detection algorithms in overlay networks, Proc. IEEE Infocomm,
+ Mar 13-17 2005.
+
+ [293] X. Li, J. Misra, and C. G. Plaxton, Active and Concurrent
+ Topology Maintenance, The 18th Annual Conf. on Distributed
+ Computing (DISC 2004), Trippenhuis, Amsterdam, the Netherlands,
+ October 4 - October 7 (2004)
+
+
+
+Risson & Moors Informational [Page 80]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [294] K. Aberer, L. O. Alima, A. Ghodsi, S. Girdzijauskas, M.
+ Hauswirth, and S. Haridi, The essence of P2P: a reference
+ architecture for overlay networks, Proc. of the 5th
+ international conference on peer-to-peer computing, Aug 31-Sep
+ 2 2005.
+
+ [295] C. Tang, M. Buco, R. Chang, S. Dwarkadas, L. Luan, E. So, and
+ C. Ward, Low traffic overlay networks with large routing
+ tables, Proc. of ACM Sigmetrics Int'l Conf. on Measurement and
+ Modeling of Comp. Sys., Jun 6-10 2005, pp. 14-25.
+
+ [296] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, Handling
+ churn in a DHT, Proc. of the USENIX Annual Technical
+ Conference, June 2004.
+
+ [297] C. Blake and R. Rodrigues, High Availability, Scalable Storage,
+ Dynamic Peer Networks: Pick Two, 9th Workshop on Hot Topics in
+ Operating Systems (HotOS), Lihue, Hawaii, 18-21 May (2003)
+
+ [298] S. Rhea, B. Godfrey, B. Karp, J. Kubiatowicz, S. Ratnasamy, S.
+ Shenker, I. Stoica, and H. Yu, OpenDHT: a public DHT service
+ and its uses, Proc. of the conf. on Applications, technologies,
+ architectures and protocols for computer communications, Aug
+ 22-26 2005, pp. 73-84.
+
+ [299] T. Gil, F. Kaashoek, J. Li, R. Morris, and J. Stribling,
+ p2psim, a simulator for peer-to-peer protocols,
+ http://www.pdos.lcs.mit.edu/p2psim/ (2003)
+
+ [300] K. Hildrum, J. D. Kubiatowicz, S. Rao, and B. Y. Zhao,
+ Distributed object location in a dynamic network, Theory of
+ Computing Systems (2004)
+
+ [301] N. Lynch, D. Malkhi, and D. Ratajczak, Atomic data access in
+ distributed hash tables, Proc. Int'l Peer-to-Peer Symp., March
+ 7-8 2002.
+
+ [302] S. Gilbert, N. Lynch, and A. Shvartsman, RAMBO II: Rapidly
+ Reconfigurable Atomic Memory for Dynamic Networks, Technical
+ Report, MIT-CSAIL-TR-890 2004.
+
+ [303] N. Lynch and I. Stoica, MultiChord: A resilient namespace
+ management algorithm, Technical Memo MIT-LCS-TR-936 2004.
+
+ [304] J. Risson, K. Robinson, and T. Moors, Fault tolerant active
+ rings for structured peer-to-peer overlays, Proc. of the 30th
+ Annual IEEE Conf. on Local Computer Networks, Nov 15-17 2005,
+ pp. 18-25.
+
+
+
+Risson & Moors Informational [Page 81]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [305] B. Awerbuch and C. Scheideler, Peer-to-peer systems for prefix
+ search, Proc. 22nd annual ACM Symp. on Principles of
+ Distributed Computing 2003, pp. 123-132.
+
+ [306] F. Dabek, B. Zhao, P. Druschel, J. Kubiatowicz, and I. Stoica,
+ Towards a common API for structured P2P overlays, Proc. Second
+ Int'l Workshop on Peer to Peer Systems IPTPS 2003, February
+ 2003.
+
+ [307] N. Feamster and H. Balakrishnan, Towards a logic for wide-area
+ Internet routing, Proc. ACM SIGCOMM workshop on Future
+ Directions in Network Architecture, August 25-27 2003, pp.
+ 289-300.
+
+ [308] B. Ahlgren, M. Brunner, L. Eggert, R. Hancock, and S. Schmid,
+ Invariants: a new design methodology for network architectures,
+ Proc. ACM SIGCOMM workshop on Future Direction in Network
+ Architecture, August 30 2004, pp. 65-70.
+
+ [309] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction
+ to Algorithms, 2nd Edition. MIT Press, McGraw-Hill, Cambridge,
+ London, England, 2003.
+
+ [310] I. Abraham, D. Malkhi, and O. Dubzinski, LAND:Stretch
+ (1+epsilon) Locality Aware Networks for DHTs, Proc. ACM-SIAM
+ Symp. on Discrete Algorithms SODA-04 2004.
+
+ [311] S. Jain, R. Mahajan, and D. Wetherall, A study of the
+ performance potential of DHT-based overlays, Proc. of the 4th
+ Usenix symposium on internet technologies and systems (USITS),
+ Mar 2003.
+
+ [312] J. Risson, A. Harwood, and T. Moors, Stable high-capacity one-
+ hop distributed hash tables, Proc. of the IEEE Symposium on
+ Computers and Communications (ISCC'06), Jun 26-29 2006.
+
+ [313] V. Ramasubramanian and E. Sirer, Beehive: O(1) Lookup
+ Performance for Power-Law Query Distributions in Peer-to-Peer
+ Overlays, Proc. First Symp. on Networked Systems Design and
+ Implementation (NSDI'04), San Francisco, California, March
+ 29-31 (2004) 99-112.
+
+ [314] I. Abraham, A. Badola, D. Bickson, D. Malkhi, S. Maloo, and S.
+ Ron, Practical locality-awareness for large scale information
+ sharing, Proc. 4th International Workshop on Peer-to-Peer
+ Systems, Feb 24-25 2005.
+
+
+
+
+
+Risson & Moors Informational [Page 82]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [315] B. Leong, B. Liskov, and E. Demaine, Epichord: parallelizing
+ the Chord lookup algorithm with reactive routing state
+ management, Proc. of the 12th International Conference on
+ Networks, Nov 2004.
+
+ [316] J. Li, J. Stribling, R. Morris, and F. Kaashoek, Bandwidth-
+ efficient management of DHT routing tables, Proc. 2nd Symposium
+ on Networked Systems Design and Implementation, May 2-4 2005.
+
+ [317] S. Rhea, B.-G. Chun, J. Kubiatowicz, and S. Shenker, Fixing the
+ embarrassing slowness of OpenDHT on PlanetLab, Proc. of the
+ Second USENIX Workshop on Real, Large Distributed Systems, Dec
+ 13 2005.
+
+ [318] M. Costa, M. Castro, A. Rowstron, and P. Key, PIC: Practical
+ Internet coordinates for distance estimation, Proc. of the 24th
+ international conference on distributed computing systems, Mar
+ 2004.
+
+ [319] M. Castro, M. B. Jones, A.-M. Kermarrec, A. Rowstron, M.
+ Theimer, H. Wang, and A. Wolman, An evaluation of scalable
+ application- level multicast built using peer-to-peer overlays,
+ Proc. of the 22nd Annual Joint Conf. of the IEEE Comp. and
+ Comm. Soc. (INFOCOM), 30 Mar - 3 Apr 2003, pp. 1510-1520.
+
+ [320] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker,
+ Application-level multicast using content-addressable networks,
+ Proc. of the Third International Workshop on Networked Group
+ Communication, Nov 7-9 2001.
+
+ [321] S. El-Ansary, L. Alima, P. Brand, and S. Haridi, Efficient
+ broadcast in structured P2P networks, Second Int'l Workshop on
+ Peer-to-Peer Systems (IPTPS 03), Berkeley, CA, USA, 20-21
+ February (2003)
+
+ [322] J. Li, K. Sollins, and D.-Y. Lim, Implementing aggregation and
+ broadcast over Distributed Hash Tables, ACM Computer
+ Communication Reviews 35 (1) (2005) 81-92.
+
+ [323] V. Pai, K. Tamilmani, V. Sambamurthy, K. Kumar, and A. Mohr,
+ Chainsaw: eliminating trees from overlay multicast, Proc. 4th
+ Int'l Workshop on Peer-to-Peer Systems, February 24-25 2005.
+
+ [324] K. Birman, M. Hayden, O. Ozkasap, Z. Xiao, and M. Budiu,
+ Bimodal Multicast, ACM Trans. on Computer Systems 17 (2) (1999)
+ 41-88.
+
+
+
+
+
+Risson & Moors Informational [Page 83]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [325] Z. Zhang, S. Chen, Y. Ling, and R. Chow, Resilient capacity-
+ aware multicasting based on overlay networks, Proc. of the 25th
+ IEEE Int'l Conf. on Distributed Computing Systems, 6-10 June
+ 2005, pp. 565-574.
+
+ [326] A. Bharambe, S. Rao, V. Padmanabhan, S. Seshan, and H. Zhang,
+ The impact of heterogeneous bandwidth constraints on DHT-based
+ multicast protocols, Proc. 4th Int'l Workshop on Peer-to-Peer
+ Systems, February 24-25 2005.
+
+ [327] A. Ghodsi, L. O. Alima, S. El-Ansary, P. Brand, and S. Haridi,
+ Self-correcting broadcast in distributed hash tables, Proc. of
+ the 15th IASTED International Conf. on Parallel and Distributed
+ Computing and Systems, Nov 2003.
+
+ [328] R. Mahajan, M. Castro, and A. Rowstron, Controlling the cost of
+ reliability in peer-to-peer overlays, Second Int'l Workshop on
+ Peer-to-Peer Systems IPTPS'03, February 20-21 2003.
+
+ [329] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, Handling
+ churn in a DHT, Report No. UCB/CSD-03-1299, University of
+ California, also Proc. USENIX Annual Technical Conference, June
+ 2003.
+
+ [330] M. Castro, M. Costa, and A. Rowstron, Performance and
+ dependability of structured peer-to-peer overlays, Microsoft
+ Research Technical Report MSR-TR-2003-94, December. Also 2004
+ Int'l Conf. on Dependable Systems and Networks, June 28-July 1
+ 2003.
+
+ [331] D. Liben-Nowell, H. Balakrishnan, and D. Karger, Analysis of
+ the evolution of peer-to-peer systems, Annual ACM Symp. on
+ Principles of Distributed Computing 2002, pp. 233-242.
+
+ [332] L. Alima, S. El-Ansary, P. Brand, and S. Haridi, DKS(N,k,f): a
+ family of low communication, scalable and fault-tolerant
+ infrastructures for P2P applications, Proc. 3rd IEEE/ACM Int'l
+ Symp. on Cluster Computing and the Grid (2003) 344-350.
+
+ [333] D. Karger and M. Ruhl, Finding nearest neighbours in growth-
+ restricted metrics, Proc. 34th annual ACM symposium on Theory
+ of computing 2002, pp. 741-750.
+
+ [334] S. Ratnasamy, A Scalable Content-Addressable Network, Doctoral
+ Dissertation 2002.
+
+ [335] S. McCanne and S. Floyd, The LBNL/UCB Network Simulator.
+
+
+
+
+Risson & Moors Informational [Page 84]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [336] M. Naor and U. Wieder, Novel architectures for P2P
+ applications: the continuous-discrete approach, Proc. fifteenth
+ annual ACM Symp. on Parallel Algorithms and Architectures, SPAA
+ 2003, June 7-9 2003, pp. 50-59.
+
+ [337] N. D. de Bruijn, A combinatorial problem, Koninklijke
+ Netherlands: Academe Van Wetenschappen 49 (1946) 758-764.
+
+ [338] J.-W. Mao, "The Coloring and Routing Problems on de Bruijn
+ Interconnection Networks," in Doctoral Dissertation, National
+ Sun Yat-sen University, 2003.
+
+ [339] M. L. Schlumberger, De Bruijn communication networks, Doctoral
+ Dissertation 1974.
+
+ [340] M. Imase and M. Itoh, Design to minimize diameter on building-
+ block network, IEEE Trans. on Computers C-30 (6) (1981) 439-
+ 442.
+
+ [341] S. M. Reddy, D. K. Pradhan, and J. G. Kuhl, Direct graphs with
+ minimal and maximal connectivity, Technical Report, School of
+ Engineering, Oakland University (1980)
+
+ [342] R. A. Rowley and B. Bose, Fault-tolerant ring embedding in de
+ Bruijn networks, IEEE Trans. on Computers 42 (12) (1993) 1480-
+ 1486.
+
+ [343] K. Y. Lee, G. Liu, and H. F. Jordan, Hierarchical networks for
+ optical communications, Journal of Parallel and Distributed
+ Computing 60 (2000) 1-16.
+
+ [344] M. Naor and U. Wieder, Know thy neighbor's neighbor: better
+ routing for skip-graphs and small worlds, The 3rd Int'l
+ Workshop on Peer-to-Peer Systems, February 26-27 2004.
+
+ [345] P. Fraigniaud and P. Gauron, The content-addressable networks
+ D2B, Technical Report 1349, Laboratoire de Recherche en
+ Informatique, January 2003.
+
+ [346] A. Datta, S. Girdzijauskas, and K. Aberer, On de Bruijn routing
+ in distributed hash tables: there and back again, Proc. Fourth
+ IEEE Int'l Conf. on Peer-to-Peer Computing, , 25-27 August
+ 2004.
+
+ [347] W. Pugh, Skip lists: a probabilistic alternative to balanced
+ trees, Proc. Workshop on Algorithms and Data Structures, August
+ 17-19 1989, pp. 437-449.
+
+
+
+
+Risson & Moors Informational [Page 85]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [348] W. Pugh, Skip lists: a probabilistic alternative to balanced
+ trees, Communications of the ACM 33 (6) (1990) 668-676.
+
+ [349] J. Gray, The transaction concept: Virtues and limitations,
+ Proc. VLDB, September 1981.
+
+ [350] B. T. Loo, J. M. Hellerstein, R. Huebsch, S. Shenker, and I.
+ Stoica, Enhancing P2P file-sharing with internet-scale query
+ processor, Proc. 30th Int'l Conf. on Very Large Data Bases VLDB
+ 2004, 29 August-3 September 2004.
+
+ [351] M. Stonebraker, P. Aoki, W. Litwin, A. Pfeffer, A. Sah, J.
+ Sidell, C. Staelin, and A. Yu, Mariposa: a wide-area
+ distributed database system, THE VLDB Journal - The Int'l
+ Journal of Very Large Data Bases (5) (1996) 48-63.
+
+ [352] V. Cholvi, P. Felber, and E. Biersack, Efficient Search in
+ Unstructured Peer-to-Peer Networks, Proc. Symp. on Parallel
+ Algorithms and Architectures, July 2004.
+
+ [353] S. Daswani and A. Fisk, Gnutella UDP Extension for Scalable
+ Searches (GUESS) v0.1,
+ http://www.limewire.org/fisheye/viewrep/~raw,r=1.2/limecvs/
+ core/guess_01.html (2002)
+
+ [354] A. Fisk, Gnutella Dynamic Query Protocol v0.1, Gnutella
+ Developer Forum (2003)
+
+ [355] O. Gnawali, A Keyword Set Search System for Peer-to-Peer
+ Networks, Master's Thesis 2002.
+
+ [356] Limewire, Limewire Host Count,
+ http://www.limewire.com/english/content/netsize.shtml (2004)
+
+ [357] A. Fisk, Gnutella Ultrapeer Query Routing,
+ http://groups.yahoo.com/group/the_gdf/files/Proposals/
+ Working%20Proposals/search/Ultrapeer%20QRP/ v0.1 (2003)
+
+ [358] A. Fisk, Gnutella Dynamic Query Protocol,
+ http://groups.yahoo.com/group/the_gdf/files/Proposals/
+ Working%20Proposals/search/Dynamic%20Querying/ v0.1 (2003)
+
+ [359] S. Thadani, Meta Data searches on the Gnutella Network
+ (addendum), http://www.limewire.com/developer/MetaProposal2.htm
+ (2001)
+
+
+
+
+
+
+Risson & Moors Informational [Page 86]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [360] S. Thadani, Meta Information Searches on the Gnutella Networks,
+ http://www.limewire.com/developer/metainfo_searches.html (2001)
+
+ [361] P. Reynolds and A. Vahdat, Efficient peer-to-peer keyword
+ searching, ACM/IFP/USENIX Int'l Middleware Conference,
+ Middleware 2003, June 16-20 2003.
+
+ [362] W. Terpstra, S. Behnel, L. Fiege, J. Kangasharju, and A.
+ Buchmann, Bit Zipper Rendezvous, optimal data placement for
+ general P2P queries, Proc. First Int'l Workshop on Peer-to-Peer
+ Computing and Databases, March 14 2004.
+
+ [363] A. Singhal, Modern Information Retrieval: A Brief Overview,
+ IEEE Data Engineering Bulletin 24 (4) (2001) 35-43.
+
+ [364] E. Cohen, A. Fiat, and H. Kaplan, Associative Search in Peer to
+ Peer Networks: Harnessing Latent Semantics, IEEE Infocom 2003,
+ The 22nd Annual Joint Conf. of the IEEE Computer and
+ Communications Societies, March 30-April 3 2003.
+
+ [365] W. Muller and A. Henrich, Fast retrieval of high-dimensional
+ feature vectors in P2P networks using compact peer data
+ summaries, Proc. 5th ACM SIGMM international workshop on
+ Multimedia Information Retrieval, November 7 2003, pp. 79-86.
+
+ [366] M. T. Ozsu and P. Valduriez, Principles of Distributed Database
+ Systems, 2nd edition ed. Prentice Hall, 1999.
+
+ [367] G. Salton, A. Wong, and C. S. Yang, A vector space model for
+ automatic indexing, Communications of the ACM 18 (11) (1975)
+ 613- 620.
+
+ [368] S. E. Robertson, S. Walker, and M. Beaulieu, Okapi at TREC-7:
+ automatic ad hoc, filtering, VLC and filtering tracks, Proc.
+ Seventh Text REtrieval Conference, TREC-7, NIST Special
+ Publication 500-242, July 1999, pp. 253-264.
+
+ [369] A. Singhal, J. Choi, D. Hindle, D. Lewis, and F. Pereira, AT&T
+ at TREC-7, Proc. Seventh Text REtrieval Conf. TREC-7, July
+ 1999, pp. 253-264.
+
+ [370] K. Sankaralingam, S. Sethumadhavan, and J. Browne, Distributed
+ Pagerank for P2P Systems, Proc. 12th international symposium on
+ High Performance Distributed Computing HPDC, June 22-24 2003.
+
+ [371] I. Klampanos and J. Jose, An architecture for information
+ retrieval over semi-collaborated peer-to-peer networks, Proc.
+ 2004 ACM symposium on applied computing 2004, pp. 1078-1083.
+
+
+
+Risson & Moors Informational [Page 87]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [372] C. Tang, Z. Xu, and S. Dwarkadas, Peer-to-peer information
+ retrieval using self-organizing semantic overlay networks,
+ Proc. 2003 conference on Applications, Technologies,
+ Architectures and Protocols for Computer Communications, August
+ 25-29 2003, pp. 175-186.
+
+ [373] C. Tang and S. Dwarkadas, Hybrid global-local indexing for
+ efficient peer-to-peer information retrieval, Proc. First Symp.
+ on Networked Systems Design and Implementation NSDI'04, March
+ 29-31 2004, pp. 211-224.
+
+ [374] G. W. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer, R.
+ A. Harshman, L. A. Streeter, and K. E. Lochbaum, Information
+ retrieval using a singular value decomposition model of latent
+ semantic structure, Proc. 11th Annual Int'l ACM SIGIR Conf. on
+ Research and Development in Information Retrieval 1988, pp.
+ 465-480.
+
+ [375] C. Tang, S. Dwarkadas, and Z. Xu, On scaling latent semantic
+ indexing for large peer-to-peer systems, The 27th Annual Int'l
+ ACM SIGIR Conf. SIGIR'04, ACM Special Interest Group on
+ Information Retrieval, July 2004.
+
+ [376] S. Milgram, The small world problem, Psychology Today 1 (61)
+ (1967)
+
+ [377] J. Kleinberg, The small-world phenonemon: An algorithmic
+ perspective, Proc. 32nd ACM Symp. on Theory of Computing (2000)
+
+ [378] Y. Petrakis and E. Pitoura, "On constructing small worlds in
+ unstructured peer-to-peer systems," in Current trends in
+ database technology (Proc. First Int'l Workshop on Peer-to-Peer
+ Computing and Databases, Heraklion, Crete, Greece, March 14),
+ vol. 3268, Lecture Notes in Computer Science: Springer, 2004,
+ pp. 415-424.
+
+ [379] A. Iamnitchi, M. Ripeanu, and I. Foster, Locating Data in
+ (Small World?) P2P Scientific Collaborations, First Int'l
+ Workshop on Peer-to-Peer Systems (IPTPS), Cambridge, MA, March
+ (2002)
+
+
+
+
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 88]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+ [380] Y. Ren, C. Sha, W. Qian, A. Zhou, B. Ooi, and K. Tan, Explore
+ the "small world phenomena" in pure P2P information sharing
+ systems, Proc. 3rd IEEE/ACM Int'l Symp. on Cluster Computing
+ and the Grid (2003) 232-239.
+
+ [381] G. S. Manku, M. Bawa, and P. Raghavan, Symphony: Distributed
+ Hashing in a Small World, Proc. 4th USENIX Symp. on Internet
+ Technologies and Systems, March 26-28 2003.
+
+ [382] W. Litwin and S. Sahri, Implementing SD-SQL Server: a Scalable
+ Distributed Database System, CERIA Research Rerpot 2004-04-02,
+ April 2004.
+
+ [383] M. Jarke and J. Koch, Query Optimization in Database Systems,
+ ACM Computing Surveys 16 (2) (1984) 111-152.
+
+ [384] J. L. Bentley, Multidimensional binary search trees used for
+ associative searching, Communications of the ACM 18 (9) (1975)
+ 509-517.
+
+ [385] B. Chun, I. Stoica, J. Hellerstein, R. Huebsch, S. Jeffery, B.
+ T. Loo, S. Mardanbeigi, T. Roscoe, S. Rhea, and S. Schenker,
+ Querying at Internet Scale, Proc. 2004 ACM SIGMOD international
+ conference on management of data, demonstration session 2004,
+ pp. 935-936.
+
+ [386] P. Cao and Z. Wang, Efficient top-K query calculation in
+ distributed networks, Proc. 23rd Annual ACM SIGACT-SIGOPS Symp.
+ on Principles of Distributed Computing PODC 2004, July 25-28
+ 2004, pp. 206-215.
+
+ [387] D. Psaltoulis, I. Kostoulas, I. Gupta, K. Birman, and A.
+ Demers, Practical algorithms for size estimation in large and
+ dynamic groups, Proc. Twenty-Third Annual ACM SIGACT-SIGOPS
+ Symp. on Principles of Distributed Computing, PODC 2004, July
+ 25-28 2004.
+
+ [388] R. van Renesse, The importance of aggregation, Springer-Verlag
+ Lecture Notes in Computer Science "Future Directions in
+ Distributed Computing". A. Schiper, A. A. Shvartsman, H.
+ Weatherspoon, and B. Y. Zhao, editors. Springer-Verlag,
+ Heidelberg volume 2584 (2003)
+
+
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 89]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+Author's Addresses
+
+ John Risson
+ School of Elec Eng and Telecommunications
+ University of New South Wales
+ Sydney NSW 2052 Australia
+
+ EMail: jr@tuffit.com
+
+
+ Tim Moors
+ School of Elec Eng and Telecommunications
+ University of New South Wales
+ Sydney NSW 2052 Australia
+
+ EMail: t.moors@unsw.edu.au
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 90]
+
+RFC 4981 Survey of Research on P2P Search September 2007
+
+
+Full Copyright Statement
+
+ Copyright (C) The IETF Trust (2007).
+
+ This document is subject to the rights, licenses and restrictions
+ contained in BCP 78, and except as set forth therein, the authors
+ retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
+ THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
+ OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
+ THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr.
+
+ The IETF invites any interested party to bring to its attention any
+ copyrights, patents or patent applications, or other proprietary
+ rights that may cover technology that may be required to implement
+ this standard. Please address the information to the IETF at
+ ietf-ipr@ietf.org.
+
+
+
+
+
+
+
+
+
+
+
+
+Risson & Moors Informational [Page 91]
+