diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc4981.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc4981.txt')
-rw-r--r-- | doc/rfc/rfc4981.txt | 5099 |
1 files changed, 5099 insertions, 0 deletions
diff --git a/doc/rfc/rfc4981.txt b/doc/rfc/rfc4981.txt new file mode 100644 index 0000000..6e832a9 --- /dev/null +++ b/doc/rfc/rfc4981.txt @@ -0,0 +1,5099 @@ + + + + + + +Network Working Group J. Risson +Request for Comments: 4981 T. Moors +Category: Informational University of New South Wales + September 2007 + + + Survey of Research towards Robust Peer-to-Peer Networks: + Search Methods + +Status of This Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +IESG Note + + This RFC is not a candidate for any level of Internet Standard. The + IETF disclaims any knowledge of the fitness of this RFC for any + purpose and notes that the decision to publish is not based on IETF + review apart from IESG review for conflict with IETF work. The RFC + Editor has chosen to publish this document at its discretion. See + RFC 3932 for more information. + +Abstract + + The pace of research on peer-to-peer (P2P) networking in the last + five years warrants a critical survey. P2P has the makings of a + disruptive technology -- it can aggregate enormous storage and + processing resources while minimizing entry and scaling costs. + + Failures are common amongst massive numbers of distributed peers, + though the impact of individual failures may be less than in + conventional architectures. Thus, the key to realizing P2P's + potential in applications other than casual file sharing is + robustness. + + P2P search methods are first couched within an overall P2P taxonomy. + P2P indexes for simple key lookup are assessed, including those based + on Plaxton trees, rings, tori, butterflies, de Bruijn graphs, and + skip graphs. Similarly, P2P indexes for keyword lookup, information + retrieval and data management are explored. Finally, early efforts + to optimize range, multi-attribute, join, and aggregation queries + over P2P indexes are reviewed. Insofar as they are available in the + primary literature, robustness mechanisms and metrics are highlighted + throughout. However, the low-level mechanisms that most affect + robustness are not well isolated in the literature. Recommendations + are given for future research. + + + +Risson & Moors Informational [Page 1] + +RFC 4981 Survey of Research on P2P Search September 2007 + + +Table of Contents + + 1. Introduction ....................................................3 + 1.1. Related Disciplines ........................................6 + 1.2. Structured and Unstructured Routing ........................7 + 1.3. Indexes and Queries ........................................9 + 2. Index Types ....................................................10 + 2.1. Local Index (Gnutella) ....................................10 + 2.2. Central Index (Napster) ...................................12 + 2.3. Distributed Index (Freenet) ...............................13 + 3. Semantic Free Index ............................................15 + 3.1. Origins ...................................................15 + 3.1.1. Plaxton, Rajaraman, and Richa (PRR) ................15 + 3.1.2. Consistent Hashing .................................16 + 3.1.3. Scalable Distributed Data Structures (LH*) .........16 + 3.2. Dependability .............................................17 + 3.2.1. Static Dependability ...............................17 + 3.2.2. Dynamic Dependability ..............................18 + 3.2.3. Ephemeral or Stable Nodes -- O(log n) or + O(1) Hops ..........................................19 + 3.2.4. Simulation and Proof ...............................20 + 3.3. Latency ...................................................21 + 3.3.1. Hop Count and the O(1)-Hop DHTs ....................21 + 3.3.2. Proximity and the O(log n)-Hop DHTs ................22 + 3.4. Multicasting ..............................................23 + 3.4.1. Multicasting vs. Broadcasting ......................23 + 3.4.2. Motivation for DHT-based Multicasting ..............23 + 3.4.3. Design Issues ......................................24 + 3.5. Routing Geometries ........................................25 + 3.5.1. Plaxton Trees (Pastry, Tapestry) ...................25 + 3.5.2. Rings (Chord, DKS) .................................27 + 3.5.3. Tori (CAN) .........................................28 + 3.5.4. Butterflies (Viceroy) ..............................29 + 3.5.5. de Bruijn (D2B, Koorde, Distance Halving, ODRI) ....30 + 3.5.6. Skip Graphs ........................................32 + 4. Semantic Index .................................................33 + 4.1. Keyword Lookup ............................................34 + 4.1.1. Gnutella Enhancements ..............................36 + 4.1.2. Partition-by-Document, Partition-by-Keyword ........38 + 4.1.3. Partial Search, Exhaustive Search ..................39 + 4.2. Information Retrieval .....................................39 + 4.2.1. Vector Model (PlanetP, FASD, eSearch) ..............41 + 4.2.2. Latent Semantic Indexing (pSearch) .................43 + 4.2.3. Small Worlds .......................................43 + 5. Queries ........................................................44 + 5.1. Range Queries .............................................45 + 5.2. Multi-Attribute Queries ...................................48 + 5.3. Join Queries ..............................................50 + + + +Risson & Moors Informational [Page 2] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + 5.4. Aggregation Queries .......................................50 + 6. Security Considerations ........................................52 + 7. Conclusions ....................................................52 + 8. Acknowledgments ................................................53 + 9. References .....................................................54 + 9.1. Informative References ....................................54 + +1. Introduction + + Peer-to-peer (P2P) networks are those that exhibit three + characteristics: self-organization, symmetric communication, and + distributed control [1]. A self-organizing P2P network + "automatically adapts to the arrival, departure and failure of nodes" + [2]. Communication is symmetric in that peers act as both clients + and servers. It has no centralized directory or control point. + USENET servers and BGP peers have these traits [3] but the emphasis + here is on the flurry of research since 2000. Leading examples + include Gnutella [4], Freenet [5], Pastry [2], Tapestry [6], Chord + [7], the Content Addressable Network (CAN) [8], pSearch [9], and + Edutella [10]. Some have suggested that peers are inherently + unreliable [11]. Others have assumed well-connected, stable peers + [12]. + + This critical survey of P2P academic literature is warranted, given + the intensity of recent research. At the time of writing, one + research database lists over 5,800 P2P publications [13]. One vendor + surveyed P2P products and deployments [14]. There is also a tutorial + survey of leading P2P systems [15]. DePaoli and Mariani recently + reviewed the dependability of some early P2P systems at a high level + [16]. The need for a critical survey was flagged in the peer-to-peer + research group of the Internet Research Task Force (IRTF) [17]. + + P2P is potentially a disruptive technology with numerous + applications, but this potential will not be realized unless it is + demonstrated to be robust. A massively distributed search technique + may yield numerous practical benefits for applications [18]. A P2P + system has potential to be more dependable than architectures relying + on a small number of centralized servers. It has potential to evolve + better from small configurations -- the capital outlays for high + performance servers can be reduced and spread over time if a P2P + assembly of general purpose nodes is used. A similar argument + motivated the deployment of distributed databases -- one thousand, + off-the-shelf PC processors are more powerful and much less expensive + than a large mainframe computer [19]. Storage and processing can be + aggregated to achieve massive scale. Wasteful partitioning between + servers or clusters can be avoided. As Gedik and Liu put it, if P2P + is to find its way into applications other than casual file sharing, + then reliability needs to be addressed [20]. + + + +Risson & Moors Informational [Page 3] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + The taxonomy of Figure 1 divides the entire body of P2P research + literature along four lines: search, storage, security, and + applications. This survey concentrates on search aspects. A P2P + search network consists of an underlying index (Sections 2 to 4) and + queries that propagate over that index (Section 5). + + Search [18, 21-29] + Semantic-Free Indexes [2, 6, 7, 30-52] + Plaxton Trees + Rings + Tori + Butterflies + de Bruijn Graphs + Skip Graphs + Semantic Indexes [4, 53-71] + Keyword Lookup + Peer Information Retrieval + Peer Data Management + Queries [20, 22, 23, 25, 32, 38, 41, 56, 72-100] + Range Queries + Multi-Attribute Queries + Join Queries + Aggregation Queries + Continuous Queries + Recursive Queries + Adaptive Queries + + Storage + Consistency & Replication [101-112] + Eventual consistency + Trade-offs + Distribution [39, 42, 90, 92, 113-131] + Epidemics, Bloom Filters + Fault Tolerance [40, 105, 132-139] + Erasure Coding + Byzantine Agreement + Locality [24, 43, 47, 140-160] + Load Balancing [37, 86, 100, 107, 151, 161-171] + + + + + + + + + + + + + +Risson & Moors Informational [Page 4] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + Security + Character [172-182] + Identity + Reputation and Trust + Incentives + Goals [25, 27, 71, 183-197] + Availability + Authenticity + Anonymity + Access Control + Fair Trading + + Applications [1, 198-200] + Memory [32, 90, 142, 201-222] + File Systems + Web + Content Delivery Networks + Directories + Service Discovery + Publish / Subscribe ... + Intelligence [223-228] + GRID + Security... + Communication [12, 92, 119, 229-247] + Multicasting + Streaming Media + Mobility + Sensors... + + Figure 1: Classification of P2P Research Literature + + This survey is concerned with two questions. The first, "How do P2P + search networks work?" This foundation is important given the pace + and breadth of P2P research in the last five years. In Section 2, we + classify indexes as local, centralized and distributed. Since + distributed indexes are becoming dominant, they are given closer + attention in Sections 3 and 4. Section 3 compares distributed P2P + indexes for simple key lookup; in particular, their origins (Section + 3.1), dependability (Section 3.2), latency (Section 3.3), and their + support for multicast (Section 3.4). It classifies those indexes + according to their routing geometry (Section 3.5) -- Plaxton trees, + rings, tori, butterflies, de Bruijn graphs and skip graphs. Section + 4 reviews distributed P2P indexes supporting keyword lookup (Section + 4.1) and information retrieval (Section 4.2). Section 5 probes the + embryonic research on P2P queries; in particular, range queries + (Section 5.1), multi-attribute queries (Section 5.2), join queries + (Section 5.3), and aggregation queries (Section 5.4). + + + + +Risson & Moors Informational [Page 5] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + The second question, "How robust are P2P search networks?" Insofar + as it is available in the research literature, we tease out the + robustness mechanisms and metrics throughout Sections 2 to 5. + Unfortunately, robustness is often more sensitive to low-level design + choices than it is to the broad P2P index structure, yet these + underlying design choices are seldom isolated in the primary + literature [248]. Furthermore, there has been little consensus on + P2P robustness metrics (Section 3.2). Section 8 gives + recommendations to address these important gaps. + +1.1. Related Disciplines + + Peer-to-peer research draws upon numerous distributed systems + disciplines. Networking researchers will recognize familiar issues + of naming, routing, and congestion control. P2P designs need to + address routing and security issues across network region boundaries + [152]. Networking research has traditionally been host-centric. The + Web's Universal Resource Identifiers are naturally tied to specific + hosts, making object mobility a challenge [216]. + + P2P work is data-centric [249]. P2P systems for dynamic object + location and routing have borrowed heavily from the distributed + systems corpus. Some have used replication, erasure codes, and + Byzantine agreement [111]. Others have used epidemics for durable + peer group communication [39]. + + Similarly, P2P research is set to benefit from database research + [250]. Database researchers will recognize the need to reapply + Codd's principle of physical data independence, that is, to decouple + data indexes from the applications that use the data [23]. It was + the invention of appropriate indexing mechanisms and query + optimizations that enabled data independence. Database indexes like + B+ trees have an analog in P2P's distributed hash tables (DHTs). + Wide-area, P2P query optimization is a ripe, but challenging, area + for innovation. + + More flexible distribution of objects comes with increased security + risks. There are opportunities for security researchers to deliver + new methods for availability, file authenticity, anonymity, and + access control [25]. Proactive and reactive mechanisms are needed to + deal with large numbers of autonomous, distributed peers. To build + robust systems from cooperating but self-interested peers, issues of + identity, reputation, trust, and incentives need to be tackled. + Although it is beyond the scope of this paper, robustness against + malicious attacks also ought to be addressed [195]. + + Possibly the largest portion of P2P research has majored on basic + routing structures [18], where research on algorithms comes to the + + + +Risson & Moors Informational [Page 6] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + fore. Should the overlay be "structured" or "unstructured"? Are the + two approaches competing or complementary? Comparisons of the + "structured" approaches (hypercubes, rings, toroids, butterflies, de + Bruijn, and skip graphs) have weighed the amount of routing state per + peer and the number of links per peer against overlay hop counts. + While "unstructured" overlays initially used blind flooding and + random walks, overheads usually trigger some structure, for example, + super-peers and clusters. + + P2P applications rely on cooperation between these disciplines. + Applications have included file sharing, directories, content + delivery networks, email, distributed computation, publish-subscribe + middleware, multicasting, and distributed authentication. Which + applications will be suited to which structures? Are there adaptable + mechanisms that can decouple applications from the underlying data + structures? What are the criteria for selection of applications + amenable to a P2P design [1]? + + Robustness is emphasized throughout the survey. We are particularly + interested in two aspects. The first, dependability, was a leading + design goal for the original Internet [251]. It deserves the same + status in P2P. The measures of dependability are well established: + reliability, a measure of the mean-time-to-failure (MTTF); + availability, a measure of both the MTTF and the mean-time-to-repair + (MTTR); maintainability; and safety [252]. The second aspect is the + ability to accommodate variation in outcome, which one could call + adaptability. Its measures have yet to be defined. In the context + of the Internet, it was only recently acknowledged as a first-class + requirement [253]. In P2P, it means planning for the tussles over + resources and identity. It means handling different kinds of queries + and accommodating changeable application requirements with minimal + intervention. It means "organic scaling" [22], whereby the system + grows gracefully, without a priori data center costs or architectural + breakpoints. + + In the following section, we discuss one notable omission from the + taxonomy of P2P networking in Figure 1 -- routing. + +1.2. Structured and Unstructured Routing + + P2P routing algorithms have been classified as "structured" or + "unstructured". Peers in unstructured overlay networks join by + connecting to any existing peers [254]. In structured overlays, the + identifier of the joining peer determines the set of peers that it + connects to [254]. Early instantiations of Gnutella were + unstructured -- keyword queries were flooded widely [255]. Napster + [256] had decentralized content and a centralized index, so it only + partially satisfies the distributed control criteria for P2P systems. + + + +Risson & Moors Informational [Page 7] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + Early structured algorithms included Plaxton, Rajaraman and Richa + (PRR) [30], Pastry [2], Tapestry [31], Chord [7], and the Content + Addressable Network [8]. Mishchke and Stiller recently classified + P2P systems by the presence or absence of structure in routing tables + and network topology [257]. + + Some have cast unstructured and structured algorithms as competing + alternatives. Unstructured approaches have been called "first + generation", implicitly inferior to the "second generation" + structured algorithms [2, 31]. When generic key lookups are + required, these structured, key-based routing schemes can guarantee + location of a target within a bounded number of hops [23]. The + broadcasting unstructured approaches, however, may have large routing + costs, or fail to find available content [22]. Despite the apparent + advantages of structured P2P, several research groups are still + pursuing unstructured P2P. + + There have been two main criticisms of structured systems [61]. The + first relates to peer transience, which in turn, affects robustness. + Chawathe, et al. opined that highly transient peers are not well + supported by DHTs [61]. P2P systems often exhibit "churn", with + peers continually arriving and departing. One objection to concerns + about highly transient peers is that many applications use peers in + well-connected parts of the network. The Tapestry authors analyzed + the impact of churn in a network of 1000 nodes [31]. Others opined + that it is possible to maintain a robust DHT at relatively low cost + [258]. Very few papers have quantitatively compared the resilience + of structured systems. Loguinov, Kumar, et al. claimed that there + were only two such works [24, 36]. + + The second criticism of structured systems is that they do not + support keyword searches and complex queries as well as unstructured + systems. Given the current file-sharing deployments, keyword + searches seem more important than exact-match key searches in the + short term. Paraphrased, "most queries are for hay, not needles" + [61]. + + More recently, some have justifiably seen unstructured and structured + proposals as complementary, and have devised hybrid models [259]. + Their starting point was the observation that unstructured flooding + or random walks are inefficient for data that is not highly + replicated across the P2P network. Structured graphs can find keys + efficiently, irrespective of replication. Castro, et al. proposed + Structella, a hybrid of Gnutella built on top of Pastry [259]. + Another design used structured search for rare items and unstructured + search for massively replicated items [54]. + + + + + +Risson & Moors Informational [Page 8] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + However, the "structured versus unstructured routing" taxonomy is + becoming less useful, for two reasons, Firstly, most "unstructured" + proposals have evolved and incorporated structure. Consider the + classic "unstructured" system, Gnutella [4]. For scalability, its + peers are either ultrapeers or leaf nodes. This hierarchy is + augmented with a query routing protocol whereby ultrapeers receive a + hashed summary of the resource names available at leaf nodes. + Between ultrapeers, simple query broadcast is still used, though + methods to reduce the query load here have been considered [260]. + Secondly, there are emerging schema-based P2P designs [59], with + super-node hierarchies and structure within documents. These are + quite distinct from the structured DHT proposals. + +1.3. Indexes and Queries + + Given that most, if not all, P2P designs today assume some structure, + a more instructive taxonomy would describe the structure. In this + survey, we use a database taxonomy in lieu of the networking + taxonomy, as suggested by Hellerstein, Cooper, and Garcia-Molina [23, + 261]. The structure is determined by the type of index (Sections 2 , + 3, and 4). Queries feature in lieu of routing (Section 5). The DHT + algorithms implement a "semantic-free index" [216]. They are + oblivious of whether keys represent document titles, meta-data, or + text. Gnutella-like and schema-based proposals have a "semantic + index". + + Index engineering is at the heart of P2P search methods. It captures + a broad range of P2P issues, as demonstrated by the Search/Index + Links model [261]. As Manber put it, "the most important of the + tools for information retrieval is the index -- a collection of terms + with pointers to places where information about documents can be + found" [262]. Sen and Wang noted that a "P2P network" usually + consists of connections between hosts for application-layer + signaling, rather than for the data transfer itself [263]. + Similarly, we concentrate on the "signaled" indexes and queries. + + Our focus here is the dependability and adaptability of the search + network. Static dependability is a measure of how well queries route + around failures in a network that is normally fault-free. Dynamic + dependability gives an indication of query success when nodes and + data are continually joining and leaving the P2P system. An + adaptable index accommodates change in the data and query + distribution. It enables data independence, in that it facilitates + changes to the data layout without requiring changes to the + applications that use the data [23]. An adaptable P2P system can + support rich queries for a wide range of applications. Some + applications benefit from simple, semantic-free key lookups [264]. + Others require more complex, Structured Query Language (SQL)-like + + + +Risson & Moors Informational [Page 9] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + queries to find documents with multiple keywords, or to aggregate or + join query results from distributed relations [22]. + +2. Index Types + + A P2P index can be local, centralized, or distributed. With a local + index, a peer only keeps the references to its own data, and does not + receive references for data at other nodes. The very early Gnutella + design epitomized the local index (Section 2.1). In a centralized + index, a single server keeps references to data on many peers. The + classic example is Napster (Section 2.2). With distributed indexes, + pointers towards the target reside at several nodes. One very early + example is Freenet (Section 2.3). Distributed indexes are used in + most P2P designs nowadays -- they dominate this survey. + + P2P indexes can also be classified as non-forwarding and forwarding. + When queries are guided by a non-forwarding index, they jump to the + node containing the target data in a single hop. There have been + semantic and semantic-free one-hop schemes [138, 265, 266]. Where + scalability to a massive number of peers is required, these schemes + have been extended to two hops [267, 268]. More common are the + forwarding P2Ps, where the number of hops varies with the total + number of peers, often logarithmically. The related trade-offs + between routing state, lookup latency, update bandwidth, and peer + churn are critical to total system dependability. + +2.1. Local Index (Gnutella) + + P2Ps with a purely local data index are becoming rare. In such + designs, peers flood queries widely and only index their own content. + They enable rich queries - the search is not limited to a simple key + lookup. However, they also generate a large volume of query traffic + with no guarantee that a match will be found, even if it does exist + on the network. For example, to find potential peers on the early + instantiations of Gnutella, 'ping' messages were broadcast over the + P2P network and the 'pong' responses were used to build the node + index. Then, small 'query' messages, each with a list of keywords, + are broadcast to peers that respond with matching filenames [4]. + + There have been numerous attempts to improve the scalability of + local-index P2P networks. Gnutella uses fixed time-to-live (TTL) + rings, where the query's TTL is set less than 7-10 hops [4]. Small + TTLs reduce the network traffic and the load on peers, but also + reduce the chances of a successful query hit. One paper reported, + perhaps a little too bluntly, that the fixed "TTL-based mechanism + does not work" [67]. To address this TTL selection problem, they + proposed an expanding ring, known elsewhere as iterative deepening + [29]. It uses successively larger TTL counters until there is a + + + +Risson & Moors Informational [Page 10] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + match. The flooding, ring, and expanding ring methods all increase + network load with duplicated query messages. A random walk, whereby + an unduplicated query wanders about the network, does indeed reduce + the network load but massively increases the search latency. One + solution is to replicate the query k times at each peer. Called + random k-walkers, this technique can be coupled with TTL limits, or + periodic checks with the query originator, to cap the query load + [67]. Adamic, Lukose, et al. suggested that the random walk searches + be directed to nodes with a higher degree, that is, with larger + numbers of inter-peer connections [269]. They assumed that higher- + degree peers are also capable of higher query throughputs. However, + without some balancing design rule, such peers would be swamped with + the entire P2P signaling traffic. In addition to the above + approaches, there is the 'directed breadth-first' algorithm [29]. It + forwards queries within a subset of peers selected according to + heuristics on previous performance, like the number of successful + query results. Another algorithm, called probabilistic flooding, has + been modeled using percolation theory [270]. + + Several measurement studies have investigated locally indexed P2Ps. + Jovanovic noted Gnutella's power law behaviour [70]. Sen and Wang + compared the performance of Gnutella, Fasttrack [271], and Direct + Connect [263, 272, 273]. At the time, only Gnutella used local data + indexes. All three schemes now use distributed data indexes, with + hierarchy in the form of Ultrapeers (Gnutella), Super-Nodes + FastTrack), and Hubs (Direct Connect). It was found that a very + small percentage of peers have a very high degree and that the total + system dependability is at the mercy of such peers. While peer up- + time and bandwidth were heavy-tailed, they did not fit well with the + Zipf distribution. Fortunately for Internet Service Providers, + measures aggregated by IP prefix and Autonomous System (AS) were more + stable than for individual IP addresses. A study of University of + Washington traffic found that Gnutella and Kazaa together contributed + 43% of the university's total TCP traffic [274]. They also reported + a heavy-tailed distribution, with 600 external peers (out of 281,026) + delivering 26% of Kazaa bytes to internal peers. Furthermore, + objects retrieved from the P2P network were typically three orders of + magnitude larger than Web objects -- 300 objects contributed to + almost half the total outbound Kazaa bandwidth. Others reported + Gnutella's topology mismatch, whereby only 2-5% of P2P connections + link peers in the same Autonomous System (AS), despite over 40% of + peers being in the top 10 ASs [65]. Together these studies + underscore the significance of multimedia sharing applications. They + motivate interesting caching and locality solutions to the topology + mismatch problem. + + These same studies bear out one main dependability lesson: total + system dependability may be sensitive to the dependability of high- + + + +Risson & Moors Informational [Page 11] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + degree peers. The designers of Scamp translated this observation to + the design heuristic, "have the degree of each node be of nearly + equal size" [153]. They analyzed a system of N peers, with mean + degree c.log(n), where link failures occur independently with + probability e. If d>0 is fixed and c>(1+d)/(-log(e)), then the + probability of graph disconnection goes to zero as N->infinity. + Otherwise, if c<(1-d)/(-log(e)), then the probability of + disconnection goes to one as N->infinity. They presented a + localizer, which finds approximate minima to a global function of + peer degree and arbitrary link costs using only local information. + The Scamp overlay construction algorithms could support any of the + flooding and walking routing schemes above, or other epidemic and + multicasting schemes for that matter. Resilience to high churn rates + was identified for future study. + +2.2. Central Index (Napster) + + Centralized schemes like Napster [256] are significant because they + were the first to demonstrate the P2P scalability that comes from + separating the data index from the data itself. Ultimately, 36 + million Napster users lost their service not because of technical + failure, but because the single administration was vulnerable to the + legal challenges of record companies [275]. + + There has since been little research on P2P systems with central data + indexes. Such systems have also been called 'hybrid' since the index + is centralized but the data is distributed. Yang and Garcia-Molina + devised a four-way classification of hybrid systems [276]: unchained + servers, where users whose index is on one server do not see other + servers' indexes; chained servers, where the server that receives a + query forwards it to a list of servers if it does not own the index + itself; full replication, where all centralized servers keep a + complete index of all available metadata; and hashing, where keywords + are hashed to the server where the associated inverted list is kept. + The unchained architecture was used by Napster, but it has the + disadvantage that users do not see all indexed data in the system. + Strictly speaking, the other three options illustrate the distributed + data index, not the central index. The chained architecture was + recommended as the optimum for the music-swapping application at the + time. The methods by which clients update the central index were + classified as batch or incremental, with the optimum determined by + the query-to-login ratio. Measurements were derived from a clone of + Napster called OpenNap[277]. Another study of live Napster data + reported wide variation in the availability of peers, a general + unwillingness to share files (20-40% of peers share few or no files), + and a common understatement of available bandwidth so as to + discourage other peers from sharing one's link [202]. + + + + +Risson & Moors Informational [Page 12] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + Influenced by Napster's early demise, the P2P research community may + have prematurely turned its back on centralized architectures. + Chawathe, Ratnasamy, et al. opined that Google and Yahoo demonstrate + the viability of a centralized index. They argued that "the real + barriers to Napster-like designs are not technical but legal and + financial" [61]. Even this view may be a little too harsh on the + centralized architectures -- it implies that they always have an up- + front capital hurdle that is steeper than for distributed + architectures. The closer one looks at scalable 'centralized' + architectures, the less the distinction with 'distributed' + architectures seems to matter. For example, it is clear that + Google's designers consider Google a distributed, not centralized, + file system [278]. Google demonstrates the scale and performance + possible on commodity hardware, but still has a centralized master + that is critical to the operation of each Google cluster. Time may + prove that the value of emerging P2P networks, regardless of the + centralized-versus-distributed classification, is that they smooth + the capital outlays and remove the single points of failure across + the spectra of scale and geographic distribution. + +2.3. Distributed Index (Freenet) + + An important early P2P proposal for a distributed index was Freenet + [5, 71, 279]. While its primary emphasis was the anonymity of peers, + it did introduce a novel indexing scheme. Files are identified by + low-level "content-hash" keys and by "secure signed-subspace" keys, + which ensure that only a file owner can write to a file while anyone + can read from it. To find a file, the requesting peer first checks + its local table for the node with keys closest to the target. When + that node receives the query, it too checks for either a match or + another node with keys close to the target. Eventually, the query + either finds the target or exceeds time-to-live (TTL) limits. The + query response traverses the successful query path in reverse, + depositing a new routing table entry (the requested key and the data + holder) at each peer. The insert message similarly steps towards the + target node, updating routing table entries as it goes, and finally + stores the file there. Whereas early versions of Gnutella used + breadth-first flooding, Freenet uses a more economic depth-first + search [280]. + + An initial assessment has been done of Freenet's robustness. It was + shown that in a network of 1000 nodes, the median query path length + stayed under 20 hops for a failure of 30% of nodes. While the + Freenet designers considered this as evidence that the system is + "surprisingly robust against quite large failures" [71], the same + datapoint may well be outside meaningful operating bounds. How many + applications are useful when the first quartile of queries have path + lengths of several hundred hops in a network of only 1000 nodes, per + + + +Risson & Moors Informational [Page 13] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + Figure 4 of [71]? To date, there has been no analysis of Freenet's + dynamic robustness. For example, how does it perform when nodes are + continually arriving and departing? + + There have been both criticisms and extensions of the early Freenet + work. Gnutella proponents acknowledged the merit in Freenet's + avoidance of query broadcasting [281]. However, they are critical on + two counts: the exact file name is needed to construct a query; and + exactly one match is returned for each query. P2P designs using + DHTs, per Section 3, share similar characteristics -- a precise query + yields a precise response. The similarity is not surprising since + Freenet also uses a hash function to generate keys. However, the + query routing used in the DHTs has firmer theoretical foundations. + Another difference with DHTs is that Freenet will take time, when a + new node joins the network, to build an index that facilitates + efficient query routing. By the inventor's own admission, this is + damaging for a user's first impressions [282]. It was proposed to + download a copy of routing tables from seed nodes at startup, even + though the new node might be far from the seed node. Freenet's slow + startup motivated Mache, Gilbert, et al. to amend the overlay after + failed requests and to place additional index entries on successful + requests -- they claim almost an order of magnitude reduction in + average query path length [280]. Clarke also highlighted the lack of + locality or bandwidth information available for efficient query + routing decisions [282]. He proposed that each node gather response + times, connection times, and proportion of successful requests for + each entry in the query routing table. When searching for a key that + is not in its own routing table, it was proposed to estimate response + times from the routing metrics for the nearest known keys and + consequently choose the node that can retrieve the data fastest. The + response time heuristic assumed that nodes close in the key space + have similar response times. This assumption stemmed from early + deployment observations that Freenet peers seemed to specialize in + parts of the keyspace -- it has not been justified analytically. + Kronfol drew attention to Freenet's inability to do keyword searches + [283]. He suggested that peers cache lists of weighted keywords in + order to route queries to documents, using Term Frequency Inverse + Document Frequency (TFIDF) measures and inverted indexes (Section + 4.2.1). With these methods, a peer can route queries for simple + keyword lists or more complicated conjunctions and disjunctions of + keywords. Robustness analysis and simulation of Kronfol's proposal + remain open. + + The vast majority of P2P proposals in following sections rely on a + distributed index. + + + + + + +Risson & Moors Informational [Page 14] + +RFC 4981 Survey of Research on P2P Search September 2007 + + +3. Semantic Free Index + + Many of today's distributed network indexes are semantic. The + semantic index is human-readable. For example, it might associate + information with other keywords, a document, a database key, or even + an administrative domain. It makes it easy to associate objects with + particular network providers, companies, or organizations, as + evidenced in the Domain Name System (DNS). However, it can also + trigger legal tussles and frustrate content replication and migration + [216]. + + Distributed Hash Tables (DHTs) have been proposed to provide + semantic-free, data-centric references. DHTs enable one to find an + object's persistent key in a very large, changing set of hosts. They + are typically designed for [23]: + + a) low degree. If each node keeps routing information for only a + small number of other nodes, the impact of high node arrival and + departure rates is contained; + + b) low hop count. The hops and delay introduced by the extra + indirection are minimized; + + c) greedy routing. Nodes independently calculate a short path to the + target. At each hop, the query moves closer to the target; and + + d) robustness. A path to the target can be found even when links or + nodes fail. + +3.1. Origins + + To understand the origins of recent DHTs, one needs to look to three + contributions from the 1990s. The first two -- Plaxton, Rajaraman, + and Richa (PRR) [30] and Consistent Hashing [49] -- were published + within one month of each other. The third, the Scalable Distributed + Data Structure (SDDS) [52], was curiously ignored in significant + structured P2P designs despite having some similar goals [2, 6, 7]. + It has been briefly referenced in other P2P papers [46, 284-287]. + +3.1.1. Plaxton, Rajaraman, and Richa (PRR) + + PRR is the most recent of the three. It influenced the designs of + Pastry [2], Tapestry [6], and Chord [7]. The value of PRR is that it + can locate objects using fixed-length routing tables [6]. Objects + and nodes are assigned a semantic-free address, for example a 160-bit + key. Every node is effectively the root of a spanning tree. A + message routes toward an object by matching longer address suffixes, + until it encounters either the object's root node or another node + + + +Risson & Moors Informational [Page 15] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + with a 'nearby' copy. It can route around link and node failure by + matching nodes with a related suffix. The scheme has several + disadvantages [6]: global knowledge is needed to construct the + overlay; an object's root node is a single point of failure; nodes + cannot be inserted and deleted; and there is no mechanism for queries + to avoid congestion hot spots. + +3.1.2. Consistent Hashing + + Consistent Hashing [288] strongly influenced the designs of Chord [7] + and Koorde [37]. Karger, et al. introduced Consistent Hashing in the + context of the Web-caching problem [49]. Web servers could + conceivably use standard hashing to place objects across a network of + caches. Clients could use the approach to find the objects. For + normal hashing, most object references would be moved when caches are + added or deleted. On the other hand, Consistent Hashing is "smooth" + -- when caches are added or deleted, the minimum number of object + references move so as to maintain load balancing. Consistent Hashing + also ensures that the total number of caches responsible for a + particular object is limited. Whereas Litwin's Linear Hashing (LH*) + scheme requires 'buckets' to be added one at a time in sequence [50], + Consistent Hashing allows them to be added in any order [49]. There + is an open Consistent Hashing problem pertaining to the fraction of + items moved when a node is inserted [165]. Extended Consistent + Hashing was recently proposed to randomize queries over the spread of + caches to significantly reduce the load variance [289]. + Interestingly, Karger [49] referred to an older DHT algorithm by + Devine that used "a novel autonomous location discovery algorithm + that learns the buckets' locations instead of using a centralized + directory" [51]. + +3.1.3. Scalable Distributed Data Structures (LH*) + + In turn, Devine's primary point of reference was Litwin's work on + SDDSs and the associated LH* algorithm [52]. An SDDS satisfies three + design requirements: files grow to new servers only when existing + servers are well loaded; there is no centralized directory; and the + basic operations like insert, search, and split never require atomic + updates to multiple clients. Honicky and Miller suggested the first + requirement could be considered a limitation since expansion to new + servers is not under administrative control [286]. Litwin recently + noted numerous similarities and differences between LH* and Chord + [290]. He found that both implement key search. Although LH* refers + to clients and servers, nodes can operate as peers in both. Chord + 'splits' nodes when a new node is inserted, while LH* schedules + 'splits' to avoid overload. Chord requests travel O(log n) hops, + while LH* client requests need, at most, two hops to find the target. + Chord stores a small number of 'fingers' at each node. LH* servers + + + +Risson & Moors Informational [Page 16] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + store N/2 to N addresses while LH* clients store 1 to N addresses. + This trade-off between hop count and the size of the index affects + system robustness, and bears striking similarity to recent one- and + two-hop P2P schemes in Section 2. The arrival and departure of LH* + clients does not disrupt LH* server metadata at all. Given the size + of the index, the arrival and departure of LH* servers are likely to + cause more churn than that of Chord nodes. Unlike Chord, LH* has a + single point of failure, the split coordinator. It can be + replicated. Alternatively, it can be removed in later LH* variants, + though details have not been progressed for lack of practical need + [290]. + +3.2. Dependability + + We make four overall observations about their dependability. + Dependability metrics fall into two categories: static dependability, + a measure of performance before recovery mechanisms take over; and + dynamic dependability, for the most likely case in massive networks + where there is continual failure and recovery ("churn"). + +3.2.1. Static Dependability + + Observation A: Static dependability comparisons show that no O(log n) + DHT geometry is significantly more dependable than the other O(log n) + geometries. + + Gummadi, et al. compared the tree, hypercube, butterfly, ring, XOR, + and hybrid geometries. In such geometries, nodes generally know + about O(log n) neighbors and route to a destination in O(log n) hops, + where N is the number of nodes in the overlay. Gummadi, et al. asked + "Why not the ring?" They concluded that only the ring and XOR + geometries permit flexible choice of both neighbors and alternative + routes [24]. Loguinov, et al. added the de Bruijn graph to their + comparison [36]. They concluded that the classical analyses, for + example the probability that a particular node becomes disconnected, + yield no major differences between the resilience of Chord, CAN, and + de Bruijn graphs. Using bisection width (the minimum edge count + between two equal partitions) and path overlap (the likelihood that + backup paths will encounter the same failed nodes or links as the + primary path), they argued for the superior resilience of the de + Bruijn graph. In short, ring, XOR, and de Bruijn graphs all permit + flexible choice of alternative paths, but only in de Bruijn are the + alternate paths independent of each other [36]. + + + + + + + + +Risson & Moors Informational [Page 17] + +RFC 4981 Survey of Research on P2P Search September 2007 + + +3.2.2. Dynamic Dependability + + Observation B: Dynamic dependability comparisons show that DHT + dependability is sensitive to the underlying topology maintenance + algorithms. + + Li, et al. give the best comparison to date of several leading DHTs + during churn [291]. They relate the disparate configuration + parameters of Tapestry, Chord, Kademlia, Kelips, and OneHop to + fundamental design choices. For each of these DHTs, they plotted the + optimal performance in terms of lookup latency (milliseconds) and + fraction of failed lookups. The results led to several important + insights about the underlying algorithms, for example: increasing + routing table size is more cost-effective than increasing the rate of + periodic stabilization; learning about new nodes during the lookup + process sometimes eliminates the need for stabilization; and parallel + lookups reduce latency due to timeouts more effectively than faster + stabilization. Similarly, Zhuang, et al. compared keep-alive + algorithms for DHT failure detection [292]. Such algorithmic + comparisons can significantly improve the dependability of DHT + designs. + + In Figure 2, we propose a taxonomy for the topology maintenance + algorithms that influence dependability. The algorithms can be + classified by how nodes join and leave, how they first detect + failures, how they share information about topology updates, and how + they react when they receive information about topology updates. + + + + + + + + + + + + + + + + + + + + + + + + +Risson & Moors Informational [Page 18] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + Normal Updates + Joins (passive; active) [293] + Leaves (passive; active) [293] + + Fault Detection [292] + Maintenance + Proactive (periodic or keep-alive probes) + Reactive (correction-on-use, correction-on-failure) [294] + Report + Negative (all dead nodes, nodes recently failed) + Positive (all live nodes; nodes recently recovered) [292] + + Topology Sharing: yes/ no [292] + Multicast Tree (explicit, implicit) [267, 295] + Gossip (timeouts; number of contacts) [39] + + Corrective Action + Routing + Rerouting actions + (reroute once; route in parallel [291]; reject) + Routing timeouts + (TCP-style, virtual coordinates) [296] + Topology + Update action (evict/ replace/ tag node) + Update timeliness (immediate, periodic[296], delayed [297]) + + Figure 2: Topology Maintenance in Distributed Hash Tables + +3.2.3. Ephemeral or Stable Nodes -- O(log n) or O(1) Hops + + Observation C: Most DHTs use O(log n) geometries to suit ephemeral + nodes. The O(1) hop DHTs suit stable nodes and deserve more research + attention. + + Most of the DHTs in Section 3.5 assume that nodes are ephemeral, with + expected lifetimes of one to two hours. Therefore, they mostly use + an O(log n) geometry. The common assumption is that maintenance of + full routing tables in the O(1) hop DHTs will consume excessive + bandwidth when nodes are continually joining and leaving. The + corollary is that, when they run on stable infrastructure servers + [298], most of the DHTs in Section 3.5 are less than optimal -- + lookups take many more hops than necessary, wasting latency and + bandwidth budgets. The O(1) hop DHTs suit stable deployments and + high lookup rates. For a churning 1024-node network, Li, et al. + concluded that OneHop is superior to Chord, Tapestry, Kademlia, and + Kelips in terms of latency and lookup success rate [291]. For a + 3000-node network, they concluded that "OneHop is only preferable to + Chord when the deployment scenario allows a communication cost + + + +Risson & Moors Informational [Page 19] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + greater than 20 bytes per node per second" [291]. This apparent + limitation needs to be put in context. They assumed that each node + issues only one lookup every 10 minutes and has a lifetime of only 60 + minutes. It seems reasonable to expect that in some deployments, + nodes will have a lifetime of weeks or more, a maintenance bandwidth + of tens of kilobits per second, and a load of hundreds of lookups per + second. O(1) hop DHTs are superior in such situations. OneHop can + scale at least to many tens of thousands of nodes [267]. The recent + O(1) hop designs [267, 295] are vastly outnumbered by the O(log n) + DHTs in Section 3.5. Research on the algorithms of Figure 2 will + also yield improvements in the dependability of the O(1) hop DHTs. + +3.2.4. Simulation and Proof + + Observation D: Although not yet a mature science, the study of DHT + dependability is helped by recent simulation and formal development + tools. + + While there are recent reference architectures [294, 298], much of + the DHT literature in Section 3.5 does not lend itself to repeatable, + comparative studies. The best comparative work to date [291] relies + on the Peer-to-Peer Simulator (P2PSIM) [299]. At the time of + writing, it supports more DHT geometries than any other simulator. + As the study of DHTs matures, we can expect to see the simulation + emphasis shift from geometric comparison to a comparison of the + algorithms of Figure 2. + + P2P correctness proofs generally rely on less-than-complete formal + specifications of system invariants and events [7, 45, 300]. Li and + Plaxton expressed concern that "when many joins and leaves happen + concurrently, it is not clear whether the neighbor tables will remain + in a 'good' state" [47]. While acknowledging that guaranteeing + consistency in a failure-prone network is impossible, Lynch, Malkhi, + et al. sketched amendments to the Chord algorithm to guarantee + atomicity [301]. More recently, Gilbert, Lynch, et al. gave a new + algorithm for atomic read/write memory in a churning distributed + network, suggesting it to be a good match for P2P [302]. Lynch and + Stoica show in an enhancement to Chord that lookups are provably + correct when there is a limited rate of joins and failures [303]. + Fault Tolerant Active Rings is a protocol for active joins and leaves + that was formally specified and proven using B-method tools [304]. A + good starting point for a formal DHT development would be the + numerous informal API specifications [22, 305, 306]. Such work could + be informed by other efforts to formally specify routing invariants + [307, 308]. + + + + + + +Risson & Moors Informational [Page 20] + +RFC 4981 Survey of Research on P2P Search September 2007 + + +3.3. Latency + + The key metrics for DHT latency are: + + 1) Shortest-Path Distance and Diameter. In graph theory, the + shortest-path distance is the minimum number of edges in any path + between two vertices of the graph. Diameter is the largest of all + shortest-path distances in a graph [309]. Networking synonyms for + distance on a DHT are "hop count" and "lookup length". + + 2) Latency and Latency Stretch. Two types of latency are relevant + here -- network-layer latency and overlay latency. Network-layer + latency has been referred to as "proximity" or "locality" [24]. + Stretch is the cost of an overlay path between two nodes, divided + by the cost of the direct network path between those nodes [310]. + Latency stretch is also known as the "relative delay penalty" + [311]. + +3.3.1. Hop Count and the O(1)-Hop DHTs + + Hop count gives an approximate indication of path latency. O(1)-hop + DHTs have path latencies lower than the O(log n)-hop DHTs [291]. + This significant advantage is often overlooked on account of concern + about the messaging costs to maintain large routing tables (Section + 3.2.3). Such concern is justified when the mean node lifetime is + only a few hours and the mean lookup interval per node is more than a + few seconds (the classic profile of a P2P file-sharing node). + However, for a large, practical operating range (node lifetimes of + days or more, lookup rates of over tens of lookups per second per + node, up to ~100,000 nodes), the total messaging cost in O(1) hop + DHTs is lower than in O(log n) DHTs [312]. Lookups and routing table + maintenance contribute to the total messaging cost. If a deployment + fits this operating range, then O(1)-hop DHTs will give lower path + latencies and lower total messaging costs. An additional merit of + the O(1)-hop DHTs is that they yield lower lookup failure rates than + their O(log N)-hop counterparts [291]. + + Low hop count can be achieved in two ways: each node has a large O(N) + index of nodes; or the object references can be replicated on many + nodes. Beehive [313], Kelips [39], LAND [310], and Tulip [314] are + examples of the latter category. Beehive achieves O(1) hops on + average and O(log n) hops in the worst case, by proactive replication + of popular objects. Kelips replicates the 'file index'. It incurs + O(sqrt(N)) storage costs for both the node index and the file index. + LAND uses O(log n) reference pointers for each stored object and an + O(log n) index to achieve a worst-case 1+e stretch, where 0<e. The + Kelips-like Tulip [314] requires 2 hops per lookup. Each node + + + + +Risson & Moors Informational [Page 21] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + maintains 2sqrt(N)log(N) links to other nodes and objects are + replicated on O(sqrt(N)) nodes. + + The DHTs with a large O(N) node index can be divided into two groups: + those for which the index is always O(N); and those for which the + index opportunistically ranges from O(log n) to O(N). Linear Hashing + (LH*) servers [52], OneHop [267], and 1h-Calot [295] fall into the + former category. EpiChord [315] and Accordion [316] are examples of + the latter. + +3.3.2. Proximity and the O(log n)-Hop DHTs + + If one chooses not to use single-hop DHTs, hop count is a weak + indicator of end-to-end path latency. Some hops may incur large + delays because of intercontinental or satellite links. Consequently, + numerous DHT designs minimize path latency by considering the + proximity of nodes. Gummadi, et al. classified the proximity methods + as follows [24]: + + 1) Proximity Neighbor Selection (PNS). The nodes in the routing + table are chosen based on the latency of the direct hop to those + nodes. The latency may be explicitly measured [317], or it may be + estimated using one of several synthetic coordinate systems [150, + 154, 318]. As a lower bound on PNS performance, Dabek, et al. + showed that lookups on O(log n) DHTs take at least 1.5 times the + average roundtrip time of the underlying network [154]. + + 2) Proximity Route Selection (PRS). At lookup time, the choice of + the next-hop node relies on the latency of the direct hop to that + node. PRS is less effective than PNS, though it may complement it + [24]. Some of the routing geometries in Section 3.5 do not + support PNS and/or PRS [24]. + + 3) Proximity Identifier Selection (PIS). Node identifiers indicate + geographic position. PIS frustrates load balancing, increases the + risk of correlated failures, and is not often used [24]. + + The proximity study by Gummadi, et al. assumed recursive routing, + though they suggested that PNS would also be superior to PRS with + iterative routing [24]. Dabek, et al. found that recursive lookups + take 0.6 times as long as iterative lookups [150]. + + Beyond the explicit use of proximity information, redundancy can help + to avoid slow paths and servers. One may increase the number of + replicas [150], use parallel lookups [291, 316], use alternate routes + on failure [150], or use multiple gateway nodes to enter the DHT + [317]. + + + + +Risson & Moors Informational [Page 22] + +RFC 4981 Survey of Research on P2P Search September 2007 + + +3.4. Multicasting + +3.4.1. Multicasting vs. Broadcasting + + "Multicasting" here means sending a message to a subset of an + overlay's nodes. Nodes explicitly join and leave this subset, called + a "multicast group". "Broadcasting" here is a special case of + multicasting in which a message is sent to all nodes in the overlay. + Broadcasting relies on overlay membership messages -- it does not + need extra group membership messaging. Castro, et al. said + multicasting on structured overlays is either "flooding" (one overlay + per group) or "tree-based" (one tree per group) [319]. These are + synonyms for broadcasting and multicasting respectively. + + The first DHT-based designs for multicasting were CAN multicast + [320], Scribe [241], Bayeux [242], and i3 [231]. They were based on + CAN [8], Pastry [2], Tapestry [31], and Chord [7] respectively. El- + Ansary, et al. devised the first DHT-based broadcasting scheme [321]. + It was based on Chord. + + Multicast trees can be constructed using reverse-path forwarding or + forward-path forwarding. Scribe uses reverse-path forwarding [241]. + Bayeux uses forward-path forwarding [242]. Borg, a multicast design + based on Pastry, uses a combination of forward-path and reverse-path + forwarding to minimize latency [237]. + +3.4.2. Motivation for DHT-based Multicasting + + Multicasting complements DHT search capability. DHTs naturally + support exact match queries. With multicasting, they can support + more complex queries. Multicasting also enables the dissemination + and collection of global information. + + Consider, for example, aggregation queries like minimum, maximum, + count, sum, and average (Section 5.4). A node at the root of a + dissemination tree might multicast such a query [322]. The leaf + nodes return local results towards the root node. Successive parents + aggregate the result so that eventually the root node can compute the + global result. Such queries may help to monitor the capacity and + health of the overlay itself. + + Why bother with structured overlays for multicasting? In Section + 2.1, we saw that Gnutella can multicast complex queries without them + [4]. Castro, et al. posed the question, "Should we build Gnutella on + a structured overlay?" [259]. While acknowledging that their study + was preliminary, they did conclude that "we see no reason to build + Gnutella on top of an unstructured overlay" [259]. The supposedly + high maintenance costs of structured overlays were outweighed by + + + +Risson & Moors Informational [Page 23] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + query cost savings. The structured overlay ensured that nodes were + only visited once during a complex query. It also helped to + accurately limit the total number of nodes visited. Pai, et al. + acknowledged that multicast trees based on structured overlays + contribute to simple routing rules, low delay and low delay variation + [323]. However, they opted for unstructured, gossip-based + multicasting for reliability reasons: data loss near the tree root + affects all subtended nodes; interior node failures must be repaired + quickly; interior nodes are obliged to disseminate more than their + fair share of traffic, giving leaf nodes a "free ride". The most + promising research direction is to improve on the Bimodal + Multicasting approach [324]. It combines the bandwidth efficiency + and low latency of structured, best-effort multicasting trees with + the reliability of unstructured gossip protocols. + +3.4.3. Design Issues + + None of the early structured overlay multicast designs addressed all + of the following issues [325]: + + 1) Heterogeneous Node Capacity. Nodes differ in their processing, + memory, and network capacity. Multicast throughput is largely + determined by the node with smallest throughput [325]. To limit + the multicasting load on a node, one might cap its out-degree. If + the same node receives further join requests, it refers them to + its children ("pushdown") [240]. Bharambe, et al. explored + several pushdown strategies but found them inadequate to deal with + heterogeneity [326]. They concluded that the heterogeneity issue + remains open, and should be addressed before deploying DHTs for + high-bandwidth multicasting applications. Independently, Zhang et + al. partially tackled heterogeneity by allowing nodes in their + CAM-Chord and CAM-Koorde designs to vary out-degree according to + the node's capacity [325]. However, they made no mention of the + "pushdown" issue -- they did not describe topology maintenance + when the out-degree limit is reached. + + 2) Reliability (Dynamic Membership). If a multicast tree is to be + resilient, it must survive dynamic membership. There are several + ways to deal with dynamic membership: ensure that the root node of + the multicasting tree does not handle all requests to join or + leave the multicast group [242]; use multiple interior-node- + disjoint trees to avoid single points of failure in tree + structures [322]; and split the root node into several replicas + and partition members across them [241]. For example, Bayeux + requires the root node to track all group membership changes + whereas Scribe does not [241]. CAN-multicast uses a single, + well-known host to bootstrap the join operations [320]. The + earliest DHT-based broadcasting work by El-Ansary, et al. did not + + + +Risson & Moors Informational [Page 24] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + address the issue of dynamic membership [321]. Ghodsi, et al. + addressed it in a subsequent paper, though, giving two broadcast + algorithms that accommodate routing table inconsistencies [327]. + One algorithm achieves a more optimal multicasting network at the + expense of greater correction overhead. Splitstream, based on + Scribe and Pastry, redundantly striped content across multiple + interior-node-disjoint multicast trees -- if one interior node + fails, then only one stripe is lost [240]. + + 3) Large Any-Source Multicast Groups. Any group member should be + allowed to send multicast messages. The group should scale to a + very large number of hosts. CAN-based multicast was the first + application-level multicast scheme to scale to groups of several + thousands of nodes without restricting the service model to a + single source [320]. Bayeux scales to large groups but has a + single root node for each multicast group. It supports the any- + source model only by having the root node operate as a reflector + for multiple senders [242]. + +3.5. Routing Geometries + + In Sections 3.5.1 to 3.5.6, we introduce the main geometries for + simple key lookup and survey their robustness mechanisms. + +3.5.1. Plaxton Trees (Pastry, Tapestry) + + Work began in March 2000 on a structured, fault-tolerant, wide-area + Dynamic Object Location and Routing (DOLR) system called Tapestry [6, + 155]. While DHTs fix replica locations, a DOLR API enables + applications to control object placement [31]. Tapestry's basic + location and routing scheme follows Plaxton, Rajaraman, and Richa + (PRR) [30], but it remedies PRR's robustness shortcomings described + in Section 3.1. Whereas each object has one root node in PRR, + Tapestry uses several to avoid a single point of failure. Unlike + PRR, it allows nodes to be inserted and deleted. Whereas PRR + required a total ordering of nodes, Tapestry uses 'surrogate routing' + to incrementally choose root nodes. The PRR algorithm does not + address congestion, but Tapestry can put object copies close to nodes + generating high query loads. PRR nodes only know of the nearest + replica, whereas Tapestry nodes enable selection from a set of + replicas (for example, to retrieve the most up to date). To detect + routing faults, Tapestry uses TCP timeouts and UDP heartbeats for + detection, sequential secondary neighbours for rerouting, and a + 'second chance' window so that recovery can occur without the + overhead of a full node insertion. Tapestry's dependability has been + measured on a testbed of about 100 machines and on simulations of + + + + + +Risson & Moors Informational [Page 25] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + about 1000 nodes. Successful routing rates and maintenance + bandwidths were measured during instantaneous failures and ongoing + churn [31]. + + Pastry, like Tapestry, uses Plaxton-like prefix routing [2]. As in + Tapestry, Pastry nodes maintain O(log n) neighbours and route to a + target in O(log n) hops. Pastry differs from Tapestry only in the + method by which it handles network locality and replication [2]. + Each Pastry node maintains a 'leaf set' and a 'routing table'. The + leaf set contains l/2 node IDs on either side of the local node ID in + the node ID space. The routing table, in row r, column c, points to + the node ID with the same r-digit prefix as the local node, but with + an r+1 digit of c. A Pastry node periodically probes leaf set and + routing table nodes, with periodicity of Tls and Trt and a timeout + Tout. Mahajan, Castry, et al. analyzed the reliability versus + maintenance cost trade-offs in terms of the parameters l, Tls, Trt, + and Tout [328]. They concluded that earlier concerns about excessive + maintenance cost in a churning P2P network were unfounded, but + suggested follow-up work for a wider range of reliability targets, + maintenance costs, and probe periods. Rhea Geels, et al. concluded + that existing DHTs fail at high churn rates [329]. Building on a + Pastry implementation from Rice University, they found that most + lookups fail to complete when there is excessive churn. They + conjectured that short-lived nodes often leave the network with + lookups that have not yet timed out, but no evidence was provided to + confirm the theory. They identified three design issues that affect + DHT performance under churn: reactive versus periodic recovery of + peers; lookup timeouts; and choice of nearby neighbours. Since + reactive recovery was found to add traffic to already congested + links, the authors used periodic recovery in their design. For + lookup timeouts, they advocated an exponentially weighted moving + average of each neighbour's response time, over alternative fixed + timeout or 'virtual coordinate' schemes. For selection of nearby + neighbours, they found that 'global sampling' was more effective than + simply sampling a 'neighbour's neighbours' or 'inverse neighbours'. + Castro, Costa, et al. have refuted the suggestion that DHTs cannot + cope with high churn rates [330]. By implementing methods for + continuous detection and repair, their MSPastry implementation + achieved shorter routing paths and a maintenance overhead of less + than half a message per second per node. + + There have been more recent proposals based on these early Plaxton- + like schemes. Kademlia uses a bit-wise exclusive or (XOR) metric for + the 'distance' between 160-bit node identifiers [45]. Each node + keeps a list of contact nodes for each section of the node space that + is between 2^i and 2^(i+1) from itself (0.i<160). Longer-lived nodes + are deliberately given preference on this list -- it has been found + in Gnutella that the longer a node has been active, the more likely + + + +Risson & Moors Informational [Page 26] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + it is to remain active. Like Kademlia, Willow uses the XOR metric + [32]. It implements a Tree Maintenance Protocol to 'zipper' together + broken segments of a tree. Where other schemes use DHT routing to + inefficiently add new peers, Willow can merge disjoint or broken + trees in O(log n) parallel operations. + +3.5.2. Rings (Chord, DKS) + + Chord is the prototypical DHT ring, so we first sketch its operation. + Chord maps nodes and keys to an identifier ring [7, 34]. Chord + supports one main operation: find a node with the given key. It uses + Consistent Hashing (Section 3.1) to minimize disruption of keys when + nodes join and leave the network. However, Chord peers need only + track O(log n) other peers, not all peers as in the original + consistent hashing proposal [49]. It enables concurrent node + insertions and deletions, improving on PRR. Compared to Pastry, it + has a simpler join protocol. Each Chord peer tracks its predecessor, + a list of successors, and a finger table. Using the finger table, + each hop is at least half the remaining distance around the ring to + the target node, giving an average lookup hop count of (1/2)log + n(base 2). Each Chord node runs a periodic stabilization routine + that updates predecessor and successor pointers to cater to newly + added nodes. All successors of a given node need to fail for the + ring to fail. Although a node departure could be treated the same as + a failure, a departing Chord node first notifies the predecessor and + successors, so as to improve performance. + + In their definitive paper, Chord's inventors critiqued its + dependability under churn [34]. They provided proofs on the + behaviour of the Chord network when nodes in a stable network fail, + stressing that such proofs are inadequate in the general case of a + perpetually churning network. An earlier paper had posed the + question, "For lookups to be successful during churn, how regularly + do the Chord stabilization routines need to run?" [331]. Stoica, + Morris, et al. modeled a range of node join/departure rates and + stabilization periods for a Chord network of 1000 nodes. They + measured the number of timeouts (caused by a finger pointing to a + departed node) and lookup failures (caused by nodes that temporarily + point to the wrong successor during churn). They also modeled the + 'lookup stretch', the ratio of the Chord lookup time to optimal + lookup time on the underlying network. They demonstrated the latency + advantage of recursive lookups over iterative lookups, but there + remains room for delay reduction. For further work, the authors + proposed to improve resilience to network partitions, using a small + set of known nodes or 'remembered' random nodes. To reduce the + number of messages per lookup, they suggested an increase in the size + of each step around the ring, accomplished via a larger number of + fingers at each node. Much of the paper assumed independent, equally + + + +Risson & Moors Informational [Page 27] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + likely node failures. Analysis of correlated node failures, caused + by massive site or backbone failures, will be more important in some + deployments. The paper did not attempt to recommend a fixed optimal + stabilization rate. Liben-Nowell, Balakrishnan, et al. had suggested + that optimum stabilization rate might evolve according to + measurements of peers' behaviour [331] -- such a mechanism has yet to + be devised. + + Alima, El-Ansary, et al. considered the communication costs of + Chord's stabilization routines, referred to as 'active correction', + to be excessive [332]. Two other robustness issues also motivated + their Distributed K-ary Search (DKS) design, which is similar to + Chord. Firstly, the total system should evolve for an optimum + balance between the number of peers, the lookup hop count, and the + size of the routing table. Secondly, lookups should be reliable -- + P2P algorithms should be able to guarantee a successful lookup for + key/value pairs that have been inserted into the system. A similar + lookup-correctness issue was raised elsewhere by one of Chord's + authors; "Is it possible to augment the data structure to work even + when nodes (and their associated finger lists) just disappear?" [333] + Alima, El-Ansary, et al. asserted that P2Ps using active correction, + like Chord, Pastry, and Tapestry, are unable to give such a + guarantee. They propose an alternate 'correction-on-use' scheme, + whereby expired routing entries are corrected by information + piggybacking lookups and insertions. A prerequisite is that lookup + and insertion rates are significantly higher than node arrival, + departure, and failure rates. Correct lookups are guaranteed in the + presence of simultaneous node arrivals or up to f concurrent node + departures, where f is configurable. + +3.5.3. Tori (CAN) + + Ratnasamy, Francis, et al. developed the Content-Addressable Network + (CAN), another early DHT widely referenced alongside Tapestry, + Pastry, and Chord [8, 334]. It is arranged as a virtual + d-dimensional Cartesian coordinate space on a d-torus. Each node is + responsible for a zone in this coordinate space. The designers used + a heuristic thought to be important for large, churning P2P networks: + keep the number of neighbours independent of system size. + Consequently, its design differs significantly from Pastry, Tapestry, + and Chord. Whereas they have O(log n) neighbours per node and O(log + n) hops per lookup, CAN has O(d) neighbours and O(dn^(1/d)) hop + count. When CAN's system-wide parameter d is set to log(n), CAN + converges to their profile. If the number of nodes grows, a major + rearrangement of the CAN network may be required [151]. The CAN + designers considered building on PRR, but opted for the simple, low- + state-per-node CAN algorithm instead. They had reasoned that a PRR- + based design would not perform well under churn, given node + + + +Risson & Moors Informational [Page 28] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + departures and arrivals would affect a logarithmic number of nodes + [8]. + + There have been preliminary assessments of CAN's resilience. When a + node leaves the CAN in an orderly fashion, it passes its own Virtual + ID (VID), its neighbours' VIDs and IP addresses, and its key/value + pairs to a takeover node. If a node leaves abruptly, its neighbours + send recovery messages towards the designated takeover node. CAN + ensures the recovery messages reach the takeover node, even if nodes + die simultaneously, by maintaining a VID chain with Chord's + stabilization algorithm. Some initial 'proof of concept' resilience + simulations were run using the Network Simulator (NS) [335] for up to + a few hundred nodes. Average hop counts and lookup failure + probabilities were plotted against the total number of nodes for + various node failure rates [8]. The CAN team documented several open + research questions pertaining to state/hop count trade-offs, + resilience, load, locality, and heterogeneous peers [44, 334]. + +3.5.4. Butterflies (Viceroy) + + Viceroy approximates a butterfly network [46]. It generally has + constant degree like CAN. Like Chord, Tapestry, and Pastry, it has + logarithmic diameter. It improves on these systems, inasmuch as its + diameter is better than CAN and its degree is better than Chord, + Tapestry, and Pastry. As with most DHTs, it utilizes Consistent + Hashing. When a peer joins the Viceroy network, it takes a random + but permanent 'identity' and selects its 'level' within the network. + Each peer maintains general ring pointers ('predecessor' and + 'successor'), level ring pointers ('nextonlevel' and 'prevonlevel'), + and butterfly pointers ('left', 'right', and 'up'). When a peer + departs, it normally passes its key pairs to a successor, and + notifies other peers to find a replacement peer. + + The Viceroy paper scoped out the issue of robustness. It explicitly + assumed that peers do not fail [46]. It assumed that join and leave + operations do not overlap, so as to avoid the complication of + concurrency mechanisms like locking. Kaashoek and Karger were + somewhat critical of Viceroy's complexity [37]. They also pointed to + its fault-tolerance blind spot. Li and Plaxton suggested that such + constant-degree algorithms deserve further consideration [47]. They + offered several pros and cons. The limited degree may increase the + risk of a network partition, or inhibit use of local neighbours (for + the simple reason that there are less of them). On the other hand, + it may be easier to reason about the correctness of fixed-degree + networks. One of the Viceroy authors has since proposed constant- + degree peers in a two-tier, locality-aware DHT [310] -- the lower + degree maintained by each lower-tier peer purportedly improves + network adaptability. Another Viceroy author has since explored an + + + +Risson & Moors Informational [Page 29] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + alternative bounded-degree graph for P2P, namely the de Bruijn graph + [336]. + +3.5.5. de Bruijn (D2B, Koorde, Distance Halving, ODRI) + + De Bruijn graphs have had numerous refinements since their inception + [337, 338]. Schlumberger was the first to use them for networking + [339]. Two research teams independently devised the 'generalized' de + Bruijn graph that accommodates a flexible number of nodes in the + system [340, 341]. Rowley and Bose studied fault-tolerant rings + overlaid on the de Bruijn graph [342]. Lee, Liu, et al. devised a + two-level de Bruijn hierarchy, whereby clusters of local nodes are + interconnected by a second-tier ring [343]. + + Many of the algorithms discussed previously are 'greedy' in that each + time a query is forwarded, it moves closer to the destination. + Unfortunately, greedy algorithms are generally suboptimal -- for a + given degree, the routing distance is longer than necessary [344]. + Unlike these earlier P2P designs, de Bruijn graphs of degree k + achieve an asymptotically optimal diameter log n, where n is the + number of nodes in the system and k can be varied to improve + resilience. If there are O(log n) neighbours per node, the de Bruijn + hop count is O(log n/log log n). To illustrate de Bruijn's practical + advantage, consider a network with one million nodes of degree 20: + Chord has a diameter of 20, while de Bruijn has a diameter of 5 [36]. + In 2003, there were a quick succession of de Bruijn proposals -- D2B + [345], Koorde [37], Distance Halving [132, 336], and the Optimal + Diameter Routing Infrastructure (ODRI) [36]. + + Fraigniaud and Gauron began the D2B design by laying out an informal + problem statement: keys should be evenly distributed; lookup latency + should be small; traffic load should be evenly distributed; updates + of routing tables and redistribution of keys should be fast when + nodes join or leave the network. They defined a node's "congestion" + to be the probability that a lookup will traverse it. Apart from its + optimal de Bruijn diameter, they highlighted D2B's merits: a constant + expected update time when nodes join and leave (O(log n) with high + probability (w.h.p.)); the expected node congestion is O((log n)/n) + (O(((log n)^2)/n) w.h.p.) [345]. D2B's resilience was discussed only + in passing. + + Koorde extends Chord to attain the optimal de Bruijn degree/diameter + trade-off above [37]. Unlike D2B, Koorde does not constrain the + selection of node identifiers. Also unlike D2B, it caters to + concurrent joins, by extension of Chord's functionality. Kaashoek + and Karger investigated Koorde's resilience to a rather harsh failure + scenario: "in order for a network to stay connected when all nodes + fail with probability of 1/2, some nodes must have degree + + + +Risson & Moors Informational [Page 30] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + omega(log n)" [37]. They sketched a mechanism to increase Koorde's + degree for this more stringent fault tolerance, losing de Bruijn's + constant degree advantage. Similarly, to achieve a constant-factor + load balance, Koorde would have to sacrifice its degree optimality. + They suggested that the ability to trade the degree, and hence the + maintenance overhead, against the expected hop count may be important + for churning systems. They also identified an open problem: find a + load-balanced, degree optimal DHT. Datta, Girdzijauskas, et al. + showed that for arbitrary key distributions, de Bruijn graphs fail to + meet the dual goals of load balancing and search efficiency [346]. + They posed the question, "(Is there) a constant routing table sized + DHT which meets the conflicting goals of storage load balancing and + search efficiency for an arbitrary and changing key distribution?" + + Distance Halving was also inspired by de Bruijn [336] and shares its + optimal diameter. Naor and Wieder argued for a two-step + "continuous-discrete" approach for its design. The correctness of + its algorithms is proven in a continuous setting. The algorithms are + then mapped to a discrete space. The source x and target y are + points on the continuous interval [0,1). Data items are hashed to + this same interval. <str> is a string that determines how messages + leave any point on the ring: if bit t of the string is 0, the left + leg is taken; if it is 1, the right leg is taken. <str> increases by + one bit each hop, giving a sequence by which to step around the ring. + A lookup has two phases. In the first, the lookup message containing + the source, target, and the random string hops toward the midpoint of + the source and target. On each hop, the distance between <str>(x) + and <str>(y) is halved, by virtue of the specific 'left' and 'right' + functions. In the second phase, the message steps 'backward' from + the midpoint to the target, removing the last bit in <str> at each + hop. 'Join' and 'leave' algorithms were outlined but there was no + consideration of recovery times or message load on churn. Using the + Distance Halving properties, the authors devised a caching scheme to + relieve congestion in a large P2P network. They have also modified + the algorithm to be more robust in the presence of random faults + [132]. + + Solid comparisons of DHT resilience are scarce, but Loguinov, Kumar, + et al. give just that in their ODRI paper [36]. They compare Chord, + CAN, and de Bruijn in terms of routing performance, graph expansion + and clustering. At the outset, they give the optimal diameter (the + maximum hop count between any two nodes in the graph) and average hop + count for graphs of fixed degree. De Bruijn graphs converge to both + optima, and outperform Chord and CAN on both counts. These optima + impact both delay and aggregate lookup load. They present two + clustering measures (edge expansion and node expansion), which are + interesting for resilience. Unfortunately, after decades of de + Bruijn research, they have no exact solution. De Bruijn was shown to + + + +Risson & Moors Informational [Page 31] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + be superior in terms of path overlap - "de Bruijn automatically + selects backup paths that do not overlap with the best shortest path + or with each other" [36]. + +3.5.6. Skip Graphs + + Skip Graphs have been pursued by two research camps [38, 41]. They + augment the earlier Skip Lists [347, 348]. Unlike earlier balanced + trees, the Skip List is probabilistic -- its insert and delete + operations do not require tree rearrangements and so are faster by a + constant factor. The Skip List consists of layers of ordered linked + lists. All nodes participate in the bottom layer 0 list. Some of + these nodes participate in the layer 1 list with some fixed + probability. A subset of layer 1 nodes participate in the layer 2 + list, and so on. A lookup can proceed quickly through the list by + traversing the sparse upper layers until it is close to, or at, the + target. Unfortunately, nodes in the upper layers of a Skip List are + potential hot spots and single points of failure. Unlike Skip Lists, + Skip Graphs provide multiple lists at each level for redundancy, and + every node participates in one of the lists at each level. + + Each node in a Skip Graph has theta(log n) neighbours on average, + like some of the preceding DHTs. The Skip Graph's primary edge over + the DHTs is its support for prefix and proximity search. DHTs hash + objects to a random point in the graph. Consequently, they give no + guarantees over where the data is stored. Nor do they guarantee that + the path to the data will stay within the one administration as far + as possible [38]. Skip graphs, on the other hand, provide for + location-sensitive name searches. For example, to find the document + docname on the node user.company.com, the Skip Graph might step + through its ordered lists for the prefix com.company.user [38]. + Alternatively, to find an object with a numeric identifier, an + algorithm might search the lowest layer of the Skip Graph for the + first digit, the next layer for the next digit, in the same vein + until all digits are resolved. Being ordered, Skip Graphs also + facilitate range searches. In each of these examples, the Skip Graph + can be arranged such that the path to the target, as far as possible, + stays within an administrative boundary. If one administration is + detached from the rest of the Skip Graph, routing can continue within + each of the partitions. Mechanisms have been devised to merge + disconnected segments [157], though at this stage, segments are re- + merged one at a time. A parallel merge algorithm has been flagged + for future work. + + The advantages of Skip Graphs come at a cost. To be able to provide + range queries and data placement flexibility, Skip Graph nodes + require many more pointers than their DHT counterparts. An increased + number of pointers implies increased maintenance traffic. Another + + + +Risson & Moors Informational [Page 32] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + shortcoming of at least one of the early proposals was that no + algorithm was given to assign keys to machines. Consequently, there + are no guarantees on system-wide load balancing or on the distance + between adjacent keys [100]. Aspnes, Kirsch, et al. have recently + devised a scheme to reduce the inter-machine pointer count from + O(mlogm), where m is the number of data elements, to O(nlog n), where + n is the number of nodes [100]. They proposed a two-layer scheme -- + one layer for the Skip Graph itself and the second 'bucket layer'. + Each machine is responsible for a number of buckets and each bucket + elects a representative key. Nodes locally adjust their load. They + accept additional keys if they are below their threshold or disperse + keys to nearby nodes if they are above threshold. There appear to be + numerous open issues: simulations have been done but analysis is + outstanding; mechanisms are required to handle the arrival and + departure of nodes; there were only brief hints as to how to handle + nodes with different capacities. + +4. Semantic Index + + Semantic indexes capture object relationships. While the semantic- + free methods (DHTs) have firmer theoretic foundations and guarantee + that a key can be found if it exists, they do not capture the + relationships between the document name and its content or metadata + on their own. Semantic P2P designs do. However, since their design + is often driven by heuristics, they may not guarantee that scarce + items will be found. + + So what might the semantically indexed P2Ps add to an already crowded + field of distributed information architectures? At one extreme, + there are the distributed relational database management systems + (RDBMSs), with their strong consistency guarantees [284]. They + provide strong data independence, the flexibility of SQL queries, and + strong transactional semantics -- Atomicity, Consistency, Isolation + and Durability (ACID) [349]. They guarantee that the query response + is complete -- all matching results are returned. The price is + performance. They scale to perhaps 1000 nodes, as evidenced in + Mariposa [350, 351], or require query caching front ends to constrain + the load [284]. Database research has "arguably been cornered into + traditional, high-end, transactional applications" [72]. Then there + are distributed file systems, like the Network File System (NFS) or + the Serverless Network File Systems (xFS), with little data + independence, low-level file retrieval interfaces, and varied + consistency [284]. Today's eclectic mix of Content Distribution + Networks (CDNs) generally deload primary servers by redirecting Web + requests to a nearby replica. Some intercept the HTTP requests at + the DNS level and then use consistent hashing to find a replica [23]. + Since this same consistent hashing was a forerunner to the DHT + + + + +Risson & Moors Informational [Page 33] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + approaches above, CDNs are generally constrained to the same simple + key lookups. + + The opportunity for semantically indexed P2Ps, then, is to provide: + + a) graduated data independence, consistency, and query flexibility, + and + + b) probabilistically complete query responses, across + + c) very large numbers of low-cost, geographically distributed, + dynamic nodes. + +4.1. Keyword Lookup + + P2P keyword lookup is best understood by considering the structure of + the underlying index and the algorithms by which queries are routed + over that index. Figure 3 summarizes the following paragraphs by + classifying the keyword query algorithms, index structures, and + metrics. The research has largely focused on scalability, not + dependability. There have been very few studies that quantify the + impact of network churn. One exception is the work by Chawathe, et + al. on the Gia system [61]. Gia's combination of algorithms from + Figure 3 (receiver-based flow control, biased random walk, and one- + hop replication) gave 2-4 orders of magnitude improvement in query + success rates in churning networks. + + + + + + + + + + + + + + + + + + + + + + + + + +Risson & Moors Informational [Page 34] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + QUERY + Query routing + Flooding: Peers only index local files so queries must propagate + widely [4] + Policy-based: Choice of the next hop node: random; most/least + recently used; most files shared; most results [265, 352] + Random walks: Parallel [67] or biased random walks [61, 66] + Query forwarding + Iterative: Nodes perform iterative unicast searches of ultrapeers, + until the desired number of results is achieved. See Gnutella + UDP Extension for Scalable Searches (GUESS) [265, 353] + Recursive + Query flow control + Receiver-controlled: Receivers grant query tokens to senders, so + as to avoid overload [61] + Reactive: sender throttles queries when it notices receivers are + discarding packets [61, 66] + Dynamic Time To Live: In the Dynamic Query Protocol, the sender + adjusts the time-to-live on each iteration based on the number + of results received, the number of connections left, and the + number of nodes already theoretically reached by the search [354] + + INDEX + Distribution + Compression: Leaf nodes periodically send ultrapeers compressed + query routing tables, as in the Query Routing Protocol [260] + One hop replication: Nodes maintain an index of content on their + nearest neighbors [61, 352] + Partitioning + By document [210] + By keyword: Use an inverted list to find a matching document, + either locally or at another peer [21]. Partition by keyword + sets [355] + By document and keyword: Also called Multi-Level Partitioning [21] + + METRIC + Query load: Queries per second per node/link [65, 265] + Degree: The number of links per node [66, 352]. Early P2P networks + approximated power-law networks, where the number of nodes with L + links is proportional to L^(-k), where k is a constant [65] + Query delay: Reported in terms of time and hop count [61, 66] + Query success rate: The "Collapse Point" is the per-node query rate + at which the query success rate drops below 90% [61]. See + also [61, 265, 352]. + + Figure 3: Keyword Lookup in P2P Systems + + + + + +Risson & Moors Informational [Page 35] + +RFC 4981 Survey of Research on P2P Search September 2007 + + +4.1.1. Gnutella Enhancements + + Perhaps the most widely referenced P2P system for simple keyword + match is Gnutella [4]. Gnutella queries contain a string of + keywords. Gnutella peers answer when they have files whose names + contain all the keywords. As discussed in Section 2.1, early + versions of Gnutella did not forward the document index. Queries + were flooded and peers searched their own local indexes for filename + matches. An early review highlighted numerous areas for improvement + [65]. It was estimated that the query traffic alone from 50,000 + early-generation Gnutella nodes would amount to 1.7% of the total + U.S. Internet backbone traffic at December 2000 levels. It was + speculated that high-degree Gnutella nodes would impede + dependability. An unnecessarily high percentage of Gnutella traffic + crossed Autonomous System (AS) boundaries -- a locality mechanism may + have found suitable nearby peers. + + Fortunately, there have since been numerous enhancements within the + Gnutella Developer Forum. At the time of writing, it has been + reported that Gnutella has almost 350,000 unique hosts, of which + nearly 90,000 accept incoming connections [356]. One of the main + improvements is that an index of filename keywords, called the Query + Routing Table (QRT), can now be forwarded from 'leaf peers' to its + 'ultrapeers' [260]. Ultrapeers can then ensure that the leaves only + receive queries for which they have a match, dramatically reducing + the query traffic at the leaves. Ultrapeers can have connections to + many leaf nodes (~10-100) and a small number of other ultrapeers + (<10) [260]. Originally, a leaf node's QRT was not forwarded by the + parent ultrapeer to other ultrapeers. More recently, there has been + a proposal to distribute aggregated QRTs amongst ultrapeers [357]. + To further limit traffic, QRTs are compressed by hashing, according + to the Query Routing Protocol (QRP) specification [281]. This same + specification claims QRP may reduce Gnutella traffic by orders of + magnitude, but cautions that simulation is required before mass + deployment. A known shortcoming of QRP was that the extent of query + propagation was independent of the popularity of the search terms. + The Dynamic Query Protocol addressed this [358]. It required leaf + nodes to send single queries to high-degree ultrapeers that adjust + the queries' time-to-live (TTL) bounds according to the number of + received query results. An earlier proposal, called the Gnutella UDP + Extension for Scalable Searches (GUESS) [353], similarly aimed to + reduce the number of queries for widely distributed files. GUESS + reuses the non-forwarding idea (Section 2). A GUESS peer repeatedly + queries single ultrapeers with a TTL of 1, with a small timeout on + each query to limit load. It chooses the number of iterations and + selects ultrapeers so as to satisfy its search needs. For + adaptability, a small number of experimental Gnutella nodes have + + + + +Risson & Moors Informational [Page 36] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + implemented eXtensible Markup Language (XML) schemas for richer + queries [359, 360]. None of the above Gnutella proposals explicitly + assess robustness. + + The broader research community has recently been leveraging aspects + of the Gnutella design. Lv, Ratnasamy, et al. exposed one assumption + implicit in some of the early DHT work -- that designs "such as + Gnutella are inherently not scalable, and therefore should be + abandoned" [66]. They argued that by making better use of the more + powerful peers, Gnutella's scalability issues could be alleviated. + Instead of its flooding mechanism, they used random walks. Their + preliminary design to bias random walks towards high capacity nodes + did not go as far as the ultrapeer proposals in that the indexes did + not move to the high-capacity nodes. Chawathe, Ratnasamy, et al. + chose to extend the Gnutella design with their Gia system, in + response to the perceived shortcomings of DHTs in Section 1.2 [61]. + Compared to the early Gnutella designs, they incorporated several + novel features. They devise a topology adaptation algorithm so that + most peers are attached to high-degree peers. They use a random walk + search algorithm, in lieu of flooding, and bias the query load + towards higher-degree peers. For 'one-hop replication', they require + all nodes to keep pointers to content on adjacent peers. To + implement a receiver-controlled token-based flow control, a peer must + have a token from its neighbouring peer before it sends a query to + it. Chawathe, Ratnasamy, et al. show by simulations that the + combination of these features provides a scalability improvement of + three to five orders of magnitude over Gnutella "while retaining + significant robustness". The main robustness metrics they used were + the 'collapse point' query rate (the per-node query rate at which the + successful query rate falls below 90%) and the average hop count + immediately prior to collapse. Their comparison with Gnutella did + not take into account the Gnutella enhancements above -- this was + left as future work. Castro, Costa, and Rowstron argued that if + Gnutella were built on top of a structured overlay, then both the + query and overlay maintenance traffic could be reduced [259]. Yang, + Vinograd, et al. explore various policies for peer selection in the + GUESS protocol, since the issue is left open in the original proposal + [265]. For example, the peer initiating the query could choose peers + that have been "most recently used" or that have the "most files + shared". Various policy pitfalls are identified. For example, good + peers could be overloaded, victims of their own success. + Alternatively, malicious peers could encourage the querying peer to + try inactive peers. They conclude that a "most results" policy gives + the best balance of robustness and efficiency. Like Castro, Costa, + and Rowstron, they concentrated on the static network scenario. + Cholvi, Felber, et al. very briefly describe how similar "least + recently used" and "most often used" heuristics can be used by a peer + to select peer 'acquaintances' [352]. They were motivated by the + + + +Risson & Moors Informational [Page 37] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + congestion associated with Gnutella's TTL-limited flooding. + Recognizing that the busiest peers can quickly become overloaded + central hubs for the entire network, they limit the number of + acquaintances for any given peer to 25. They sketch a mechanism to + decrement a query's TTL multiple times when it traverses "interested + peers". In summary, these Gnutella-related investigations are + characterized by a bias for high-degree peers and very short directed + query paths, a disdain for flooding, and concern about excessive load + on the 'better' peers. Generally, the robustness analysis for + dynamic networks (content updates and node arrivals/departures) + remains open. + +4.1.2. Partition-by-Document, Partition-by-Keyword + + One aspect of P2P keyword search systems has received particular + attention: should the index be partitioned by document or by keyword? + The issue affects scalability. To be partitioned by document, each + node has a local index of documents for which it is responsible. + Gnutella is a prime example. Queries are generally flooded in + systems partitioned by document. On the other hand, a peer may + assume responsibility for a set of keywords. The peer uses an + inverted list to find a matching document, either locally or at + another peer. If the query contains several keywords, inverted lists + may need to be retrieved from several different peers to find the + intersection [21]. The initial assessment by Li, Loo, et al. was + that the partition-by-document approach was superior [210]. For one + scenario of a full-text Web search, they estimated the communications + costs to be about six times higher than the feasible budget. + However, wanting to exploit prior work on inverted list intersection, + they studied the partition-by-keyword strategy. They proposed + several optimizations that put the communication costs for a + partition-by-keyword system within an order of magnitude of + feasibility. There had been a couple of prior papers that suggested + partitioned-by-keyword designs incorporate DHTs to map keywords to + peers [355, 361]. In Gnawali's Keyword-set Search System (KSS), the + index is partitioned by sets of keywords [355]. Terpstra, Behnel, et + al. point out that by keeping keyword pairs or triples, the number of + lists per document in KSS is squared or tripled [362]. Shi, + Guangwen, et al. interpreted the approximations of Li, Loo, et al. to + mean that neither approach is feasible on its own [21]. Their + Multi-Level Partitioning (MLP) scheme incorporates both partitioning + approaches. They arrange nodes into a group hierarchy, with all + nodes in the single 'level 0' group, and with the same nodes sub- + divided into k logical subgroups on 'level 1'. The subgroups are + again divided, level by level, until level l. The inverted index is + partitioned by document between groups and by keyword within groups. + MLP avoids the query flooding normally associated with systems + partitioned by document, since a small number of nodes in each group + + + +Risson & Moors Informational [Page 38] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + process the query. It reduces the bandwidth overheads associated + with inverted list intersection in systems partitioned solely by + keyword, since groups can calculate the intersection independently + over the documents for which they are responsible. MLP was overlaid + on SkipNet, per Section 3.5.6 [38]. Some initial analyses of + communications costs and query latencies were provided. + +4.1.3. Partial Search, Exhaustive Search + + Much of the research above addresses partial keyword search. + Daswani, et al. highlighted the open problem of efficient, + comprehensive keyword search [25]. How can exhaustive searches be + achieved without flooding queries to every peer in the network? + Terpstra, Behnel et al. couched the keyword search problem in + rendezvous terms: dynamic keyword queries need to 'meet' with static + document lists [362]. Their Bitzipper scheme is partitioned by + document. They improved on full flooding by putting document + metadata on 2sqrt(n) nodes and forwarding queries through only + 6sqrt(n) nodes. They reported that Bitzipper nodes need only 1/166th + of the bandwidth of full-flooding Gnutella nodes for an exhaustive + search. An initial comparison of query load was given. There was + little consideration of either static or dynamic resilience; that is, + of nodes failing, of documents continually changing, or of nodes + continually joining and leaving the network. + +4.2. Information Retrieval + + The field of Information Retrieval (IR) has matured considerably + since its inception in the 1950s [363]. A taxonomy for IR models has + been formalized [262]. It consists of four elements: a + representation of documents in a collection; a representation of user + queries; a framework describing relationships between document + representations and queries; and a ranking function that quantifies + an ordering amongst documents for a particular query. Three main + issues motivate current IR research -- information relevance, query + response time, and user interaction with IR systems. The dominant IR + trends for searching large text collections are also threefold [262]. + The size of collections is increasing dramatically. More complicated + search mechanisms are being found to exploit document structure, to + accommodate heterogeneous document collections, and to deal with + document errors. Compression is in favour -- it may be quicker to + search compact text or retrieve it from external devices. In a + distributed IR system, query processing has four parts. Firstly, + particular collections are targeted for the search. Secondly, + queries are sent to the targeted collections. Queries are then + evaluated at the individual collections. Finally, results from the + collections are collated. + + + + +Risson & Moors Informational [Page 39] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + So how do P2P networks differ from distributed IR systems? Bawa, + Manku, et al. presented four differences [62]. They suggested that a + P2P network is typically larger, with tens or hundreds of thousands + of nodes. It is usually more dynamic, with node lifetimes measured + in hours. They suggested that a P2P network is usually homogeneous, + with a common resource description language. It lacks the + centralized "mediators" found in many IR systems that assume + responsibility for selecting collections, for rewriting queries, and + for merging ranked results. These distinctions are generally aligned + with the peer characteristics in Section 1. One might add that P2P + nodes display more symmetry -- peers are often both information + consumers and producers. Daswani, Garcia-Molina, et al. pointed out + that, while there are IR techniques for ranked keyword search at + moderate scale, research is required so that ranking mechanisms are + efficient at the larger scale targeted by P2P designs [25]. Joseph + and Hoshiai surveyed several P2P systems using metadata techniques + from the IR toolkit [60]. They described an assortment of IR + techniques and P2P systems, including various metadata formats, + retrieval models, bloom filters, DHTs, and trust issues. + + In the ensuing paragraphs, we survey P2P work that has incorporated + information retrieval models, particularly the Vector Model and the + Latent Semantic Indexing Model. We omit the P2P work based on + Bayesian models. Some have pointed to such work [60], but made no + explicit mention of the model [364]. One early paper on P2P + content-based image retrieval also leveraged the Bayesian model + [365]. For the former two models, we briefly describe the design, + then try to highlight robustness aspects. On robustness, we are + again stymied for lack of prior work. Indeed, a search across all + proceedings of the Annual ACM Conference on Research and Development + in Information Retrieval for the words "reliable", "available", + "dependable", or "adaptable" did not return any results at the time + of writing. In contrast, a standard text on distributed database + management systems [366] contains a whole chapter on reliability. IR + research concentrates on performance measures. Common performance + measures include recall, the fraction of the relevant documents that + has been retrieved and precision, the fraction of the retrieved + documents that is relevant [262]. Ideally, an IR system would have + high recall and high precision. Unfortunately techniques favouring + one often disadvantage the other [363]. + + + + + + + + + + + +Risson & Moors Informational [Page 40] + +RFC 4981 Survey of Research on P2P Search September 2007 + + +4.2.1. Vector Model (PlanetP, FASD, eSearch) + + The vector model [367] represents both documents and queries as term + vectors, where a term could be a word or a phrase. If a document or + query has a term, the weight of the corresponding dimension of the + vector is non-zero. The similarity of the document and query vectors + gives an indication of how well a document matches a particular + query. + + The weighting calculation is critical across the retrieval models. + Amongst the numerous proposals for the probabilistic and vector + models, there are some commonly recurring weighting factors [363]. + One is term frequency. The more a term is repeated in a document, + the more important the term is. Another is inverse document + frequency. Terms common to many documents give less information + about the content of a document. Then there is document length. + Larger documents can bias term frequencies, so weightings are + sometimes normalized against document length. The expression "TFIDF + weighting" refers to the collection of weighting calculations that + incorporate term frequency and inverse document frequency, not just + to one. Two weighting calculations have been particularly dominant + -- Okapi [368] and pivoted normalization [369]. A distributed + version of Google's Pagerank algorithm has also been devised for a + P2P environment [370]. It allows incremental, ongoing Pagerank + calculations while documents are inserted and deleted. + + A couple of early P2P systems leveraged the vector model. Building + on the vector model, PlanetP divided the ranking problem into two + steps [215]. In the first, peers are ranked for the probability that + they have matching documents. In the second, higher-priority peers + are contacted and the matching documents are ranked. An Inverse Peer + Frequency, analogous to the Inverse Document Frequency, is used to + rank relevant peers. To further constrain the query traffic, PlanetP + contacts only the first group of m peers to retrieve a relevant set + of documents. In this way, it repeatedly contacts groups of m peers + until the top k document rankings are stable. While the PlanetP + designers first quantified recall and precision, they also considered + reliability. Each PlanetP peer has a global index with a list of all + other peers, their IP addresses, and their Bloom filters. This large + volume of shared information needs to be maintained. Klampanos and + Jose saw this as PlanetP's primary shortcoming [371]. Each Bloom + filter summarized the set of terms in the local index of each peer. + The time to propagate changes, be they new documents or peer + arrivals/departures, was studied by simulation for up to 1000 peers. + The reported propagation times were in the hundreds of seconds. + Design workarounds were required for PlanetP to be viable across + slower dial-up modem connections. For future work, the authors were + + + + +Risson & Moors Informational [Page 41] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + considering some sort of hierarchy to scale to larger numbers of + peers. + + A second early system using the vector model is the Fault-tolerant, + Adaptive, Scalable Distributed (FASD) search engine [283], which + extended the Freenet design (Section 2.3) for richer queries. The + original Freenet design could find a document based on a globally + unique identifier. Kronfol's design added the ability to search, for + example, for documents about "apples AND oranges NOT bananas". It + uses a TFIDF weighting scheme to build a document's term vector. + Each peer calculates the similarity of the query vector and local + documents and forwards the query to the best downstream peer. Once + the best downstream peer returns a result, the second-best peer is + tried, and so on. Simulations with 1000 nodes gave an indication of + the query path lengths in various situations -- when routing queries + in a network with constant rates of node and document insertion, when + bootstrapping the network in a "worst-case" ring topology, or when + failing randomly and specifically selected peers. Kronfol claimed + excellent average-case performance -- less than 20 hops to retrieve + the same top n results as a centralized search engine. There were, + however, numerous cases where the worst-case path length was several + hundred hops in a network of only 1000 nodes. + + In parallel, there have been some P2P designs based on the vector + model from the University of Rochester -- pSearch [9, 372] and + eSearch [373]. The early pSearch paper suggested a couple of + retrieval models, one of which was the Vector Space Model, to search + only the nodes likely to have matching documents. To obtain + approximate global statistics for the TFIDF calculation, a spanning + tree was constructed across a subset of the peers. For the m top + terms, the term-to-document index was inserted into a Content- + Addressable Network [334]. A variant that mapped terms to document + clusters was also suggested. eSearch is a hybrid of the partition- + by-document and partition-by-term approaches (Section 4.1.2) eSearch + nodes are primarily partitioned by term. Each is responsible for the + inverted lists for some top terms. For each document in the inverted + list, the node stores the complete term list. To reduce the size of + the index, the complete term lists for a document are only kept on + nodes that are responsible for top terms in the document. eSearch + uses the Okapi term weighting to select top terms. It relies on the + Chord DHT [34] to associate terms with nodes storing the inverted + lists. It also uses automatic query expansion. This takes the + significant terms from the top document matches and automatically + adds them to the user's query to find additional relevant documents. + The eSearch performance was quantified in terms of search precision, + the number of retrieved documents, and various load-balancing + metrics. Compared to the more common proposals for partitioning by + + + + +Risson & Moors Informational [Page 42] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + keywords, eSearch consumed 6.8 times the storage space to achieve + faster search times. + +4.2.2. Latent Semantic Indexing (pSearch) + + Another retrieval model used in P2P proposals is Latent Semantic + Indexing (LSI) [374]. Its key idea is to map both the document and + query vectors to a concept space with lower dimensions. The starting + point is a t*N weighting matrix, where t is the total number of + indexed terms, N is the total number of documents, and the matrix + elements could be TFIDF rankings. Using singular value + decomposition, this matrix is reduced to a smaller number of + dimensions, while retaining the more significant term-to-document + mappings. Baeza-Yates and Ribeiro-Neto suggested that LSI's value is + a novel theoretic framework, but that its practical performance + advantage for real document collections had yet to be proven [262]. + pSearch incorporated LSI [9]. By placing the indices for + semantically similar documents close in the network, Tang, Xu, et al. + touted significant bandwidth savings relative to the early full- + flooding variant of Gnutella [372]. They plotted the number of nodes + visited by a query. They also explored the trade-off with accuracy, + the percentage match between the documents returned by the + distributed pSearch algorithm and those from a centralized LSI + baseline. In a more recent update to the pSearch work, Tang, + Dwarkadas, et al. summarized LSI's shortcomings [375]. Firstly, for + large document collections, its retrieval quality is inherently + inferior to Okapi. Secondly, singular value decomposition consumes + excessive memory and computation time. Consequently, the authors + used Okapi for searching while retaining LSI for indexing. With + Okapi, they selected the next node to be searched and selected + documents on searched nodes. With LSI, they ensured that similar + documents are clustered near each other, thereby optimizing the + network search costs. When retrieving a small number of top + documents, the precision of LSI+Okapi approached that of Okapi. + However, if retrieving a large number of documents, the LSI+Okapi + precision is inferior. The authors want to improve this in future + work. + +4.2.3. Small Worlds + + The "small world" concept originally described how people are + interconnected by short chains of acquaintances [376]. Kleinberg was + struck by the algorithmic lesson of the small world, namely "that + individuals using local information are collectively very effective + at constructing short paths between two points in a social network" + [377]. Small world networks have a small diameter and a large + clustering coefficient (a large number of connections amongst + relevant nodes) [378]. + + + +Risson & Moors Informational [Page 43] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + The small world idea has had a limited impact on peer-to-peer + algorithms. It has influenced only a few unstructured [62, 378-380] + and structured [344, 381] algorithms. The most promising work on + "small worlds" in P2P networks are those concerned with the + information retrieval metrics, precision and recall [62, 378, 380]. + +5. Queries + + Database research suggests directions for P2P research. Hellerstein + observed that, while work on fast P2P indexes is well underway, P2P + query optimization remains a promising topic for future research + [23]. Kossman reviewed the state of the art of distributed query + processing, highlighting areas for future research: simulation and + query optimization for networks of tens of thousands of servers and + millions of clients; non-relational data types (e.g., XML, text, and + images); and partial query responses since on the Internet, "failure + is the rule rather than the exception" [19]. A primary motivation + for the P2P system, PIER, was to scale from the largest database + systems of a few hundred nodes to an Internet environment in which + there are over 160 million nodes [22]. Litwin and Sahri have also + considered ways to combine distributed hashing, more specifically the + Scalable Distributed Data Structures, with SQL databases, claiming to + be first to implement scalable distributed database partitioning + [382]. Motivated by the lack of transparent distribution in current + distributed databases, they measure query execution times for + Microsoft SQL servers aggregated by means of an SDDS layer. One of + their starting assumptions was that it is too challenging to change + the SQL query optimizer. + + Database research also suggests the approach to P2P research. + Researchers of database query optimization were divided between those + looking for optimal solutions in special cases and those using + heuristics to answer all queries [383]. Gribble, et al. cast query + optimization in terms of the data placement problem, which is to + "distribute data and work so the full query workload is answered with + lowest cost under the existing bandwidth and resource constraints" + [250]. They pointed out that even the static version of this problem + is NP-complete in P2P networks. Consequently, research on massive, + dynamic P2P networks will likely progress using both strategies of + early database research - heuristics and special-case optimizations. + + If P2P networks are going to be adaptable, if they are to support a + wide range of applications, then they need to accommodate many query + types [72]. Up to this point, we have reviewed queries for keys + (Section 3) and keywords (Sections 4.1. and 4.2). Unfortunately, a + major shortcoming of the DHTs in Section 3.5 is that they primarily + support exact-match, single-key queries. Skip Graphs support range + and prefix queries, but not aggregation queries. Here we probe below + + + +Risson & Moors Informational [Page 44] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + the language syntax to identify the open research issues associated + with more expressive P2P queries [25]. Triantafillou and Pitoura + observed the disparate P2P designs for different types of queries and + so outlined a unifying framework [76]. To classify queries, they + considered the number of relations (single or multiple), the number + of attributes (single or multiple), and the type of query operator. + They described numerous operators: equality, range, join, and + "special functions". The latter referred to aggregation (like sum, + count, average, minimum, and maximum), grouping and ordering. The + following sections approximately fit their taxonomy -- range queries, + multi-attribute queries, join queries and aggregation queries. There + has been some initial P2P work on other query types -- continuous + queries [20, 22, 73], recursive queries [22, 74], and adaptive + queries [23, 75]. For these, we defer to the primary references. + +5.1. Range Queries + + The support of efficient range predicates in P2P networks was + identified as an important open research issue by Huebsch, et al. + [22]. Range partitioning has been important in parallel databases to + improve performance, so that a transaction commonly needs data from + only one disk or node [22]. One type of range search, longest prefix + match, is important because of its prevalence in routing schemes for + voice and data networks alike. In other applications, users may pose + broad, inexact queries, even though they require only a small number + of responses. Consequently, techniques to locate similar ranges are + also important [77]. Various proposals for range searches over P2P + networks are summarized in Figure 4. Since the Scalable Distributed + Data Structure (SDDS) has been an important influence on contemporary + Distributed Hash Tables (DHTs) [49-51], we also include ongoing work + on SDDS range searches. + + PEER-TO-PEER (P2P) + Locality Sensitive Hashing (Chord) [77] + Prefix Hash Trees (unspecified DHT) [78, 79] + Space Filling Curves (CAN) [80] + Space Filling Curves (Chord) [81] + Quadtrees (Chord) [82] + Skip Graphs [38, 41, 83, 100] + Mercury [84] + P-Grid [85, 86] + + SCALABLE DISTRIBUTED DATA STRUCTURES (SDDS) + RP* [87, 88] + + Figure 4: Solutions for Range Queries on P2P and SDDS Indexes + + + + + +Risson & Moors Informational [Page 45] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + The papers on P2P range search can be divided into those that rely on + an underlying DHT (the first five entries in Figure 4) and those that + do not (the subsequent three entries). Bharambe, Agrawal, et al. + argued that DHTs are inherently ill-suited to range queries [84]. + The very feature that makes for their good load balancing properties, + randomized hash functions, works against range queries. One possible + solution would be to hash ranges, but this can require a priori + partitioning. If the partitions are too large, partitions risk + overload. If they are too small, there may be too many hops. + + Despite these potential shortcomings, there have been several range + query proposals based on DHTs. If hashing ranges to nodes, it is + entirely possible that overlapping ranges map to different nodes. + Gupta, Agrawal, et al. rely on locality sensitive hashing to ensure + that, with high probability, similar ranges are mapped to the same + node [77]. They propose one particular family of locality sensitive + hash functions, called min-wise independent permutations. The number + of partitions per node and the path length were plotted against the + total numbers of peers in the system. For a network with 1000 nodes, + the hop count distribution was very similar to that of the exact- + matching Chord scheme. Was it load-balanced? For the same network + with 50,000 partitions, there were over two orders of magnitude + variation in the number of partitions at each node (first and + ninety-ninth percentiles). The Prefix Hash Tree is a trie in which + prefixes are hashed onto any DHT. The preliminary analysis suggests + efficient doubly logarithmic lookup, balanced load, and fault + resilience [78, 79]. Andrzejak and Xu were perhaps the first to + propose a mapping from ranges to DHTs [80]. They use one particular + Space Filling Curve, the Hilbert curve, over a Content Addressable + Network (CAN) construction (Section 3.5.3). They maintain two + properties: nearby ranges map to nearby CAN zones; if a range is + split into two sub-ranges, then the zones of the sub-ranges partition + the zone of the primary range. They plot path length and load proxy + measures (the total number of messages and nodes visited) for three + algorithms to propagate range queries: brute force, controlled + flooding, and directed controlled flooding. Schmidt and Parashar + also advocated Space Filling Curves to achieve range queries over a + DHT [81]. However, they point out that, while Andrzejak and Xu use + an inverse Space Filling Curve to map a one-dimensional space to d- + dimensional zones, they map a d-dimensional space back to a one- + dimensional index. Such a construction gives the ability to search + across multiple attributes (Section 5.2). Tanin, Harwood, et al. + suggested quadtrees over Chord [82], and gave preliminary simulation + results for query response times. + + Because DHTs are naturally constrained to exact-match, single-key + queries, researchers have considered other P2P indexes for range + searches. Several were based on Skip Graphs [38, 41], which, unlike + + + +Risson & Moors Informational [Page 46] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + the DHTs, do not necessitate randomizing hash functions and are + therefore capable of range searches. Unfortunately, they are not + load balanced [83]. For example, in SkipNet [48], hashing was added + to balance the load -- the Skip Graph could support range searches or + load balancing, but not both. One solution for load-balancing relies + on an increased number of 'virtual' servers [168] but, in their + search for a system that can both search for ranges and balance + loads, Bharambe, Agrawal, et al. rejected the idea [84]. The virtual + servers work assumed load imbalance stems from hashing; that is, by + skewed data insertions and deletions. In some situations, the + imbalance is triggered by a skewed query load. In such + circumstances, additional virtual servers can increase the number of + routing hops and increase the number of pointers that a Skip Graph + needs to maintain. Ganesan, Bawa, et al. devised an alternate method + to balance load [83]. They proposed two Skip Graphs, one to index + the data itself and the other to track load at each node in the + system. Each node is able to determine the load on its neighbours + and the most (least) loaded nodes in the system. They devise two + algorithms: NBRADJUST balances load on neighbouring nodes; using + REORDER, empty nodes can take over some of the tuples on heavily + loaded nodes. Their simulations focus on skewed storage load, rather + than on skewed query loads, but they surmise that the same approach + could be used for the latter. + + Other proposals for range queries avoid both the DHT and the Skip + Graph. Bharambe, Agrawal, et al. distinguish their Mercury design by + its support for multi-attribute range queries and its explicit load + balancing [84]. In Mercury, nodes are grouped into routing hubs, + each of which is responsible for various query attributes. While it + does not use hashing, Mercury is loosely similar to the DHT + approaches: nodes within hubs are arranged into rings, like Chord + [34]; for efficient routing within hubs, k long-distance links are + used, like Symphony [381]. Range lookups require O(((log n)^2)/k) + hops. Random sampling is used to estimate the average load on nodes + and to find the parts of the overlay that are lightly loaded. + Whereas Symphony assumed that nodes are responsible for ranges of + approximately equal size, Mercury's random sampling can determine the + location of the start of the range, even for non-uniform ranges [84]. + P-Grid [42] does provide for range queries, by virtue of the key + ordering in its tree structures. Ganesan, Bawa, et al. critiqued its + capabilities [83]: P-Grid assumes fixed-capacity nodes; there was no + formal characterization of imbalance ratios or balancing costs; every + P-Grid periodically contacts other nodes for load information. + + The work on Scalable Distributed Data Structures (SDDSs) has + progressed in parallel with P2P work and has addressed range queries. + Like the DHTs above, the early SDDS Linear Hashing (LH*) schemes were + not order-preserving [52]. To facilitate range queries, Litwin, + + + +Risson & Moors Informational [Page 47] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + Niemat, et al. devised a Range Parititioning variant, RP* [87]. + There are options to dispense with the index, to add indexes to + clients, and to add them to servers. In the variant without an + index, every query is issued via multicasting. The other variants + also use some multicasting. The initial RP* paper suggested + scalability to thousands of sites, but a more recent RP* simulation + was capped at 140 servers [88]. In that work, Tsangou, Ndiaye, et + al. investigated TCP and UDP mechanisms by which servers could return + range query results to clients. The primary metrics were search and + response times. Amongst the commercial parallel database management + systems, they reported that the largest seems only to scale to 32 + servers (SQL Server 2000). For future work, they planned to explore + aggregation of query results, rather than establishing a connection + between the client and every single server with a response. + + All in all, it seems there are numerous open research questions on + P2P range queries. How realistic is the maintenance of global load + statistics considering the scale and dynamism of P2P networks? + Simulations at larger scales are required. Proposals should take + into account both the storage load (insert and delete messages) and + the query load (lookup messages). Simplifying assumptions need to be + attacked. For example, how well do the above solutions work in + networks with heterogeneous nodes, where the maximum message loads + and index sizes are node-dependent? + +5.2. Multi-Attribute Queries + + There has been some work on multi-attribute P2P queries. As late as + September 2003, it was suggested that there has not been an efficient + solution [76]. + + Again, an early significant work on multi-attribute queries over + aggregated commodity nodes germinated amongst SDDSs. k-RP* [89] uses + the multi-dimensional binary search tree (or k-d tree, where k + indicates the number of dimensions of the search index) [384]. It + builds on the RP* work from the previous section and inherits their + capabilities for range search and partial match. Like the other + SDDSs, k-RP* indexes can fit into RAM for very fast lookup. For + future work, Litwin and Neimat suggested a) a formal analysis of the + range search termination algorithm and the k-d paging algorithm, b) a + comparison with other multi-attribute data structures (quad-trees and + R-trees) and c) exploration of query processing, concurrency control, + and transaction management for k-RP* files [89]. On the latter + point, others have considered transactions to be inconsequential to + the core problem of supporting more complex queries in P2P networks + [72]. + + + + + +Risson & Moors Informational [Page 48] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + In architecting their secure wide-area Service Discovery Service + (SDS), Hodes, Czerwinski, et al. considered three possible designs + for multi-criteria search -- Centralization, Mapping and Flooding + [90]. These correlate to the index classifications of Section 2 -- + Central, Distributed, and Local. They discounted the centralized, + Napster-like index for its risk of a single point of failure. They + considered the hash-based mappings of Section 3, but concluded that + it would not be possible to adequately partition data. A document + satisfying many criteria would be wastefully stored in many + partitions. They rejected full flooding for its lack of scalability. + Instead, they devised a query filtering technique, reminiscent of + Gnutella's query routing protocol (Section 4.1). Nodes push + proactive summaries of their data rather than waiting for a query. + Summaries are aggregated and stored throughout a server hierarchy, to + guide subsequent queries. Some initial prototype measurements were + provided for total load on the system, but not for load distribution. + They put several issues forward for future work. The indexing needs + to be flexible to change according to query and storage workloads. A + mesh topology might improve on their hierarchic topology since query + misses would not propagate to root servers. The choice is analogous + to BGP meshes and DNS trees. + + More recently, Cai, Frank, et al. devised the Multi-Attribute + Addressable Network (MAAN) [91]. They built on Chord to provide both + multi-attribute and range queries, claiming to be the first to + service both query types in a structured P2P system. Each MAAN node + has O(log n) neighbours, where N is the number of nodes. MAAN + multi-attribute range queries require O(log n+N*Smin) hops, where + Smin is the minimum range selectivity across all attributes. + Selectivity is the ratio of the query range to the entire identifier + range. The paper assumed that a locality preserving hash function + would ensure balanced load. Per Section 5.1, the arguments by + Bharambe, Agrawal, et al. have highlighted the shortcomings of this + assumption [84]. MAAN required that the schema must be fixed and + known in advance -- adaptable schemas were recommended for subsequent + attention. The authors also acknowledged that there is a selectivity + breakpoint at which full flooding becomes more efficient than their + scheme. This begs for a query resolution algorithm that adapts to + the profile of queries. Cai and Frank followed up with RDFPeers + [55]. They differentiate their work from other RDF proposals by a) + guaranteeing to find query results if they exist and b) removing the + requirement of prior definition of a fixed schema. They hashed + <subject, predicate, object> triples onto the MAAN and reported + routing hop metrics for their implementation. Load imbalance across + nodes was reduced to less than one order of magnitude, but the + specific measure was the number of triples stored per node - skewed + query loads were not considered. They plan to improve load balancing + with the virtual servers of Section 5.1 [168]. + + + +Risson & Moors Informational [Page 49] + +RFC 4981 Survey of Research on P2P Search September 2007 + + +5.3. Join Queries + + Two research teams have done some initial work on P2P join + operations. Harren, Hellerstein, et al. initially described a + three-layer architecture -- storage, DHT and query processing. They + implemented the join operation by modifying an existing Content + Addressable Network (CAN) simulator, reporting "significant hot-spots + in all dimensions: storage, processing, and routing" [72]. They + progressed their design more recently in the context of PIER, a + distributed query engine based on CAN [22, 385]. They implemented + two equi-join algorithms. In their design, a key is constructed from + the "namespace" and the "resource ID". There is a namespace for each + relation and the resource ID is the primary key for base tuples in + that relation. Queries are multicast to all nodes in the two + namespaces (relations) to be joined. Their first algorithm is a DHT + version of the symmetric hash join. Each node in the two namespaces + finds the relevant tuples and hashes them to a new query namespace. + The resource ID in the new namespace is the concatenation of join + attributes. In the second algorithm, called "fetch matches", one of + the relations is already hashed on the join attributes. Each node in + the second namespace finds tuples matching the query and retrieves + the corresponding tuples from the first relation. They leveraged two + other techniques, namely the symmetric semi-join rewrite and the + Bloom filter rewrite, to reduce the high bandwidth overheads of the + symmetric hash join. For an overlay of 10,000 nodes, they simulated + the delay to retrieve tuples and the aggregate network bandwidth for + these four schemes. The initial prototype was on a cluster of 64 + PCs, but it has more recently been expanded to PlanetLab. + + Triantafillou and Pitoura considered multicasting to large numbers of + peers to be inefficient [76]. They therefore allocated a limited + number of special peers, called range guards. The domain of the join + attributes was divided, one partition per range guard. Join queries + were sent only to range guards, where the query was executed. + Efficient selection of range guards and a quantitive evaluation of + their proposal were left for future work. + +5.4. Aggregation Queries + + Aggregation queries invariable rely on tree-structures to combine + results from a large number of nodes. Examples of aggregation + queries are Count, Sum, Maximum, Minimum, Average, Median, and Top-K + [92, 386, 387]. Figure 5 summarizes the tree and query + characteristics that affect dependability. + + + + + + + +Risson & Moors Informational [Page 50] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + Tree type: Doesn't use DHT [92], use internal DHT trees [95], use + independent trees on top of DHTs + Tree repair: Periodic [93], exceptional [32] + Tree count: One per key, one per overlay [56] + Tree flexibility: Static [92], dynamic + + Query interface: install, update, probe [98] + Query distribution: multicast [98], gossip [92] + Query applications: leader election, voting, resource location, + object placement and error recovery [98, 388] + Query semantics + Consistency: Best-effort, eventual [92], snapshot / interval / + single-site validity [99] + Timeliness [388] + Lifetime: Continuous [97, 99], single-shot + No. attributes: Single, multiple + Query types: Count, sum, maximum, minimum, average, median, top k + [92, 386, 387] + + Figure 5: Aggregation Trees and Queries in P2P Networks + + Key: Astrolabe [92]; Cone [93]; Distributed Approximative System + Information Service (DASIS) [95]; Scalable Distributed Information + + Management System (SDIMS) [98]; Self-Organized Metadata Overlay + (SOMO) [56]; Wildfire [99]; Willow [32]; Newscast [97] + + The fundamental design choices for aggregation trees relate to how + the overlay uses DHTs, how it repairs itself when there are failures, + how many aggregation trees there are, and whether the tree is static + or dynamic (Figure 5). Astrolabe is one of the most influential P2P + designs included in Figure 5, yet it makes no use of DHTs [92]. + Other designs make use of the internal trees of Plaxton-like DHTs. + Others build independent tree structures on top of DHTs. Most of the + designs repair the aggregation tree with periodic mechanisms similar + to those used in the DHTs themselves. Willow is an exception [32]. + It uses a Tree Maintenance Protocol to "zip" disjoint aggregation + trees together when there are major failures. Yalagandula and Dahlin + found reconfigurations at the aggregation layer to be costly, + suggesting more research on techniques to reduce the cost and + frequency of such reconfigurations [98]. Many of the designs use + multiple aggregation trees, each rooted at the DHT node responsible + for the aggregation attribute. On the other hand, the Self-Organized + Metadata Overlay [56] uses a single tree and is vulnerable to a + single point of failure at its root. + + + + + + +Risson & Moors Informational [Page 51] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + At the time of writing, researchers have just begun exploring the + performance of queries in the presence of churn. Most designs are + for best-effort queries. Bawa, et al. devised a better consistency + model, called Single-Site Validity [99] to qualify the accuracy of + results when there is churn. Its price was a five-fold increase in + the message load, when compared to an efficient but best-effort + Spanning Tree. Gossip mechanisms are resilient to churn, but they + delay aggregation results and incur high message cost for aggregation + attributes with small read-to-write ratios. + +6. Security Considerations + + An initial list of references to research on P2P security is given in + Figure 1, Section 1. This document addresses P2P search. P2P + storage, security, and applications are recommended for further + investigation in Section 8. + +7. Conclusions + + Research on peer-to-peer networks can be divided into four categories + -- search, storage, security and applications. This critical survey + has focused on search methods. While P2P networks have been + classified by the existence of an index (structured or unstructured) + or the location of the index (local, centralized, and distributed), + this survey has shown that most have evolved to have some structure, + whether it is indexes at superpeers or indexes defined by DHT + algorithms. As for location, the distributed index is most common. + The survey has characterized indexes as semantic and semantic-free. + It has also critiqued P2P work on major query types. While much of + it addresses work from 2000 or later, we have traced important + building blocks from the 1990s. + + The initial motivation in this survey was to answer the question, + "How robust are P2P search networks?" The question is key to the + deployment of P2P technology. Balakrishnan, Kaashoek, et al. argued + that the P2P architecture is appealing: the startup and growth + barriers are low; they can aggregate enormous storage and processing + resources; "the decentralized and distributed nature of P2P systems + gives them the potential to be robust to faults or intentional + attacks" [18]. If P2P is to be a disruptive technology in + applications other than casual file sharing, then robustness needs to + be practically verified [20]. + + The best comparative research on P2P dependability has been done in + the context of Distributed Hash Tables (DHTs) [291]. The entire body + of DHT research can be distilled to four main observations about + dependability (Section 3.2). Firstly, static dependability + comparisons show that no O(log n) DHT geometry is significantly more + + + +Risson & Moors Informational [Page 52] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + dependable than the other O(log n) geometries. Secondly, dynamic + dependability comparisons show that DHT dependability is sensitive to + the underlying topology maintenance algorithms (Figure 2). Thirdly, + most DHTs use O(log n) geometries to suit ephemeral nodes, whereas + the O(1) hop DHTs suit stable nodes - they deserve more research + attention. Fourthly, although not yet a mature science, the study of + DHT dependability is helped by recent simulation tools that support + multiple DHTs [299]. + + We make the following four suggestions for future P2P research: + + 1) Complete the companion P2P surveys for storage, security, and + applications. A rough outline has been suggested in Figure 1, + along with references. The need for such surveys was highlighted + within the peer-to-peer research group of the Internet Research + Task Force (IRTF) [17]. + + 2) P2P indexes are maturing. P2P queries are embryonic. Work on + more expressive queries over P2P indexes started to gain momentum + in 2003, but remains fraught with efficiency and load issues. + + 3) Isolate the low-level mechanisms affecting robustness. There is + limited value in comparing robustness of DHT geometries (like + rings versus de Bruijn graphs), when robustness is highly + sensitive to underlying topology maintenance algorithms (Figure + 2). + + 4) Build consensus on robustness metrics and their acceptable ranges. + This paper has teased out numerous measures that impinge on + robustness, for example, the median query path length for a + failure of x% of nodes, bisection width, path overlap, the number + of alternatives available for the next hop, lookup latency, + average live bandwidth (bytes/node/sec), successful routing rates, + the number of timeouts (caused by a finger pointing to a departed + node), lookup failure rates (caused by nodes that temporarily + point to the wrong successor during churn), and clustering + measures (edge expansion and node expansion). Application-level + robustness metrics need to drive a consistent assessment of the + underlying search mechanics. + +8. Acknowledgments + + This document was adapted from a paper in Elsevier's Computer + Networks: + + J. Risson & T. Moors, Survey of Research towards Robust Peer-to- + Peer Networks: Search Methods, Computer Networks 51(7)2007. + + + + +Risson & Moors Informational [Page 53] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + We thank Bill Yeager, Ali Ghodsi, and several anonymous reviewers for + thorough comments that significantly improved the quality of earlier + versions of this document. + +9. References + +9.1. Informative References + + [1] M. Roussopoulos, M. Baker, D. Rosenthal, T. Guili, P. Maniatis, + and J. Mogul, 2 P2P of Not 2 P2P?, The 3rd Int'l Workshop on + Peer-to-Peer Systems, February 26-27 2004. + + [2] A. Rowstron and P. Druschel, Pastry: Scalable, distributed + object location and routing for large-scale peer-to-peer + systems, IFIP/ACM Middleware 2001, Nov 2001. + + [3] B. Yeager and B. Bhattacharjee, Peer-to-Peer Research Group + Charter, http://www.irtf.org/charters/p2prg.html (2003) + + [4] T. Klingberg and R. Manfredi, Gnutella 0.6, (2002) + + [5] I. Clarke, A Distributed Decentralised Information Storage and + Retrieval System, Undergraduate Thesis, 1999. + + [6] B. Zhao, J. Kubiatowicz, and A. Joseph, Tapestry: an + infrastructure for fault-tolerant wide-area location and + routing, Report No. UCB/CSD-01-1141 2001. + + [7] I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, + F. Dabek, and H. Balakrishnan, Chord: A scalable peer-to-peer + lookup service for internet applications, Proc. ACM SIGCOMM + 2001 2001, pp. 149-160. + + [8] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker, + A scalable content-addressable network, Proc. of the conf. on + Applications, technologies, architectures and protocols for + computer communications, August 27-31 2001, pp. 161-172. + + [9] C. Tang, Z. Xu, and M. Mahalingam, pSearch: information + retrieval in structured overlays, First Workshop on Hot Topics + in Networks. Also Computer Communication Review, Volume 33, + Number 1, January 2003, Oct 28-29 2002. + + [10] W. Nejdl, S. Decker, and W. Siberski, Edutella Project, RDF- + based Metadata Infrastructure for P2P Applications, + http://edutella.jxta.org/ (2003) + + + + + +Risson & Moors Informational [Page 54] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [11] K. Aberer and M. Hauswirth, Peer-to-peer information systems: + concepts and models, state-of-the-art, and future systems, ACM + SIGSOFT Software Engineering Notes, Proc. 8th European software + engineering conference held jointly with 9th ACM SIGSOFT + international symposium on foundations of software engineering + 26 (5) (2001) + + [12] L. Zhou and R. van Renesse, P6P: a peer-to-peer approach to + internet infrastructure, The 3rd Int'l Workshop on Peer-to-Peer + Systems, February 26-27 2004. + + [13] Citeseer, Citeseer Scientific Literature Digital Library, + http://citeseer.ist.psu.edu/ (2004) + + [14] D. Milojicic, V. Kalogeraki, R. Lukose, K. Nagaraja, J. Pruyne, + B. Richard, S. Rollins, and Z. Xu, Peer-to-Peer Computing, HP + Technical Report, HPL-2002-57 2002. + + [15] K. Aberer and M. Hauswirth, An overview on peer-to-peer + information systems, Workshop on Distributed Data and + Structures WDAS-2002 2002. + + [16] F. DePaoli and L. Mariani, Dependability in Peer-to-Peer + Systems, IEEE Internet Computing 8 (4) (2004) 54-61. + + [17] B. Yeager, Proposed research tracks, Email to the Internet + Research Task Force IRTF P2P Research Group, Nov 10 2003. + + [18] H. Balakrishnan, M. F. Kaashoek, D. Karger, R. Morris, and I. + Stoica, Looking up data in P2P systems, Communications of the + ACM 46 (2) (2003) 43-48. + + [19] D. Kossmann, The state of the art in distributed query + processing, ACM Computing Surveys 32 (4) (2000) 422-469. + + [20] B. Gedik and L. Liu, Reliable peer-to-peer information + monitoring through replication, Proc. 22nd Int'l Symp. on + Reliable Distributed Systems, 6-8 Oct 2003, pp. 56-65. + + [21] S.-M. Shi, Y. Guangwen, D. Wang, J. Yu, S. Qu, and M. Chen, + Making peer-to-peer keyword searching feasible using multi- + level partitioning, The 3rd Int'l Workshop on Peer-to-Peer + Systems, February 26-27 2004. + + [22] R. Huebsch, J. M. Hellerstein, N. Lanham, B. T. Loo, S. + Shenker, and I. Stoica, Querying the Internet with PIER, Proc. + 29th Int'l Conf. on Very Large Databases VLDB'03, September + 2003. + + + +Risson & Moors Informational [Page 55] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [23] J. M. Hellerstein, Toward network data independence, ACM SIGMOD + Record 32 (3) (2003) 34-40. + + [24] K. Gummadi, R. Gummadi, S. Gribble, S. Ratnasamy, S. Shenker, + and I. Stoica, The impact of DHT routing geometry on resilience + and proximity, Proc. 2003 conference on Applications, + Technologies, Architectures and Protocols for Computer + Communications 2003, pp. 381-394. + + [25] N. Daswani, H. Garcia-Molina, and B. Yang, Open Problems in + Data- sharing Peer-to-peer Systems, The 9th Int'l Conf. on + Database Theory (ICDT 2003), Siena, Italy, 8-10 January (2003) + + [26] B. Cooper and H. Garcia-Molina, Studying search networks with + SIL, Second Int'l Workshop on Peer-to-Peer Systems IPTPS 03, + 20- 21 February 2003. + + [27] M. Bawa, Q. Sun, P. Vinograd, B. Yang, B. Cooper, A. Crespo, N. + Daswani, P. Ganesan, H. Garcia-Molina, S. Kamvar, S. Marti, and + M. Schlossed, Peer-to-peer research at Stanford, ACM SIGMOD + Record 32 (3) (2003) 23-28. + + [28] B. Yang and H. Garcia-Molina, Improving search in peer-to-peer + networks, Proc. 22nd IEEE Int'l Conf. on Distributed Computing + Systems, July 2002. + + [29] B. Yang and H. Garcia-Molina, Efficient search in peer-to-peer + networks, Proc. 22nd Int'l Conf. on Distributed Computing + Systems, July 2-5 2002. + + [30] C. Plaxton, R. Rajaraman, and A. Richa, Accessing nearby copies + of replicated objects in a distributed environment, ACM Symp. + on Parallel Algorithms and Architectures (1997) + + [31] B. Zhao, L. Huang, J. Stribling, S. Rhea, A. Joseph, and J. + Kubiatowicz, Tapestry: A Resilient Global-Scale overlay for + Service Deployment, IEEE Journal on Selected Areas in + Communications 22 (1) (2004) 41-53. + + [32] R. van Renesse and A. Bozdog, Willow: DHT, aggregation and + publish/subscribe in one protocol, The 3rd Int'l Workshop on + Peer-to-Peer Systems, February 26-27 2004. + + [33] P. Ganesan, G. Krishna, and H. Garcia-Molina, Canon in G Major: + Designing DHTs with Hierarchical Structure, Proc. Int'l Conf. + on Distributed Computing Systems ICDCS 2004 2004. + + + + + +Risson & Moors Informational [Page 56] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [34] I. Stoica, R. Morris, D. Liben-Nowell, D. Karger, M. Kaashoek, + F. Dabek, and H. Balakrishnan, Chord: a scalable peer-to-peer + lookup protocol for Internet applications, IEEE/ACM Trans. on + Networking 11 (1) (2003) 17-32. + + [35] S. Rhea, T. Roscoe, and J. Kubiatowicz, Structured Peer-to-Peer + Overlays Need Application-Driven Benchmarks, Proc. 2nd Int'l + Workshop on Peer-to-Peer Systems IPTPS'03, February 20-21 2003. + + [36] D. Loguinov, A. Kumar, and S. Ganesh, Graph-theoretic analysis + of structured peer-to-peer systems: routing distances and + fault resilience, Proc. 2003 conference on Applications, + Technologies, Architectures and Protocols for Computer + Communications, August 25-29 2003, pp. 395-406. + + [37] F. Kaashoek and D. Karger, Koorde: A simple degree-optimal + hash table, Second Int'l Workshop on Peer-to-Peer Systems + IPTPS'03, 20-21 February 2003. + + [38] N. Harvey, M. B. Jones, S. Saroiu, M. Theimer, and A. Wolman, + SkipNet: A Scalable Overlay Network with Practical Locality + Properties, Proc. Fourth USENIX Symp. on Internet Technologies + and Systems USITS'03, March 2003. + + [39] I. Gupta, K. Birman, P. Linga, A. Demers, and R. Van Renesse, + Kelips: Building an efficient and stable P2P DHT through + increased memory and background overhead, Second Int'l Workshop + on Peer-to-Peer Systems IPTPS 03, Feb 20-21 2003. + + [40] J. Cates, Robust and Efficient Data Management for a + Distributed Hash Table, Master's Thesis, May 2003. + + [41] J. Aspnes and G. Shah, Skip graphs, Proc. 14th annual ACM-SIAM + symposium on discrete algorithms (2003) 384-393. + + [42] K. Aberer, P. Cudre-Mauroux, A. Datta, Z. Despotovic, M. + Hauswirth, M. Punceva, and R. Schmidt, P-Grid: a self- + organizing structured P2P system, ACM SIGMOD Record 32 (3) + (2003) 29-33. + + [43] B. Zhao, Y. Duan, L. Huang, A. Joseph, and J. Kubiatowicz, + Brocade: landmark routing on overlay networks, First Int'l + Workshop on Peer-to-Peer Systems IPTPS'02, March 2002. + + [44] S. Ratnasamy, S. Shenker, and I. Stoica, Routing algorithms for + DHTs: some open questions, Proc. First Int'l Workshop on Peer + to Peer Systems, IPTPS 2002, March 2002. + + + + +Risson & Moors Informational [Page 57] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [45] P. Maymounkov and D. Mazieres, Kademlia: A peer-to-peer + information system based on the XOR metric, Proc. First Int'l + Workshop on Peer to Peer Systems, IPTPS 2002, March 7-8 2002. + + [46] D. Malkhi, M. Naor, and D. Ratajczak, Viceroy: a scalable and + dynamic emulation of the butterfly, Proc. 21st annual symposium + on principles of distributed computing PODC, July 21-24 2002, + pp. 183-192. + + [47] X. Li and C. Plaxton, On name resolution in peer to peer + networks, Proc. ACM SIGACT Annual Workshop on Principles of + Mobile Computing POMC'02 2002, pp. 82-89. + + [48] N. Harvey, J. Dunagan, M. B. Jones, S. Saroiu, M. Theimer, and + A. Wolman, SkipNet: A Scalable overlay Network with Practical + Locality Properties, Microsoft Research Technical Report MSR- + TR- 2002-92 (2002) + + [49] D. Karger, E. Lehman, T. Leighton, R. Panigraphy, M. Levin, and + D. Lewin, Consistent hashing and random trees: distributed + caching protocols for relieving hot spots on the World Wide + Web, ACM Symp. on Theory of Computing (1997) + + [50] W. Litwin, M. Neimat, and D. Schneider, LH* - a scalable, + distributed data structure, ACM Trans. on Database Systems + (TODS) 21 (4) (1996) 480-525. + + [51] R. Devine, Design and Implementation of DDH: A Distributed + Dynamic Hashing Algorithm, Proc. 4th Int'l Conf. on + Foundations of Data Organizations and Algorithms 1993. + + [52] W. Litwin, M.-A. Niemat, and D. Schneider, LH* - Linear Hashing + for Distributed Files, Proc. ACM Int'l Conf. on Mngt. of Data + SIGMOD, May 1993, pp. 327-336. + + [53] C. Tempich, S. Staab, and A. Wranik, Remindin': semantic query + routing in peer-to-peer networks, Proc. 13th conference on + World Wide Web, New York, NY, USA, May 17-20 (2004) 640-649. + + [54] B. T. Loo, R. Huebsch, I. Stoica, and J. M. Hellerstein, The + case for a hybrid P2P search infrastructure, The 3rd Int'l + Workshop on Peer-to-Peer Systems, February 26-27 2004. + + [55] M. Cai and M. Frank, RDFPeers: a scalable distributed RDF + repository based on a structured peer-to-peer network, Proc. + 13th conference on World Wide Web, May 17-20 2004, pp. 650-657. + + + + + +Risson & Moors Informational [Page 58] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [56] Z. Zhang, S.-M. Shi, and J. Zhu, SOMO: Self-organized metadata + overlay for resource management in P2P DHTs, Second Int'l + Workshop on Peer-to-Peer Systems IPTPS'03, Feb 20-21 2003. + + [57] B. Yang and H. Garcia-Molina, Designing a super-peer network, + Proc. 19th Int'l Conf. on Data Engineering ICDE, March 2003. + + [58] I. Tatarinov, P. Mork, Z. Ives, J. Madhavan, A. Halevy, D. + Suciu, N. Dalvi, X. Dong, Y. Kadiyska, and G. Miklau, The + Piazza peer data management project, ACM SIGMOD Record 32 (3) + (2003) 47-52. + + [59] W. Nejdl, W. Siberski, and M. Sintek, Design Issues and + Challenges for RDF- and schema-based peer-to-peer systems, ACM + SIGMOD Record 32 (3) (2003) 41-46. + + [60] S. Joseph and T. Hoshiai, Decentralized Meta-Data Strategies: + Effective Peer-to-Peer Search, IEICE Trans. Commun. E86-B (6 + June) (2003) 1740-1753. + + [61] Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S. + Shenker, Making gnutella-like P2P systems scalable, Proc. 2003 + conference on Applications, Technologies, Architectures and + Protocols for Computer Communications, August 25-29 2003, pp. + 407-418. + + [62] M. Bawa, G. S. Manku, and P. Raghavan, SETS: search enhanced by + topic segmentation, Proc. 26th annual international ACM SIGIR + conference on Research and Development in Information Retrieval + 2003, pp. 306-313. + + [63] H. Sunaga, M. Takemoto, and T. Iwata, Advanced peer to peer + network platform for various services - SIONet Semantic + Information Oriented Network, Proc. Second Int'l Conf. on Peer + to Peer Computing, Sept 5-7 2002, pp. 169-170. + + [64] M. Schlosser, M. Sintek, S. Decker, and W. Nejdl, HyperCuP - + Hypercubes, Ontologies and P2P Networks, Springer Lecture Notes + on Computer Science, Agents and Peer-to-Peer Systems Vol. 2530 + (2002) + + [65] M. Ripeanu, A. Iamnitchi, and P. Foster, Mapping the Gnutella + network, IEEE Internet Computing 6 (1) (2002) 50-57. + + [66] Q. Lv, S. Ratnasamy, and S. Shenker, Can Heterogeneity Make + Gnutella Scalable?, Proc. 1st Int'l Workshop on Peer-to-Peer + Systems IPTPS2002, March 7-8 2002. + + + + +Risson & Moors Informational [Page 59] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [67] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, Search and + replication in unstructured peer to peer networks, Proc. 16th + international conference on supercomputing, June 22-26 2002, + pp. 84-95. + + [68] V. Kalogaraki, D. Gunopulos, and D. Zeinalipour-Yasti, XML + schemas: integration and translation: A local search + mechanism for peer to peer networks, Proc. 11th ACM + international conference on Information and Knowledge + management 2002, pp. 300- 307. + + [69] O. Babaoglu, H. Meling, and Montresor, Anthill: a framework + for the development of agent-based peer-to-peer systems, Proc. + IEEE Int'l Conf. on Distributed Computer systems 2002, pp. 15- + 22. + + [70] M. Jovanovic, Modeling large-scale peer-to-peer networks and a + case study of Gnutella, Master's Thesis 2001. + + [71] I. Clarke, O. Sandberg, B. Wiley, and T. Hong, Freenet: A + Distributed Anonymous Information Storage and Retrieval System. + Springer, New York, USA, 2001. + + [72] J. Harren, J. Hellerstein, R. Huebsch, B. Loo, S. Shenker, and + I. Stoica, Complex queries in DHT-based peer-to-peer networks, + Proc. First Int'l Workshop on Peer to Peer Systems IPTPS 2002, + March 2002. + + [73] B. Gedik and L. Liu, PeerCQ: A Decentralized and Self- + Configuring Peer-to-Peer Information Monitoring System, Proc. + 23rd Int'l Conf. on Distributed Computing Systems ICDCS2003, + May 19-22 2003. + + [74] B. T. Loo, R. Huebsch, J. M. Hellerstein, T. Roscoe, and I. + Stoica, Analyzing P2P Overlays with Recursive Queries, + Technical Report, CSD-04-1301, January 14 2004. + + [75] R. Avnur and J. Hellerstein, Eddies: continuously adaptive + query processing, Proc. 2000 ACM SIGMOD international + conference on Management of Data 2000, pp. 261-272. + + [76] P. Triantafillou and T. Pitoura, Towards a unifying framework + for complex query processing over structured peer-to-peer data + networks, Proc. First Int'l Workshop on Databases, Information + Systems and Peer-to-Peer Computing DBISP2P, Sept 7-8 2003, pp. + 169-183. + + + + + +Risson & Moors Informational [Page 60] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [77] A. Gupta, D. Agrawal, and A. E. Abbadi, Approximate range + selection queries in peer-to-peer systems, Proc. First Biennial + Conf. on Innovative Data Systems Research CIDR 2003 2003. + + [78] S. Ratnasamy, P. Francis, and M. Handley, Range queries in + DHTs, Technical Report IRB-TR-03-009, July 2003. + + [79] S. Ramabhadran, S. Ratnasamy, J. Hellerstein, and S. Shenker, + Brief announcement: prefix hash tree, Proc. 23rd Annual ACM + SIGACT-SIGOPS Symp. on Principles of Distributed Computing, + PODC 2004, July 25-28 2004, pp. 368-368. + + [80] A. Andrzejak and Z. Xu, Scalable, efficient range queries for + grid information services, Proc. Second IEEE Int'l Conf. on + Peer to Peer Computing, September 2002. + + [81] C. Schmidt and M. Parashar, Enabling flexible queries with + guarantees in P2P systems, IEEE Internet Computing 8 (3) (2004) + 19-26. + + [82] E. Tanin, A. Harwood, and H. Samet, Indexing distributed + complex data for complex queries, Proc. National Conf. on + Digital Government Research 2004, pp. 81-90. + + [83] P. Ganesan, M. Bawa, and H. Garcia-Molina, Online Balancing of + Range-Partitioned Data with Applications to Peer-to-Peer + Systems, Proc. 30th Int'l Conf. on Very Large Data Bases VLDB + 2004, 29 August - 3 September 2004. + + [84] A. Bharambe, M. Agrawal, and S. Seshan, Mercury: Supporting + Scalable Multi-Attribute Range Queries, SIGCOMM'04, Aug 30-Sept + 3 2004. + + [85] K. Aberer, Scalable Data Access in P2P Systems Using Unbalanced + Search Trees, Workshop on Distributed Data and Structures WDAS- + 2002 2002. + + [86] K. Aberer, A. Datta, and M. Hauswirth, The Quest for Balancing + Peer Load in Structured Peer-to-Peer Systems, Technical Report + IC/2003/32 2003. + + [87] W. Litwin, M.-A. Neimat, and D. Schneider, RP*: a family of + order-preserving scalable distributed data structures, Proc. + 20th Int'l Conf. on Very Large Data Bases VLDB'94, September + 12-15 1994. + + + + + + +Risson & Moors Informational [Page 61] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [88] M. Tsangou, S. Ndiaye, M. Seck, and W. Litwin, Range queries to + scalable distributed data structure RP*, Proc. Fifth Workshop + on Distributed Data and Structures, WDAS 2003, June 2003. + + [89] W. Litwin and M.-A. Neimat, k-RP*s: a scalable distributed data + structure for high-performance multi-attributed access, Proc. + Fourth Int'l Conf. on Parallel and Distributed Information + Systems (1996) 120-131. + + [90] T. Hodes, S. Czerwinski, B. Zhao, A. Joseph, and R. Katz, An + architecture for secure wide-area service discovery, Wireless + Networks 8 (2/3) (2002) 213-230. + + [91] M. Cai, M. Frank, J. Chen, and P. Szekely, MAAN: A Multi- + Attribute Addressable Network for Grid Information Services, + Proc. Int'l Workshop on Grid Computing, November 2003. + + [92] R. van Renesse, K. P. Birman, and W. Vogels, Astrolabe: A + robust and scalable technology for distribute system + monitoring, management and data mining, ACM Trans. on Computer + Systems 21 (2) (2003) 164-206. + + [93] R. Bhagwan, G. Varghese, and G. Voelker, Cone: Augmenting DHTs + to support distributed resource discovery, Technical Report, + CS2003- 0755, July 2003. + + [94] K. Albrecht, R. Arnold, and R. Wattenhofer, Join and Leave in + Peer-to-Peer Systems: The DASIS Approach, Technical Report 427, + Department of Computer Science, November 2003. + + [95] K. Albrecht, R. Arnold, and R. Wattenhofer, Aggregating + information in peer-to-peer systems for improved join and + leave, Proc. Fourth IEEE Int'l Conf. on Peer-to-Peer Computing, + 25-27 August 2004. + + [96] A. Montresor, M. Jelasity, and O. Babaoglu, Robust aggregation + protocol for large-scale overlay networks, Technical Report + UBLCS-2003-16, December 2003. + + [97] M. Jelasity, W. Kowalczyk, and M. van Steen, An Approach to + Aggregation in Large and Fully Distributed Peer-to-Peer Overlay + Networks, Proc. 12th Euromicro Conf. on Parallel, Distributted + and Network based Processing PDP 2004, February 2004. + + [98] P. Yalagandula and M. Dahlin, A scalable distributed + information management system, SIGCOMM'04, Aug 30-Sept 3 2004. + + + + + +Risson & Moors Informational [Page 62] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [99] M. Bawa, A. Gionis, H. Garcia-Molina, and R. Motwani, The price + of validity in dynamic networks, Proc. 2004 ACM SIGMOD Int'l + Conf. on the management of data 2004, pp. 515-526. + + [100] J. Aspnes, J. Kirsch, and A. Krishnamurthy, Load Balancing and + Locality in Range-Queriable Data Structures, Proc. 23rd Annual + ACM SIGACT-SIGOPS Symp. on Principles of Distributed Computing + PODC 2004, July 25-28 2004. + + [101] G. On, J. Schmitt, and R. Steinmetz, The effectiveness of + realistic replication strategies on quality of availability for + peer-to-peer systems, Proc. Third Int'l IEEE Conf. on Peer-to- + Peer Computing, Sept 1-3 2003, pp. 57-64. + + [102] D. Geels and J. Kubiatowicz, Replica management should be a + game, Proc. SIGOPS European Workshop, September 2003. + + [103] E. Cohen and S. Shenker, Replication strategies in unstructured + peer to peer networks, Proc. 2002 conference on applications, + technologies, architectures and protocols for computer + communications 2002, pp. 177-190. + + [104] E. Cohen and S. Shenker, P2P and multicast: replication + strategies in unstructured peer to peer networks, Proc. 2002 + conference on applications, technologies, architectures and + protocols for computer communications 2002, pp. 177-190. + + [105] H. Weatherspoon and J. Kubiatowicz, Erasure coding vs + replication: a quantative comparison, Proc. First Int'l + Workshop on Peer to Peer Systems IPTPS'02, March 2002. + + [106] D. Lomet, Replicated indexes for distributed data, Proc. Fourth + Int'l Conf. on Parallel and Distributed Information Systems, + December 18-20 1996, pp. 108-119. + + [107] V. Gopalakrishnan, B. Silaghi, B. Bhattacharjee, and P. + Keleher, Adaptive Replication in Peer-to-Peer Systems, Proc. + 24th Int'l Conf. on Distributed Computing Systems ICDCS 2004, + March 23-26 2004. + + [108] S.-D. Lin, Q. Lian, M. Chen, and Z. Zhang, A practical + distributed mutual exclusion protocol in dynamic peer-to-peer + systems, The 3rd Int'l Workshop on Peer-to-Peer Systems, + February 26-27 2004. + + + + + + + +Risson & Moors Informational [Page 63] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [109] A. Adya, R. Wattenhofer, W. Bolosky, M. Castro, G. Cermak, R. + Chaiken, J. Douceur, J. Howell, J. Lorch, and M. Thiemer, + Farsite: federated, available and reliable storage for an + incompletely trusted environment, ACM SIGOPS Operating Systems + Review, Special issue on Decentralized storage systems (2002) + 1- 14. + + [110] A. Rowstron and P. Druschel, Storage management and caching in + PAST, a large-scale, persistent peer-to-peer storage utility, + Proceedings ACM SOSP'01, October 2001, pp. 188-201. + + [111] S. Rhea, C. Wells, P. Eaton, D. Geels, B. Zhao, H. + Weatherspoon, and J. Kubiatowicz, Maintenance-Free Global Data + Storage, IEEE Internet Computing 5 (5) (2001) 40-49. + + [112] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. + Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. + Wells, and B. Zhao, Oceanstore: An Architecture for global- + scale persistent storage, Proc. Ninth Int'l Conf. on + Architecture Support for Programming Languages and Operating + Systems ASPLOS 2000, November 2000, pp. 190-201. + + [113] K. Birman, The Surprising Power of Epidemic Communication, + Springer-Verlag Heidelberg Lecture Notes in Computer Science + Volume 2584/2003 (2003) 97-102. + + [114] P. Costa, M. Migliavacca, G. P. Picco, and G. Cugola, + Introducing reliability in content-based publish-subscribe + through epidemic algorithms, Proc. 2nd international workshop + on Distributed event-based systems 2003, pp. 1-8. + + [115] P. Costa, M. Migliavacca, G. P. Picco, and G. Cugola, Epidemic + Algorithms for Reliable Content-Based Publish-Subscribe: An + Evaluation, The 24th Int'l Conf. on Distributed Computing + Systems (ICDCS-2004), Mar 23-26, Tokyo University of + Technology, Hachioji, Tokyo, Japan (2004) + + [116] A. Demers, D. Greene, C. Hauser, W. Irish, J. Larson, S. + Shenker, H. Sturgis, D. Swinehart, and D. Terry, Epidemic + algorithms for replicated data management, Proc. Sixth ACM + Symp. on Principles of Distributed Computing 1987, pp. 1-12. + + [117] P. Eugster, R. Guerraoiu, A. Kermarrec, and L. Massoulie, + Epidemic information dissemination in distributed systems, IEEE + Computer 37 (5) (2004) 60-67. + + + + + + +Risson & Moors Informational [Page 64] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [118] W. Vogels, R. v. Renesse, and K. Birman, The power of + epidemics: robust communication for large-scale distributed + systems, ACM SIGCOMM Computer Communication Review 33 (1) + (2003) 131-135. + + [119] S. Voulgaris and M. van Steen, An epidemic protocol for + managing routing tables in very large peer to peer networks, + Proc. 14th IFIP/IEEE Workshop on Distributed Systems: + Operations and Management, October 2003. + + [120] I. Gupta, On the design of distributed protocols from + differential equations, Proc. 23rd Annual ACM SIGACT-SIGOPS + Symp. on Principles of Distributed Computing PODC 2004, July + 25-28 2004, pp. 216-225. + + [121] I. Gupta, K. Birman, and R. van Renesse, Fighting fire with + fire: using randomized gossip to combat stochastic scalability + limits, Cornell University Dept of Computer Science Technical + Report, March 2001. + + [122] K. Birman and I. Gupta, Building Scalable Solutions to + Distributed Computing Problems using Probabilistic Components, + Submitted to the Int'l Conf. on Dependable Systems and Networks + DSN-2004, Dependable Computing and Computing Symp. DCCS, June + 28- July 1 2004. + + [123] A. Ganesh, A.-M. Kermarrec, and L. Massoulie, Peer-to-peer + membership management for gossip-based protocols, IEEE Trans. + on Computers 52 (2) (2003) 139-149. + + [124] N. Bailey, Epidemic Theory of Infectious Diseases and its + Applications, Second Edition ed. Hafner Press, 1975. + + [125] P. Eugster, R. Guerraoiu, S. Handurukande, P. Kouznetsov, and + A.- M. Kermarrec, Lightweight probabilistic broadcast, ACM + Trans. on Computer Systems 21 (4) (2003) 341-374. + + [126] H. Weatherspoon and J. Kubiatowicz, Efficient heartbeats and + repair of softstate in decentralized object location and + routing systems, Proc. SIGOPS European Workshop, September + 2002. + + [127] G. Koloniari and E. Pitoura, Content-based Routing of Path + Queries in Peer-to-Peer Systems, Proc. 9th Int'l Conf. on + Extending DataBase Technology EDBT, March 14-18 2004. + + + + + + +Risson & Moors Informational [Page 65] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [128] A. Mohan and V. Kalogaraki, Speculative routing and update + propagation: a kundali centric approach, IEEE Int'l Conf. on + Communications ICC'03, May 2002. + + [129] G. Koloniari, Y. Petrakis, and E. Pitoura, Content-Based + Overlay Networks for XML Peers Based on Multi-Level Bloom + Filters, Proc. First Int'l Workshop on Databases, Information + Systems and Peer-to-Peer Computing DBISP2P, Sept 7-8 2003, pp. + 232-247. + + [130] G. Koloniari and E. Pitoura, Bloom-Based Filters for + Hierarchical Data, Proc. 5th Workshop on Distributed Data and + Structures (WDAS) (2003) + + [131] B. Bloom, Space/time trade-offs in hash coding with allowable + errors, Communications of the ACM 13 (7) (1970) 422-426. + + [132] M. Naor and U. Wieder, A Simple Fault Tolerant Distributed Hash + Table, Second Int'l Workshop on Peer-to-Peer Systems (IPTPS + 03), Berkeley, CA, USA, 20-21 February (2003) + + [133] P. Maymounkov and D. Mazieres, Rateless codes and big + downloads, Second Int'l Workshop on Peer-to-Peer Systems, + IPTPS'03, February 20-21 2003. + + [134] M. Krohn, M. Freedman, and D. Mazieres, On-the-fly verification + of rateless erasure codes for efficient content distribution, + Proc. IEEE Symp. on Security and Privacy, May 2004. + + [135] J. Byers, J. Considine, M. Mitzenmacher, and S. Rost, Informed + content delivery across adaptive overlay networks, Proc. 2002 + conference on applications, technologies, architectures and + protocols for computer communications 2002, pp. 47-60. + + [136] J. Plank, S. Atchley, Y. Ding, and M. Beck, Algorithms for High + Performance, Wide-Area Distributed File Downloads, Parallel + Processing Letters 13 (2) (2003) 207-223. + + [137] M. Castro, P. Rodrigues, and B. Liskov, BASE: Using + abstraction to improve fault tolerance, ACM Trans. on Computer + Systems 21 (3) (2003) 236-269. + + [138] R. Rodrigues, B. Liskov, and L. Shrira, The design of a robust + peer-to-peer system, 10th ACM SIGOPS European Workshop, Sep + 2002. + + + + + + +Risson & Moors Informational [Page 66] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [139] H. Weatherspoon, T. Moscovitz, and J. Kubiatowicz, + Introspective failure analysis: avoiding correlated failures in + peer-to-peer systems, Proc. Int'l Workshop on Reliable Peer- + to-Peer Distributed Systems, Oct 2002. + + [140] F. Dabek, R. Cox, F. Kaashoek, and R. Morris, Vivaldi: A + Decentralized Network Coordinate System, SIGCOMM'04, Aug 30- + Sept 3 2004. + + [141] E.-K. Lua, J. Crowcroft, and M. Pias, Highways: proximity + clustering for massively scaleable peer-to-peer network + routing, Proc. Fourth IEEE Int'l Conf. on Peer-to-Peer + Computing, August 25-27 2004. + + [142] F. Fessant, S. Handurukande, A.-M. Kermarrec, and L. Massoulie, + Clustering in Peer-to-Peer File Sharing Workloads, The 3rd + Int'l Workshop on Peer-to-Peer Systems, February 26-27 2004. + + [143] T. S. E. Ng and H. Zhang, Predicting internet network distance + with coordinates-based approaches, IEEE Infocom 2002, The 21st + Annual Joint Conf. of the IEEE Computer and Communication + Societies, June 23-27 2002. + + [144] K. Hildrum, R. Krauthgamer, and J. Kubiatowicz, Object Location + in Realistic Networks, Proc. Sixteenth ACM Symp. on Parallel + Algorithms and Architectures (SPAA 2004), June 2004, pp. 25-35. + + [145] P. Keleher, S. Bhattacharjee, and B. Silaghi, Are Virtualized + Overlay Networks Too Much of a Good Thing?, First Int'l + Workshop on Peer-to-Peer Systems IPTPS, March 2002. + + [146] A. Mislove and P. Druschel, Providing administrative control + and autonomy in structured peer-to-peer overlays, The 3rd Int'l + Workshop on Peer-to-Peer Systems, June 9-12 2004. + + [147] D. Karger and M. Ruhl, Diminished Chord: A Protocol for + Heterogeneous SubGroup Formation in Peer-to-Peer Networks, The + 3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27 + 2004. + + [148] B. Awerbuch and C. Scheideler, Consistent, order-preserving + data management in distributed storage systems, Proc. Sixteenth + ACM Symp. on Parallel Algorithms and Architectures SPAA 2004, + June 27-30 2004, pp. 44-53. + + [149] M. Freedman and D. Mazieres, Sloppy Hashing and Self-Organizing + Clusters, Proc. 2nd Int'l Workshop on Peer-to-Peer Systems + IPTPS + + + +Risson & Moors Informational [Page 67] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [150] F. Dabek, J. Li, E. Sit, J. Robertson, F. Kaashoek, and R. + Morris, Designing a DHT for low latency and high throughput, + Proc. First Symp. on Networked Systems Design and + Implementation (NSDI'04), San Francisco, California, March + 29-31 (2004) 85-98. + + [151] M. Ruhl, Efficient algorithms for new computational models, + Doctoral Dissertation, September 2003. + + [152] K. Sollins, Designing for scale and differentiation, Proc. ACM + SIGCOMM workshop on Future Directions in network architecture, + August 25-27 2003. + + [153] L. Massoulie, A. Kermarrec, and A. Ganesh, Network awareness + and failure resilience in self-organizing overlay networks, + Proc. 22nd Int'l Symp. on Reliable Distributed Systems, + SRDS'03, Oct 6-8 2003, pp. 47-55. + + [154] R. Cox, F. Dabek, F. Kaashoek, J. Li, and R. Morris, + Practical,distributed network coordinates, ACM SIGCOMM Computer + Communication Review 34 (1) (2004) 113-118. + + [155] K. Hildrum, J. Kubiatowicz, S. Rao, and B. Zhao, Distributed + object location in a dynamic network, Proc. 14th annual ACM + symposium on parallel algorithms and architectures 2002, pp. + 41- 52. + + [156] X. Zhang, Q. Zhang, G. Song, and W. Zhu, A Construction of + Locality-Aware Overlay Network: mOverlay and its Performance, + IEEE Journal on Selected Areas in Communications 22 (1) (2004) + 18-28. + + [157] N. Harvey, M. B. Jones, M. Theimer, and A. Wolman, Efficient + recovery from organization disconnects in Skipnet, Second Int'l + Workshop on Peer-to-Peer Systems IPTPS'03, Feb 20-21 2003. + + [158] M. Pias, J. Crowcroft, S. Wilbur, T. Harris, and S. Bhatti, + Lighthouses for scalable distributed location, Second Int'l + Workshop on Peer-to-Peer Systems IPTPS'03, February 20-21 2003. + + [159] K. Gummadi, S. Saroui, S. Gribble, and D. King, Estimating + latency between arbitrary internet end hosts, Proc. SIGCOMM + IMW 2002, November 2002. + + [160] Y. Liu, X. Liu, L. Xiao, L. Ni, and X. Zhang, Location-aware + topology matching in P2P systems, Proc. IEEE Infocomm, Mar + 7-11 2004. + + + + +Risson & Moors Informational [Page 68] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [161] G. S. Manku, Balanced binary trees for ID management and load + balance in distributed hash tables, Proc. 23rd Annual ACM + SIGACT-SIGOPS Symp. on Principles of Distributed Computing, + PODC 2004, July 25-28 2004, pp. 197-205. + + [162] J. Gao and P. Steenkiste, Design and Evaluation of a + Distributed Scalable Content Delivery System, IEEE Journal on + Selected Areas in Communications 22 (1) (2004) 54-66. + + [163] X. Wang, Y. Zhang, X. Li, and D. Loguinov, On zone-balancing of + peer-to-peer networks: analysis of random node join, Proc. + joint international conference on measurement and modeling of + computer systems, June 2004. + + [164] D. Karger and M. Ruhl, Simple efficient load balancing + algorithms for peer-to-peer systems, Proc. Sixteenth ACM Symp. + on Parallel Algorithms and Architectures SPAA 2004, June 27-30 + 2004. + + [165] D. Karger and M. Ruhl, Simple efficient load balancing + algorithms for peer-to-peer systems, The 3rd Int'l Workshop on + Peer-to-Peer Systems, February 26-27 2004. + + [166] M. Adler, E. Halperin, R. Karp, and V. Vazirani, A stochastic + process on the hypercube with applications to peer-to-peer + networks, Proc. 35th ACM symposium on Theory of Computing 2003, + pp. 575-584. + + [167] C. Baquero and N. Lopes, Towards peer to peer content indexing, + ACM SIGOPS Operating Systems Review 37 (4) (2003) 90-96. + + [168] A. Rao, K. Lakshminarayanan, S. Surana, R. Karp, and I. Stoica, + Load balancing in structured P2P systems, Proc. 2nd Int'l + Workshop on Peer-to-Peer Systems, IPTPS'03, February 20-21 + 2003. + + [169] J. Byers, J. Considine, and M. Mitzenmacher, Simple Load + Balancing for Distributed Hash Tables, Second Int'l Workshop on + Peer-to-Peer Systems IPTPS 03, 20-21 February 2003. + + [170] P. Castro, J. Lee, and A. Misra, CLASH: A Protocol for + Internet- Scale Utility-Oriented Distributed Computing, Proc. + 24th Int'l Conf. on Distributed Computing Systems ICDCS 2004, + March 23-26 2004. + + [171] A. Stavrou, D. Rubenstein, and S. Sahu, A Lightwight, Robust + P2P System to Handle Flash Crowds, IEEE Journal on Selected + Areas in Communications 22 (1) (2004) 6-17. + + + +Risson & Moors Informational [Page 69] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [172] A. Selcuk, E. Uzun, and M. R. Pariente, A reputation-based + trust management system for P2P networks, Fourth Int'l Workshop + on Global and Peer-to-Peer Computing, April 20-21 2004. + + [173] T. Papaioannou and G. Stamoulis, Effective use of reputation in + peer-to-peer environments, Fourth Int'l Workshop on Global and + Peer-to-Peer Computing, April 20-21 2004. + + [174] M. Blaze, J. Feigenbaum, and J. Lacy, Trust and Reputation in + P2P networks, + http://www.neurogrid.net/twiki/bin/view/Main/ReputationAndTrust + (2003) + + [175] E. Damiani, D. C. di Vimercati, S. Paraboschi, P. Samarati, and + F. Violante, A reputation-based approach for choosing reliable + resources in peer to peer networks, Proc. 9th conference on + computer and communications security 2002, pp. 207-216. + + [176] S. Marti, P. Ganesan, and H. Garcia-Molina, DHT routing using + social links, The 3rd Int'l Workshop on Peer-to-Peer Systems, + February 26-27 2004. + + [177] G. Caronni and M. Waldvogel, Establishing trust in distributed + storage providers, Proc. Third Int'l IEEE Conf. on Peer-to-Peer + Computing, 1-3 Sept 2003, pp. 128-133. + + [178] B. Sieka, A. Kshemkalyani, and M. Singhal, On the security of + polling protocols in peer-to-peer systems, Proc. Fourth IEEE + Int'l Conf. on Peer-to-Peer Computing, 25-27 August 2004. + + [179] M. Feldman, K. Lai, I. Stoica, and J. Chuang, Robust Incentive + Techniques for Peer-to-Peer Networks, ACM E-Commerce Conf. + EC'04, May 2004. + + [180] K. Anagnostakis and M. Greenwald, Exchange-based Incentive + Mechanism for Peer-to-Peer File Sharing, Proc. 24th Int'l Conf. + on Distributed Computing Systems ICDCS 2004, March 23-26 2004. + + [181] J. Schneidman and D. Parkes, Rationality and self-Interest in + peer to peer networks, Second Int'l Workshop on Peer-to-Peer + Systems IPTPS'03, February 20-21 2003. + + [182] C. Buragohain, D. Agrawal, and S. Subhash, A game theoretic + framework for incentives in P2P systems, Proc. Third Int'l IEEE + Conf. on Peer-to-Peer Computing, 1-3 Sept 2003, pp. 48-56. + + + + + + +Risson & Moors Informational [Page 70] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [183] W. Josephson, E. Sirer, and F. Schneider, Peer-to-Peer + Authentication with a Distributed Single Sign-On Service, The + 3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27 + 2004. + + [184] A. Fiat and J. Saia, Censorship resistant peer to peer content + addressable networks, Proc. 13th annual ACM-SIAM symposium on + discrete algorithms 2002, pp. 94-103. + + [185] N. Daswani and H. Garcia-Molina, Query-flood DoS attacks in + gnutella, Proc. 9th ACM Conf. on Computer and Communications + Security 2002, pp. 181-192. + + [186] A. Singh and L. Liu, TrustMe: anonymous management of trust + relationships in decentralized P2P systems, Proc. Third Int'l + IEEE Conf. on Peer-to-Peer Computing, Sept 1-3 2003. + + [187] A. Serjantov, Anonymizing censorship resistant systems, Proc. + Second Int'l Conf. on Peer to Peer Computing, March 2002. + + [188] S. Hazel and B. Wiley, Achord: A Variant of the Chord Lookup + Service for Use in Censorship Resistant Peer-to-Peer Publishing + Systems, Proc. Second Int'l Conf. on Peer to Peer Computing, + March 2002. + + [189] M. Freedman and R. Morris, Tarzan: a peer-to-peer anonymizing + network layer, Proc. 9th ACM Conf. on Computer and + Communications Security (2002) 193-206. + + [190] M. Feldman, C. Papadimitriou, J. Chuang, and I. Stoica, Free- + Riding and Whitewashing in Peer-to-Peer Systems, 3rd Annual + Workshop on Economics and Information Security WEIS04, May + 2004. + + [191] L. Ramaswamy and L. Liu, FreeRiding: a new challenge for peer- + to-peer file sharing systems, Proc. 2003 Hawaii Int'l Conf. on + System Sciences, P2P Track, HICSS2003, January 6-9 2003. + + [192] T.-W. Ngan, D. Wallach, and P. Druschel, Enforcing fair sharing + of peer-to-peer resources, Second Int'l Workshop on Peer-to- + Peer Systems, IPTPS'03, 20-21 February 2003. + + [193] L. Cox and B. D. Noble, Samsara: honor among thieves in peer- + to-peer storage, Proc. nineteenth ACM symposium on Operating + System Principles 2003, pp. 120-132. + + + + + + +Risson & Moors Informational [Page 71] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [194] M. Surridge and C. Upstill, Grid security: lessons for peer-to- + peer systems, Proc. Third Int'l IEEE Conf. on Peer-to-Peer + Computing, Sept 1-3 2003, pp. 2-6. + + [195] E. Sit and R. Morris, Security considerations for peer-to-peer + distributed hash tables, First Int'l Workshop on Peer-to-Peer + Systems, March 2002. + + [196] C. O'Donnel and V. Vaikuntanathan, Information leak in the + Chord lookup protocol, Proc. Fourth IEEE Int'l Conf. on Peer- + to-Peer Computing, 25-27 August 2004. + + [197] K. Berket, A. Essiari, and A. Muratas, PKI-Based Security for + Peer-to-Peer Information Sharing, Proc. Fourth IEEE Int'l Conf. + on Peer-to-Peer Computing, 25-27 August 2004. + + [198] B. Karp, S. Ratnasamy, S. Rhea, and S. Shenker, Spurring + adoption of DHTs with OpenHash, a public DHT service, The 3rd + Int'l Workshop on Peer-to-Peer Systems, February 26-27 2004. + + [199] J. Considine, M. Walfish, and D. G. Andersen, A pragmatic + approach to DHT adoption, Technical Report,, December 2003. + + [200] G. Li, Peer to Peer Networks in Action, IEEE Internet Computing + 6 (1) (2002) 37-39. + + [201] A. Mislove, A. Post, C. Reis, P. Willmann, P. Druschel, D. + Wallach, X. Bonnaire, P. Sens, J.-M. Busca, and L. Arantes- + Bezerra, POST: A Secure, Resilient, Cooperative Messaging + System, 9th Workshop on Hot Topics in Operating Systems, HotOS, + May 2003. + + [202] S. Saroiu, P. Gummadi, and S. Gribble, A measurement study of + peer-to-peer file sharing systems, Proc. Multimedia Computing + and Networking 2002 MMCN'02, January 2002. + + [203] A. Muthitacharoen, R. Morris, T. Gil, and B. Chen, Ivy: a + read/write peer-to-peer file system, ACM SIGOPS Operating + Systems Review, Special issue on Decentralized storage systems, + December 2002, pp. 31-44. + + [204] A. Muthitacharoen, R. Morris, T. Gil, and B. Chen, A read/write + peer-to-peer file system, Proc. 5th Symp. on Operating System + Design and Implementation (OSDI 2002), Boston, MA, December + (2002) + + + + + + +Risson & Moors Informational [Page 72] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [205] F. Annexstein, K. Berman, M. Jovanovic, and K. Ponnavaikko, + Indexing techniques for file sharing in scalable peer to peer + networks, 11th IEEE Int'l Conf. on Computer Communications and + Networks (2002) 10-15. + + [206] G. Kan and Y. Faybishenko, Introduction to Gnougat, First Int'l + Conf. on Peer-to-Peer Computing 2001 2001, pp. 4-12. + + [207] R. Gold and D. Tidhar, Towards a content-based aggregation + network, Proc. First Int'l Conf. on Peer to Peer Compuuting + 2001, pp. 62-68. + + [208] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica, + Wide-area cooperative storage with CFS, Proc. 18th ACM + symposium on Operating System Principles 2001, pp. 202-215. + + [209] M. Freedman, E. Freudenthal, and D. Mazieres, Democratizing + content publication with coral, Proc. First Symp. on Networked + Systems Design and Implementation NSDI'04, March 29-31 2004, + pp. 239-252. + + [210] J. Li, B. T. Loo, J. Hellerstein, F. Kaashoek, D. Karger, and + R. Morris, On the Feasibility of Peer-to-Peer Web Indexing and + Search, Second Int'l Workshop on Peer-to-Peer Systems IPTPS 03, + 20-21 February 2003. + + [211] S. Iyer, A. Rowstron, and P. Druschel, Squirrel: a + decentralized peer-to-peer web cache, Proc. 21st annual + symposium on principles of distributed computing 2002, pp. + 213-222. + + [212] M. Bawa, R. Bayardo, S. Rajagopalan, and E. Shekita, Make it + fresh, make it quick: searching a network of personal + webservers, Proc. 12th international conference on World Wide + Web 2003, pp. 577-586. + + [213] B. T. Loo, S. Krishnamurthy, and O. Cooper, Distributed web + crawling over DHTs, Technical Report, CSD-04-1305, February 9 + 2004. + + [214] M. Junginger and Y. Lee, A self-organizing publish/subscribe + middleware for dynamic peer-to-peer networks, IEEE Network 18 + (1) (2004) 38-43. + + + + + + + + +Risson & Moors Informational [Page 73] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [215] F. Cuenca-Acuna, C. Peery, R. Martin, and T. Nguyen, PlanetP: + Using Gossiping to Build Content Addressable Peer-to-Peer + Information Sharing Communities, Proc. 12th international + symposium on High Performance Distributed Computing (HPDC), + June 2002. + + [216] M. Walfish, H. Balakrishnan, and S. Shenker, Untangling the web + from DNS, Proc. First Symp. on Networked Systems Design and + Implementation NSDI'04, March 29-31 2004, pp. 225-238. + + [217] B. Awerbuch and C. Scheideler, Robust distributed name service, + The 3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27 + 2004. + + [218] A. Iamnitchi, Resource Discovery in Large Resource-Sharing + Environments, Doctoral Dissertation 2003. + + [219] R. Cox, A. Muthitacharoen, and R. Morris, Serving DNS using a + Peer-to-Peer Lookup Service, First Int'l Workshop on Peer-to- + Peer Systems (IPTPS), March 2002. + + [220] A. Chander, S. Dawson, P. Lincoln, and D. Stringer-Calvert, + NEVRLATE: scalable resource discovery, Second IEEE/ACM Int'l + Symp. on Cluster Computing and the Grid CCGRID2002 2002, pp. + 56-65. + + [221] M. Balazinska, H. Balakrishnan, and D. Karger, INS/Twine: A + scalable Peer-to-Peer architecture for Intentional Resource + Discovery, Proc. First Int'l Conf. on Pervasive Computing + (IEEE) (2002) + + [222] J. Kangasharju, K. Ross, and D. Turner, Secure and resilient + peer-to-peer E-mail: design and implementation, Proc. Third + Int'l IEEE Conf. on Peer-to-Peer Computing, 1-3 Sept 2003. + + [223] V. Lo, D. Zappala, D. Zhou, Y. Liu, and S. Zhao, Cluster + computing on the fly: P2P scheduling of idle cycles in the + internet, The 3rd Int'l Workshop on Peer-to-Peer Systems, + February 26-27 2004. + + [224] A. Iamnitchi, I. Foster, and D. Nurmi, A peer-to-peer approach + to resource discovery in grid environments, IEEE High + Performance Distributed Computing 2002. + + [225] I. Foster and A. Iamnitchi, On Death, Taxes and the Convergence + of Peer-to-Peer and Grid Computing, Second Int'l Workshop on + Peer-to-Peer Systems IPTPS 03, 20-21 February 2003. + + + + +Risson & Moors Informational [Page 74] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [226] W. Hoschek, Peer-to-Peer Grid Databases for Web Service + Discovery, Concurrency - Practice and Experience (2002) 1-7. + + [227] K. Aberer, A. Datta, and M. Hauswirth, A decentralized public + key infrastructure for customer-to-customer e-commerce, Int'l + Journal of Business Process Integration and Management (2004) + + [228] S. Ajmani, D. Clarke, C.-H. Moh, and S. Richman, ConChord: + Cooperative SDSI Certificate Storage and Name Resolution, First + Int'l Workshop on Peer-to-Peer Systems IPTPS, March 2002. + + [229] E. Sit, F. Dabek, and J. Robertson, UsenetDHT: a low overhead + Usenet server, The 3rd Int'l Workshop on Peer-to-Peer Systems, + February 26-27 2004. + + [230] H.-Y. Hsieh and R. Sivakumar, On transport layer support for + peer-to-peer networks, The 3rd Int'l Workshop on Peer-to-Peer + Systems, February 26-27 2004. + + [231] I. Stoica, D. Adkins, S. Zhuang, S. Shenker, and S. Surana, + Internet indirection infrastructure, Proc. 2002 conference on + applications, technologies, architectures and protocols for + computer communications, August 19-23 2002, pp. 73-86. + + [232] E. Halepovic and R. Deters, Building a P2P forum system with + JXTA, Proc. Second IEEE Int'l Conf. on Peer to Peer Computing + P2P'02, September 5-7 2002. + + [233] M. Wawrzoniak, L. Peterson, and T. Roscoe, Sophia: an + Information Plane for networked systems, ACM SIGCOMM Computer + Communication Review 34 (1) (2004) 15-20. + + [234] D. Tran, K. Hua, and T. Do, A Peer-to-Peer Architecture for + Media Streaming, IEEE Journal on Selected Areas in + Communications 22 (1) (2004) 121-133. + + [235] V. Padmanabhan, H. Wang, and P. Chou, Supporting heterogeneity + and congestion control in peer-to-peer multicast streaming, The + 3rd Int'l Workshop on Peer-to-Peer Systems, February 26-27 + 2004. + + [236] A. Nicolosi and D. Mazieres, Secure acknowledgment of multicast + messages in open peer-to-peer networks, The 3rd Int'l Workshop + on Peer-to-Peer Systems, February 26-27 2004. + + + + + + + +Risson & Moors Informational [Page 75] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [237] R. Zhang and C. Hu, Borg: a hybrid protocol for scalable + application-level multicast in peer-to-peer networks, Proc. + 13th international workshop on network and operating systems + for digital audio and video 2003, pp. 172-179. + + [238] M. Sasabe, N. Wakamiya, M. Murata, and H. Miyahara, Scalable + and continuous media streaming on peer-to-peer networks, Proc. + Third Int'l IEEE Conf. on Peer-to-Peer Computing, Sept 1-3 + 2003, pp. 92-99. + + [239] M. Hefeeda, A. Habib, B. Botev, D. Xu, and B. Bhargava, + PROMISE: peer-to-peer media streaming using CollectCast, Proc. + eleventh ACM international conference on multimedia 2003, pp. + 45-54. + + [240] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, + and A. Singh, SplitStream: high-bandwidth multicast in + cooperative environments, Proc. 19th ACM symposium on operating + systems principles 2003, pp. 298-313. + + [241] M. Castro, P. Druschel, A.-M. Kermarrec, and A. Rowstron, + SCRIBE: a large-scale and decentralized application-level + multicast infrastructure, IEEE Journal on Selected Areas in + Communications 20 (8) (2002) + + [242] S. Zhuang, B. Zhao, A. Joseph, R. Katz, and J. Kubiatowicz, + Bayeux: an architecture for scalable and fault-tolerant wide- + area data dissemination, Proc. 11th ACM international workshop + on network and operating systems support for digital audio and + video, Jan 2001. + + [243] R. Lienhart, M. Holliman, Y.-K. Chen, I. Kozintsev, and M. + Yeung, Improving media services on P2P networks, IEEE Internet + Computing 6 (1) (2002) 58-67. + + [244] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. + Yin, and F. Yu, Data Centric Storage in Sensornets with GHT, a + geographic hash table, Mobile Networks and Applications 8 (4) + (2003) 427-442. + + [245] M. Demirbas and H. Ferhatosmanoglu, Peer-to-peer spatial + queries in sensor networks, Proc. Third Int'l IEEE Conf. on + Peer-to-Peer Computing, 1-3 Sept 2003, pp. 32-39. + + [246] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, + and S. Shenker, GHT: a geographic hash table for data-centric + storage, Proc. First ACM Int'l Workshop on Wireless Sensor + Networks and Applications (Mobicom) 2002, pp. 78-87. + + + +Risson & Moors Informational [Page 76] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [247] J. Hellerstein and W. Wang, Optimization of In-Network Data + Reduction, Proc. First Workshop on Data Management for Sensor + Networks DMSN 2004, August 30th 2004. + + [248] J. Li, J. Stribling, T. Gil, R. Morris, and F. Kaashoek, + Comparing the performance of distributed hash tables under + churn, The 3rd Int'l Workshop on Peer-to-Peer Systems, February + 26-27 2004. + + [249] S. Shenker, The data-centric revolution in networking, Keynote + Speech, 29th Int'l Conf. on Very Large Data Bases, September + 9-12 2003. + + [250] S. Gribble, A. Halevy, Z. Ives, M. Rodrig, and D. Suciu, What + can databases do for P2P?, Proc. Fourth Int'l Workshop on + Databases and the Web, WebDB2001, May 24-25 2001. + + [251] D. Clark, The design philosophy of the DARPA internet + protocols, ACM SIGCOMM Computer Communication Review, Symp. + proceedings on communications architectures and protocols 18 + (4) (1988) + + [252] J.-C. Laprie, Dependable Computing and Fault Tolerance: + Concepts and Terminology, Twenty-Fifth Int'l Symp. on Fault- + Tolerant Computing, Highlights from Twenty-Five Years 1995, pp. + 2-13. + + [253] D. Clark, J. Wroclawski, K. Sollins, and R. Braden, Tussle in + cyberspace: defining tomorrow's internet, Conf. on + Applications, Technologies, Architectures and Protocols for + Computer Communications 2002, pp. 347-356. + + [254] L. O. Alima, A. Ghodsi, and S. Haridi, "A framework for + structured peer-to-peer overlay networks," in Global computing, + vol. 3267, Lecture Notes in Computer Science: Springer Berlin / + Heidelberg, 2005, pp. 223-249. + + [255] Clip2, The Gnutella Protocol Specification, + http://www.clip2.com (2000) + + [256] Napster, http://www.napster.com (1999) + + [257] J. Mishchke and B. Stiller, A methodology for the design of + distributed search in P2P middleware, IEEE Network 18 (1) + (2004) 30-37. + + + + + + +Risson & Moors Informational [Page 77] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [258] J. Li and K. Sollins, Implementing aggregation and broadcast + over distributed hash tables. Full report, + http://krs.lcs.mit.edu/regions/docs.html (November) (2003) + + [259] M. Castro, M. Costa, and A. Rowstron, Should we build Gnutella + on a structured overlay?, ACM SIGCOMM Computer Communication + Review 34 (1) (2004) 131-136. + + [260] A. Singla and C. Rohrs, Ultrapeers: Another Step Towards + Gnutella Scalability, + http://groups.yahoo.com/group/the_gdf/files/Proposals/ + Working%20Proposals/Ultrapeer/ Version 1.0, 26 November (2002) + + [261] B. Cooper and H. Garcia-Molina, Ad hoc, Self-Supervising Peer- + to-Peer Search Networks, Technical Report, + http://www.cc.gatech.edu/~cooperb/odin/ 2003. + + [262] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information + Retrieval. Addison Wesley, Essex, England, 1999. + + [263] S. Sen and J. Wang, Analyzing peer-to-peer traffic across large + networks, IEEE/ACM Trans. on Networking 12 (2) (2004) 219-232. + + [264] H. Balakrishnan, S. Shenker, and M. Walfish, Semantic-Free + Referencing in Linked Distributed Systems, Second Int'l + Workshop on Peer-to-Peer Systems IPTPS 03, 20-21 February 2003. + + [265] B. Yang, P. Vinograd, and H. Garcia-Molina, Evaluating GUESS + and non-forwarding peer-to-peer search, The 24th Int'l Conf. on + Distributed Computing Systems ICDCS'04, Mar 23-26 2004. + + [266] A. Gupta, B. Liskov, and R. Rodrigues, One Hop Lookups for + Peer-to-Peer Overlays, 9th Workshop on Hot Topics in Operating + Systems (HotOS), 18-21 May 2003. + + [267] A. Gupta, B. Liskov, and R. Rodrigues, Efficient routing for + peer-to-peer overlays, First symp. on Networked Systems Design + and Implementation (NSDI), Mar 29-31 2004, pp. 113-126. + + [268] A. Mizrak, Y. Cheng, V. Kumar, and S. Savage, Structured + superpeers: leveraging heterogeneity to provide constant-time + lookup, IEEE Workshop on Internet Applications, June 23-24 + 2003. + + [269] L. Adamic, R. Lukose, A. Puniyani, and B. Huberman, Search in + power-law networks, Physical review E, The American Physical + Society 64 (046135) (2001) + + + + +Risson & Moors Informational [Page 78] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [270] F. Banaei-Kashani and C. Shahabi, Criticality-based analysis + and design of unstructured peer-to-peer networks as "complex + systems", Proc. 3rd IEEE/ACM Int'l Symp. on Cluster Computing + and the Grid 2003, pp. 351-358. + + [271] KaZaa, KaZaa Media Desktop, www.kazaa.com (2001) + + [272] S. Sen and J. Wang, Analyzing peer-to-peer traffic across large + networks, Proc. second ACM SIGCOMM workshop on Internet + measurement, November 06-08 2002, pp. 137-150. + + [273] DirectConnect, http:www.neo-modus.com (2001) + + [274] S. Saroiu, K. Gummadi, R. Dunn, S. Gribble, and H. Levy, An + analysis of Internet content delivery systems, ACM SIGOPS + Operating Systems Review 36 (2002) 315-327. + + [275] A. Loo, The Future or Peer-to-Peer Computing, Communications of + the ACM 46 (9) (2003) 56-61. + + [276] B. Yang and H. Garcia-Molina, Comparing Hybrid Peer-to-Peer + Systems (extended), 27th Int'l Conf. on Very Large Data Bases, + September 11-14 2001. + + [277] D. Scholl, OpenNap Home Page, http://opennap.sourceforge.net/ + (2001) + + [278] S. Ghemawat, H. Gobioff, and S.-T. Leung, The Google file + system, Proc. 19th ACM symposium on operating systems + principles 2003, pp. 29-43. + + [279] I. Clarke, S. Miller, T. Hong, O. Sandberg, and B. Wiley, + Protecting Free Expression Online with Freenet, IEEE Internet + Computing 6 (1) (2002) + + [280] J. Mache, M. Gilbert, J. Guchereau, J. Lesh, F. Ramli, and M. + Wilkinson, Request algorithms in Freenet-style peer-to-peer + systems, Proc. Second IEEE Int'l Conf. on Peer to Peer + Computing P2P'02, September 5-7 2002. + + [281] C. Rohrs, Query Routing for the Gnutella Networks, + http://www.limewire.com/developer/query_routing/ + keyword%20routing.htm Version 1.0 (2002) + + + + + + + + +Risson & Moors Informational [Page 79] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [282] I. Clarke, Freenet's Next Generation Routing Protocol, + http://freenetproject.org/index.php?page=ngrouting, 20th July + 2003. + + [283] A. Z. Kronfol, FASD: A fault-tolerant, adaptive scalable + distributed search engine, Master's Thesis + http://www.cs.princeton.edu/~akronfol/fasd/ 2002. + + [284] S. Gribble, E. Brewer, J. M. Hellerstein, and D. Culler, + Scalable, Distributed Data Structures for Internet Service + Construction, Proc. 4th Symp. on Operating Systems Design and + Implementation OSDI 2000, October 2000. + + [285] K. Aberer, Efficient Search in Unbalanced, Randomized Peer-to- + Peer Search Trees, EPFL Technical Report IC/2002/79 (2002) + + [286] R. Honicky and E. Miller, A fast algorithm for online placement + and reorganization of replicated data, Proc. 17th Int'l + Parallel and Distributed Processing Symp., April 2003. + + [287] G. S. Manku, Routing networks for distributed hash tables, + Proc. 22nd annual ACM Symp. on Principles of Distributed + Computing, PODC 2003, July 13-16 2003, pp. 133-142. + + [288] D. Lewin, Consistent hashing and random trees: algorithms for + caching in distributed networks, Master's Thesis, Department of + Electrical Engineering and Computer Science, Massachusetts + Institute of Technology (1998) + + [289] S. Lei and A. Grama, Extended consistent hashing: a framework + for distributed servers, Proc. 24th Int'l Conf. on Distributed + Computing Systems ICDCS 2004, March 23-26 2004. + + [290] W. Litwin, Re: Chord & LH*, Email to Ion Stoica, March 23 + 2004a. + + [291] J. Li, J. Stribling, R. Morris, F. Kaashoek, and T. Gil, A + performance vs. cost framework for evaluating DHT design + tradeoffs under churn, Proc. IEEE Infocom, Mar 13-17 2005. + + [292] S. Zhuang, D. Geels, I. Stoica, and R. Katz, On failure + detection algorithms in overlay networks, Proc. IEEE Infocomm, + Mar 13-17 2005. + + [293] X. Li, J. Misra, and C. G. Plaxton, Active and Concurrent + Topology Maintenance, The 18th Annual Conf. on Distributed + Computing (DISC 2004), Trippenhuis, Amsterdam, the Netherlands, + October 4 - October 7 (2004) + + + +Risson & Moors Informational [Page 80] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [294] K. Aberer, L. O. Alima, A. Ghodsi, S. Girdzijauskas, M. + Hauswirth, and S. Haridi, The essence of P2P: a reference + architecture for overlay networks, Proc. of the 5th + international conference on peer-to-peer computing, Aug 31-Sep + 2 2005. + + [295] C. Tang, M. Buco, R. Chang, S. Dwarkadas, L. Luan, E. So, and + C. Ward, Low traffic overlay networks with large routing + tables, Proc. of ACM Sigmetrics Int'l Conf. on Measurement and + Modeling of Comp. Sys., Jun 6-10 2005, pp. 14-25. + + [296] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, Handling + churn in a DHT, Proc. of the USENIX Annual Technical + Conference, June 2004. + + [297] C. Blake and R. Rodrigues, High Availability, Scalable Storage, + Dynamic Peer Networks: Pick Two, 9th Workshop on Hot Topics in + Operating Systems (HotOS), Lihue, Hawaii, 18-21 May (2003) + + [298] S. Rhea, B. Godfrey, B. Karp, J. Kubiatowicz, S. Ratnasamy, S. + Shenker, I. Stoica, and H. Yu, OpenDHT: a public DHT service + and its uses, Proc. of the conf. on Applications, technologies, + architectures and protocols for computer communications, Aug + 22-26 2005, pp. 73-84. + + [299] T. Gil, F. Kaashoek, J. Li, R. Morris, and J. Stribling, + p2psim, a simulator for peer-to-peer protocols, + http://www.pdos.lcs.mit.edu/p2psim/ (2003) + + [300] K. Hildrum, J. D. Kubiatowicz, S. Rao, and B. Y. Zhao, + Distributed object location in a dynamic network, Theory of + Computing Systems (2004) + + [301] N. Lynch, D. Malkhi, and D. Ratajczak, Atomic data access in + distributed hash tables, Proc. Int'l Peer-to-Peer Symp., March + 7-8 2002. + + [302] S. Gilbert, N. Lynch, and A. Shvartsman, RAMBO II: Rapidly + Reconfigurable Atomic Memory for Dynamic Networks, Technical + Report, MIT-CSAIL-TR-890 2004. + + [303] N. Lynch and I. Stoica, MultiChord: A resilient namespace + management algorithm, Technical Memo MIT-LCS-TR-936 2004. + + [304] J. Risson, K. Robinson, and T. Moors, Fault tolerant active + rings for structured peer-to-peer overlays, Proc. of the 30th + Annual IEEE Conf. on Local Computer Networks, Nov 15-17 2005, + pp. 18-25. + + + +Risson & Moors Informational [Page 81] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [305] B. Awerbuch and C. Scheideler, Peer-to-peer systems for prefix + search, Proc. 22nd annual ACM Symp. on Principles of + Distributed Computing 2003, pp. 123-132. + + [306] F. Dabek, B. Zhao, P. Druschel, J. Kubiatowicz, and I. Stoica, + Towards a common API for structured P2P overlays, Proc. Second + Int'l Workshop on Peer to Peer Systems IPTPS 2003, February + 2003. + + [307] N. Feamster and H. Balakrishnan, Towards a logic for wide-area + Internet routing, Proc. ACM SIGCOMM workshop on Future + Directions in Network Architecture, August 25-27 2003, pp. + 289-300. + + [308] B. Ahlgren, M. Brunner, L. Eggert, R. Hancock, and S. Schmid, + Invariants: a new design methodology for network architectures, + Proc. ACM SIGCOMM workshop on Future Direction in Network + Architecture, August 30 2004, pp. 65-70. + + [309] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction + to Algorithms, 2nd Edition. MIT Press, McGraw-Hill, Cambridge, + London, England, 2003. + + [310] I. Abraham, D. Malkhi, and O. Dubzinski, LAND:Stretch + (1+epsilon) Locality Aware Networks for DHTs, Proc. ACM-SIAM + Symp. on Discrete Algorithms SODA-04 2004. + + [311] S. Jain, R. Mahajan, and D. Wetherall, A study of the + performance potential of DHT-based overlays, Proc. of the 4th + Usenix symposium on internet technologies and systems (USITS), + Mar 2003. + + [312] J. Risson, A. Harwood, and T. Moors, Stable high-capacity one- + hop distributed hash tables, Proc. of the IEEE Symposium on + Computers and Communications (ISCC'06), Jun 26-29 2006. + + [313] V. Ramasubramanian and E. Sirer, Beehive: O(1) Lookup + Performance for Power-Law Query Distributions in Peer-to-Peer + Overlays, Proc. First Symp. on Networked Systems Design and + Implementation (NSDI'04), San Francisco, California, March + 29-31 (2004) 99-112. + + [314] I. Abraham, A. Badola, D. Bickson, D. Malkhi, S. Maloo, and S. + Ron, Practical locality-awareness for large scale information + sharing, Proc. 4th International Workshop on Peer-to-Peer + Systems, Feb 24-25 2005. + + + + + +Risson & Moors Informational [Page 82] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [315] B. Leong, B. Liskov, and E. Demaine, Epichord: parallelizing + the Chord lookup algorithm with reactive routing state + management, Proc. of the 12th International Conference on + Networks, Nov 2004. + + [316] J. Li, J. Stribling, R. Morris, and F. Kaashoek, Bandwidth- + efficient management of DHT routing tables, Proc. 2nd Symposium + on Networked Systems Design and Implementation, May 2-4 2005. + + [317] S. Rhea, B.-G. Chun, J. Kubiatowicz, and S. Shenker, Fixing the + embarrassing slowness of OpenDHT on PlanetLab, Proc. of the + Second USENIX Workshop on Real, Large Distributed Systems, Dec + 13 2005. + + [318] M. Costa, M. Castro, A. Rowstron, and P. Key, PIC: Practical + Internet coordinates for distance estimation, Proc. of the 24th + international conference on distributed computing systems, Mar + 2004. + + [319] M. Castro, M. B. Jones, A.-M. Kermarrec, A. Rowstron, M. + Theimer, H. Wang, and A. Wolman, An evaluation of scalable + application- level multicast built using peer-to-peer overlays, + Proc. of the 22nd Annual Joint Conf. of the IEEE Comp. and + Comm. Soc. (INFOCOM), 30 Mar - 3 Apr 2003, pp. 1510-1520. + + [320] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker, + Application-level multicast using content-addressable networks, + Proc. of the Third International Workshop on Networked Group + Communication, Nov 7-9 2001. + + [321] S. El-Ansary, L. Alima, P. Brand, and S. Haridi, Efficient + broadcast in structured P2P networks, Second Int'l Workshop on + Peer-to-Peer Systems (IPTPS 03), Berkeley, CA, USA, 20-21 + February (2003) + + [322] J. Li, K. Sollins, and D.-Y. Lim, Implementing aggregation and + broadcast over Distributed Hash Tables, ACM Computer + Communication Reviews 35 (1) (2005) 81-92. + + [323] V. Pai, K. Tamilmani, V. Sambamurthy, K. Kumar, and A. Mohr, + Chainsaw: eliminating trees from overlay multicast, Proc. 4th + Int'l Workshop on Peer-to-Peer Systems, February 24-25 2005. + + [324] K. Birman, M. Hayden, O. Ozkasap, Z. Xiao, and M. Budiu, + Bimodal Multicast, ACM Trans. on Computer Systems 17 (2) (1999) + 41-88. + + + + + +Risson & Moors Informational [Page 83] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [325] Z. Zhang, S. Chen, Y. Ling, and R. Chow, Resilient capacity- + aware multicasting based on overlay networks, Proc. of the 25th + IEEE Int'l Conf. on Distributed Computing Systems, 6-10 June + 2005, pp. 565-574. + + [326] A. Bharambe, S. Rao, V. Padmanabhan, S. Seshan, and H. Zhang, + The impact of heterogeneous bandwidth constraints on DHT-based + multicast protocols, Proc. 4th Int'l Workshop on Peer-to-Peer + Systems, February 24-25 2005. + + [327] A. Ghodsi, L. O. Alima, S. El-Ansary, P. Brand, and S. Haridi, + Self-correcting broadcast in distributed hash tables, Proc. of + the 15th IASTED International Conf. on Parallel and Distributed + Computing and Systems, Nov 2003. + + [328] R. Mahajan, M. Castro, and A. Rowstron, Controlling the cost of + reliability in peer-to-peer overlays, Second Int'l Workshop on + Peer-to-Peer Systems IPTPS'03, February 20-21 2003. + + [329] S. Rhea, D. Geels, T. Roscoe, and J. Kubiatowicz, Handling + churn in a DHT, Report No. UCB/CSD-03-1299, University of + California, also Proc. USENIX Annual Technical Conference, June + 2003. + + [330] M. Castro, M. Costa, and A. Rowstron, Performance and + dependability of structured peer-to-peer overlays, Microsoft + Research Technical Report MSR-TR-2003-94, December. Also 2004 + Int'l Conf. on Dependable Systems and Networks, June 28-July 1 + 2003. + + [331] D. Liben-Nowell, H. Balakrishnan, and D. Karger, Analysis of + the evolution of peer-to-peer systems, Annual ACM Symp. on + Principles of Distributed Computing 2002, pp. 233-242. + + [332] L. Alima, S. El-Ansary, P. Brand, and S. Haridi, DKS(N,k,f): a + family of low communication, scalable and fault-tolerant + infrastructures for P2P applications, Proc. 3rd IEEE/ACM Int'l + Symp. on Cluster Computing and the Grid (2003) 344-350. + + [333] D. Karger and M. Ruhl, Finding nearest neighbours in growth- + restricted metrics, Proc. 34th annual ACM symposium on Theory + of computing 2002, pp. 741-750. + + [334] S. Ratnasamy, A Scalable Content-Addressable Network, Doctoral + Dissertation 2002. + + [335] S. McCanne and S. Floyd, The LBNL/UCB Network Simulator. + + + + +Risson & Moors Informational [Page 84] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [336] M. Naor and U. Wieder, Novel architectures for P2P + applications: the continuous-discrete approach, Proc. fifteenth + annual ACM Symp. on Parallel Algorithms and Architectures, SPAA + 2003, June 7-9 2003, pp. 50-59. + + [337] N. D. de Bruijn, A combinatorial problem, Koninklijke + Netherlands: Academe Van Wetenschappen 49 (1946) 758-764. + + [338] J.-W. Mao, "The Coloring and Routing Problems on de Bruijn + Interconnection Networks," in Doctoral Dissertation, National + Sun Yat-sen University, 2003. + + [339] M. L. Schlumberger, De Bruijn communication networks, Doctoral + Dissertation 1974. + + [340] M. Imase and M. Itoh, Design to minimize diameter on building- + block network, IEEE Trans. on Computers C-30 (6) (1981) 439- + 442. + + [341] S. M. Reddy, D. K. Pradhan, and J. G. Kuhl, Direct graphs with + minimal and maximal connectivity, Technical Report, School of + Engineering, Oakland University (1980) + + [342] R. A. Rowley and B. Bose, Fault-tolerant ring embedding in de + Bruijn networks, IEEE Trans. on Computers 42 (12) (1993) 1480- + 1486. + + [343] K. Y. Lee, G. Liu, and H. F. Jordan, Hierarchical networks for + optical communications, Journal of Parallel and Distributed + Computing 60 (2000) 1-16. + + [344] M. Naor and U. Wieder, Know thy neighbor's neighbor: better + routing for skip-graphs and small worlds, The 3rd Int'l + Workshop on Peer-to-Peer Systems, February 26-27 2004. + + [345] P. Fraigniaud and P. Gauron, The content-addressable networks + D2B, Technical Report 1349, Laboratoire de Recherche en + Informatique, January 2003. + + [346] A. Datta, S. Girdzijauskas, and K. Aberer, On de Bruijn routing + in distributed hash tables: there and back again, Proc. Fourth + IEEE Int'l Conf. on Peer-to-Peer Computing, , 25-27 August + 2004. + + [347] W. Pugh, Skip lists: a probabilistic alternative to balanced + trees, Proc. Workshop on Algorithms and Data Structures, August + 17-19 1989, pp. 437-449. + + + + +Risson & Moors Informational [Page 85] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [348] W. Pugh, Skip lists: a probabilistic alternative to balanced + trees, Communications of the ACM 33 (6) (1990) 668-676. + + [349] J. Gray, The transaction concept: Virtues and limitations, + Proc. VLDB, September 1981. + + [350] B. T. Loo, J. M. Hellerstein, R. Huebsch, S. Shenker, and I. + Stoica, Enhancing P2P file-sharing with internet-scale query + processor, Proc. 30th Int'l Conf. on Very Large Data Bases VLDB + 2004, 29 August-3 September 2004. + + [351] M. Stonebraker, P. Aoki, W. Litwin, A. Pfeffer, A. Sah, J. + Sidell, C. Staelin, and A. Yu, Mariposa: a wide-area + distributed database system, THE VLDB Journal - The Int'l + Journal of Very Large Data Bases (5) (1996) 48-63. + + [352] V. Cholvi, P. Felber, and E. Biersack, Efficient Search in + Unstructured Peer-to-Peer Networks, Proc. Symp. on Parallel + Algorithms and Architectures, July 2004. + + [353] S. Daswani and A. Fisk, Gnutella UDP Extension for Scalable + Searches (GUESS) v0.1, + http://www.limewire.org/fisheye/viewrep/~raw,r=1.2/limecvs/ + core/guess_01.html (2002) + + [354] A. Fisk, Gnutella Dynamic Query Protocol v0.1, Gnutella + Developer Forum (2003) + + [355] O. Gnawali, A Keyword Set Search System for Peer-to-Peer + Networks, Master's Thesis 2002. + + [356] Limewire, Limewire Host Count, + http://www.limewire.com/english/content/netsize.shtml (2004) + + [357] A. Fisk, Gnutella Ultrapeer Query Routing, + http://groups.yahoo.com/group/the_gdf/files/Proposals/ + Working%20Proposals/search/Ultrapeer%20QRP/ v0.1 (2003) + + [358] A. Fisk, Gnutella Dynamic Query Protocol, + http://groups.yahoo.com/group/the_gdf/files/Proposals/ + Working%20Proposals/search/Dynamic%20Querying/ v0.1 (2003) + + [359] S. Thadani, Meta Data searches on the Gnutella Network + (addendum), http://www.limewire.com/developer/MetaProposal2.htm + (2001) + + + + + + +Risson & Moors Informational [Page 86] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [360] S. Thadani, Meta Information Searches on the Gnutella Networks, + http://www.limewire.com/developer/metainfo_searches.html (2001) + + [361] P. Reynolds and A. Vahdat, Efficient peer-to-peer keyword + searching, ACM/IFP/USENIX Int'l Middleware Conference, + Middleware 2003, June 16-20 2003. + + [362] W. Terpstra, S. Behnel, L. Fiege, J. Kangasharju, and A. + Buchmann, Bit Zipper Rendezvous, optimal data placement for + general P2P queries, Proc. First Int'l Workshop on Peer-to-Peer + Computing and Databases, March 14 2004. + + [363] A. Singhal, Modern Information Retrieval: A Brief Overview, + IEEE Data Engineering Bulletin 24 (4) (2001) 35-43. + + [364] E. Cohen, A. Fiat, and H. Kaplan, Associative Search in Peer to + Peer Networks: Harnessing Latent Semantics, IEEE Infocom 2003, + The 22nd Annual Joint Conf. of the IEEE Computer and + Communications Societies, March 30-April 3 2003. + + [365] W. Muller and A. Henrich, Fast retrieval of high-dimensional + feature vectors in P2P networks using compact peer data + summaries, Proc. 5th ACM SIGMM international workshop on + Multimedia Information Retrieval, November 7 2003, pp. 79-86. + + [366] M. T. Ozsu and P. Valduriez, Principles of Distributed Database + Systems, 2nd edition ed. Prentice Hall, 1999. + + [367] G. Salton, A. Wong, and C. S. Yang, A vector space model for + automatic indexing, Communications of the ACM 18 (11) (1975) + 613- 620. + + [368] S. E. Robertson, S. Walker, and M. Beaulieu, Okapi at TREC-7: + automatic ad hoc, filtering, VLC and filtering tracks, Proc. + Seventh Text REtrieval Conference, TREC-7, NIST Special + Publication 500-242, July 1999, pp. 253-264. + + [369] A. Singhal, J. Choi, D. Hindle, D. Lewis, and F. Pereira, AT&T + at TREC-7, Proc. Seventh Text REtrieval Conf. TREC-7, July + 1999, pp. 253-264. + + [370] K. Sankaralingam, S. Sethumadhavan, and J. Browne, Distributed + Pagerank for P2P Systems, Proc. 12th international symposium on + High Performance Distributed Computing HPDC, June 22-24 2003. + + [371] I. Klampanos and J. Jose, An architecture for information + retrieval over semi-collaborated peer-to-peer networks, Proc. + 2004 ACM symposium on applied computing 2004, pp. 1078-1083. + + + +Risson & Moors Informational [Page 87] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [372] C. Tang, Z. Xu, and S. Dwarkadas, Peer-to-peer information + retrieval using self-organizing semantic overlay networks, + Proc. 2003 conference on Applications, Technologies, + Architectures and Protocols for Computer Communications, August + 25-29 2003, pp. 175-186. + + [373] C. Tang and S. Dwarkadas, Hybrid global-local indexing for + efficient peer-to-peer information retrieval, Proc. First Symp. + on Networked Systems Design and Implementation NSDI'04, March + 29-31 2004, pp. 211-224. + + [374] G. W. Furnas, S. Deerwester, S. T. Dumais, T. K. Landauer, R. + A. Harshman, L. A. Streeter, and K. E. Lochbaum, Information + retrieval using a singular value decomposition model of latent + semantic structure, Proc. 11th Annual Int'l ACM SIGIR Conf. on + Research and Development in Information Retrieval 1988, pp. + 465-480. + + [375] C. Tang, S. Dwarkadas, and Z. Xu, On scaling latent semantic + indexing for large peer-to-peer systems, The 27th Annual Int'l + ACM SIGIR Conf. SIGIR'04, ACM Special Interest Group on + Information Retrieval, July 2004. + + [376] S. Milgram, The small world problem, Psychology Today 1 (61) + (1967) + + [377] J. Kleinberg, The small-world phenonemon: An algorithmic + perspective, Proc. 32nd ACM Symp. on Theory of Computing (2000) + + [378] Y. Petrakis and E. Pitoura, "On constructing small worlds in + unstructured peer-to-peer systems," in Current trends in + database technology (Proc. First Int'l Workshop on Peer-to-Peer + Computing and Databases, Heraklion, Crete, Greece, March 14), + vol. 3268, Lecture Notes in Computer Science: Springer, 2004, + pp. 415-424. + + [379] A. Iamnitchi, M. Ripeanu, and I. Foster, Locating Data in + (Small World?) P2P Scientific Collaborations, First Int'l + Workshop on Peer-to-Peer Systems (IPTPS), Cambridge, MA, March + (2002) + + + + + + + + + + + +Risson & Moors Informational [Page 88] + +RFC 4981 Survey of Research on P2P Search September 2007 + + + [380] Y. Ren, C. Sha, W. Qian, A. Zhou, B. Ooi, and K. Tan, Explore + the "small world phenomena" in pure P2P information sharing + systems, Proc. 3rd IEEE/ACM Int'l Symp. on Cluster Computing + and the Grid (2003) 232-239. + + [381] G. S. Manku, M. Bawa, and P. Raghavan, Symphony: Distributed + Hashing in a Small World, Proc. 4th USENIX Symp. on Internet + Technologies and Systems, March 26-28 2003. + + [382] W. Litwin and S. Sahri, Implementing SD-SQL Server: a Scalable + Distributed Database System, CERIA Research Rerpot 2004-04-02, + April 2004. + + [383] M. Jarke and J. Koch, Query Optimization in Database Systems, + ACM Computing Surveys 16 (2) (1984) 111-152. + + [384] J. L. Bentley, Multidimensional binary search trees used for + associative searching, Communications of the ACM 18 (9) (1975) + 509-517. + + [385] B. Chun, I. Stoica, J. Hellerstein, R. Huebsch, S. Jeffery, B. + T. Loo, S. Mardanbeigi, T. Roscoe, S. Rhea, and S. Schenker, + Querying at Internet Scale, Proc. 2004 ACM SIGMOD international + conference on management of data, demonstration session 2004, + pp. 935-936. + + [386] P. Cao and Z. Wang, Efficient top-K query calculation in + distributed networks, Proc. 23rd Annual ACM SIGACT-SIGOPS Symp. + on Principles of Distributed Computing PODC 2004, July 25-28 + 2004, pp. 206-215. + + [387] D. Psaltoulis, I. Kostoulas, I. Gupta, K. Birman, and A. + Demers, Practical algorithms for size estimation in large and + dynamic groups, Proc. Twenty-Third Annual ACM SIGACT-SIGOPS + Symp. on Principles of Distributed Computing, PODC 2004, July + 25-28 2004. + + [388] R. van Renesse, The importance of aggregation, Springer-Verlag + Lecture Notes in Computer Science "Future Directions in + Distributed Computing". A. Schiper, A. A. Shvartsman, H. + Weatherspoon, and B. Y. Zhao, editors. Springer-Verlag, + Heidelberg volume 2584 (2003) + + + + + + + + + +Risson & Moors Informational [Page 89] + +RFC 4981 Survey of Research on P2P Search September 2007 + + +Author's Addresses + + John Risson + School of Elec Eng and Telecommunications + University of New South Wales + Sydney NSW 2052 Australia + + EMail: jr@tuffit.com + + + Tim Moors + School of Elec Eng and Telecommunications + University of New South Wales + Sydney NSW 2052 Australia + + EMail: t.moors@unsw.edu.au + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Risson & Moors Informational [Page 90] + +RFC 4981 Survey of Research on P2P Search September 2007 + + +Full Copyright Statement + + Copyright (C) The IETF Trust (2007). + + This document is subject to the rights, licenses and restrictions + contained in BCP 78, and except as set forth therein, the authors + retain all their rights. + + This document and the information contained herein are provided on an + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND + THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS + OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF + THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + + The IETF takes no position regarding the validity or scope of any + Intellectual Property Rights or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; nor does it represent that it has + made any independent effort to identify any such rights. Information + on the procedures with respect to rights in RFC documents can be + found in BCP 78 and BCP 79. + + Copies of IPR disclosures made to the IETF Secretariat and any + assurances of licenses to be made available, or the result of an + attempt made to obtain a general license or permission for the use of + such proprietary rights by implementers or users of this + specification can be obtained from the IETF on-line IPR repository at + http://www.ietf.org/ipr. + + The IETF invites any interested party to bring to its attention any + copyrights, patents or patent applications, or other proprietary + rights that may cover technology that may be required to implement + this standard. Please address the information to the IETF at + ietf-ipr@ietf.org. + + + + + + + + + + + + +Risson & Moors Informational [Page 91] + |