summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc2651.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc2651.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc2651.txt')
-rw-r--r--doc/rfc/rfc2651.txt1067
1 files changed, 1067 insertions, 0 deletions
diff --git a/doc/rfc/rfc2651.txt b/doc/rfc/rfc2651.txt
new file mode 100644
index 0000000..01c8909
--- /dev/null
+++ b/doc/rfc/rfc2651.txt
@@ -0,0 +1,1067 @@
+
+
+
+
+
+
+Network Working Group J. Allen
+Request for Comments: 2651 WebTV Networks
+Category: Standards Track M. Mealling
+ Network Solutions, Inc.
+ August 1999
+
+
+ The Architecture of the Common Indexing Protocol (CIP)
+
+Status of this Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (1999). All Rights Reserved.
+
+Abstract
+
+ The Common Indexing Protocol (CIP) is used to pass indexing
+ information from server to server in order to facilitate query
+ routing. Query routing is the process of redirecting and replicating
+ queries through a distributed database system towards servers holding
+ the desired results. This document describes the CIP framework,
+ including its architecture and the protocol specifics of exchanging
+ indices.
+
+1. Introduction
+
+1.1. History and Motivation
+
+ The Common Indexing Protocol (CIP) is an evolution and refinement of
+ distributed indexing concepts first introduced in the Whois++
+ Directory Service [RFC1913, RFC1914]. While indexing proved useful in
+ that system to promote query routing, the centroid index object which
+ is passed among Whois++ servers is specifically designed for
+ template-based databases searchable by token-based matching. With
+ alternative index objects, the index-passing technology will prove
+ useful to many more application domains, not simply Directory
+ Services and those applications which can be cast into the form of
+ template collections.
+
+
+
+
+
+
+Allen & Mealling Standards Track [Page 1]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ The indexing part of Whois++ is integrated with the data access
+ protocol. The goal in designing CIP is to extract the indexing
+ portion of Whois++, while abstracting the index objects to apply more
+ broadly to information retrieval. In addition, another kind of
+ technology reuse has been undertaken by converting the ad-hoc data
+ representations used by Whois++ into structures based on the MIME
+ specification for structured Internet mail.
+
+ Whois++ used a version number field in centroid objects to facilitate
+ future growth. The initial version was "1". Version 1 of CIP (then
+ embedded in Whois++, and not referred to separately as CIP) had
+ support for only ISO-8895-1 characters, and for only the centroid
+ index object type.
+
+ Version 2 of the Whois++ centroid was used in the Digger software by
+ Bunyip Information Systems to notify recipients that the centroid
+ carried extra character set information. Digger's centroids can carry
+ UTF-8 encoded 16-bit Unicode characters, or ISO-8859-1 characters,
+ determined by a field in the headers.
+
+ This specification is for CIP version 3. Version 3 is a major
+ overhaul to the protocol. However, by using of a short negotiation
+ sequence, CIP version 3 servers can interoperate with earlier servers
+ in an index-passing mesh.
+
+ For unclear terms the reader is referred to the glossary in Appendix
+ A.
+
+1.2 CIP's place in the Information Retrieval world
+
+ CIP facilitates query routing. CIP is a protocol used between servers
+ in a network to pass hints which make data access by clients at a
+ later date more efficient. Query routing is the act of redirecting
+ and replicating queries through a distributed database system towards
+ the servers holding the actual results via reference to indexing
+ information.
+
+ CIP is a "backend" protocol -- it is implemented in and "spoken" only
+ among network servers. These same servers must also speak some kind
+ of data access protocol to communicate with clients. During query
+ resolution in the native protocol implementation, the server will
+ refer to the indexing information collected by the CIP implementation
+ for guidance on how to route the query.
+
+ Data access protocols used with CIP must have some provision for
+ control information in the form of a referral. The syntax and
+ semantics of these referrals are outside the scope of this
+ specification.
+
+
+
+Allen & Mealling Standards Track [Page 2]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+2. Related Documents
+
+ This document is one of three documents. This document describes the
+ fundamental concepts and framework of CIP.
+
+ The document "MIME Object Definitions for the Common Indexing
+ Protocol" [CIP-MIME] describes the MIME objects that make up the
+ items that are passed by the transport system.
+
+ Requirements and examples of several transport systems are specified
+ in the "CIP Transport Protocols" [CIP-TRANSPORT] document.
+
+ A second set of document describe the various specifications for
+ specific index types.
+
+3. Architecture
+
+3.1 CIP in the Information Retrieval World
+
+3.1.1 Information Retrieval in the Abstract
+
+ In order to better understand how CIP fits into the information
+ retrieval world, we need to first understand the unifying abstract
+ features of existing information retrieval technology. Next, we
+ discuss why adding indexing technology to this model results in a
+ system capable of query routing, and why query routing is useful.
+
+ An abstract view of the client/server data retrieval process includes
+ data sets and data access protocols. An individual server is
+ responsible for handling queries over a fixed domain of data. For the
+ purposes of CIP, we call this domain of data the dataset. Clients
+ make searches in the dataset and retrieve parts of it via a data
+ access protocol. There are many data access protocols, each optimized
+ for the data in question. For instance, LDAP and Whois++ are access
+ protocols that reflect the needs of the directory services
+ application domain. Other data access protocols include HTTP and
+ Z39.50.
+
+3.1.2 Indexing Information Facilitates Query Routing
+
+ The above description reflects a world without indexing, where no
+ server knows about any other server. In some cases (as with X.500
+ referrals, and HTTP redirects) a server will, as part of its reply,
+ implicate another server in the process of resolving the query.
+ However, those servers generate replies based solely on their local
+ knowledge. When indexing information is introduced into a server's
+ local database, the server now knows not only answers based on the
+
+
+
+
+Allen & Mealling Standards Track [Page 3]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ local dataset, but also answers based on external indices. These
+ indices come from peer servers, via an indexing protocol. CIP is one
+ such indexing protocol.
+
+ Replies based on index information may not be the complete answer.
+ After all, an index is not a replicated version of the remote
+ dataset, but a possibly reduced version of it. Thus, in addition to
+ giving complete replies from the local dataset, the server may give
+ referrals to other datasets. These referrals are the core feature
+ necessary for effective query routing. When servers use CIP to pass
+ indices from server to server, they make a kind of investment. At the
+ cost of some resources to create, transmit and store the indices,
+ query routing becomes possible.
+
+ Query Routing is the process of replicating and moving a query closer
+ to datasets which can satisfy the query. In some distributed systems,
+ widely distributed searches must be accomplished by replicating the
+ query to all sub-datasets. This approach can be wasteful of resources
+ both in the network, and on the servers, and is thus sometimes
+ explicitly disabled. Using indexing in such a system opens the door
+ to more efficient distributed searching.
+
+ While CIP-equipped servers provide the referrals necessary to make
+ query routing work, it is always the client's responsibility to
+ collate, filter, and chase the referrals it receives. This gives the
+ end-user (or agent, in the case that there's no human user involved
+ in the search) greatest control over the query resolution process.
+ The cost of the added client complexity is weighed against the
+ benefits of total control over query resolution. In some cases, it
+ may also be possible to decouple the referral chasing from the client
+ by introducing a proxy, allowing existing simple clients to make use
+ of query routing. Such a proxy would transparently resolve referrals
+ into concrete results before returning them to the simple-minded
+ client.
+
+3.1.3 Abstracting the CIP index object
+
+ As useful as indices seem, the fact remains that not all queries can
+ benefit from the same type of index. For example, say the index
+ consists of a simple list of keywords. With such an index, it is
+ impossible to answer queries about whether two keywords were near one
+ another, or if a keyword was present in a certain context (for
+ instance, in the title).
+
+ Because of the need for application domain specific indices, CIP
+ index objects are abstract; they must be defined by a separate
+ specification. The basic protocols for moving index objects are
+ widely applicable, but the specific design of the index, and the
+
+
+
+Allen & Mealling Standards Track [Page 4]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ structure of the mesh of servers which pass a particular type of
+ index is dependent on the application domain. This document describes
+ only the protocols for moving indices among servers. Companion
+ documents describe initial index objects.
+
+ The requirements that index type specifications must address are
+ specified in the [CIP-MIME] document.
+
+3.2 Architectural Details
+
+ CIP implements index passing, providing the forward knowledge
+ necessary to generate the referrals used for query routing. The core
+ of the protocol is the index object. In the following sections, the
+ structure of the index objects themselves is presented. Next, how and
+ why indices are passed from server to server is discussed. Finally,
+ the circumstances under which a server may synthesize an index object
+ based on incoming ones are discussed.
+
+3.2.1 The CIP Index Object
+
+ A CIP index object is composed of two parts, the header and the
+ payload. The header contains metadata necessary to process and make
+ use of the index object being transmitted. The actual index resides
+ in the payload.
+
+ Three particular headers warrant specific mention at this point. The
+ "type" of the index object selects one of many distinct CIP index
+ object specifications which define exactly how the index blocks are
+ to be created, parsed and used to facilitate query routing. Another
+ header of note is the "DSI", or Dataset Identifier, which uniquely
+ identifies the dataset from which the index was created. Another
+ header that is crucial for generating referrals is the "Base-URI".
+ The URI (or URI's) contained in this header form the basis of any
+ referrals generated based on this index block. The URI is also used
+ as input during the index aggregation process to constrain the kinds
+ of aggregation possible, due to multiprotocol constraints. How that
+ URI is used is defined by the aggregation algorithm. The exact
+ syntax of these headers is specified in the CIP MIME specification
+ document [CIP-MIME].
+
+ The payload is opaque to CIP itself. It is defined exclusively by the
+ index object specification associated with the object's MIME type.
+ Specifications on how to parse and use the payload are published
+ separately as "CIP index object specifications". This abstract
+ definition of the index object forms the basis of CIP's applicability
+ to indexing needs across multiple application domains.
+
+
+
+
+
+Allen & Mealling Standards Track [Page 5]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ A precise definition of the content and form of a CIP index block can
+ be found in the Protocol document [CIP-MIME]
+
+3.2.2 Moving Index Objects: How to Build a Mesh
+
+ Indices are transmitted among servers participating in a CIP mesh. By
+ distributing this information in anticipation of a query, efficient,
+ accurate query routing is possible at the time a query arrives.
+
+ A CIP mesh is a set of CIP servers which pass indices of the same
+ type among themselves. Typically, a mesh is arranged in a
+ hierarchical tree fashion, with servers nearer the root of the tree
+ having larger and more comprehensive indices. See Figure 1. However,
+ a CIP mesh is explicitly allowed to have lateral links in it, and
+ there may be more than one part of the mesh that has the properties
+ of a "root". Mesh administrators are encouraged to avoid loops in the
+ system, but they are not obliged to maintain a strict tree structure.
+ Clients wishing to completely resolve all referrals they receive
+ should protect against referral loops while attempting to traverse
+ the mesh to avoid wasting time and network resources. See the
+ section on "Navigating the Mesh" for a discussion of this.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allen & Mealling Standards Track [Page 6]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ base level index index
+ directory servers servers
+ servers for for
+ base level lower-level
+ servers index servers
+ _______
+ | |
+ | A |__
+ |_______| \ _______
+ \---CIP----| |
+ _______ | D |__
+ | | /---CIP----|_______| \ ------
+ | B |__/ \--CIP------| |
+ |_______| | F |
+ /--CIP------|______|
+ /
+ _______ _______ /
+ | | | |-
+ | C |-------CIP----| E |
+ |_______| |_______|-
+ | \
+ r \
+ _______ e \ ______
+ | | f \--CIP-----| |
+ | G |-------CIP---------e------------------| H |
+ |_______| r |______|
+ \--referral---| r --referral-/
+
+ | a |
+
+ | l |
+
+ \ 3 | 2 | 1
+
+ \--------/
+
+ | |
+
+ | client |
+
+ | |
+
+ --------
+
+
+ Figure 1: Sample layout of the Index Service mesh
+
+
+
+
+
+Allen & Mealling Standards Track [Page 7]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ All indices passed in a given mesh are assumed, as of this writing,
+ to be of the same type (i.e. governed by the same CIP index object
+ specification). It may be possible to create gateways between meshes
+ carrying different index objects, but at this time that process is
+ undefined and declared to be outside the scope of this specification.
+
+ In the case where a CIP server receives an index of a type that it
+ does not understand it _can_ pass that index forward untouched. In
+ the case where a server implementation decides not to accept unknown
+ indices it should return an appropriate error message to the server
+ sending the index. This behavior is to allow mesh implementations to
+ attempt heterogeneous meshes. As stated above heterogeneous meshes
+ are considered to be ill defined and as such should be considered
+ dangerous.
+
+ Experience suggests that this index passing activity should take
+ place among CIP servers as a parallel (and possibly lower-priority)
+ job to their primary job of answering queries. Index objects travel
+ among CIP servers by protocol exchanges explicitly defined in this
+ document, not via the server's native protocol. This distinction is
+ important, and bears repeating:
+
+ Queries are answered (and referrals are sent) via the native data
+ access protocol.
+
+ Index objects are transferred via alternative means, as defined by
+ this document.
+
+ When two servers cooperate to move indexing information, the pair are
+ said to be in a "polling relationship". The server that holds the
+ data of interest, and generates the index is called the "polled
+ server". The other server, which is the one that collects the
+ generated index, is the "polling server".
+
+ In a polling relationship, the polled server is responsible for
+ notifying the polling server when it has a new index that the polling
+ server might be interested in. In response, the polling server may
+ immediately pick up the index object, or it may schedule a job to
+ pick up a copy of the new index at a more convenient time. But, a
+ polling server is not required to wait on the polled server to notify
+ it of changes. The polling server can request a new index at any
+ time.
+
+ Independent of the symmetric polling relationship, there's another
+ way that servers can pass indices using CIP. In an "index pushing"
+ relationship, a CIP server simply sends the index to a peer whenever
+ necessary, and allows the receiver to handle the index object as it
+
+
+
+
+Allen & Mealling Standards Track [Page 8]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ chooses. The receiving server may refuse it, may accept it, then
+ silently discard it, may accept only portions of it (by accepting it
+ as is, then filtering it), or may accept it without question.
+
+ The index pushing relationship is intended for use by dumb leaf nodes
+ which simply want to make their index available to the global mesh of
+ servers, but have no interest in implementing the complete CIP
+ transaction protocol. It lowers the barriers to entry for CIP leaf
+ nodes. For more information on participating in a CIP mesh in this
+ restricted manner, see the section below on "Protocol Conformance".
+ CIP index passing operations take place across a reliable transport
+ mechanisms, including both TCP connections, and Internet mail
+ messages. The precise mechanisms are described in the Transport
+ document [CIP-Transport].
+
+3.2.3 Index Object Synthesis
+
+ From the preceding discussion, it should be clear that indexing
+ servers read and write index objects as they pass them around the
+ mesh. However, a CIP server need not simply pass the in-bound indices
+ through as the out-bound ones. While it is always permissible to pass
+ an index object through to other servers, a server may choose to
+ aggregate two or more of them, thereby reducing redundancy in the
+ index, at the cost of longer referral chains.
+
+ A basic premise of index passing is that even while collapsing a body
+ of data into an index by lossy compression methods, hints useful to
+ routing queries will survive in the resulting index. Since the index
+ is not a complete copy of the original dataset, it contains less
+ information. Index objects can be passed along unchanged, but as more
+ and more information collects in the resulting index object,
+ redundancy will creep in again, and it may prove useful to apply the
+ compression again, by aggregating two or more index objects into one.
+
+ This kind of aggregation should be performed without compromising the
+ ability to correctly route queries while avoiding excessive numbers
+ of missed results. The acceptable likelihood of false negatives must
+ be established on a per-application-domain basis, and is controlled
+ by the granularity of the index and the aggregation rules defined for
+ it by the particular specification.
+
+ However, when CIP is used in a multi-protocol application domain,
+ such as a Directory Service (with contenders including Whois++, LDAP,
+ and Ph), things get significantly trickier. The fundamental problem
+ is to avoid forcing a referral chain to pass through part of the mesh
+ which does not support the protocol by which that client made the
+ query. If this ever happens, the client loses access to any hits
+
+
+
+
+Allen & Mealling Standards Track [Page 9]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ beyond that point in the referral chain, since it cannot resolve the
+ referral in its native data access protocol. This is a failure of
+ query routing, which should be avoided.
+
+ In addition to multi-protocol considerations, server managers may
+ choose not to allow index object aggregation for performance reasons.
+ As referral chains lengthen, a client needs to perform more
+ transactions to resolve a query. As the number of transactions
+ increases, so do the user-perceived delays, the system loads, and the
+ global bandwidth demands. In general, there's a tradeoff between
+ aggressive aggregation (which leads to reductions in the indexing
+ overhead) and aggressive referral chain optimization. This tradeoff,
+ which is also sensitive to the particular application domain, needs
+ to be explored more in actual operational situations.
+
+ Conceptually, a CIP index server has several index objects on hand at
+ any given time. If it holds data in addition to indexing information,
+ the server has an index object formed from its own data, called the
+ "local index". It may have one or more indices from remote servers
+ which it has collected via the index passing mechanisms. These are
+ called "in-bound indices".
+
+ Implementor's Note: It may not be necessary to keep all of these
+ structures intact and distinct in the local database. It is also
+ not required to keep the out-bound index (or indices) built and
+ ready to distribute at all times. The previous paragraph merely
+ introduces a useful model for expressing the aggregation rules.
+ Implementors are free to model index objects internally however
+ they see fit.
+
+ The following two rules control how a CIP server formulates its
+ outgoing indices:
+
+ 1. An index server may pass any of the index objects in its local
+ index and its in-bound indices through unchanged to polling
+ servers.
+
+ 2. If and only if the following three conditions are true, an index
+ server can aggregate two or more index objects into a single new
+ index object, to be added to the set of out-bound indices.
+
+ a. Each index object to be aggregated covers exactly the same set
+ of protocols, as defined by the scheme component of the Base-
+ URI's in each index object.
+
+ b. The index server supports every one of the data access
+ protocols represented by the Base-URI's in the index objects to
+ be aggregated.
+
+
+
+Allen & Mealling Standards Track [Page 10]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ c. The specification for the index object type specified by the
+ type header of the index objects explicitly defines the
+ aggregation operation.
+
+ The resulting index object must have Base-URI's characteristic of
+ the local server for each protocol it supports. The outgoing
+ objects should have the DSI of the local server.
+
+4. Navigating the mesh
+
+ With the CIP infrastructure in place to manage index objects, the
+ only problem remaining is how to successfully use the indexing
+ information to do efficient searches. CIP facilitates query routing,
+ which is essentially a client activity. A client connects to one
+ server, which redirects the query to servers "closer to" the answer.
+ This redirection message is called a referral.
+
+4.1 The Referral
+
+ The concept of a referral and the mechanism for deciding when they
+ should be issued is described by CIP. However, the referral itself
+ must be transferred to the client in the native protocol, so its
+ syntax is not directly a CIP issue. The mechanism for deciding that a
+ referral needs to be made and generating that referral resides in the
+ CIP implementation in the server. The mechanism for sending the
+ referral to the client resides in the server's native protocol
+ implementation.
+
+ A referral is made when a search against the index objects held by
+ the server shows that there may be hits available in one of the
+ datasets represented by those index objects. If more that one index
+ object indicates that a referral must be generated to a given
+ dataset, the server should generate only one referral to the given
+ dataset, as the client may not be able to detect duplicates.
+
+ Though the format of the referral is dependent on the native
+ protocol(s) of the CIP server, the baseline contents of the referral
+ are constant across all protocols. At the least, a DSI and a URI must
+ be returned. The DSI is the DSI associated with the dataset which
+ caused the hit. This must be presented to the client so that it can
+ avoid referral loops. The Base-URI parameter which travels along with
+ index objects is used to provide the other required part of a
+ referral.
+
+ The additional information in the Base-URI may be necessary for the
+ server receiving the referred query to correctly handle it. A good
+ example of this is an LDAP server, which needs a base X.500
+ distinguished name from which to search. When an LDAP server sends a
+
+
+
+Allen & Mealling Standards Track [Page 11]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ centroid-format index object up to a CIP indexing server, it sends a
+ Base-URI along with the name of the X.500 subtree for which the index
+ was made. When a referral is made, the Base-URI is passed back to the
+ client so that it can pass it to the original LDAP server.
+
+ As usual, in addition to sending the DSI, a DSI-Description header
+ can be optionally sent. Because a client may attempt to check with
+ the user before chasing the referral, and because this string is the
+ friendliest representation of the DSI that CIP has to offer, it
+ should be included in referrals when available (i.e. when it was sent
+ along with the index object).
+
+4.2 Cross-protocol Mappings
+
+ Each data access protocol which uses CIP will need a clearly defined
+ set of rules to map queries in the native protocol to searches
+ against an index object. These rules will vary according to the data
+ domain. In principle, this could create a bit of a scaling
+ difficulty; for N protocols and M data domains, there would be N x M
+ mappings required. In practice, this should not be the case, since
+ some access protocols will be wholly unsuited to some data domains.
+ Consider for example, a LDAP server trying to make a search in an
+ index object composed from unorganized text based pages. What would
+ the results be? How would the client make sense of the results?
+
+ However, as pre-existing protocols are connected to CIP, and as new
+ ones are developed to work with CIP, this issue must be examined. In
+ the case of Whois++ and the CENTROID index type, there is an
+ extremely close mapping, since the two were designed together. When
+ hooking LDAP to the CENTROID index type, it will be necessary to map
+ the attribute names used in the LDAP system to attribute names which
+ are already being used in the CENTROID mesh. It will also be
+ necessary to tokenize the LDAP queries under the same rules as the
+ CENTROID indexing policy, so that searches will take place correctly.
+ These application- and protocol-specific actions must be specified in
+ the index object specification, as discussed in the [CIP-MIME]
+ document.
+
+4.3 Moving through the mesh
+
+ From a client's point of view, CIP simply pushes all the "hard work"
+ onto its shoulders. After all, it is the client which needs to track
+ down the real data. While this is true, it is very misleading.
+ Because the client has control over the query routing process, the
+ client has significant control over the size of the result set, the
+ speed with which the query progresses, and the depth of the search.
+
+
+
+
+
+Allen & Mealling Standards Track [Page 12]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ The simplest client implementation provides referrals to the user in
+ a raw, ready-to-reuse form, without attempting to follow them. For
+ instance, one Whois++ client, which interacts with the user via a
+ Web-based form, simply makes referrals into HTML hypertext links.
+ Encoded in the link via the HTML forms interface GET encoding rules
+ is the data of the referral: the hostname, port, and query. If a user
+ chooses to follow the referral link, he executes a new search on the
+ new host. A more savvy client might present the referrals to the user
+ and ask which should be followed. And, assuming appropriate limits
+ were placed on search time and bandwidth usage, it might be
+ reasonable to program a client to follow all referrals automatically.
+
+ When following all referrals, a client must show a bit of
+ intelligence. Remember that the mesh is defined as an interconnected
+ graph of CIP servers. This graph may have cycles, which could cause
+ an infinite loop of referrals, wasting the servers' time and the
+ client's too. When faced with the job of tacking down all referrals,
+ a client must use some form of a mesh traversal algorithm. Such an
+ algorithm has been documented for use with Whois++ in RFC-1914. The
+ same algorithm can be easily used with this version of CIP. In
+ Whois++ the equivalent of a DSI is called a handle. With this
+ substitution, the Whois++ mesh traversal algorithm works unchanged
+ with CIP.
+
+ Finally, the mesh entry point (i.e. the first server queried) can
+ have an impact on the success of the query. To avoid scaling issues,
+ it is not acceptable to use a single "root" node, and force all
+ clients to connect to it. Instead, clients should connect to a
+ reasonably well connected (with respect to the CIP mesh, not the
+ Internet infrastructure) local server. If no match can be made from
+ this entry point, the client can expand the search by asking the
+ original server who polls it. In general, those servers will have a
+ better "vantage point" on the mesh, and will turn up answers that the
+ initial search didn't. The mechanism for dynamically determining the
+ mesh structure like this exists, but is not documented here for
+ brevity. See RFC-1913 for more information on the POLLED-BY and
+ POLLED-FOR commands.
+
+ It still should be noted that, while these mesh operations are
+ important to optimizing the searches that a client should make, the
+ client still speaks its native protocol. This information must be
+ communicated to the client without causing the client to have to
+ understand CIP.
+
+
+
+
+
+
+
+
+Allen & Mealling Standards Track [Page 13]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+5. Security Considerations
+
+ In this section, we discuss the security considerations necessary
+ when making use of this specification. There are at least three
+ levels at which security considerations come into play. Indexing
+ information can leak undesirable amounts of proprietary information,
+ unless carefully controlled. At a more fundamental level, the CIP
+ protocol itself requires external security services to operate in a
+ safe manner. Lastly, CIP itself can be used to propogate false
+ information.
+
+5.1 Secure Indexing
+
+ CIP is designed to index all kinds of data. Some of this data might
+ be considered valuable, proprietary, or even highly sensitive by the
+ data maintainer. Take, for example, a human resources database.
+ Certain bits of data, in moderation, can be very helpful for a
+ company to make public. However, the database in its entirety is a
+ very valuable asset, which the company must protect. Much experience
+ has been gained in the directory service community over the years as
+ to how best to walk this fine line between completely revealing the
+ database and making useful pieces of it available. There are also
+ legal considerations regarding what data can be collected and shared.
+
+ Another example where security becomes a problem is for a data
+ publisher who'd like to participate in a CIP mesh. The data that
+ publisher creates and manages is the prime asset of the company.
+ There is a financial incentive to participate in a CIP mesh, since
+ exporting indices of the data will make it more likely that people
+ will search your database. (Making profit off of the search activity
+ is left as an exercise to the entrepreneur.) Once again, the index
+ must be designed carefully to protect the database while providing a
+ useful synopsis of the data.
+
+ One of the basic premises of CIP is that data providers will be
+ willing to provide indices of their data to peer indexing servers.
+ Unless they are carefully constructed, these indices could constitute
+ a threat to the security of the database. Thus, security of the data
+ must be a prime consideration when developing a new index object
+ type. The risk of reverse engineering a database based only on the
+ index exported from it must be kept to a level consistent with the
+ value of the data and the need for fine-grained indexing.
+
+ Lastly, mesh organizers should be aware that the insertion of false
+ data into a mesh can be used as part of an attack. Depending on the
+ type of mesh and aggregation algorithms, an index can selectivly
+ prune parts of a mesh. Also, since CIP is used to discover
+
+
+
+
+Allen & Mealling Standards Track [Page 14]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ information, it will be the target for the advertisement of false
+ information. CIP does not provide a method for trusting the data that
+ it contains.
+
+Acknowledgments
+
+ Thanks to the many helpful members of the FIND working group for
+ discussions leading to this specification.
+
+ Specific acknowledgment is given to Jeff Allen formerly of Bunyip
+ Information Systems. His original version of these documents helped
+ enormously in crystallizing the debate and consensus. Most of the
+ actual text in this document was originally authored by Jeff. Jeff
+ is no longer involved with the FIND Working Group or with editing
+ this document. His authorship is preserved by a specific decision of
+ the current editor.
+
+Authors' Addresses
+
+ Jeff R. Allen
+ 246 Hawthorne St.
+ Palo Alto, CA 94301
+
+ EMail: jeff.allen@acm.org
+
+
+ Michael Mealling
+ Network Solutions, Inc.
+ 505 Huntmar Park Drive
+ Herndon, VA 22070
+
+ Phone: (703) 742-0400
+ EMail: michael.mealling@RWhois.net
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allen & Mealling Standards Track [Page 15]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+References
+
+ [RFC1913] Weider, C., Fullton, J. and S. Spero, "Architecture
+ of the Whois++Index Service", RFC 1913, February
+ 1996.
+
+ [RFC1914] Faltstrom, P., Schoultz, R. and C. Weider, "How to
+ Interact with a Whois++ Mesh", RFC 1914, February
+ 1996.
+
+ [CIP-MIME] Allen, J. and M. Mealling, "MIME Object Definitions
+ for the Common Indexing Protocol (CIP)", RFC 2652,
+ August 1999.
+
+ [CIP-TRANSPORT] Allen, J. and P. Leach, "CIP Transport Protocols",
+ RFC 2653, August 1999.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allen & Mealling Standards Track [Page 16]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+Appendix A: Glossary
+
+ application domain: A problem domain to which CIP is applied which
+ has indexing requirements which are not subsumed by any existing
+ problem domain. Separate application domains require separate
+ index object specifications, and potentially separate CIP meshes.
+ See index object specification.
+
+ centroid: An index object type used with Whois++. In CIP versions
+ before version 3, the index was not extensible, and could only
+ take the form of a centroid. A centroid is a list of (template
+ name, attribute name, token) tuples with duplicate removed.
+
+ dataset: A collection of data (real or virtual) over which an index
+ is created. When a CIP server aggregates two or more indices, the
+ resultant index represents the index from a "virtual dataset",
+ spanning the previous two datasets.
+
+ Dataset Identifier: An identifier chosen from any part of the
+ ISO/CCITT OID space which uniquely identifies a given dataset
+ among all datasets indexed by CIP.
+
+ DSI: See Dataset Identifier.
+
+ DSI-description: A human readable string optionally carried along
+ with DSI's to make them more user-friendly. See dataset
+ Identifier.
+
+ index: A summary or compressed form of a body of data. Examples
+ include a unique list of words, a codified full text analysis, a
+ set of keywords, etc.
+
+ index object: The embodiment of the indices passed by CIP. An index
+ object consists of some control attributes and an opaque payload.
+
+ index object specification: A document describing an index object
+ type for use with the CIP system described in this document. See
+ index object and payload.
+
+ index pushing: The act of presenting, unsolicited, an index to a
+ peer CIP server.
+
+ MIME: see Multipurpose Internet Mail Extensions
+
+
+
+
+
+
+
+
+Allen & Mealling Standards Track [Page 17]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+ Multipurpose Internet Mail Extensions: A set of rules for encoding
+ Internet Mail messages that gives them richer structure. CIP uses
+ MIME rules to simplify object encoding issues. MIME is specified
+ in RFC-1521 and RFC-1522.
+
+ payload: The application domain specific indexing information stored
+ inside an index object. The format of the payload is specified
+ externally to this document, and depends on the type of the
+ containing index object.
+
+ polled server: A CIP server which receives a request to generate and
+ pass an index to a peer server.
+
+ polling server: A CIP server which generates a request to a peer
+ server for its index.
+
+ referral chain: The set of referrals generated by the process of
+ routing a query. See query routing.
+
+ query routing: Based on reference to indexing information,
+ redirecting and replicating queries through a distributed database
+ system towards the servers holding the actual results.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allen & Mealling Standards Track [Page 18]
+
+RFC 2651 The CIP Architecture August 1999
+
+
+6. Full Copyright Statement
+
+ Copyright (C) The Internet Society (1999). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assigns.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is currently provided by the
+ Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Allen & Mealling Standards Track [Page 19]
+