1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
|
Network Working Group J. Allen
Request for Comments: 2651 WebTV Networks
Category: Standards Track M. Mealling
Network Solutions, Inc.
August 1999
The Architecture of the Common Indexing Protocol (CIP)
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (1999). All Rights Reserved.
Abstract
The Common Indexing Protocol (CIP) is used to pass indexing
information from server to server in order to facilitate query
routing. Query routing is the process of redirecting and replicating
queries through a distributed database system towards servers holding
the desired results. This document describes the CIP framework,
including its architecture and the protocol specifics of exchanging
indices.
1. Introduction
1.1. History and Motivation
The Common Indexing Protocol (CIP) is an evolution and refinement of
distributed indexing concepts first introduced in the Whois++
Directory Service [RFC1913, RFC1914]. While indexing proved useful in
that system to promote query routing, the centroid index object which
is passed among Whois++ servers is specifically designed for
template-based databases searchable by token-based matching. With
alternative index objects, the index-passing technology will prove
useful to many more application domains, not simply Directory
Services and those applications which can be cast into the form of
template collections.
Allen & Mealling Standards Track [Page 1]
^L
RFC 2651 The CIP Architecture August 1999
The indexing part of Whois++ is integrated with the data access
protocol. The goal in designing CIP is to extract the indexing
portion of Whois++, while abstracting the index objects to apply more
broadly to information retrieval. In addition, another kind of
technology reuse has been undertaken by converting the ad-hoc data
representations used by Whois++ into structures based on the MIME
specification for structured Internet mail.
Whois++ used a version number field in centroid objects to facilitate
future growth. The initial version was "1". Version 1 of CIP (then
embedded in Whois++, and not referred to separately as CIP) had
support for only ISO-8895-1 characters, and for only the centroid
index object type.
Version 2 of the Whois++ centroid was used in the Digger software by
Bunyip Information Systems to notify recipients that the centroid
carried extra character set information. Digger's centroids can carry
UTF-8 encoded 16-bit Unicode characters, or ISO-8859-1 characters,
determined by a field in the headers.
This specification is for CIP version 3. Version 3 is a major
overhaul to the protocol. However, by using of a short negotiation
sequence, CIP version 3 servers can interoperate with earlier servers
in an index-passing mesh.
For unclear terms the reader is referred to the glossary in Appendix
A.
1.2 CIP's place in the Information Retrieval world
CIP facilitates query routing. CIP is a protocol used between servers
in a network to pass hints which make data access by clients at a
later date more efficient. Query routing is the act of redirecting
and replicating queries through a distributed database system towards
the servers holding the actual results via reference to indexing
information.
CIP is a "backend" protocol -- it is implemented in and "spoken" only
among network servers. These same servers must also speak some kind
of data access protocol to communicate with clients. During query
resolution in the native protocol implementation, the server will
refer to the indexing information collected by the CIP implementation
for guidance on how to route the query.
Data access protocols used with CIP must have some provision for
control information in the form of a referral. The syntax and
semantics of these referrals are outside the scope of this
specification.
Allen & Mealling Standards Track [Page 2]
^L
RFC 2651 The CIP Architecture August 1999
2. Related Documents
This document is one of three documents. This document describes the
fundamental concepts and framework of CIP.
The document "MIME Object Definitions for the Common Indexing
Protocol" [CIP-MIME] describes the MIME objects that make up the
items that are passed by the transport system.
Requirements and examples of several transport systems are specified
in the "CIP Transport Protocols" [CIP-TRANSPORT] document.
A second set of document describe the various specifications for
specific index types.
3. Architecture
3.1 CIP in the Information Retrieval World
3.1.1 Information Retrieval in the Abstract
In order to better understand how CIP fits into the information
retrieval world, we need to first understand the unifying abstract
features of existing information retrieval technology. Next, we
discuss why adding indexing technology to this model results in a
system capable of query routing, and why query routing is useful.
An abstract view of the client/server data retrieval process includes
data sets and data access protocols. An individual server is
responsible for handling queries over a fixed domain of data. For the
purposes of CIP, we call this domain of data the dataset. Clients
make searches in the dataset and retrieve parts of it via a data
access protocol. There are many data access protocols, each optimized
for the data in question. For instance, LDAP and Whois++ are access
protocols that reflect the needs of the directory services
application domain. Other data access protocols include HTTP and
Z39.50.
3.1.2 Indexing Information Facilitates Query Routing
The above description reflects a world without indexing, where no
server knows about any other server. In some cases (as with X.500
referrals, and HTTP redirects) a server will, as part of its reply,
implicate another server in the process of resolving the query.
However, those servers generate replies based solely on their local
knowledge. When indexing information is introduced into a server's
local database, the server now knows not only answers based on the
Allen & Mealling Standards Track [Page 3]
^L
RFC 2651 The CIP Architecture August 1999
local dataset, but also answers based on external indices. These
indices come from peer servers, via an indexing protocol. CIP is one
such indexing protocol.
Replies based on index information may not be the complete answer.
After all, an index is not a replicated version of the remote
dataset, but a possibly reduced version of it. Thus, in addition to
giving complete replies from the local dataset, the server may give
referrals to other datasets. These referrals are the core feature
necessary for effective query routing. When servers use CIP to pass
indices from server to server, they make a kind of investment. At the
cost of some resources to create, transmit and store the indices,
query routing becomes possible.
Query Routing is the process of replicating and moving a query closer
to datasets which can satisfy the query. In some distributed systems,
widely distributed searches must be accomplished by replicating the
query to all sub-datasets. This approach can be wasteful of resources
both in the network, and on the servers, and is thus sometimes
explicitly disabled. Using indexing in such a system opens the door
to more efficient distributed searching.
While CIP-equipped servers provide the referrals necessary to make
query routing work, it is always the client's responsibility to
collate, filter, and chase the referrals it receives. This gives the
end-user (or agent, in the case that there's no human user involved
in the search) greatest control over the query resolution process.
The cost of the added client complexity is weighed against the
benefits of total control over query resolution. In some cases, it
may also be possible to decouple the referral chasing from the client
by introducing a proxy, allowing existing simple clients to make use
of query routing. Such a proxy would transparently resolve referrals
into concrete results before returning them to the simple-minded
client.
3.1.3 Abstracting the CIP index object
As useful as indices seem, the fact remains that not all queries can
benefit from the same type of index. For example, say the index
consists of a simple list of keywords. With such an index, it is
impossible to answer queries about whether two keywords were near one
another, or if a keyword was present in a certain context (for
instance, in the title).
Because of the need for application domain specific indices, CIP
index objects are abstract; they must be defined by a separate
specification. The basic protocols for moving index objects are
widely applicable, but the specific design of the index, and the
Allen & Mealling Standards Track [Page 4]
^L
RFC 2651 The CIP Architecture August 1999
structure of the mesh of servers which pass a particular type of
index is dependent on the application domain. This document describes
only the protocols for moving indices among servers. Companion
documents describe initial index objects.
The requirements that index type specifications must address are
specified in the [CIP-MIME] document.
3.2 Architectural Details
CIP implements index passing, providing the forward knowledge
necessary to generate the referrals used for query routing. The core
of the protocol is the index object. In the following sections, the
structure of the index objects themselves is presented. Next, how and
why indices are passed from server to server is discussed. Finally,
the circumstances under which a server may synthesize an index object
based on incoming ones are discussed.
3.2.1 The CIP Index Object
A CIP index object is composed of two parts, the header and the
payload. The header contains metadata necessary to process and make
use of the index object being transmitted. The actual index resides
in the payload.
Three particular headers warrant specific mention at this point. The
"type" of the index object selects one of many distinct CIP index
object specifications which define exactly how the index blocks are
to be created, parsed and used to facilitate query routing. Another
header of note is the "DSI", or Dataset Identifier, which uniquely
identifies the dataset from which the index was created. Another
header that is crucial for generating referrals is the "Base-URI".
The URI (or URI's) contained in this header form the basis of any
referrals generated based on this index block. The URI is also used
as input during the index aggregation process to constrain the kinds
of aggregation possible, due to multiprotocol constraints. How that
URI is used is defined by the aggregation algorithm. The exact
syntax of these headers is specified in the CIP MIME specification
document [CIP-MIME].
The payload is opaque to CIP itself. It is defined exclusively by the
index object specification associated with the object's MIME type.
Specifications on how to parse and use the payload are published
separately as "CIP index object specifications". This abstract
definition of the index object forms the basis of CIP's applicability
to indexing needs across multiple application domains.
Allen & Mealling Standards Track [Page 5]
^L
RFC 2651 The CIP Architecture August 1999
A precise definition of the content and form of a CIP index block can
be found in the Protocol document [CIP-MIME]
3.2.2 Moving Index Objects: How to Build a Mesh
Indices are transmitted among servers participating in a CIP mesh. By
distributing this information in anticipation of a query, efficient,
accurate query routing is possible at the time a query arrives.
A CIP mesh is a set of CIP servers which pass indices of the same
type among themselves. Typically, a mesh is arranged in a
hierarchical tree fashion, with servers nearer the root of the tree
having larger and more comprehensive indices. See Figure 1. However,
a CIP mesh is explicitly allowed to have lateral links in it, and
there may be more than one part of the mesh that has the properties
of a "root". Mesh administrators are encouraged to avoid loops in the
system, but they are not obliged to maintain a strict tree structure.
Clients wishing to completely resolve all referrals they receive
should protect against referral loops while attempting to traverse
the mesh to avoid wasting time and network resources. See the
section on "Navigating the Mesh" for a discussion of this.
Allen & Mealling Standards Track [Page 6]
^L
RFC 2651 The CIP Architecture August 1999
base level index index
directory servers servers
servers for for
base level lower-level
servers index servers
_______
| |
| A |__
|_______| \ _______
\---CIP----| |
_______ | D |__
| | /---CIP----|_______| \ ------
| B |__/ \--CIP------| |
|_______| | F |
/--CIP------|______|
/
_______ _______ /
| | | |-
| C |-------CIP----| E |
|_______| |_______|-
| \
r \
_______ e \ ______
| | f \--CIP-----| |
| G |-------CIP---------e------------------| H |
|_______| r |______|
\--referral---| r --referral-/
| a |
| l |
\ 3 | 2 | 1
\--------/
| |
| client |
| |
--------
Figure 1: Sample layout of the Index Service mesh
Allen & Mealling Standards Track [Page 7]
^L
RFC 2651 The CIP Architecture August 1999
All indices passed in a given mesh are assumed, as of this writing,
to be of the same type (i.e. governed by the same CIP index object
specification). It may be possible to create gateways between meshes
carrying different index objects, but at this time that process is
undefined and declared to be outside the scope of this specification.
In the case where a CIP server receives an index of a type that it
does not understand it _can_ pass that index forward untouched. In
the case where a server implementation decides not to accept unknown
indices it should return an appropriate error message to the server
sending the index. This behavior is to allow mesh implementations to
attempt heterogeneous meshes. As stated above heterogeneous meshes
are considered to be ill defined and as such should be considered
dangerous.
Experience suggests that this index passing activity should take
place among CIP servers as a parallel (and possibly lower-priority)
job to their primary job of answering queries. Index objects travel
among CIP servers by protocol exchanges explicitly defined in this
document, not via the server's native protocol. This distinction is
important, and bears repeating:
Queries are answered (and referrals are sent) via the native data
access protocol.
Index objects are transferred via alternative means, as defined by
this document.
When two servers cooperate to move indexing information, the pair are
said to be in a "polling relationship". The server that holds the
data of interest, and generates the index is called the "polled
server". The other server, which is the one that collects the
generated index, is the "polling server".
In a polling relationship, the polled server is responsible for
notifying the polling server when it has a new index that the polling
server might be interested in. In response, the polling server may
immediately pick up the index object, or it may schedule a job to
pick up a copy of the new index at a more convenient time. But, a
polling server is not required to wait on the polled server to notify
it of changes. The polling server can request a new index at any
time.
Independent of the symmetric polling relationship, there's another
way that servers can pass indices using CIP. In an "index pushing"
relationship, a CIP server simply sends the index to a peer whenever
necessary, and allows the receiver to handle the index object as it
Allen & Mealling Standards Track [Page 8]
^L
RFC 2651 The CIP Architecture August 1999
chooses. The receiving server may refuse it, may accept it, then
silently discard it, may accept only portions of it (by accepting it
as is, then filtering it), or may accept it without question.
The index pushing relationship is intended for use by dumb leaf nodes
which simply want to make their index available to the global mesh of
servers, but have no interest in implementing the complete CIP
transaction protocol. It lowers the barriers to entry for CIP leaf
nodes. For more information on participating in a CIP mesh in this
restricted manner, see the section below on "Protocol Conformance".
CIP index passing operations take place across a reliable transport
mechanisms, including both TCP connections, and Internet mail
messages. The precise mechanisms are described in the Transport
document [CIP-Transport].
3.2.3 Index Object Synthesis
From the preceding discussion, it should be clear that indexing
servers read and write index objects as they pass them around the
mesh. However, a CIP server need not simply pass the in-bound indices
through as the out-bound ones. While it is always permissible to pass
an index object through to other servers, a server may choose to
aggregate two or more of them, thereby reducing redundancy in the
index, at the cost of longer referral chains.
A basic premise of index passing is that even while collapsing a body
of data into an index by lossy compression methods, hints useful to
routing queries will survive in the resulting index. Since the index
is not a complete copy of the original dataset, it contains less
information. Index objects can be passed along unchanged, but as more
and more information collects in the resulting index object,
redundancy will creep in again, and it may prove useful to apply the
compression again, by aggregating two or more index objects into one.
This kind of aggregation should be performed without compromising the
ability to correctly route queries while avoiding excessive numbers
of missed results. The acceptable likelihood of false negatives must
be established on a per-application-domain basis, and is controlled
by the granularity of the index and the aggregation rules defined for
it by the particular specification.
However, when CIP is used in a multi-protocol application domain,
such as a Directory Service (with contenders including Whois++, LDAP,
and Ph), things get significantly trickier. The fundamental problem
is to avoid forcing a referral chain to pass through part of the mesh
which does not support the protocol by which that client made the
query. If this ever happens, the client loses access to any hits
Allen & Mealling Standards Track [Page 9]
^L
RFC 2651 The CIP Architecture August 1999
beyond that point in the referral chain, since it cannot resolve the
referral in its native data access protocol. This is a failure of
query routing, which should be avoided.
In addition to multi-protocol considerations, server managers may
choose not to allow index object aggregation for performance reasons.
As referral chains lengthen, a client needs to perform more
transactions to resolve a query. As the number of transactions
increases, so do the user-perceived delays, the system loads, and the
global bandwidth demands. In general, there's a tradeoff between
aggressive aggregation (which leads to reductions in the indexing
overhead) and aggressive referral chain optimization. This tradeoff,
which is also sensitive to the particular application domain, needs
to be explored more in actual operational situations.
Conceptually, a CIP index server has several index objects on hand at
any given time. If it holds data in addition to indexing information,
the server has an index object formed from its own data, called the
"local index". It may have one or more indices from remote servers
which it has collected via the index passing mechanisms. These are
called "in-bound indices".
Implementor's Note: It may not be necessary to keep all of these
structures intact and distinct in the local database. It is also
not required to keep the out-bound index (or indices) built and
ready to distribute at all times. The previous paragraph merely
introduces a useful model for expressing the aggregation rules.
Implementors are free to model index objects internally however
they see fit.
The following two rules control how a CIP server formulates its
outgoing indices:
1. An index server may pass any of the index objects in its local
index and its in-bound indices through unchanged to polling
servers.
2. If and only if the following three conditions are true, an index
server can aggregate two or more index objects into a single new
index object, to be added to the set of out-bound indices.
a. Each index object to be aggregated covers exactly the same set
of protocols, as defined by the scheme component of the Base-
URI's in each index object.
b. The index server supports every one of the data access
protocols represented by the Base-URI's in the index objects to
be aggregated.
Allen & Mealling Standards Track [Page 10]
^L
RFC 2651 The CIP Architecture August 1999
c. The specification for the index object type specified by the
type header of the index objects explicitly defines the
aggregation operation.
The resulting index object must have Base-URI's characteristic of
the local server for each protocol it supports. The outgoing
objects should have the DSI of the local server.
4. Navigating the mesh
With the CIP infrastructure in place to manage index objects, the
only problem remaining is how to successfully use the indexing
information to do efficient searches. CIP facilitates query routing,
which is essentially a client activity. A client connects to one
server, which redirects the query to servers "closer to" the answer.
This redirection message is called a referral.
4.1 The Referral
The concept of a referral and the mechanism for deciding when they
should be issued is described by CIP. However, the referral itself
must be transferred to the client in the native protocol, so its
syntax is not directly a CIP issue. The mechanism for deciding that a
referral needs to be made and generating that referral resides in the
CIP implementation in the server. The mechanism for sending the
referral to the client resides in the server's native protocol
implementation.
A referral is made when a search against the index objects held by
the server shows that there may be hits available in one of the
datasets represented by those index objects. If more that one index
object indicates that a referral must be generated to a given
dataset, the server should generate only one referral to the given
dataset, as the client may not be able to detect duplicates.
Though the format of the referral is dependent on the native
protocol(s) of the CIP server, the baseline contents of the referral
are constant across all protocols. At the least, a DSI and a URI must
be returned. The DSI is the DSI associated with the dataset which
caused the hit. This must be presented to the client so that it can
avoid referral loops. The Base-URI parameter which travels along with
index objects is used to provide the other required part of a
referral.
The additional information in the Base-URI may be necessary for the
server receiving the referred query to correctly handle it. A good
example of this is an LDAP server, which needs a base X.500
distinguished name from which to search. When an LDAP server sends a
Allen & Mealling Standards Track [Page 11]
^L
RFC 2651 The CIP Architecture August 1999
centroid-format index object up to a CIP indexing server, it sends a
Base-URI along with the name of the X.500 subtree for which the index
was made. When a referral is made, the Base-URI is passed back to the
client so that it can pass it to the original LDAP server.
As usual, in addition to sending the DSI, a DSI-Description header
can be optionally sent. Because a client may attempt to check with
the user before chasing the referral, and because this string is the
friendliest representation of the DSI that CIP has to offer, it
should be included in referrals when available (i.e. when it was sent
along with the index object).
4.2 Cross-protocol Mappings
Each data access protocol which uses CIP will need a clearly defined
set of rules to map queries in the native protocol to searches
against an index object. These rules will vary according to the data
domain. In principle, this could create a bit of a scaling
difficulty; for N protocols and M data domains, there would be N x M
mappings required. In practice, this should not be the case, since
some access protocols will be wholly unsuited to some data domains.
Consider for example, a LDAP server trying to make a search in an
index object composed from unorganized text based pages. What would
the results be? How would the client make sense of the results?
However, as pre-existing protocols are connected to CIP, and as new
ones are developed to work with CIP, this issue must be examined. In
the case of Whois++ and the CENTROID index type, there is an
extremely close mapping, since the two were designed together. When
hooking LDAP to the CENTROID index type, it will be necessary to map
the attribute names used in the LDAP system to attribute names which
are already being used in the CENTROID mesh. It will also be
necessary to tokenize the LDAP queries under the same rules as the
CENTROID indexing policy, so that searches will take place correctly.
These application- and protocol-specific actions must be specified in
the index object specification, as discussed in the [CIP-MIME]
document.
4.3 Moving through the mesh
From a client's point of view, CIP simply pushes all the "hard work"
onto its shoulders. After all, it is the client which needs to track
down the real data. While this is true, it is very misleading.
Because the client has control over the query routing process, the
client has significant control over the size of the result set, the
speed with which the query progresses, and the depth of the search.
Allen & Mealling Standards Track [Page 12]
^L
RFC 2651 The CIP Architecture August 1999
The simplest client implementation provides referrals to the user in
a raw, ready-to-reuse form, without attempting to follow them. For
instance, one Whois++ client, which interacts with the user via a
Web-based form, simply makes referrals into HTML hypertext links.
Encoded in the link via the HTML forms interface GET encoding rules
is the data of the referral: the hostname, port, and query. If a user
chooses to follow the referral link, he executes a new search on the
new host. A more savvy client might present the referrals to the user
and ask which should be followed. And, assuming appropriate limits
were placed on search time and bandwidth usage, it might be
reasonable to program a client to follow all referrals automatically.
When following all referrals, a client must show a bit of
intelligence. Remember that the mesh is defined as an interconnected
graph of CIP servers. This graph may have cycles, which could cause
an infinite loop of referrals, wasting the servers' time and the
client's too. When faced with the job of tacking down all referrals,
a client must use some form of a mesh traversal algorithm. Such an
algorithm has been documented for use with Whois++ in RFC-1914. The
same algorithm can be easily used with this version of CIP. In
Whois++ the equivalent of a DSI is called a handle. With this
substitution, the Whois++ mesh traversal algorithm works unchanged
with CIP.
Finally, the mesh entry point (i.e. the first server queried) can
have an impact on the success of the query. To avoid scaling issues,
it is not acceptable to use a single "root" node, and force all
clients to connect to it. Instead, clients should connect to a
reasonably well connected (with respect to the CIP mesh, not the
Internet infrastructure) local server. If no match can be made from
this entry point, the client can expand the search by asking the
original server who polls it. In general, those servers will have a
better "vantage point" on the mesh, and will turn up answers that the
initial search didn't. The mechanism for dynamically determining the
mesh structure like this exists, but is not documented here for
brevity. See RFC-1913 for more information on the POLLED-BY and
POLLED-FOR commands.
It still should be noted that, while these mesh operations are
important to optimizing the searches that a client should make, the
client still speaks its native protocol. This information must be
communicated to the client without causing the client to have to
understand CIP.
Allen & Mealling Standards Track [Page 13]
^L
RFC 2651 The CIP Architecture August 1999
5. Security Considerations
In this section, we discuss the security considerations necessary
when making use of this specification. There are at least three
levels at which security considerations come into play. Indexing
information can leak undesirable amounts of proprietary information,
unless carefully controlled. At a more fundamental level, the CIP
protocol itself requires external security services to operate in a
safe manner. Lastly, CIP itself can be used to propogate false
information.
5.1 Secure Indexing
CIP is designed to index all kinds of data. Some of this data might
be considered valuable, proprietary, or even highly sensitive by the
data maintainer. Take, for example, a human resources database.
Certain bits of data, in moderation, can be very helpful for a
company to make public. However, the database in its entirety is a
very valuable asset, which the company must protect. Much experience
has been gained in the directory service community over the years as
to how best to walk this fine line between completely revealing the
database and making useful pieces of it available. There are also
legal considerations regarding what data can be collected and shared.
Another example where security becomes a problem is for a data
publisher who'd like to participate in a CIP mesh. The data that
publisher creates and manages is the prime asset of the company.
There is a financial incentive to participate in a CIP mesh, since
exporting indices of the data will make it more likely that people
will search your database. (Making profit off of the search activity
is left as an exercise to the entrepreneur.) Once again, the index
must be designed carefully to protect the database while providing a
useful synopsis of the data.
One of the basic premises of CIP is that data providers will be
willing to provide indices of their data to peer indexing servers.
Unless they are carefully constructed, these indices could constitute
a threat to the security of the database. Thus, security of the data
must be a prime consideration when developing a new index object
type. The risk of reverse engineering a database based only on the
index exported from it must be kept to a level consistent with the
value of the data and the need for fine-grained indexing.
Lastly, mesh organizers should be aware that the insertion of false
data into a mesh can be used as part of an attack. Depending on the
type of mesh and aggregation algorithms, an index can selectivly
prune parts of a mesh. Also, since CIP is used to discover
Allen & Mealling Standards Track [Page 14]
^L
RFC 2651 The CIP Architecture August 1999
information, it will be the target for the advertisement of false
information. CIP does not provide a method for trusting the data that
it contains.
Acknowledgments
Thanks to the many helpful members of the FIND working group for
discussions leading to this specification.
Specific acknowledgment is given to Jeff Allen formerly of Bunyip
Information Systems. His original version of these documents helped
enormously in crystallizing the debate and consensus. Most of the
actual text in this document was originally authored by Jeff. Jeff
is no longer involved with the FIND Working Group or with editing
this document. His authorship is preserved by a specific decision of
the current editor.
Authors' Addresses
Jeff R. Allen
246 Hawthorne St.
Palo Alto, CA 94301
EMail: jeff.allen@acm.org
Michael Mealling
Network Solutions, Inc.
505 Huntmar Park Drive
Herndon, VA 22070
Phone: (703) 742-0400
EMail: michael.mealling@RWhois.net
Allen & Mealling Standards Track [Page 15]
^L
RFC 2651 The CIP Architecture August 1999
References
[RFC1913] Weider, C., Fullton, J. and S. Spero, "Architecture
of the Whois++Index Service", RFC 1913, February
1996.
[RFC1914] Faltstrom, P., Schoultz, R. and C. Weider, "How to
Interact with a Whois++ Mesh", RFC 1914, February
1996.
[CIP-MIME] Allen, J. and M. Mealling, "MIME Object Definitions
for the Common Indexing Protocol (CIP)", RFC 2652,
August 1999.
[CIP-TRANSPORT] Allen, J. and P. Leach, "CIP Transport Protocols",
RFC 2653, August 1999.
Allen & Mealling Standards Track [Page 16]
^L
RFC 2651 The CIP Architecture August 1999
Appendix A: Glossary
application domain: A problem domain to which CIP is applied which
has indexing requirements which are not subsumed by any existing
problem domain. Separate application domains require separate
index object specifications, and potentially separate CIP meshes.
See index object specification.
centroid: An index object type used with Whois++. In CIP versions
before version 3, the index was not extensible, and could only
take the form of a centroid. A centroid is a list of (template
name, attribute name, token) tuples with duplicate removed.
dataset: A collection of data (real or virtual) over which an index
is created. When a CIP server aggregates two or more indices, the
resultant index represents the index from a "virtual dataset",
spanning the previous two datasets.
Dataset Identifier: An identifier chosen from any part of the
ISO/CCITT OID space which uniquely identifies a given dataset
among all datasets indexed by CIP.
DSI: See Dataset Identifier.
DSI-description: A human readable string optionally carried along
with DSI's to make them more user-friendly. See dataset
Identifier.
index: A summary or compressed form of a body of data. Examples
include a unique list of words, a codified full text analysis, a
set of keywords, etc.
index object: The embodiment of the indices passed by CIP. An index
object consists of some control attributes and an opaque payload.
index object specification: A document describing an index object
type for use with the CIP system described in this document. See
index object and payload.
index pushing: The act of presenting, unsolicited, an index to a
peer CIP server.
MIME: see Multipurpose Internet Mail Extensions
Allen & Mealling Standards Track [Page 17]
^L
RFC 2651 The CIP Architecture August 1999
Multipurpose Internet Mail Extensions: A set of rules for encoding
Internet Mail messages that gives them richer structure. CIP uses
MIME rules to simplify object encoding issues. MIME is specified
in RFC-1521 and RFC-1522.
payload: The application domain specific indexing information stored
inside an index object. The format of the payload is specified
externally to this document, and depends on the type of the
containing index object.
polled server: A CIP server which receives a request to generate and
pass an index to a peer server.
polling server: A CIP server which generates a request to a peer
server for its index.
referral chain: The set of referrals generated by the process of
routing a query. See query routing.
query routing: Based on reference to indexing information,
redirecting and replicating queries through a distributed database
system towards the servers holding the actual results.
Allen & Mealling Standards Track [Page 18]
^L
RFC 2651 The CIP Architecture August 1999
6. Full Copyright Statement
Copyright (C) The Internet Society (1999). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain it
or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any
kind, provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to the Internet Society or other
Internet organizations, except as needed for the purpose of
developing Internet standards in which case the procedures for
copyrights defined in the Internet Standards process must be
followed, or as required to translate it into languages other than
English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on an
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
Allen & Mealling Standards Track [Page 19]
^L
|