1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
|
Network Working Group D. Wessels
Request for Comments: 2187 K. Claffy
Category: Informational National Laboratory for Applied
Network Research/UCSD
September 1997
Application of Internet Cache Protocol (ICP), version 2
Status of this Memo
This memo provides information for the Internet community. This memo
does not specify an Internet standard of any kind. Distribution of
this memo is unlimited.
Abstract
This document describes the application of ICPv2 (Internet Cache
Protocol version 2, RFC2186) to Web caching. ICPv2 is a lightweight
message format used for communication among Web caches. Several
independent caching implementations now use ICP[3,5], making it
important to codify the existing practical uses of ICP for those
trying to implement, deploy, and extend its use.
ICP queries and replies refer to the existence of URLs (or objects)
in neighbor caches. Caches exchange ICP messages and use the
gathered information to select the most appropriate location from
which to retrieve an object. A companion document (RFC2186)
describes the format and syntax of the protocol itself. In this
document we focus on issues of ICP deployment, efficiency, security,
and interaction with other aspects of Web traffic behavior.
Table of Contents
1. Introduction................................................. 2
2. Web Cache Hierarchies........................................ 3
3. What is the Added Value of ICP?.............................. 5
4. Example Configuration of ICP Hierarchy....................... 5
4.1. Configuring the `proxy.customer.org' cache................. 6
4.2. Configuring the `cache.isp.com' cache...................... 6
5. Applying the Protocol........................................ 7
5.1. Sending ICP Queries........................................ 8
5.2. Receiving ICP Queries and Sending Replies.................. 10
5.3. Receiving ICP Replies...................................... 11
5.4. ICP Options................................................ 13
6. Firewalls.................................................... 14
7. Multicast.................................................... 14
8. Lessons Learned.............................................. 16
8.1. Differences Between ICP and HTTP........................... 16
Wessels & Claffy Informational [Page 1]
^L
RFC 2187 ICP September 1997
8.2. Parents, Siblings, Hits and Misses......................... 16
8.3. Different Roles of ICP..................................... 17
8.4. Protocol Design Flaws of ICPv2............................. 17
9. Security Considerations...................................... 18
9.1. Inserting Bogus ICP Queries................................ 19
9.2. Inserting Bogus ICP Replies................................ 19
9.3. Eavesdropping.............................................. 20
9.4. Blocking ICP Messages...................................... 20
9.5. Delaying ICP Messages...................................... 20
9.6. Denial of Service.......................................... 20
9.7. Altering ICP Fields........................................ 21
9.8. Summary.................................................... 22
10. References................................................... 23
11. Acknowledgments.............................................. 24
12. Authors' Addresses........................................... 24
1. Introduction
ICP is a lightweight message format used for communicating among Web
caches. ICP is used to exchange hints about the existence of URLs in
neighbor caches. Caches exchange ICP queries and replies to gather
information for use in selecting the most appropriate location from
which to retrieve an object.
This document describes the implementation of ICP in software. For a
description of the protocol and message format, please refer to the
companion document (RFC2186). We avoid making judgments about
whether or how ICP should be used in particular Web caching
configurations. ICP may be a "net win" in some situations, and a
"net loss" in others. We recognize that certain practices described
in this document are suboptimal. Some of these exist for historical
reasons. Some aspects have been improved in later versions. Since
this document only serves to describe current practices, we focus on
documenting rather than evaluating. However, we do address known
security problems and other shortcomings.
The remainder of this document is written as follows. We first
describe Web cache hierarchies, explain motivation for using ICP, and
demonstrate how to configure its use in cache hierarchies. We then
provide a step-by-step description of an ICP query-response
transaction. We then discuss ICP interaction with firewalls, and
briefly touch on multicasting ICP. We end with lessons with have
learned during the protocol development and deployement thus far, and
the canonical security considerations.
ICP was initially developed by Peter Danzig, et. al. at the
University of Southern California as a central part of hierarchical
caching in the Harvest research project[3].
Wessels & Claffy Informational [Page 2]
^L
RFC 2187 ICP September 1997
2. Web Cache Hierarchies
A single Web cache will reduce the amount of traffic generated by the
clients behind it. Similarly, a group of Web caches can benefit by
sharing another cache in much the same way. Researchers on the
Harvest project envisioned that it would be important to connect Web
caches hierarchically. In a cache hierarchy (or mesh) one cache
establishes peering relationships with its neighbor caches. There
are two types of relationship: parent and sibling. A parent cache is
essentially one level up in a cache hierarchy. A sibling cache is on
the same level. The terms "neighbor" and "peer" are used to refer to
either parents or siblings which are a single "cache-hop" away.
Figure 1 shows a simple hierarchy configuration.
But what does it mean to be "on the same level" or "one level up?"
The general flow of document requests is up the hierarchy. When a
cache does not hold a requested object, it may ask via ICP whether
any of its neighbor caches has the object. If any of the neighbors
does have the requested object (i.e., a "neighbor hit"), then the
cache will request it from them. If none of the neighbors has the
object (a "neighbor miss"), then the cache must forward the request
either to a parent, or directly to the origin server. The essential
difference between a parent and sibling is that a "neighbor hit" may
be fetched from either one, but a "neighbor miss" may NOT be fetched
from a sibling. In other words, in a sibling relationship, a cache
can only ask to retrieve objects that the sibling already has cached,
whereas the same cache can ask a parent to retrieve any object
regardless of whether or not it is cached. A parent cache's role is
Wessels & Claffy Informational [Page 3]
^L
RFC 2187 ICP September 1997
T H E I N T E R N E T
===========================
| ||
| ||
| ||
| ||
| +----------------------+
| | |
| | PARENT |
| | CACHE |
| | |
| +----------------------+
| ||
DIRECT ||
RETRIEVALS ||
| ||
| HITS
| AND
| MISSES
| RESOLVED
| ||
| ||
| ||
V \/
+------------------+ +------------------+
| | | |
| LOCAL |/--------HITS-------| SIBLING |
| CACHE |\------RESOLVED-----| CACHE |
| | | |
+------------------+ +------------------+
| | | | |
| | | | |
| | | | |
V V V V V
===================
CACHE CLIENTS
FIGURE 1: A Simple Web cache hierarchy. The local cache can retrieve
hits from sibling caches, hits and misses from parent caches, and
some requests directly from origin servers.
to provide "transit" for the request if necessary, and accordingly
parent caches are ideally located within or on the way to a transit
Internet service provider (ISP).
Squid and Harvest allow for complex hierarchical configurations. For
example, one could specify that a given neighbor be used for only a
certain class of requests, such as URLs from a specific DNS domain.
Wessels & Claffy Informational [Page 4]
^L
RFC 2187 ICP September 1997
Additionally, it is possible to treat a neighbor as a sibling for
some requests and as a parent for others.
The cache hierarchy model described here includes a number of
features to prevent top-level caches from becoming choke points. One
is the ability to restrict parents as just described previously (by
domains). Another optimization is that the cache only forwards
cachable requests to its neighbors. A large class of Web requests
are inherently uncachable, including: requests requiring certain
types of authentication, session-encrypted data, highly personalized
responses, and certain types of database queries. Lower level caches
should handle these requests directly rather than burdening parent
caches.
3. What is the Added Value of ICP?
Although it is possible to maintain cache hierarchies without using
ICP, the lack of ICP or something similar prohibits the existence of
sibling meta-communicative relationships, i.e., mechanisms to query
nearby caches about a given document.
One concern over the use of ICP is the additional delay that an ICP
query/reply exchange contributes to an HTTP transaction. However, if
the ICP query can locate the object in a nearby neighbor cache, then
the ICP delay may be more than offset by the faster delivery of the
data from the neighbor. In order to minimize ICP delays, the caches
(as well as the protocol itself) are designed to return ICP requests
quickly. Indeed, the application does minimal processing of the ICP
request, most ICP-related delay is due to transmission on the
network.
ICP also serves to provide an indication of neighbor reachability.
If ICP replies from a neighbor fail to arrive, then either the
network path is congested (or down), or the cache application is not
running on the ICP-queried neighbor machine. In either case, the
cache should not use this neighbor at this time. Additionally,
because an idle cache can turn around the replies faster than a busy
one, all other things being equal, ICP provides some form of load
balancing.
4. Example Configuration of ICP Hierarchy
Configuring caches within a hierarchy requires establishing peering
relationships, which currently involves manual configuration at both
peering endpoints. One cache must indicate that the other is a
parent or sibling. The other cache will most likely have to add the
first cache to its access control lists.
Wessels & Claffy Informational [Page 5]
^L
RFC 2187 ICP September 1997
Below we show some sample configuration lines for a hypothetical
situation. We have two caches, one operated by an ISP, and another
operated by a customer. First we describe how the customer would
configure his cache to peer with the ISP. Second, we describe how
the ISP would allow the customer access to its cache.
4.1. Configuring the `proxy.customer.org' cache
In Squid, to configure parents and siblings in a hierarchy, a
`cache_host' directive is entered into the configuration file. The
format is:
cache_host hostname type http-port icp-port [options]
Where type is either `parent', `sibling', or `multicast'. For our
example, it would be:
cache_host cache.isp.com parent 8080 3130
This configuration will cause the customer cache to resolve most
cache misses through the parent (`cgi-bin' and non-GET requests would
be resolved directly). Utilizing the parent may be undesirable for
certain servers, such as servers also in the customer.org domain. To
always handle such local domains directly, the customer would add
this to his configuration file:
local_domain customer.org
It may also be the case that the customer wants to use the ISP cache
only for a specific subset of DNS domains. The need to limit
requests this way is actually more common for higher levels of cache
hierarchies, but it is illustrated here nonetheless. To limit the
ISP cache to a subset of DNS domains, the customer would use:
cache_host_domain cache.isp.com com net org
Then, any requests which are NOT in the .com, .net, or .org domains
would be handled directly.
4.2. Configuring the `cache.isp.com' cache
To configure the query-receiving side of the cache peer
relationship one uses access lists, similar to those used in routing
peers. The access lists support a large degree of customization in
the peering relationship. If there are no access lines present, the
cache allows the request by default.
Wessels & Claffy Informational [Page 6]
^L
RFC 2187 ICP September 1997
Note that the cache.isp.com cache need not explicitly specify the
customer cache as a peer, nor is the type of relationship encoded
within the ICP query itself. The access control entries regulate the
relationships between this cache and its neighbors. For our example,
the ISP would use:
acl src Customer proxy.customer.org
http_access allow Customer
icp_access allow Customer
This defines an access control entry named `Customer' which specifies
a source IP address of the customer cache machine. The customer
cache would then be allowed to make any request to both the HTTP and
ICP ports (including cache misses). This configuration implies that
the ISP cache is a parent of the customer.
If the ISP wanted to enforce a sibling relationship, it would need to
deny access to cache misses. This would be done as follows:
miss_access deny Customer
Of course the ISP should also communicate this to the customer, so
that the customer will change his configuration from parent to
sibling. Otherwise, if the customer requests an object not in the
ISP cache, an error message is generated.
5. Applying the Protocol
The following sections describe the ICP implementation in the
Harvest[3] (research version) and Squid Web cache[5] packages. In
terms of version numbers, this means version 1.4pl2 for Harvest and
version 1.1.10 for Squid.
The basic sequence of events in an ICP transaction is as follows:
1. Local cache receives an HTTP[1] request from a cache client.
2. The local cache sends ICP queries (section 5.1).
3. The peer cache(s) receive the queries and send ICP replies
(section 5.2).
4. The local cache receives the ICP replies and decides where to
forward the request (section 5.3).
Wessels & Claffy Informational [Page 7]
^L
RFC 2187 ICP September 1997
5.1. Sending ICP Queries
5.1.1. Determine whether to use ICP at all
Not every HTTP request requires an ICP query to be sent. Obviously,
cache hits will not need ICP because the request is satisfied
immediately. For origin servers very close to the cache, we do not
want to use any neighbor caches. In Squid and Harvest, the
administrator specifies what constitutes a `local' server with the
`local_domain' and `local_ip' configuration options. The cache
always contacts a local server directly, never querying a peer cache.
There are other classes of requests that the cache (or the
administrator) may prefer to forward directly to the origin server.
In Squid and Harvest, one such class includes all non-GET request
methods. A Squid cache can also be configured to not use peers for
URLs matching the `hierarchy_stoplist'.
In order for an HTTP request to yield an ICP transaction, it must:
o not be a cache hit
o not be to a local server
o be a GET request, and
o not match the `hierarchy_stoplist' configuration.
We call this a "hierarchical" request. A "non-hierarchical" request
is one that doesn't generate any ICP traffic. To avoid processing
requests that are likely to lower cache efficiency, one can configure
the cache to not consult the hierarchy for URLs that contain certain
strings (e.g. `cgi_bin').
5.1.2. Determine which peers to query
By default, a cache sends an ICP_OP_QUERY message to each peer,
unless any one of the following are true:
o Restrictions prevent querying a peer for this request, based on
the configuration directive `cache_host_domain', which specifies
a set of DNS domains (from the URLs) for which the peer should
or should not be queried. In Squid, a more flexible directive
('cache_host_acl') supports restrictions on other parts of the
request (method, port number, source, etc.).
Wessels & Claffy Informational [Page 8]
^L
RFC 2187 ICP September 1997
o The peer is a sibling, and the HTTP request includes a "Pragma:
no-cache" header. This is because the sibling would be asked to
transit the request, which is not allowed.
o The peer is configured to never be sent ICP queries (i.e. with
the `no-query' option).
If the determination yields only one queryable ICP peer, and the
Squid configuration directive `single_parent_bypass' is set, then one
can bypass waiting for the single ICP response and just send the HTTP
request directly to the peer cache.
The Squid configuration option `source_ping' configures a Squid cache
to send a ping to the original source simultaneous with its ICP
queries, in case the origin is closer than any of the caches.
5.1.3. Calculate the expected number of ICP replies
Harvest and Squid want to maximize the chance to get a HIT reply from
one of the peers. Therefore, the cache waits for all ICP replies to
be received. Normally, we expect to receive an ICP reply for each
query sent, except:
o When the peer is believed to be down. If the peer is down Squid
and Harvest continue to send it ICP queries, but do not expect
the peer to reply. When an ICP reply is again received from the
peer, its status will be changed to up.
The determination of up/down status has varied a little bit as
the Harvest and Squid software evolved. Both Harvest and Squid
mark a peer down when it fails to reply to 20 consecutive ICP
queries. Squid also marks a peer down when a TCP connection
fails, and up again when a diagnostic TCP connection succeeds.
o When sending to a multicast address. In this case we'll
probably expect to receive more than one reply, and have no way
to definitively determine how many to expect. We discuss
multicast issues in section 7 below.
5.1.4. Install timeout event
Because ICP uses UDP as underlying transport, ICP queries and replies
may sometimes be dropped by the network. The cache installs a
timeout event in case not all of the expected replies arrive. By
default Squid and Harvest use a two-second timeout. If object
retrieval has not commenced when the timeout occurs, a source is
selected as described in section 5.3.9 below.
Wessels & Claffy Informational [Page 9]
^L
RFC 2187 ICP September 1997
5.2. Receiving ICP Queries and Sending Replies
When an ICP_OP_QUERY message is received, the cache examines it and
decides which reply message is to be sent. It will send one of the
following reply opcodes, tested for use in the order listed:
5.2.1. ICP_OP_ERR
The URL is extracted from the payload and parsed. If parsing fails,
an ICP_OP_ERR message is returned.
5.2.2. ICP_OP_DENIED
The access controls are checked. If the peer is not allowed to make
this request, ICP_OP_DENIED is returned. Squid counts the number of
ICP_OP_DENIED messages sent to each peer. If more than 95% of more
than 100 replies have been denied, then no reply is sent at all.
This prevents misconfigured caches from endlessly sending unnecessary
ICP messages back and forth.
5.2.3. ICP_OP_HIT
If the cache reaches this point without already matching one of the
previous opcodes, it means the request is allowed and we must
determine if it will be HIT or MISS, so we check if the URL exists in
the local cache. If so, and if the cached entry is fresh for at
least the next 30 seconds, we can return an ICP_OP_HIT message. The
stale/fresh determination uses the local refresh (or TTL) rules.
Note that a race condition exists for ICP_OP_HIT replies to sibling
peers. The ICP_OP_HIT means that a subsequent HTTP request for the
named URL would result in a cache hit. We assume that the HTTP
request will come very quickly after the ICP_OP_HIT. However, there
is a slight chance that the object might be purged from this cache
before the HTTP request is received. If this happens, and the
replying peer has applied Squid's `miss_access' configuration then
the user will receive a very confusing access denied message.
5.2.3.1. ICP_OP_HIT_OBJ
Before returning the ICP_OP_HIT message, we see if we can send an
ICP_OP_HIT_OBJ message instead. We can use ICP_OP_HIT_OBJ if:
o The ICP_OP_QUERY message had the ICP_FLAG_HIT_OBJ flag set.
Wessels & Claffy Informational [Page 10]
^L
RFC 2187 ICP September 1997
o The entire object (plus URL) will fit in an ICP message. The
maximum ICP message size is 16 Kbytes, but an application may
choose to set a smaller maximum value for ICP_OP_HIT_OBJ
replies.
Normally ICP replies are sent immediately after the query is
received, but the ICP_OP_HIT_OBJ message cannot be sent until the
object data is available to copy into the reply message. For Squid
and Harvest this means the object must be "swapped in" from disk if
it is not already in memory. Therefore, on average, an
ICP_OP_HIT_OBJ reply will have higher latency than ICP_OP_HIT.
5.2.4. ICP_OP_MISS_NOFETCH
At this point we have a cache miss. ICP has two types of miss
replies. If the cache does not want the peer to request the object
from it, it sends an ICP_OP_MISS_NOFETCH message.
5.2.5. ICP_OP_MISS
Finally, an ICP_OP_MISS reply is returned as the default. If the
replying cache is a parent of the querying cache, the ICP_OP_MISS
indicates an invitation to fetch the URL through the replying cache.
5.3. Receiving ICP Replies
Some ICP replies will be ignored; specifically, when any of the
following are true:
o The reply message originated from an unknown peer.
o The object named by the URL does not exist.
o The object is already being fetched.
5.3.1. ICP_OP_DENIED
If more than 95% of more than 100 replies from a peer cache have been
ICP_OP_DENIED, then such a high denial rate most likely indicates a
configuration error, either locally or at the peer. For this reason,
no further queries will be sent to the peer for the duration of the
cache process.
5.3.2. ICP_OP_HIT
Object retrieval commences immediately from the replying peer.
Wessels & Claffy Informational [Page 11]
^L
RFC 2187 ICP September 1997
5.3.3. ICP_OP_HIT_OBJ
The object data is extracted from the ICP message and the retrieval
is complete. If there is some problem with the ICP_OP_HIT_OBJ
message (e.g. missing data) the reply will be treated like a standard
ICP_OP_HIT.
5.3.4. ICP_OP_SECHO
Object retrieval commences immediately from the origin server because
the ICP_OP_SECHO reply arrived prior to any ICP_OP_HIT's. If an
ICP_OP_HIT had arrived prior, this ICP_OP_SECHO reply would be
ignored because the retrieval has already started.
5.3.5. ICP_OP_DECHO
An ICP_OP_DECHO reply is handled like an ICP_OP_MISS. Non-ICP peers
must always be configured as parents; a non-ICP sibling makes no
sense. One serious problem with the ICP_OP_DECHO feature is that
since it bounces messages off the peer's UDP echo port, it does not
indicate that the peer cache is actually running -- only that network
connectivity exists between the pair.
5.3.6. ICP_OP_MISS
If the peer is a sibling, the ICP_OP_MISS reply is ignored.
Otherwise, the peer may be "remembered" for future use in case no HIT
replies are received later (section 5.3.9).
Harvest and Squid remember the first parent to return an ICP_OP_MISS
message. With Squid, the parents may be weighted so that the "first
parent to miss" may not actually be the first reply received. We
call this the FIRST_PARENT_MISS. Remember that sibling misses are
entirely ignored, we only care about misses from parents. The parent
miss RTT's can be weighted because sometimes the closest parent is
not the one people want to use.
Also, recent versions of Squid may remember the parent with the
lowest RTT to the origin server, using the ICP_FLAG_SRC_RTT option.
We call this the CLOSEST_PARENT_MISS.
5.3.7. ICP_OP_MISS_NOFETCH
This reply is essentially ignored. A cache must not forward a
request to a peer that returns ICP_OP_MISS_NOFETCH.
Wessels & Claffy Informational [Page 12]
^L
RFC 2187 ICP September 1997
5.3.8. ICP_OP_ERR
Silently ignored.
5.3.9. When all peers MISS.
For ICP_OP_HIT and ICP_OP_SECHO the request is forwarded immediately.
For ICP_OP_HIT_OBJ there is no need to forward the request. For all
other reply opcodes, we wait until the expected number of replies
have been received. When we have all of the expected replies, or
when the query timeout occurs, it is time to forward the request.
Since MISS replies were received from all peers, we must either
select a parent cache or the origin server.
o If the peers are using the ICP_FLAG_SRC_RTT feature, we forward
the request to the peer with the lowest RTT to the origin
server. If the local cache is also measuring RTT's to origin
servers, and is closer than any of the parents, the request is
forwarded directly to the origin server.
o If there is a FIRST_PARENT_MISS parent available, the request
will be forwarded there.
o If the ICP query/reply exchange did not produce any appropriate
parents, the request will be sent directly to the origin server
(unless firewall restrictions prevent it).
5.4. ICP Options
The following options were added to Squid to support some new
features while maintaining backward compatibility with the Harvest
implementation.
5.4.1. ICP_FLAG_HIT_OBJ
This flag is off by default and will be set in an ICP_OP_QUERY
message only if these three criteria are met:
o It is enabled in the cache configuration file with `udp_hit_obj
on'.
o The peer must be using ICP version 2.
o The HTTP request must not include the "Pragma: no-cache" header.
Wessels & Claffy Informational [Page 13]
^L
RFC 2187 ICP September 1997
5.4.2. ICP_FLAG_SRC_RTT
This flag is off by default and will be set in an ICP_OP_QUERY
message only if these two criteria are met:
o It is enabled in the cache configuration file with `query_icmp
on'.
o The peer must be using ICP version 2.
6. Firewalls
Operating a Web cache behind a firewall or in a private network poses
some interesting problems. The hard part is figuring out whether the
cache is able to connect to the origin server. Harvest and Squid
provide an `inside_firewall' configuration directive to list DNS
domains on the near side of a firewall. Everything else is assumed
to be on the far side of a firewall. Squid also has a `firewall_ip'
directive so that inside hosts can be specified by IP addresses as
well.
In a simple configuration, a Squid cache behind a firewall will have
only one parent cache (which is on the firewall itself). In this
case, Squid must use that parent for all servers beyond the firewall,
so there is no need to utilize ICP.
In a more complex configuration, there may be a number of peer caches
also behind the firewall. Here, ICP may be used to check for cache
hits in the peers. Occasionally, when ICP is being used, there may
not be any replies received. If the cache were not behind a
firewall, the request would be forwarded directly to the origin
server. But in this situation, the cache must pick a parent cache,
either randomly or due to configuration information. For example,
Squid allows a parent cache to be designated as a default choice when
no others are available.
7. Multicast
For efficient distribution, a cache may deliver ICP queries to a
multicast address, and neighbor caches may join the multicast group
to receive such queries.
Current practice is that caches send ICP replies only to unicast
addresses, for several reasons:
o Multicasting ICP replies would not reduce the number of packets
sent.
Wessels & Claffy Informational [Page 14]
^L
RFC 2187 ICP September 1997
o It prevents other group members from receiving unexpected
replies.
o The reply should follow unicast routing paths to indicate
(unicast) connectivity between the receiver and the sender since
the subsequent HTTP request will be unicast routed.
Trust is an important aspect of inter-cache relationships. A Web
cache should not automatically trust any cache which replies to a
multicast ICP query. Caches should ignore ICP messages from
addresses not specifically configured as neighbors. Otherwise, one
could easily pollute a cache mesh by running an illegitimate cache
and having it join a group, return ICP_OP_HIT for all requests, and
then deliver bogus content.
When sending to multicast groups, cache administrators must be
careful to use the minimum multicast TTL required to reach all group
members. Joining a multicast group requires no special privileges
and there is no way to prevent anyone from joining "your" group. Two
groups of caches utilizing the same multicast address could overlap,
which would cause a cache to receive ICP replies from unknown
neighbors. The unknown neighbors would not be used to retrieve the
object data, but the cache would constantly receive ICP replies that
it must always ignore.
To prevent an overlapping cache mesh, caches should thus limit the
scope of their ICP queries with appropriate TTLs; an application such
as mtrace[6] can determine appropriate multicast TTLs.
As mentioned in section 5.1.3, we need to estimate the number of
expected replies for an ICP_OP_QUERY message. For unicast we expect
one reply for each query if the peer is up. However, for multicast
we generally expect more than one reply, but have no way of knowing
exactly how many replies to expect. Squid regularly (every 15
minutes) sends out test ICP_OP_QUERY messages to only the multicast
group peers. As with a real ICP query, a timeout event is installed
and the replies are counted until the timeout occurs. We have found
that the received count varies considerably. Therefore, the number
of replies to expect is calculated as a moving average, rounded down
to the nearest integer.
Wessels & Claffy Informational [Page 15]
^L
RFC 2187 ICP September 1997
8. Lessons Learned
8.1. Differences Between ICP and HTTP
ICP is notably different from HTTP. HTTP supports a rich and
sophisticated set of features. In contrast, ICP was designed to be
simple, small, and efficient. HTTP request and reply headers consist
of lines of ASCII text delimited by a CRLF pair, whereas ICP uses a
fixed size header and represents numbers in binary. The only thing
ICP and HTTP have in common is the URL.
Note that the ICP message does not even include the HTTP request
method. The original implementation assumed that only GET requests
would be cachable and there would be no need to locate non-GET
requests in neighbor caches. Thus, the current version of ICP does
not accommodate non-GET requests, although the next version of this
protocol will likely include a field for the request method.
HTTP defines features that are important for caching but not
expressible with the current ICP protocol. Among these are Pragma:
no-cache, If-Modified-Since, and all of the Cache-Control features of
HTTP/1.1. An ICP_OP_HIT_OBJ message may deliver an object which may
not obey all of the request header constraints. These differences
between ICP and HTTP are the reason we discourage the use of the
ICP_OP_HIT_OBJ feature.
8.2. Parents, Siblings, Hits and Misses
Note that the ICP message does not have a field to indicate the
intent of the querying cache. That is, nowhere in the ICP request or
reply does it say that the two caches have a sibling or parent
relationship. A sibling cache can only respond with HIT or MISS, not
"you can retrieve this from me" or "you can not retrieve this from
me." The querying cache must apply the HIT or MISS reply to its
local configuration to prevent it from resolving misses through a
sibling cache. This constraint is awkward, because this aspect of
the relationship can be configured only in the cache originating the
requests, and indirectly via the access controls configured in the
queried cache as described earlier in section 4.2.
Wessels & Claffy Informational [Page 16]
^L
RFC 2187 ICP September 1997
8.3. Different Roles of ICP
There are two different understandings of what exactly the role of
ICP is in a cache mesh. One understanding is that ICP's role is only
object location, specifically, to provide hints about whether or not
a named object exists in a neighbor cache. An implied assumption is
that cache hits are highly desirable, and ICP is used to maximize the
chance of getting them. If an ICP message is lost due to congestion,
then nothing significant is lost; the request will be satisfied
regardless.
ICP is increasingly being tasked to fill a more complex role:
conveying cache usage policy. For example, many organizations (e.g.
universities) will install a Web cache on the border of their
network. Such organizations may be happy to establish sibling
relationships with other, nearby caches, subject to the following
terms:
o Any of the organization's customers or users may request any
object (cached or not).
o Anyone may request an object already in the cache.
o Anyone may request any object from the organization's servers
behind the cache.
o All other requests are denied; specifically, the organization
will not provide transit for requests in which neither the
client nor the server falls within its domain.
To successfully convey policy the ICP exchange must very accurately
predict the result (hit, miss) of a subsequent HTTP request. The
result may often depend on other request fields, such as Cache-
Control. So it's not possible for ICP to accurately predict the
result without more, or perhaps all, of the HTTP request.
8.4. Protocol Design Flaws of ICPv2
We recognize certain flaws with the original design of ICP, and make
note of them so that future versions can avoid the same mistakes.
o The NULL-terminated URL in the payload field requires stepping
through the message an octet at a time to find some of the
fields (i.e. the beginning of object data in an ICP_OP_HIT_OBJ
message).
Wessels & Claffy Informational [Page 17]
^L
RFC 2187 ICP September 1997
o Two fields (Sender Host Address and Requester Host Address) are
IPv4 specific. However, neither of these fields are used in
practice; they are normally zero-filled. If IP addresses have a
role in the ICP message, there needs to be an address family
descriptor for each address, and clients need to be able to say
whether they want to hear IPv6 responses or not.
o Options are limited to 32 option flags and 32 bits of option
data. This should be more like TCP, with an option descriptor
followed by option data.
o Although currently used as the cache key, the URL string no
longer serves this role adequately. Some HTTP responses now
vary according to the requestor's User-Agent and other headers.
A cache key must incorporate all non-transport headers present
in the client's request. All non-hop-by-hop request headers
should be sent in an ICP query.
o ICPv2 uses different opcode values for queries and responses.
ICP should use the same opcode for both sides of a two-sided
transaction, with a "query/response" indicator telling which
side is which.
o ICPv2 does not include any authentication fields.
9. Security Considerations
Security is an issue with ICP over UDP because of its connectionless
nature. Below we consider various vulnerabilities and methods of
attack, and their implications.
Our first line of defense is to check the source IP address of the
ICP message, e.g. as given by recvfrom(2). ICP query messages should
be processed if the access control rules allow the querying address
access to the cache. However, ICP reply messages must only be
accepted from known neighbors; a cache must ignore replies from
unknown addresses.
Because we trust the validity of an address in an IP packet, ICP is
susceptible to IP address spoofing. In this document we address some
consequences of IP address spoofing. Normally, spoofed addresses can
only be detected by routers, not by hosts. However, the IP
Authentication Header[7,8] can be used underneath ICP to provide
cryptographic authentication of the entire IP packet containing the
ICP protocol, thus eliminating the risk of IP address spoofing.
Wessels & Claffy Informational [Page 18]
^L
RFC 2187 ICP September 1997
9.1. Inserting Bogus ICP Queries
Processing an ICP_OP_QUERY message has no known security
implications, so long as the requesting address is granted access to
the cache.
9.2. Inserting Bogus ICP Replies
Here we are concerned with a third party generating ICP reply
messages which are returned to the querying cache before the real
reply arrives, or before any replies arrive. The third party may
insert bogus ICP replies which appear to come from legitimate
neighbors. There are three vulnerabilities:
o Preventing a certain neighbor from being used
If a third-party could send an ICP_OP_MISS_NOFETCH reply back
before the real reply arrived, the (falsified) neighbor would
not be used.
A third-party could blast a cache with ICP_OP_DENIED messages
until the threshold described in section 5.3.1 is reached,
thereby causing the neighbor relationship to be temporarily
terminated.
o Forcing a certain neighbor to be used
If a third-party could send an ICP_OP_HIT reply back before the
real reply arrived, the (falsified) neighbor would be used.
This may violate the terms of a sibling relationship; ICP_OP_HIT
replies mean a subsequent HTTP request will also be a hit.
Similarly, if bogus ICP_OP_SECHO messages can be generated, the
cache would retrieve requests directly from the origin server.
o Cache poisoning
The ICP_OP_HIT_OBJ message is especially sensitive to security
issues since it contains actual object data. In combination
with IP address spoofing, this option opens up the likely
possibility of having the cache polluted with invalid objects.
Wessels & Claffy Informational [Page 19]
^L
RFC 2187 ICP September 1997
9.3. Eavesdropping
Multicasting ICP queries provides a very simple method for others to
"snoop" on ICP messages. If enabling multicast, cache administrators
should configure the application to use the minimum required
multicast TTL, using a tool such as mtrace[6]. Note that the IP
Encapsulating Security Payload [7,9] mechanism can be used to provide
protection against eavesdropping of ICP messages.
Eavesdropping on ICP traffic can provide third parties with a list of
URLs being browsed by cache users. Because the Requestor Host
Address is zero-filled by Squid and Harvest, the URLs cannot be
mapped back to individual host systems.
By default, Squid and Harvest do not send ICP messages for URLs
containing `cgi-bin' or `?'. These URLs sometimes contain sensitive
information as argument parameters. Cache administrators need to be
aware that altering the configuration to make ICP queries for such
URLs may expose sensitive information to outsiders, especially when
multicast is used.
9.4. Blocking ICP Messages
Intentionally blocked (or discarded) ICP queries or replies will
appear to reflect link failure or congestion, and will prevent the
use of a neighbor as well as lead to timeouts (see section 5.1.4).
If all messages are blocked, the cache will assume the neighbor is
down and remove it from the selection algorithm. However, if, for
example, every other query is blocked, the neighbor will remain
"alive," but every other request will suffer the ICP timeout.
9.5. Delaying ICP Messages
The neighbor selection algorithm normally waits for all ICP MISS
replies to arrive. Delaying queries or replies, so that they arrive
later than they normally would, will cause additional delay for the
subsequent HTTP request. Of course, if messages are delayed so that
they arrive after the timeout, the behavior is the same as "blocking"
above.
9.6. Denial of Service
A denial-of-service attack, where the ICP port is flooded with a
continuous stream of bogus messages has three vulnerabilities:
o The application may log every bogus ICP message and eventually
fill up a disk partition.
Wessels & Claffy Informational [Page 20]
^L
RFC 2187 ICP September 1997
o The socket receive queue may fill up, causing legitimate
messages to be dropped.
o The host may waste some CPU cycles receiving the bogus messages.
9.7. Altering ICP Fields
Here we assume a third party is able to change one or more of the ICP
reply message fields.
Opcode
Changing the opcode field is much like inserting bogus messages
described above. Changing a hit to a miss would prevent the peer
from being used. Changing a miss to a hit would force the peer to
be used.
Version
Altering the ICP version field may have unpredictable consequences
if the new version number is recognized and supported. The
receiving application should ignore messages with invalid version
numbers. At the time of this writing, both version numbers 2 and
3 are in use. These two versions use some fields (e.g. Options)
in a slightly different manner.
Message Length
An incorrect message length should be detected by the receiving
application as an invalid ICP message.
Request Number
The request number is often used as a part of the cache key.
Harvest does not use the request number. Squid uses the request
number in conjunction with the URL to create a cache key.
Altering the request number will cause a lookup of the cache key
to fail. This is similar to blocking the ICP reply altogether.
Wessels & Claffy Informational [Page 21]
^L
RFC 2187 ICP September 1997
There is no requirement that a cache use both the URL and the
request number to locate HTTP requests with outstanding ICP
queries (however both Squid and Harvest do). The request number
must always be the same in the query and the reply. However, if
the querying cache uses only the request number to locate pending
requests, there is some possibility that a replying cache might
increment the request number in the reply to give the false
impression that the two caches are closer than they really are.
In other words, assuming that there are a few ICP requests "in
flight" at any given time, incrementing the reply request number
trick the querying cache into seeing a smaller round-trip time
than really exists.
Options
There is little risk in having the Options bitfields altered. Any
option bit must only be set in a reply if it was also set in a
query. Changing a bit from clear to set is detectable by the
querying cache, and such a message must be ignored. Changing a
bit from set to clear is allowed and has no negative side effects.
Option Data
ICP_FLAG_SRC_RTT is the only option which uses the Option Data
field. Altering the RTT values returned here can affect the
neighbor selection algorithm, either forcing or preventing the use
of a neighbor.
URL
The URL and Request Number are used to generate the cache key.
Altering the URL will cause a lookup of the cache key to fail, and
the ICP reply to be entirely ignored. This is similar to blocking
the ICP reply altogether.
9.8. Summary
o ICP_OP_HIT_OBJ is particularly vulnerable to security problems
because it includes object data. For this, and other reasons,
its use is discouraged.
o Falsifying, altering, inserting, or blocking ICP messages can
cause an HTTP request to fail only in two situations:
- If the cache is behind a firewall and cannot directly
connect to the origin server.
Wessels & Claffy Informational [Page 22]
^L
RFC 2187 ICP September 1997
- If a false ICP_OP_HIT reply causes the HTTP request to be
forwarded to a sibling, where the request is a cache miss
and the sibling refuses to continue forwarding the request
on behalf of the originating cache.
o Falsifying, altering, inserting, or blocking ICP messages can
easily cause HTTP requests to be forwarded (or not forwarded) to
certain neighbors. If the neighbor cache has also been
compromised, then it could serve bogus content and pollute a
cache hierarchy.
o Blocking or delaying ICP messages can cause HTTP request to be
further delayed, but still satisfied.
10. References
[1] Fielding, R., et. al, "Hypertext Transfer Protocol -- HTTP/1.1",
RFC 2068, UC Irvine, January 1997.
[2] Berners-Lee, T., Masinter, L., and M. McCahill, "Uniform Resource
Locators (URL)", RFC 1738, CERN, Xerox PARC, University of Minnesota,
December 1994.
[3] Bowman M., Danzig P., Hardy D., Manber U., Schwartz M., and
Wessels D., "The Harvest Information Discovery and Access System",
Internet Research Task Force - Resource Discovery,
http://harvest.transarc.com/.
[4] Wessels D., Claffy K., "ICP and the Squid Web Cache", National
Laboratory for Applied Network Research,
http://www.nlanr.net/~wessels/Papers/icp-squid.ps.gz.
[5] Wessels D., "The Squid Internet Object Cache", National
Laboratory for Applied Network Research,
http://squid.nlanr.net/Squid/
[6] mtrace, Xerox PARC, ftp://ftp.parc.xerox.com/pub/net-
research/ipmulti/.
[7] Atkinson, R., "Security Architecture for the Internet Protocol",
RFC 1825, NRL, August 1995.
[8] Atkinson, R., "IP Authentication Header", RFC 1826, NRL, August
1995.
[9] Atkinson, R., "IP Encapsulating Security Payload (ESP)", RFC
1827, NRL, August 1995.
Wessels & Claffy Informational [Page 23]
^L
RFC 2187 ICP September 1997
11. Acknowledgments
The authors wish to thank Paul A Vixie <paul@vix.com> for providing
excellent feedback on this document, Martin Hamilton
<martin@mrrl.lut.ac.uk> for pushing the development of multicast ICP,
Eric Rescorla <ekr@terisa.com> and Randall Atkinson <rja@home.net>
for assisting with security issues, and especially Allyn Romanow for
keeping us on the right track.
12. Authors' Addresses
Duane Wessels
National Laboratory for Applied Network Research
10100 Hopkins Drive
La Jolla, CA 92093
EMail: wessels@nlanr.net
K. Claffy
National Laboratory for Applied Network Research
10100 Hopkins Drive
La Jolla, CA 92093
EMail: kc@nlanr.net
Wessels & Claffy Informational [Page 24]
^L
|