summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc8805.txt
blob: ac5d2bcd8d894a4ce735f0380cfe8c1d5ecc0436 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
Independent Submission                                          E. Kline
Request for Comments: 8805                                      Loon LLC
Category: Informational                                        K. Duleba
ISSN: 2070-1721                                                   Google
                                                             Z. Szamonek
                                                                S. Moser
                                                 Google Switzerland GmbH
                                                               W. Kumari
                                                                  Google
                                                             August 2020


            A Format for Self-Published IP Geolocation Feeds

Abstract

   This document records a format whereby a network operator can publish
   a mapping of IP address prefixes to simplified geolocation
   information, colloquially termed a "geolocation feed".  Interested
   parties can poll and parse these feeds to update or merge with other
   geolocation data sources and procedures.  This format intentionally
   only allows specifying coarse-level location.

   Some technical organizations operating networks that move from one
   conference location to the next have already experimentally published
   small geolocation feeds.

   This document describes a currently deployed format.  At least one
   consumer (Google) has incorporated these feeds into a geolocation
   data pipeline, and a significant number of ISPs are using it to
   inform them where their prefixes should be geolocated.

Status of This Memo

   This document is not an Internet Standards Track specification; it is
   published for informational purposes.

   This is a contribution to the RFC Series, independently of any other
   RFC stream.  The RFC Editor has chosen to publish this document at
   its discretion and makes no statement about its value for
   implementation or deployment.  Documents approved for publication by
   the RFC Editor are not candidates for any level of Internet Standard;
   see Section 2 of RFC 7841.

   Information about the current status of this document, any errata,
   and how to provide feedback on it may be obtained at
   https://www.rfc-editor.org/info/rfc8805.

Copyright Notice

   Copyright (c) 2020 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.

Table of Contents

   1.  Introduction
     1.1.  Motivation
     1.2.  Requirements Notation
     1.3.  Assumptions about Publication
   2.  Self-Published IP Geolocation Feeds
     2.1.  Specification
       2.1.1.  Geolocation Feed Individual Entry Fields
         2.1.1.1.  IP Prefix
         2.1.1.2.  Alpha2code (Previously: 'country')
         2.1.1.3.  Region
         2.1.1.4.  City
         2.1.1.5.  Postal Code
       2.1.2.  Prefixes with No Geolocation Information
       2.1.3.  Additional Parsing Requirements
     2.2.  Examples
   3.  Consuming Self-Published IP Geolocation Feeds
     3.1.  Feed Integrity
     3.2.  Verification of Authority
     3.3.  Verification of Accuracy
     3.4.  Refreshing Feed Information
   4.  Privacy Considerations
   5.  Relation to Other Work
   6.  Security Considerations
   7.  Planned Future Work
   8.  Finding Self-Published IP Geolocation Feeds
     8.1.  Ad Hoc 'Well-Known' URIs
     8.2.  Other Mechanisms
   9.  IANA Considerations
   10. References
     10.1.  Normative References
     10.2.  Informative References
   Appendix A.  Sample Python Validation Code
   Acknowledgements
   Authors' Addresses

1.  Introduction

1.1.  Motivation

   Providers of services over the Internet have grown to depend on best-
   effort geolocation information to improve the user experience.
   Locality information can aid in directing traffic to the nearest
   serving location, inferring likely native language, and providing
   additional context for services involving search queries.

   When an ISP, for example, changes the location where an IP prefix is
   deployed, services that make use of geolocation information may begin
   to suffer degraded performance.  This can lead to customer
   complaints, possibly to the ISP directly.  Dissemination of correct
   geolocation data is complicated by the lack of any centralized means
   to coordinate and communicate geolocation information to all
   interested consumers of the data.

   This document records a format whereby a network operator (an ISP, an
   enterprise, or any organization that deems the geolocation of its IP
   prefixes to be of concern) can publish a mapping of IP address
   prefixes to simplified geolocation information, colloquially termed a
   "geolocation feed".  Interested parties can poll and parse these
   feeds to update or merge with other geolocation data sources and
   procedures.

   This document describes a currently deployed format.  At least one
   consumer (Google) has incorporated these feeds into a geolocation
   data pipeline, and a significant number of ISPs are using it to
   inform them where their prefixes should be geolocated.

1.2.  Requirements Notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   As this is an informational document about a data format and set of
   operational practices presently in use, requirements notation
   captures the design goals of the authors and implementors.

1.3.  Assumptions about Publication

   This document describes both a format and a mechanism for publishing
   data, with the assumption that the network operator to whom
   operational responsibility has been delegated for any published data
   wishes it to be public.  Any privacy risk is bounded by the format,
   and feed publishers MAY omit prefixes or any location field
   associated with a given prefix to further protect privacy (see
   Section 2.1 for details about which fields exactly may be omitted).
   Feed publishers assume the responsibility of determining which data
   should be made public.

   This document does not incorporate a mechanism to communicate
   acceptable use policies for self-published data.  Publication itself
   is inferred as a desire by the publisher for the data to be usefully
   consumed, similar to the publication of information like host names,
   cryptographic keys, and Sender Policy Framework (SPF) records
   [RFC7208] in the DNS.

2.  Self-Published IP Geolocation Feeds

   The format described here was developed to address the need of
   network operators to rapidly and usefully share geolocation
   information changes.  Originally, there arose a specific case where
   regional operators found it desirable to publish location changes
   rather than wait for geolocation algorithms to "learn" about them.
   Later, technical conferences that frequently use the same network
   prefixes advertised from different conference locations experimented
   by publishing geolocation feeds updated in advance of network
   location changes in order to better serve conference attendees.

   At its simplest, the mechanism consists of a network operator
   publishing a file (the "geolocation feed") that contains several text
   entries, one per line.  Each entry is keyed by a unique (within the
   feed) IP prefix (or single IP address) followed by a sequence of
   network locality attributes to be ascribed to the given prefix.

2.1.  Specification

   For operational simplicity, every feed should contain data about all
   IP addresses the provider wants to publish.  Alternatives, like
   publishing only entries for IP addresses whose geolocation data has
   changed or differ from current observed geolocation behavior "at
   large", are likely to be too operationally complex.

   Feeds MUST use UTF-8 [RFC3629] character encoding.  Lines are
   delimited by a line break (CRLF) (as specified in [RFC4180]), and
   blank lines are ignored.  Text from a '#' character to the end of the
   current line is treated as a comment only and is similarly ignored
   (note that this does not strictly follow [RFC4180], which has no
   support for comments).

   Feed lines that are not comments MUST be formatted as comma-separated
   values (CSV), as described in [RFC4180].  Each feed entry is a text
   line of the form:

   ip_prefix,alpha2code,region,city,postal_code

   The IP prefix field is REQUIRED, all others are OPTIONAL (can be
   empty), though the requisite minimum number of commas SHOULD be
   present.

2.1.1.  Geolocation Feed Individual Entry Fields

2.1.1.1.  IP Prefix

   REQUIRED: Each IP prefix field MUST be either a single IP address or
   an IP prefix in Classless Inter-Domain Routing (CIDR) notation in
   conformance with Section 3.1 of [RFC4632] for IPv4 or Section 2.3 of
   [RFC4291] for IPv6.

   Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and
   "2001:db8::1" and "2001:db8::/32" for IPv6.

2.1.1.2.  Alpha2code (Previously: 'country')

   OPTIONAL: The alpha2code field, if non-empty, MUST be a 2-letter ISO
   country code conforming to ISO 3166-1 alpha 2 [ISO.3166.1alpha2].
   Parsers SHOULD treat this field case-insensitively.

   Earlier versions of this document called this field "country", and it
   may still be referred to as such in existing tools/interfaces.

   Parsers MAY additionally support other 2-letter codes outside the ISO
   3166-1 alpha 2 codes, such as the 2-letter codes from the
   "Exceptionally reserved codes" [ISO-GLOSSARY] set.

   Examples include "US" for the United States, "JP" for Japan, and "PL"
   for Poland.

2.1.1.3.  Region

   OPTIONAL: The region field, if non-empty, MUST be an ISO region code
   conforming to ISO 3166-2 [ISO.3166.2].  Parsers SHOULD treat this
   field case-insensitively.

   Examples include "ID-RI" for the Riau province of Indonesia and "NG-
   RI" for the Rivers province in Nigeria.

2.1.1.4.  City

   OPTIONAL: The city field, if non-empty, SHOULD be free UTF-8 text,
   excluding the comma (',') character.

   Examples include "Dublin", "New York", and "Sao Paulo" (specifically
   "S" followed by 0xc3, 0xa3, and "o Paulo").

2.1.1.5.  Postal Code

   OPTIONAL, DEPRECATED: The postal code field, if non-empty, SHOULD be
   free UTF-8 text, excluding the comma (',') character.  The use of
   this field is deprecated; consumers of feeds should be able to parse
   feeds containing these fields, but new feeds SHOULD NOT include this
   field due to the granularity of this information.  See Section 4 for
   additional discussion.

   Examples include "106-6126" (in Minato ward, Tokyo, Japan).

2.1.2.  Prefixes with No Geolocation Information

   Feed publishers may indicate that some IP prefixes should not have
   any associated geolocation information.  It may be that some prefixes
   under their administrative control are reserved, not yet allocated or
   deployed, or in the process of being redeployed elsewhere and
   existing geolocation information can, from the perspective of the
   publisher, safely be discarded.

   This special case can be indicated by explicitly leaving blank all
   fields that specify any degree of geolocation information.  For
   example:

   192.0.2.0/24,,,,
   2001:db8:1::/48,,,,
   2001:db8:2::/48,,,,

   Historically, the user-assigned alpha2code identifier of "ZZ" has
   been used for this same purpose.  This is not necessarily preferred,
   and no specific interpretation of any of the other user-assigned
   alpha2code codes is currently defined.

2.1.3.  Additional Parsing Requirements

   Feed entries that do not have an IP address or prefix field or have
   an IP address or prefix field that fails to parse correctly MUST be
   discarded.

   While publishers SHOULD follow [RFC5952] for IPv6 prefix fields,
   consumers MUST nevertheless accept all valid string representations.

   Duplicate IP address or prefix entries MUST be considered an error,
   and consumer implementations SHOULD log the repeated entries for
   further administrative review.  Publishers SHOULD take measures to
   ensure there is one and only one entry per IP address and prefix.

   Multiple entries that constitute nested prefixes are permitted.
   Consumers SHOULD consider the entry with the longest matching prefix
   (i.e., the "most specific") to be the best matching entry for a given
   IP address.

   Feed entries with non-empty optional fields that fail to parse,
   either in part or in full, SHOULD be discarded.  It is RECOMMENDED
   that they also be logged for further administrative review.

   For compatibility with future additional fields, a parser MUST ignore
   any fields beyond those it expects.  The data from fields that are
   expected and that parse successfully MUST still be considered valid.
   Per Section 7, no extensions to this format are in use nor are any
   anticipated.

2.2.  Examples

   Example entries using different IP address formats and describing
   locations at alpha2code ("country code"), region, and city
   granularity level, respectively:

   192.0.2.0/25,US,US-AL,,
   192.0.2.5,US,US-AL,Alabaster,
   192.0.2.128/25,PL,PL-MZ,,
   2001:db8::/32,PL,,,
   2001:db8:cafe::/48,PL,PL-MZ,,

   The IETF network publishes geolocation information for the meeting
   prefixes, and generally just comment out the last meeting information
   and append the new meeting information.  The [GEO_IETF], at the time
   of this writing, contains:

   # IETF106 (Singapore) - November 2019 - Singapore, SG
   130.129.0.0/16,SG,SG-01,Singapore,
   2001:df8::/32,SG,SG-01,Singapore,
   31.133.128.0/18,SG,SG-01,Singapore,
   31.130.224.0/20,SG,SG-01,Singapore,
   2001:67c:1230::/46,SG,SG-01,Singapore,
   2001:67c:370::/48,SG,SG-01,Singapore,

   Experimentally, RIPE has published geolocation information for their
   conference network prefixes, which change location in accordance with
   each new event.  [GEO_RIPE_NCC], at the time of writing, contains:

   193.0.24.0/21,NL,NL-ZH,Rotterdam,
   2001:67c:64::/48,NL,NL-ZH,Rotterdam,

   Similarly, ICANN has published geolocation information for their
   portable conference network prefixes.  [GEO_ICANN], at the time of
   writing, contains:

   199.91.192.0/21,MA,MA-07,Marrakech
   2620:f:8000::/48,MA,MA-07,Marrakech

   A longer example is the [GEO_Google] Google Corp Geofeed, which lists
   the geolocation information for Google corporate offices.

   At the time of writing, Google processes approximately 400 feeds
   comprising more than 750,000 IPv4 and IPv6 prefixes.

3.  Consuming Self-Published IP Geolocation Feeds

   Consumers MAY treat published feed data as a hint only and MAY choose
   to prefer other sources of geolocation information for any given IP
   prefix.  Regardless of a consumer's stance with respect to a given
   published feed, there are some points of note for sensibly and
   effectively consuming published feeds.

3.1.  Feed Integrity

   The integrity of published information SHOULD be protected by
   securing the means of publication, for example, by using HTTP over
   TLS [RFC2818].  Whenever possible, consumers SHOULD prefer retrieving
   geolocation feeds in a manner that guarantees integrity of the feed.

3.2.  Verification of Authority

   Consumers of self-published IP geolocation feeds SHOULD perform some
   form of verification that the publisher is in fact authoritative for
   the addresses in the feed.  The actual means of verification is
   likely dependent upon the way in which the feed is discovered.  Ad
   hoc shared URIs, for example, will likely require an ad hoc
   verification process.  Future automated means of feed discovery
   SHOULD have an accompanying automated means of verification.

   A consumer should only trust geolocation information for IP addresses
   or prefixes for which the publisher has been verified as
   administratively authoritative.  All other geolocation feed entries
   should be ignored and logged for further administrative review.

3.3.  Verification of Accuracy

   Errors and inaccuracies may occur at many levels, and publication and
   consumption of geolocation data are no exceptions.  To the extent
   practical, consumers SHOULD take steps to verify the accuracy of
   published locality.  Verification methodology, resolution of
   discrepancies, and preference for alternative sources of data are
   left to the discretion of the feed consumer.

   Consumers SHOULD decide on discrepancy thresholds and SHOULD flag,
   for administrative review, feed entries that exceed set thresholds.

3.4.  Refreshing Feed Information

   As a publisher can change geolocation data at any time and without
   notification, consumers SHOULD implement mechanisms to periodically
   refresh local copies of feed data.  In the absence of any other
   refresh timing information, it is recommended that consumers SHOULD
   refresh feeds no less often than weekly and no more often than is
   likely to cause issues to the publisher.

   For feeds available via HTTPS (or HTTP), the publisher MAY
   communicate refresh timing information by means of the standard HTTP
   expiration model ([RFC7234]).  Specifically, publishers can include
   either an Expires header (Section 5.3 of [RFC7234]) or a Cache-
   Control header (Section 5.2 of [RFC7234]) specifying the max-age.
   Where practical, consumers SHOULD refresh feed information before the
   expiry time is reached.

4.  Privacy Considerations

   Publishers of geolocation feeds are advised to have fully considered
   any and all privacy implications of the disclosure of such
   information for the users of the described networks prior to
   publication.  A thorough comprehension of the security considerations
   (Section 13 of [RFC6772]) of a chosen geolocation policy is highly
   recommended, including an understanding of some of the limitations of
   information obscurity (Section 13.5 of [RFC6772]) (see also
   [RFC6772]).

   As noted in Section 2.1, each location field in an entry is optional,
   in order to support expressing only the level of specificity that the
   publisher has deemed acceptable.  There is no requirement that the
   level of specificity be consistent across all entries within a feed.
   In particular, the Postal Code field (Section 2.1.1.5) can provide
   very specific geolocation, sometimes within a building.  Such
   specific Postal Code values MUST NOT be published in geofeeds without
   the express consent of the parties being located.

   Operators who publish geolocation information are strongly encouraged
   to inform affected users/customers of this fact and of the potential
   privacy-related consequences and trade-offs.

5.  Relation to Other Work

   While not originally done in conjunction with the GEOPRIV Working
   Group [GEOPRIV], Richard Barnes observed that this work is
   nevertheless consistent with that which the group has defined, both
   for address format and for privacy.  The data elements in geolocation
   feeds are equivalent to the following XML structure ([RFC5139]
   [W3C.REC-xml-20081126]):

   <civicAddress>
     <country>country</country>
     <A1>region</A1>
     <A2>city</A2>
     <PC>postal_code</PC>
   </civicAddress>

   Providing geolocation information to this granularity is equivalent
   to the following privacy policy (the definition of the 'building'
   Section 6.5.1 of [RFC6772] level of disclosure):

   <ruleset>
     <rule>
       <conditions/>
       <actions/>
       <transformations>
         <provide-location profile="civic-transformation">
           <provide-civic>building</provide-civic>
         </provide-location>
       </transformations>
     </rule>
   </ruleset>

6.  Security Considerations

   As there is no true security in the obscurity of the location of any
   given IP address, self-publication of this data fundamentally opens
   no new attack vectors.  For publishers, self-published data may
   increase the ease with which such location data might be exploited
   (it can, for example, make easy the discovery of prefixes populated
   with customers as distinct from prefixes not generally in use).

   For consumers, feed retrieval processes may receive input from
   potentially hostile sources (e.g., in the event of hijacked traffic).
   As such, proper input validation and defense measures MUST be taken
   (see the discussion in Section 3.1).

   Similarly, consumers who do not perform sufficient verification of
   published data bear the same risks as from other forms of geolocation
   configuration errors (see the discussion in Sections 3.2 and 3.3).

   Validation of a feed's contents includes verifying that the publisher
   is authoritative for the IP prefixes included in the feed.  Failure
   to verify IP prefix authority would, for example, allow ISP Bob to
   make geolocation statements about IP space held by ISP Alice.  At
   this time, only out-of-band verification methods are implemented
   (i.e., an ISP's feed may be verified against publicly available IP
   allocation data).

7.  Planned Future Work

   In order to more flexibly support future extensions, use of a more
   expressive feed format has been suggested.  Use of JavaScript Object
   Notation (JSON) [RFC8259], specifically, has been discussed.
   However, at the time of writing, no such specification nor
   implementation exists.  Nevertheless, work on extensions is deferred
   until a more suitable format has been selected.

   The authors are planning on writing a document describing such a new
   format.  This document describes a currently deployed and used
   format.  Given the extremely limited extensibility of the present
   format no extensions to it are anticipated.  Extensibility
   requirements are instead expected to be integral to the development
   of a new format.

8.  Finding Self-Published IP Geolocation Feeds

   The issue of finding, and later verifying, geolocation feeds is not
   formally specified in this document.  At this time, only ad hoc feed
   discovery and verification has a modicum of established practice (see
   below); discussion of other mechanisms has been removed for clarity.

8.1.  Ad Hoc 'Well-Known' URIs

   To date, geolocation feeds have been shared informally in the form of
   HTTPS URIs exchanged in email threads.  Three example URIs
   ([GEO_IETF], [GEO_RIPE_NCC], and [GEO_ICANN]) describe networks that
   change locations periodically, the operators and operational
   practices of which are well known within their respective technical
   communities.

   The contents of the feeds are verified by a similarly ad hoc process,
   including:

   *  personal knowledge of the parties involved in the exchange and

   *  comparison of feed-advertised prefixes with the BGP-advertised
      prefixes of Autonomous System Numbers known to be operated by the
      publishers.

   Ad hoc mechanisms, while useful for early experimentation by
   producers and consumers, are unlikely to be adequate for long-term,
   widespread use by multiple parties.  Future versions of any such
   self-published geolocation feed mechanism SHOULD address scalability
   concerns by defining a means for automated discovery and verification
   of operational authority of advertised prefixes.

8.2.  Other Mechanisms

   Previous versions of this document referenced use of the WHOIS
   service [RFC3912] operated by Regional Internet Registries (RIRs), as
   well as possible DNS-based schemes to discover and validate geofeeds.
   To the authors' knowledge, support for such mechanisms has never been
   implemented, and this speculative text has been removed to avoid
   ambiguity.

9.  IANA Considerations

   This document has no IANA actions.

10.  References

10.1.  Normative References

   [ISO.3166.1alpha2]
              ISO, "ISO 3166-1 decoding table",
              <http://www.iso.org/iso/home/standards/country_codes/iso-
              3166-1_decoding_table.htm>.

   [ISO.3166.2]
              ISO, "ISO 3166-2:2007",
              <http://www.iso.org/iso/home/standards/
              country_codes.htm#2012_iso3166-2>.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
              10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
              2003, <https://www.rfc-editor.org/info/rfc3629>.

   [RFC4180]  Shafranovich, Y., "Common Format and MIME Type for Comma-
              Separated Values (CSV) Files", RFC 4180,
              DOI 10.17487/RFC4180, October 2005,
              <https://www.rfc-editor.org/info/rfc4180>.

   [RFC4291]  Hinden, R. and S. Deering, "IP Version 6 Addressing
              Architecture", RFC 4291, DOI 10.17487/RFC4291, February
              2006, <https://www.rfc-editor.org/info/rfc4291>.

   [RFC4632]  Fuller, V. and T. Li, "Classless Inter-domain Routing
              (CIDR): The Internet Address Assignment and Aggregation
              Plan", BCP 122, RFC 4632, DOI 10.17487/RFC4632, August
              2006, <https://www.rfc-editor.org/info/rfc4632>.

   [RFC5952]  Kawamura, S. and M. Kawashima, "A Recommendation for IPv6
              Address Text Representation", RFC 5952,
              DOI 10.17487/RFC5952, August 2010,
              <https://www.rfc-editor.org/info/rfc5952>.

   [RFC7234]  Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke,
              Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching",
              RFC 7234, DOI 10.17487/RFC7234, June 2014,
              <https://www.rfc-editor.org/info/rfc7234>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

   [W3C.REC-xml-20081126]
              Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and
              F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth
              Edition)", World Wide Web Consortium Recommendation REC-
              xml-20081126, November 2008,
              <http://www.w3.org/TR/2008/REC-xml-20081126>.

10.2.  Informative References

   [GEOPRIV]  IETF, "Geographic Location/Privacy (geopriv)",
              <http://datatracker.ietf.org/wg/geopriv/>.

   [GEO_Google]
              Google, LLC, "Google Corp Geofeed",
              <https://www.gstatic.com/geofeed/corp_external>.

   [GEO_ICANN]
              ICANN, "ICANN Meeting Geolocation Data",
              <https://meeting-services.icann.org/geo/google.csv>.

   [GEO_IETF] Kumari, W., "IETF Meeting Network Geolocation Data",
              <https://noc.ietf.org/geo/google.csv>.

   [GEO_RIPE_NCC]
              Schepers, M., "RIPE NCC Meeting Geolocation Data",
              <https://meetings.ripe.net/geo/google.csv>.

   [IPADDR_PY]
              Shields, M. and P. Moody, "Google's Python IP address
              manipulation library",
              <http://code.google.com/p/ipaddr-py/>.

   [ISO-GLOSSARY]
              ISO, "Glossary for ISO 3166",
              <https://www.iso.org/glossary-for-iso-3166.html>.

   [RFC2818]  Rescorla, E., "HTTP Over TLS", RFC 2818,
              DOI 10.17487/RFC2818, May 2000,
              <https://www.rfc-editor.org/info/rfc2818>.

   [RFC3912]  Daigle, L., "WHOIS Protocol Specification", RFC 3912,
              DOI 10.17487/RFC3912, September 2004,
              <https://www.rfc-editor.org/info/rfc3912>.

   [RFC5139]  Thomson, M. and J. Winterbottom, "Revised Civic Location
              Format for Presence Information Data Format Location
              Object (PIDF-LO)", RFC 5139, DOI 10.17487/RFC5139,
              February 2008, <https://www.rfc-editor.org/info/rfc5139>.

   [RFC6772]  Schulzrinne, H., Ed., Tschofenig, H., Ed., Cuellar, J.,
              Polk, J., Morris, J., and M. Thomson, "Geolocation Policy:
              A Document Format for Expressing Privacy Preferences for
              Location Information", RFC 6772, DOI 10.17487/RFC6772,
              January 2013, <https://www.rfc-editor.org/info/rfc6772>.

   [RFC7208]  Kitterman, S., "Sender Policy Framework (SPF) for
              Authorizing Use of Domains in Email, Version 1", RFC 7208,
              DOI 10.17487/RFC7208, April 2014,
              <https://www.rfc-editor.org/info/rfc7208>.

   [RFC8259]  Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
              Interchange Format", STD 90, RFC 8259,
              DOI 10.17487/RFC8259, December 2017,
              <https://www.rfc-editor.org/info/rfc8259>.

Appendix A.  Sample Python Validation Code

   Included here is a simple format validator in Python for self-
   published ipgeo feeds.  This tool reads CSV data in the self-
   published ipgeo feed format from the standard input and performs
   basic validation.  It is intended for use by feed publishers before
   launching a feed.  Note that this validator does not verify the
   uniqueness of every IP prefix entry within the feed as a whole but
   only verifies the syntax of each single line from within the feed.  A
   complete validator MUST also ensure IP prefix uniqueness.

   The main source file "ipgeo_feed_validator.py" follows.  It requires
   use of the open source ipaddr Python library for IP address and CIDR
   parsing and validation [IPADDR_PY].

   <CODE BEGINS>
   #!/usr/bin/python
   #
   # Copyright (c) 2012 IETF Trust and the persons identified as
   # authors of the code.  All rights reserved.  Redistribution and use
   # in source and binary forms, with or without modification, is
   # permitted pursuant to, and subject to the license terms contained
   # in, the Simplified BSD License set forth in Section 4.c of the
   # IETF Trust's Legal Provisions Relating to IETF
   # Documents (http://trustee.ietf.org/license-info).

   """Simple format validator for self-published ipgeo feeds.

   This tool reads CSV data in the self-published ipgeo feed format
   from the standard input and performs basic validation.  It is
   intended for use by feed publishers before launching a feed.
   """

   import csv
   import ipaddr
   import re
   import sys


   class IPGeoFeedValidator(object):
     def __init__(self):
       self.prefixes = {}
       self.line_number = 0
       self.output_log = {}
       self.SetOutputStream(sys.stderr)

     def Validate(self, feed):
       """Check validity of an IPGeo feed.

       Args:
         feed: iterable with feed lines
       """

       for line in feed:
         self._ValidateLine(line)

     def SetOutputStream(self, logfile):
       """Controls where the output messages go do (STDERR by default).

       Use None to disable logging.

       Args:
         logfile: a file object (e.g., sys.stdout) or None.
       """
       self.output_stream = logfile

     def CountErrors(self, severity):
       """How many ERRORs or WARNINGs were generated."""
       return len(self.output_log.get(severity, []))

     ############################################################
     def _ValidateLine(self, line):
       line = line.rstrip('\r\n')
       self.line_number += 1
       self.line = line.split('#')[0]
       self.is_correct_line = True

       if self._ShouldIgnoreLine(line):
         return

       fields = [field for field in csv.reader([line])][0]

       self._ValidateFields(fields)
       self._FlushOutputStream()

     def _ShouldIgnoreLine(self, line):
       line = line.strip()
       if line.startswith('#'):
         return True
       return len(line) == 0

     ############################################################
     def _ValidateFields(self, fields):
       assert(len(fields) > 0)

       is_correct = self._IsIPAddressOrPrefixCorrect(fields[0])

       if len(fields) > 1:
         if not self._IsAlpha2CodeCorrect(fields[1]):
           is_correct = False

       if len(fields) > 2 and not self._IsRegionCodeCorrect(fields[2]):
         is_correct = False

       if len(fields) != 5:
         self._ReportWarning('5 fields were expected (got %d).'
                             % len(fields))

     ############################################################
     def _IsIPAddressOrPrefixCorrect(self, field):
       if '/' in field:
         return self._IsCIDRCorrect(field)
       return self._IsIPAddressCorrect(field)

     def _IsCIDRCorrect(self, cidr):
       try:
         ipprefix = ipaddr.IPNetwork(cidr)
         if ipprefix.network._ip != ipprefix._ip:
           self._ReportError('Incorrect IP Network.')
           return False
         if ipprefix.is_private:
           self._ReportError('IP Address must not be private.')
           return False
       except:
         self._ReportError('Incorrect IP Network.')
         return False
       return True

     def _IsIPAddressCorrect(self, ipaddress):
       try:
         ip = ipaddr.IPAddress(ipaddress)
       except:
         self._ReportError('Incorrect IP Address.')
         return False
       if ip.is_private:
         self._ReportError('IP Address must not be private.')
         return False
       return True

     ############################################################
     def _IsAlpha2CodeCorrect(self, alpha2code):
       if len(alpha2code) == 0:
         return True
       if len(alpha2code) != 2 or not alpha2code.isalpha():
         self._ReportError(
             'Alpha 2 code must be in the ISO 3166-1 alpha 2 format.')
         return False
       return True

     def _IsRegionCodeCorrect(self, region_code):
       if len(region_code) == 0:
         return True
       if '-' not in region_code:
         self._ReportError('Region code must be in ISO 3166-2 format.')
         return False

       parts = region_code.split('-')
       if not self._IsAlpha2CodeCorrect(parts[0]):
         return False
       return True

     ############################################################
     def _ReportError(self, message):
       self._ReportWithSeverity('ERROR', message)

     def _ReportWarning(self, message):
       self._ReportWithSeverity('WARNING', message)

     def _ReportWithSeverity(self, severity, message):
       self.is_correct_line = False
       output_line = '%s: %s\n' % (severity, message)

       if severity not in self.output_log:
         self.output_log[severity] = []
       self.output_log[severity].append(output_line)

       if self.output_stream is not None:
         self.output_stream.write(output_line)

     def _FlushOutputStream(self):
       if self.is_correct_line: return
       if self.output_stream is None: return

       self.output_stream.write('line %d: %s\n\n'
                                % (self.line_number, self.line))


   ############################################################
   def main():
      feed_validator = IPGeoFeedValidator()
      feed_validator.Validate(sys.stdin)

      if feed_validator.CountErrors('ERROR'):
        sys.exit(1)

   if __name__ == '__main__':
     main()
   <CODE ENDS>

   A unit test file, "ipgeo_feed_validator_test.py" is provided as well.
   It provides basic test coverage of the code above, though does not
   test correct handling of non-ASCII UTF-8 strings.

   <CODE BEGINS>
   #!/usr/bin/python
   #
   # Copyright (c) 2012 IETF Trust and the persons identified as
   # authors of the code.  All rights reserved.  Redistribution and use
   # in source and binary forms, with or without modification, is
   # permitted pursuant to, and subject to the license terms contained
   # in, the Simplified BSD License set forth in Section 4.c of the
   # IETF Trust's Legal Provisions Relating to IETF
   # Documents (http://trustee.ietf.org/license-info).

   import sys
   from ipgeo_feed_validator import IPGeoFeedValidator

   class IPGeoFeedValidatorTest(object):
     def __init__(self):
       self.validator = IPGeoFeedValidator()
       self.validator.SetOutputStream(None)
       self.successes = 0
       self.failures = 0

     def Run(self):
       self.TestFeedLine('# asdf', 0, 0)
       self.TestFeedLine('   ', 0, 0)
       self.TestFeedLine('', 0, 0)

       self.TestFeedLine('asdf', 1, 1)
       self.TestFeedLine('asdf,US,,,', 1, 0)
       self.TestFeedLine('aaaa::,US,,,', 0, 0)
       self.TestFeedLine('zzzz::,US', 1, 1)
       self.TestFeedLine(',US,,,', 1, 0)
       self.TestFeedLine('55.66.77', 1, 1)
       self.TestFeedLine('55.66.77.888', 1, 1)
       self.TestFeedLine('55.66.77.asdf', 1, 1)

       self.TestFeedLine('2001:db8:cafe::/48,PL,PL-MZ,,02-784', 0, 0)
       self.TestFeedLine('2001:db8:cafe::/48', 0, 1)

       self.TestFeedLine('55.66.77.88,PL', 0, 1)
       self.TestFeedLine('55.66.77.88,PL,,,', 0, 0)
       self.TestFeedLine('55.66.77.88,,,,', 0, 0)
       self.TestFeedLine('55.66.77.88,ZZ,,,', 0, 0)
       self.TestFeedLine('55.66.77.88,US,,,', 0, 0)
       self.TestFeedLine('55.66.77.88,USA,,,', 1, 0)
       self.TestFeedLine('55.66.77.88,99,,,', 1, 0)

       self.TestFeedLine('55.66.77.88,US,US-CA,,', 0, 0)
       self.TestFeedLine('55.66.77.88,US,USA-CA,,', 1, 0)
       self.TestFeedLine('55.66.77.88,USA,USA-CA,,', 2, 0)

       self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,', 0, 0)
       self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043',
                         0, 0)
       self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043,'
                         '1600 Ampthitheatre Parkway', 0, 1)

       self.TestFeedLine('55.66.77.0/24,US,,,', 0, 0)
       self.TestFeedLine('55.66.77.88/24,US,,,', 1, 0)
       self.TestFeedLine('55.66.77.88/32,US,,,', 0, 0)
       self.TestFeedLine('55.66.77/24,US,,,', 1, 0)
       self.TestFeedLine('55.66.77.0/35,US,,,', 1, 0)

       self.TestFeedLine('172.15.30.1,US,,,', 0, 0)
       self.TestFeedLine('172.28.30.1,US,,,', 1, 0)
       self.TestFeedLine('192.167.100.1,US,,,', 0, 0)
       self.TestFeedLine('192.168.100.1,US,,,', 1, 0)
       self.TestFeedLine('10.0.5.9,US,,,', 1, 0)
       self.TestFeedLine('10.0.5.0/24,US,,,', 1, 0)
       self.TestFeedLine('fc00::/48,PL,,,', 1, 0)
       self.TestFeedLine('fe00::/48,PL,,,', 0, 0)

       print ('%d tests passed, %d failed'
         % (self.successes, self.failures))

     def IsOutputLogCorrectAtSeverity(self, severity,
       expected_msg_count):
       msg_count = self.validator.CountErrors(severity)

       if msg_count != expected_msg_count:
         print ('TEST FAILED: %s\nexpected %d %s[s], observed %d\n%s\n'
            % (self.validator.line, expected_msg_count, severity,
              msg_count,
             str(self.validator.output_log[severity])))
         return False
       return True

     def IsOutputLogCorrect(self, new_errors, new_warnings):
       retval = True

       if not self.IsOutputLogCorrectAtSeverity('ERROR', new_errors):
         retval = False
       if not self.IsOutputLogCorrectAtSeverity('WARNING',
                                                new_warnings):
         retval = False

       return retval

     def TestFeedLine(self, line, warning_count, error_count):
       self.validator.output_log['WARNING'] = []
       self.validator.output_log['ERROR'] = []
       self.validator._ValidateLine(line)

       if not self.IsOutputLogCorrect(warning_count, error_count):
         self.failures += 1
         return False

       self.successes += 1
       return True


   if __name__ == '__main__':
     IPGeoFeedValidatorTest().Run()
   <CODE ENDS>

Acknowledgements

   The authors would like to express their gratitude to reviewers and
   early implementors, including but not limited to Mikael Abrahamsson,
   Andrew Alston, Ray Bellis, John Bond, Alissa Cooper, Andras Erdei,
   Stephen Farrell, Marco Hogewoning, Mike Joseph, Maciej Kuzniar,
   George Michaelson, Menno Schepers, Justyna Sidorska, Pim van Pelt,
   and Bjoern A. Zeeb.

   In particular, Richard L. Barnes and Andy Newton contributed
   substantial review, text, and advice.

Authors' Addresses

   Erik Kline
   Loon LLC
   1600 Amphitheatre Parkway
   Mountain View, CA 94043
   United States of America

   Email: ek@loon.com


   Krzysztof Duleba
   Google
   1600 Amphitheatre Parkway
   Mountain View, CA 94043
   United States of America

   Email: kduleba@google.com


   Zoltan Szamonek
   Google Switzerland GmbH
   Brandschenkestrasse 110
   CH-8002 Zürich
   Switzerland

   Email: zszami@google.com


   Stefan Moser
   Google Switzerland GmbH
   Brandschenkestrasse 110
   CH-8002 Zürich
   Switzerland

   Email: smoser@google.com


   Warren Kumari
   Google
   1600 Amphitheatre Parkway
   Mountain View, CA 94043
   United States of America

   Email: warren@kumari.net