From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc8805.txt | 1050 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1050 insertions(+) create mode 100644 doc/rfc/rfc8805.txt (limited to 'doc/rfc/rfc8805.txt') diff --git a/doc/rfc/rfc8805.txt b/doc/rfc/rfc8805.txt new file mode 100644 index 0000000..ac5d2bc --- /dev/null +++ b/doc/rfc/rfc8805.txt @@ -0,0 +1,1050 @@ + + + + +Independent Submission E. Kline +Request for Comments: 8805 Loon LLC +Category: Informational K. Duleba +ISSN: 2070-1721 Google + Z. Szamonek + S. Moser + Google Switzerland GmbH + W. Kumari + Google + August 2020 + + + A Format for Self-Published IP Geolocation Feeds + +Abstract + + This document records a format whereby a network operator can publish + a mapping of IP address prefixes to simplified geolocation + information, colloquially termed a "geolocation feed". Interested + parties can poll and parse these feeds to update or merge with other + geolocation data sources and procedures. This format intentionally + only allows specifying coarse-level location. + + Some technical organizations operating networks that move from one + conference location to the next have already experimentally published + small geolocation feeds. + + This document describes a currently deployed format. At least one + consumer (Google) has incorporated these feeds into a geolocation + data pipeline, and a significant number of ISPs are using it to + inform them where their prefixes should be geolocated. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This is a contribution to the RFC Series, independently of any other + RFC stream. The RFC Editor has chosen to publish this document at + its discretion and makes no statement about its value for + implementation or deployment. Documents approved for publication by + the RFC Editor are not candidates for any level of Internet Standard; + see Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc8805. + +Copyright Notice + + Copyright (c) 2020 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. + +Table of Contents + + 1. Introduction + 1.1. Motivation + 1.2. Requirements Notation + 1.3. Assumptions about Publication + 2. Self-Published IP Geolocation Feeds + 2.1. Specification + 2.1.1. Geolocation Feed Individual Entry Fields + 2.1.1.1. IP Prefix + 2.1.1.2. Alpha2code (Previously: 'country') + 2.1.1.3. Region + 2.1.1.4. City + 2.1.1.5. Postal Code + 2.1.2. Prefixes with No Geolocation Information + 2.1.3. Additional Parsing Requirements + 2.2. Examples + 3. Consuming Self-Published IP Geolocation Feeds + 3.1. Feed Integrity + 3.2. Verification of Authority + 3.3. Verification of Accuracy + 3.4. Refreshing Feed Information + 4. Privacy Considerations + 5. Relation to Other Work + 6. Security Considerations + 7. Planned Future Work + 8. Finding Self-Published IP Geolocation Feeds + 8.1. Ad Hoc 'Well-Known' URIs + 8.2. Other Mechanisms + 9. IANA Considerations + 10. References + 10.1. Normative References + 10.2. Informative References + Appendix A. Sample Python Validation Code + Acknowledgements + Authors' Addresses + +1. Introduction + +1.1. Motivation + + Providers of services over the Internet have grown to depend on best- + effort geolocation information to improve the user experience. + Locality information can aid in directing traffic to the nearest + serving location, inferring likely native language, and providing + additional context for services involving search queries. + + When an ISP, for example, changes the location where an IP prefix is + deployed, services that make use of geolocation information may begin + to suffer degraded performance. This can lead to customer + complaints, possibly to the ISP directly. Dissemination of correct + geolocation data is complicated by the lack of any centralized means + to coordinate and communicate geolocation information to all + interested consumers of the data. + + This document records a format whereby a network operator (an ISP, an + enterprise, or any organization that deems the geolocation of its IP + prefixes to be of concern) can publish a mapping of IP address + prefixes to simplified geolocation information, colloquially termed a + "geolocation feed". Interested parties can poll and parse these + feeds to update or merge with other geolocation data sources and + procedures. + + This document describes a currently deployed format. At least one + consumer (Google) has incorporated these feeds into a geolocation + data pipeline, and a significant number of ISPs are using it to + inform them where their prefixes should be geolocated. + +1.2. Requirements Notation + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + + As this is an informational document about a data format and set of + operational practices presently in use, requirements notation + captures the design goals of the authors and implementors. + +1.3. Assumptions about Publication + + This document describes both a format and a mechanism for publishing + data, with the assumption that the network operator to whom + operational responsibility has been delegated for any published data + wishes it to be public. Any privacy risk is bounded by the format, + and feed publishers MAY omit prefixes or any location field + associated with a given prefix to further protect privacy (see + Section 2.1 for details about which fields exactly may be omitted). + Feed publishers assume the responsibility of determining which data + should be made public. + + This document does not incorporate a mechanism to communicate + acceptable use policies for self-published data. Publication itself + is inferred as a desire by the publisher for the data to be usefully + consumed, similar to the publication of information like host names, + cryptographic keys, and Sender Policy Framework (SPF) records + [RFC7208] in the DNS. + +2. Self-Published IP Geolocation Feeds + + The format described here was developed to address the need of + network operators to rapidly and usefully share geolocation + information changes. Originally, there arose a specific case where + regional operators found it desirable to publish location changes + rather than wait for geolocation algorithms to "learn" about them. + Later, technical conferences that frequently use the same network + prefixes advertised from different conference locations experimented + by publishing geolocation feeds updated in advance of network + location changes in order to better serve conference attendees. + + At its simplest, the mechanism consists of a network operator + publishing a file (the "geolocation feed") that contains several text + entries, one per line. Each entry is keyed by a unique (within the + feed) IP prefix (or single IP address) followed by a sequence of + network locality attributes to be ascribed to the given prefix. + +2.1. Specification + + For operational simplicity, every feed should contain data about all + IP addresses the provider wants to publish. Alternatives, like + publishing only entries for IP addresses whose geolocation data has + changed or differ from current observed geolocation behavior "at + large", are likely to be too operationally complex. + + Feeds MUST use UTF-8 [RFC3629] character encoding. Lines are + delimited by a line break (CRLF) (as specified in [RFC4180]), and + blank lines are ignored. Text from a '#' character to the end of the + current line is treated as a comment only and is similarly ignored + (note that this does not strictly follow [RFC4180], which has no + support for comments). + + Feed lines that are not comments MUST be formatted as comma-separated + values (CSV), as described in [RFC4180]. Each feed entry is a text + line of the form: + + ip_prefix,alpha2code,region,city,postal_code + + The IP prefix field is REQUIRED, all others are OPTIONAL (can be + empty), though the requisite minimum number of commas SHOULD be + present. + +2.1.1. Geolocation Feed Individual Entry Fields + +2.1.1.1. IP Prefix + + REQUIRED: Each IP prefix field MUST be either a single IP address or + an IP prefix in Classless Inter-Domain Routing (CIDR) notation in + conformance with Section 3.1 of [RFC4632] for IPv4 or Section 2.3 of + [RFC4291] for IPv6. + + Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and + "2001:db8::1" and "2001:db8::/32" for IPv6. + +2.1.1.2. Alpha2code (Previously: 'country') + + OPTIONAL: The alpha2code field, if non-empty, MUST be a 2-letter ISO + country code conforming to ISO 3166-1 alpha 2 [ISO.3166.1alpha2]. + Parsers SHOULD treat this field case-insensitively. + + Earlier versions of this document called this field "country", and it + may still be referred to as such in existing tools/interfaces. + + Parsers MAY additionally support other 2-letter codes outside the ISO + 3166-1 alpha 2 codes, such as the 2-letter codes from the + "Exceptionally reserved codes" [ISO-GLOSSARY] set. + + Examples include "US" for the United States, "JP" for Japan, and "PL" + for Poland. + +2.1.1.3. Region + + OPTIONAL: The region field, if non-empty, MUST be an ISO region code + conforming to ISO 3166-2 [ISO.3166.2]. Parsers SHOULD treat this + field case-insensitively. + + Examples include "ID-RI" for the Riau province of Indonesia and "NG- + RI" for the Rivers province in Nigeria. + +2.1.1.4. City + + OPTIONAL: The city field, if non-empty, SHOULD be free UTF-8 text, + excluding the comma (',') character. + + Examples include "Dublin", "New York", and "Sao Paulo" (specifically + "S" followed by 0xc3, 0xa3, and "o Paulo"). + +2.1.1.5. Postal Code + + OPTIONAL, DEPRECATED: The postal code field, if non-empty, SHOULD be + free UTF-8 text, excluding the comma (',') character. The use of + this field is deprecated; consumers of feeds should be able to parse + feeds containing these fields, but new feeds SHOULD NOT include this + field due to the granularity of this information. See Section 4 for + additional discussion. + + Examples include "106-6126" (in Minato ward, Tokyo, Japan). + +2.1.2. Prefixes with No Geolocation Information + + Feed publishers may indicate that some IP prefixes should not have + any associated geolocation information. It may be that some prefixes + under their administrative control are reserved, not yet allocated or + deployed, or in the process of being redeployed elsewhere and + existing geolocation information can, from the perspective of the + publisher, safely be discarded. + + This special case can be indicated by explicitly leaving blank all + fields that specify any degree of geolocation information. For + example: + + 192.0.2.0/24,,,, + 2001:db8:1::/48,,,, + 2001:db8:2::/48,,,, + + Historically, the user-assigned alpha2code identifier of "ZZ" has + been used for this same purpose. This is not necessarily preferred, + and no specific interpretation of any of the other user-assigned + alpha2code codes is currently defined. + +2.1.3. Additional Parsing Requirements + + Feed entries that do not have an IP address or prefix field or have + an IP address or prefix field that fails to parse correctly MUST be + discarded. + + While publishers SHOULD follow [RFC5952] for IPv6 prefix fields, + consumers MUST nevertheless accept all valid string representations. + + Duplicate IP address or prefix entries MUST be considered an error, + and consumer implementations SHOULD log the repeated entries for + further administrative review. Publishers SHOULD take measures to + ensure there is one and only one entry per IP address and prefix. + + Multiple entries that constitute nested prefixes are permitted. + Consumers SHOULD consider the entry with the longest matching prefix + (i.e., the "most specific") to be the best matching entry for a given + IP address. + + Feed entries with non-empty optional fields that fail to parse, + either in part or in full, SHOULD be discarded. It is RECOMMENDED + that they also be logged for further administrative review. + + For compatibility with future additional fields, a parser MUST ignore + any fields beyond those it expects. The data from fields that are + expected and that parse successfully MUST still be considered valid. + Per Section 7, no extensions to this format are in use nor are any + anticipated. + +2.2. Examples + + Example entries using different IP address formats and describing + locations at alpha2code ("country code"), region, and city + granularity level, respectively: + + 192.0.2.0/25,US,US-AL,, + 192.0.2.5,US,US-AL,Alabaster, + 192.0.2.128/25,PL,PL-MZ,, + 2001:db8::/32,PL,,, + 2001:db8:cafe::/48,PL,PL-MZ,, + + The IETF network publishes geolocation information for the meeting + prefixes, and generally just comment out the last meeting information + and append the new meeting information. The [GEO_IETF], at the time + of this writing, contains: + + # IETF106 (Singapore) - November 2019 - Singapore, SG + 130.129.0.0/16,SG,SG-01,Singapore, + 2001:df8::/32,SG,SG-01,Singapore, + 31.133.128.0/18,SG,SG-01,Singapore, + 31.130.224.0/20,SG,SG-01,Singapore, + 2001:67c:1230::/46,SG,SG-01,Singapore, + 2001:67c:370::/48,SG,SG-01,Singapore, + + Experimentally, RIPE has published geolocation information for their + conference network prefixes, which change location in accordance with + each new event. [GEO_RIPE_NCC], at the time of writing, contains: + + 193.0.24.0/21,NL,NL-ZH,Rotterdam, + 2001:67c:64::/48,NL,NL-ZH,Rotterdam, + + Similarly, ICANN has published geolocation information for their + portable conference network prefixes. [GEO_ICANN], at the time of + writing, contains: + + 199.91.192.0/21,MA,MA-07,Marrakech + 2620:f:8000::/48,MA,MA-07,Marrakech + + A longer example is the [GEO_Google] Google Corp Geofeed, which lists + the geolocation information for Google corporate offices. + + At the time of writing, Google processes approximately 400 feeds + comprising more than 750,000 IPv4 and IPv6 prefixes. + +3. Consuming Self-Published IP Geolocation Feeds + + Consumers MAY treat published feed data as a hint only and MAY choose + to prefer other sources of geolocation information for any given IP + prefix. Regardless of a consumer's stance with respect to a given + published feed, there are some points of note for sensibly and + effectively consuming published feeds. + +3.1. Feed Integrity + + The integrity of published information SHOULD be protected by + securing the means of publication, for example, by using HTTP over + TLS [RFC2818]. Whenever possible, consumers SHOULD prefer retrieving + geolocation feeds in a manner that guarantees integrity of the feed. + +3.2. Verification of Authority + + Consumers of self-published IP geolocation feeds SHOULD perform some + form of verification that the publisher is in fact authoritative for + the addresses in the feed. The actual means of verification is + likely dependent upon the way in which the feed is discovered. Ad + hoc shared URIs, for example, will likely require an ad hoc + verification process. Future automated means of feed discovery + SHOULD have an accompanying automated means of verification. + + A consumer should only trust geolocation information for IP addresses + or prefixes for which the publisher has been verified as + administratively authoritative. All other geolocation feed entries + should be ignored and logged for further administrative review. + +3.3. Verification of Accuracy + + Errors and inaccuracies may occur at many levels, and publication and + consumption of geolocation data are no exceptions. To the extent + practical, consumers SHOULD take steps to verify the accuracy of + published locality. Verification methodology, resolution of + discrepancies, and preference for alternative sources of data are + left to the discretion of the feed consumer. + + Consumers SHOULD decide on discrepancy thresholds and SHOULD flag, + for administrative review, feed entries that exceed set thresholds. + +3.4. Refreshing Feed Information + + As a publisher can change geolocation data at any time and without + notification, consumers SHOULD implement mechanisms to periodically + refresh local copies of feed data. In the absence of any other + refresh timing information, it is recommended that consumers SHOULD + refresh feeds no less often than weekly and no more often than is + likely to cause issues to the publisher. + + For feeds available via HTTPS (or HTTP), the publisher MAY + communicate refresh timing information by means of the standard HTTP + expiration model ([RFC7234]). Specifically, publishers can include + either an Expires header (Section 5.3 of [RFC7234]) or a Cache- + Control header (Section 5.2 of [RFC7234]) specifying the max-age. + Where practical, consumers SHOULD refresh feed information before the + expiry time is reached. + +4. Privacy Considerations + + Publishers of geolocation feeds are advised to have fully considered + any and all privacy implications of the disclosure of such + information for the users of the described networks prior to + publication. A thorough comprehension of the security considerations + (Section 13 of [RFC6772]) of a chosen geolocation policy is highly + recommended, including an understanding of some of the limitations of + information obscurity (Section 13.5 of [RFC6772]) (see also + [RFC6772]). + + As noted in Section 2.1, each location field in an entry is optional, + in order to support expressing only the level of specificity that the + publisher has deemed acceptable. There is no requirement that the + level of specificity be consistent across all entries within a feed. + In particular, the Postal Code field (Section 2.1.1.5) can provide + very specific geolocation, sometimes within a building. Such + specific Postal Code values MUST NOT be published in geofeeds without + the express consent of the parties being located. + + Operators who publish geolocation information are strongly encouraged + to inform affected users/customers of this fact and of the potential + privacy-related consequences and trade-offs. + +5. Relation to Other Work + + While not originally done in conjunction with the GEOPRIV Working + Group [GEOPRIV], Richard Barnes observed that this work is + nevertheless consistent with that which the group has defined, both + for address format and for privacy. The data elements in geolocation + feeds are equivalent to the following XML structure ([RFC5139] + [W3C.REC-xml-20081126]): + + + country + region + city + postal_code + + + Providing geolocation information to this granularity is equivalent + to the following privacy policy (the definition of the 'building' + Section 6.5.1 of [RFC6772] level of disclosure): + + + + + + + + building + + + + + +6. Security Considerations + + As there is no true security in the obscurity of the location of any + given IP address, self-publication of this data fundamentally opens + no new attack vectors. For publishers, self-published data may + increase the ease with which such location data might be exploited + (it can, for example, make easy the discovery of prefixes populated + with customers as distinct from prefixes not generally in use). + + For consumers, feed retrieval processes may receive input from + potentially hostile sources (e.g., in the event of hijacked traffic). + As such, proper input validation and defense measures MUST be taken + (see the discussion in Section 3.1). + + Similarly, consumers who do not perform sufficient verification of + published data bear the same risks as from other forms of geolocation + configuration errors (see the discussion in Sections 3.2 and 3.3). + + Validation of a feed's contents includes verifying that the publisher + is authoritative for the IP prefixes included in the feed. Failure + to verify IP prefix authority would, for example, allow ISP Bob to + make geolocation statements about IP space held by ISP Alice. At + this time, only out-of-band verification methods are implemented + (i.e., an ISP's feed may be verified against publicly available IP + allocation data). + +7. Planned Future Work + + In order to more flexibly support future extensions, use of a more + expressive feed format has been suggested. Use of JavaScript Object + Notation (JSON) [RFC8259], specifically, has been discussed. + However, at the time of writing, no such specification nor + implementation exists. Nevertheless, work on extensions is deferred + until a more suitable format has been selected. + + The authors are planning on writing a document describing such a new + format. This document describes a currently deployed and used + format. Given the extremely limited extensibility of the present + format no extensions to it are anticipated. Extensibility + requirements are instead expected to be integral to the development + of a new format. + +8. Finding Self-Published IP Geolocation Feeds + + The issue of finding, and later verifying, geolocation feeds is not + formally specified in this document. At this time, only ad hoc feed + discovery and verification has a modicum of established practice (see + below); discussion of other mechanisms has been removed for clarity. + +8.1. Ad Hoc 'Well-Known' URIs + + To date, geolocation feeds have been shared informally in the form of + HTTPS URIs exchanged in email threads. Three example URIs + ([GEO_IETF], [GEO_RIPE_NCC], and [GEO_ICANN]) describe networks that + change locations periodically, the operators and operational + practices of which are well known within their respective technical + communities. + + The contents of the feeds are verified by a similarly ad hoc process, + including: + + * personal knowledge of the parties involved in the exchange and + + * comparison of feed-advertised prefixes with the BGP-advertised + prefixes of Autonomous System Numbers known to be operated by the + publishers. + + Ad hoc mechanisms, while useful for early experimentation by + producers and consumers, are unlikely to be adequate for long-term, + widespread use by multiple parties. Future versions of any such + self-published geolocation feed mechanism SHOULD address scalability + concerns by defining a means for automated discovery and verification + of operational authority of advertised prefixes. + +8.2. Other Mechanisms + + Previous versions of this document referenced use of the WHOIS + service [RFC3912] operated by Regional Internet Registries (RIRs), as + well as possible DNS-based schemes to discover and validate geofeeds. + To the authors' knowledge, support for such mechanisms has never been + implemented, and this speculative text has been removed to avoid + ambiguity. + +9. IANA Considerations + + This document has no IANA actions. + +10. References + +10.1. Normative References + + [ISO.3166.1alpha2] + ISO, "ISO 3166-1 decoding table", + . + + [ISO.3166.2] + ISO, "ISO 3166-2:2007", + . + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + . + + [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO + 10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November + 2003, . + + [RFC4180] Shafranovich, Y., "Common Format and MIME Type for Comma- + Separated Values (CSV) Files", RFC 4180, + DOI 10.17487/RFC4180, October 2005, + . + + [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing + Architecture", RFC 4291, DOI 10.17487/RFC4291, February + 2006, . + + [RFC4632] Fuller, V. and T. Li, "Classless Inter-domain Routing + (CIDR): The Internet Address Assignment and Aggregation + Plan", BCP 122, RFC 4632, DOI 10.17487/RFC4632, August + 2006, . + + [RFC5952] Kawamura, S. and M. Kawashima, "A Recommendation for IPv6 + Address Text Representation", RFC 5952, + DOI 10.17487/RFC5952, August 2010, + . + + [RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, + Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching", + RFC 7234, DOI 10.17487/RFC7234, June 2014, + . + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, . + + [W3C.REC-xml-20081126] + Bray, T., Paoli, J., Sperberg-McQueen, M., Maler, E., and + F. Yergeau, "Extensible Markup Language (XML) 1.0 (Fifth + Edition)", World Wide Web Consortium Recommendation REC- + xml-20081126, November 2008, + . + +10.2. Informative References + + [GEOPRIV] IETF, "Geographic Location/Privacy (geopriv)", + . + + [GEO_Google] + Google, LLC, "Google Corp Geofeed", + . + + [GEO_ICANN] + ICANN, "ICANN Meeting Geolocation Data", + . + + [GEO_IETF] Kumari, W., "IETF Meeting Network Geolocation Data", + . + + [GEO_RIPE_NCC] + Schepers, M., "RIPE NCC Meeting Geolocation Data", + . + + [IPADDR_PY] + Shields, M. and P. Moody, "Google's Python IP address + manipulation library", + . + + [ISO-GLOSSARY] + ISO, "Glossary for ISO 3166", + . + + [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, + DOI 10.17487/RFC2818, May 2000, + . + + [RFC3912] Daigle, L., "WHOIS Protocol Specification", RFC 3912, + DOI 10.17487/RFC3912, September 2004, + . + + [RFC5139] Thomson, M. and J. Winterbottom, "Revised Civic Location + Format for Presence Information Data Format Location + Object (PIDF-LO)", RFC 5139, DOI 10.17487/RFC5139, + February 2008, . + + [RFC6772] Schulzrinne, H., Ed., Tschofenig, H., Ed., Cuellar, J., + Polk, J., Morris, J., and M. Thomson, "Geolocation Policy: + A Document Format for Expressing Privacy Preferences for + Location Information", RFC 6772, DOI 10.17487/RFC6772, + January 2013, . + + [RFC7208] Kitterman, S., "Sender Policy Framework (SPF) for + Authorizing Use of Domains in Email, Version 1", RFC 7208, + DOI 10.17487/RFC7208, April 2014, + . + + [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data + Interchange Format", STD 90, RFC 8259, + DOI 10.17487/RFC8259, December 2017, + . + +Appendix A. Sample Python Validation Code + + Included here is a simple format validator in Python for self- + published ipgeo feeds. This tool reads CSV data in the self- + published ipgeo feed format from the standard input and performs + basic validation. It is intended for use by feed publishers before + launching a feed. Note that this validator does not verify the + uniqueness of every IP prefix entry within the feed as a whole but + only verifies the syntax of each single line from within the feed. A + complete validator MUST also ensure IP prefix uniqueness. + + The main source file "ipgeo_feed_validator.py" follows. It requires + use of the open source ipaddr Python library for IP address and CIDR + parsing and validation [IPADDR_PY]. + + + #!/usr/bin/python + # + # Copyright (c) 2012 IETF Trust and the persons identified as + # authors of the code. All rights reserved. Redistribution and use + # in source and binary forms, with or without modification, is + # permitted pursuant to, and subject to the license terms contained + # in, the Simplified BSD License set forth in Section 4.c of the + # IETF Trust's Legal Provisions Relating to IETF + # Documents (http://trustee.ietf.org/license-info). + + """Simple format validator for self-published ipgeo feeds. + + This tool reads CSV data in the self-published ipgeo feed format + from the standard input and performs basic validation. It is + intended for use by feed publishers before launching a feed. + """ + + import csv + import ipaddr + import re + import sys + + + class IPGeoFeedValidator(object): + def __init__(self): + self.prefixes = {} + self.line_number = 0 + self.output_log = {} + self.SetOutputStream(sys.stderr) + + def Validate(self, feed): + """Check validity of an IPGeo feed. + + Args: + feed: iterable with feed lines + """ + + for line in feed: + self._ValidateLine(line) + + def SetOutputStream(self, logfile): + """Controls where the output messages go do (STDERR by default). + + Use None to disable logging. + + Args: + logfile: a file object (e.g., sys.stdout) or None. + """ + self.output_stream = logfile + + def CountErrors(self, severity): + """How many ERRORs or WARNINGs were generated.""" + return len(self.output_log.get(severity, [])) + + ############################################################ + def _ValidateLine(self, line): + line = line.rstrip('\r\n') + self.line_number += 1 + self.line = line.split('#')[0] + self.is_correct_line = True + + if self._ShouldIgnoreLine(line): + return + + fields = [field for field in csv.reader([line])][0] + + self._ValidateFields(fields) + self._FlushOutputStream() + + def _ShouldIgnoreLine(self, line): + line = line.strip() + if line.startswith('#'): + return True + return len(line) == 0 + + ############################################################ + def _ValidateFields(self, fields): + assert(len(fields) > 0) + + is_correct = self._IsIPAddressOrPrefixCorrect(fields[0]) + + if len(fields) > 1: + if not self._IsAlpha2CodeCorrect(fields[1]): + is_correct = False + + if len(fields) > 2 and not self._IsRegionCodeCorrect(fields[2]): + is_correct = False + + if len(fields) != 5: + self._ReportWarning('5 fields were expected (got %d).' + % len(fields)) + + ############################################################ + def _IsIPAddressOrPrefixCorrect(self, field): + if '/' in field: + return self._IsCIDRCorrect(field) + return self._IsIPAddressCorrect(field) + + def _IsCIDRCorrect(self, cidr): + try: + ipprefix = ipaddr.IPNetwork(cidr) + if ipprefix.network._ip != ipprefix._ip: + self._ReportError('Incorrect IP Network.') + return False + if ipprefix.is_private: + self._ReportError('IP Address must not be private.') + return False + except: + self._ReportError('Incorrect IP Network.') + return False + return True + + def _IsIPAddressCorrect(self, ipaddress): + try: + ip = ipaddr.IPAddress(ipaddress) + except: + self._ReportError('Incorrect IP Address.') + return False + if ip.is_private: + self._ReportError('IP Address must not be private.') + return False + return True + + ############################################################ + def _IsAlpha2CodeCorrect(self, alpha2code): + if len(alpha2code) == 0: + return True + if len(alpha2code) != 2 or not alpha2code.isalpha(): + self._ReportError( + 'Alpha 2 code must be in the ISO 3166-1 alpha 2 format.') + return False + return True + + def _IsRegionCodeCorrect(self, region_code): + if len(region_code) == 0: + return True + if '-' not in region_code: + self._ReportError('Region code must be in ISO 3166-2 format.') + return False + + parts = region_code.split('-') + if not self._IsAlpha2CodeCorrect(parts[0]): + return False + return True + + ############################################################ + def _ReportError(self, message): + self._ReportWithSeverity('ERROR', message) + + def _ReportWarning(self, message): + self._ReportWithSeverity('WARNING', message) + + def _ReportWithSeverity(self, severity, message): + self.is_correct_line = False + output_line = '%s: %s\n' % (severity, message) + + if severity not in self.output_log: + self.output_log[severity] = [] + self.output_log[severity].append(output_line) + + if self.output_stream is not None: + self.output_stream.write(output_line) + + def _FlushOutputStream(self): + if self.is_correct_line: return + if self.output_stream is None: return + + self.output_stream.write('line %d: %s\n\n' + % (self.line_number, self.line)) + + + ############################################################ + def main(): + feed_validator = IPGeoFeedValidator() + feed_validator.Validate(sys.stdin) + + if feed_validator.CountErrors('ERROR'): + sys.exit(1) + + if __name__ == '__main__': + main() + + + A unit test file, "ipgeo_feed_validator_test.py" is provided as well. + It provides basic test coverage of the code above, though does not + test correct handling of non-ASCII UTF-8 strings. + + + #!/usr/bin/python + # + # Copyright (c) 2012 IETF Trust and the persons identified as + # authors of the code. All rights reserved. Redistribution and use + # in source and binary forms, with or without modification, is + # permitted pursuant to, and subject to the license terms contained + # in, the Simplified BSD License set forth in Section 4.c of the + # IETF Trust's Legal Provisions Relating to IETF + # Documents (http://trustee.ietf.org/license-info). + + import sys + from ipgeo_feed_validator import IPGeoFeedValidator + + class IPGeoFeedValidatorTest(object): + def __init__(self): + self.validator = IPGeoFeedValidator() + self.validator.SetOutputStream(None) + self.successes = 0 + self.failures = 0 + + def Run(self): + self.TestFeedLine('# asdf', 0, 0) + self.TestFeedLine(' ', 0, 0) + self.TestFeedLine('', 0, 0) + + self.TestFeedLine('asdf', 1, 1) + self.TestFeedLine('asdf,US,,,', 1, 0) + self.TestFeedLine('aaaa::,US,,,', 0, 0) + self.TestFeedLine('zzzz::,US', 1, 1) + self.TestFeedLine(',US,,,', 1, 0) + self.TestFeedLine('55.66.77', 1, 1) + self.TestFeedLine('55.66.77.888', 1, 1) + self.TestFeedLine('55.66.77.asdf', 1, 1) + + self.TestFeedLine('2001:db8:cafe::/48,PL,PL-MZ,,02-784', 0, 0) + self.TestFeedLine('2001:db8:cafe::/48', 0, 1) + + self.TestFeedLine('55.66.77.88,PL', 0, 1) + self.TestFeedLine('55.66.77.88,PL,,,', 0, 0) + self.TestFeedLine('55.66.77.88,,,,', 0, 0) + self.TestFeedLine('55.66.77.88,ZZ,,,', 0, 0) + self.TestFeedLine('55.66.77.88,US,,,', 0, 0) + self.TestFeedLine('55.66.77.88,USA,,,', 1, 0) + self.TestFeedLine('55.66.77.88,99,,,', 1, 0) + + self.TestFeedLine('55.66.77.88,US,US-CA,,', 0, 0) + self.TestFeedLine('55.66.77.88,US,USA-CA,,', 1, 0) + self.TestFeedLine('55.66.77.88,USA,USA-CA,,', 2, 0) + + self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,', 0, 0) + self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043', + 0, 0) + self.TestFeedLine('55.66.77.88,US,US-CA,Mountain View,94043,' + '1600 Ampthitheatre Parkway', 0, 1) + + self.TestFeedLine('55.66.77.0/24,US,,,', 0, 0) + self.TestFeedLine('55.66.77.88/24,US,,,', 1, 0) + self.TestFeedLine('55.66.77.88/32,US,,,', 0, 0) + self.TestFeedLine('55.66.77/24,US,,,', 1, 0) + self.TestFeedLine('55.66.77.0/35,US,,,', 1, 0) + + self.TestFeedLine('172.15.30.1,US,,,', 0, 0) + self.TestFeedLine('172.28.30.1,US,,,', 1, 0) + self.TestFeedLine('192.167.100.1,US,,,', 0, 0) + self.TestFeedLine('192.168.100.1,US,,,', 1, 0) + self.TestFeedLine('10.0.5.9,US,,,', 1, 0) + self.TestFeedLine('10.0.5.0/24,US,,,', 1, 0) + self.TestFeedLine('fc00::/48,PL,,,', 1, 0) + self.TestFeedLine('fe00::/48,PL,,,', 0, 0) + + print ('%d tests passed, %d failed' + % (self.successes, self.failures)) + + def IsOutputLogCorrectAtSeverity(self, severity, + expected_msg_count): + msg_count = self.validator.CountErrors(severity) + + if msg_count != expected_msg_count: + print ('TEST FAILED: %s\nexpected %d %s[s], observed %d\n%s\n' + % (self.validator.line, expected_msg_count, severity, + msg_count, + str(self.validator.output_log[severity]))) + return False + return True + + def IsOutputLogCorrect(self, new_errors, new_warnings): + retval = True + + if not self.IsOutputLogCorrectAtSeverity('ERROR', new_errors): + retval = False + if not self.IsOutputLogCorrectAtSeverity('WARNING', + new_warnings): + retval = False + + return retval + + def TestFeedLine(self, line, warning_count, error_count): + self.validator.output_log['WARNING'] = [] + self.validator.output_log['ERROR'] = [] + self.validator._ValidateLine(line) + + if not self.IsOutputLogCorrect(warning_count, error_count): + self.failures += 1 + return False + + self.successes += 1 + return True + + + if __name__ == '__main__': + IPGeoFeedValidatorTest().Run() + + +Acknowledgements + + The authors would like to express their gratitude to reviewers and + early implementors, including but not limited to Mikael Abrahamsson, + Andrew Alston, Ray Bellis, John Bond, Alissa Cooper, Andras Erdei, + Stephen Farrell, Marco Hogewoning, Mike Joseph, Maciej Kuzniar, + George Michaelson, Menno Schepers, Justyna Sidorska, Pim van Pelt, + and Bjoern A. Zeeb. + + In particular, Richard L. Barnes and Andy Newton contributed + substantial review, text, and advice. + +Authors' Addresses + + Erik Kline + Loon LLC + 1600 Amphitheatre Parkway + Mountain View, CA 94043 + United States of America + + Email: ek@loon.com + + + Krzysztof Duleba + Google + 1600 Amphitheatre Parkway + Mountain View, CA 94043 + United States of America + + Email: kduleba@google.com + + + Zoltan Szamonek + Google Switzerland GmbH + Brandschenkestrasse 110 + CH-8002 Zürich + Switzerland + + Email: zszami@google.com + + + Stefan Moser + Google Switzerland GmbH + Brandschenkestrasse 110 + CH-8002 Zürich + Switzerland + + Email: smoser@google.com + + + Warren Kumari + Google + 1600 Amphitheatre Parkway + Mountain View, CA 94043 + United States of America + + Email: warren@kumari.net -- cgit v1.2.3