summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc2517.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc2517.txt')
-rw-r--r--doc/rfc/rfc2517.txt395
1 files changed, 395 insertions, 0 deletions
diff --git a/doc/rfc/rfc2517.txt b/doc/rfc/rfc2517.txt
new file mode 100644
index 0000000..4ce72ff
--- /dev/null
+++ b/doc/rfc/rfc2517.txt
@@ -0,0 +1,395 @@
+
+
+
+
+
+
+Network Working Group R. Moats
+Request for Comments: 2517 R. Huber
+Category: Informational AT&T
+ February 1999
+
+
+ Building Directories from DNS: Experiences from WWWSeeker
+
+Status of this Memo
+
+ This memo provides information for the Internet community. It does
+ not specify an Internet standard of any kind. Distribution of this
+ memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (1999). All Rights Reserved.
+
+Abstract
+
+ There has been much discussion and several documents written about
+ the need for an Internet Directory. Recently, this discussion has
+ focused on ways to discover an organization's domain name without
+ relying on use of DNS as a directory service. This memo discusses
+ lessons that were learned during InterNIC Directory and Database
+ Services' development and operation of WWWSeeker, an application that
+ finds a web site given information about the name and location of an
+ organization. The back end database that drives this application was
+ built from information obtained from domain registries via WHOIS and
+ other protocols. We present this information to help future
+ implementors avoid some of the blind alleys that we have already
+ explored. This work builds on the Netfind system that was created by
+ Mike Schwartz and his team at the University of Colorado at Boulder
+ [1].
+
+1. Introduction
+
+ Over time, there have been several RFCs [2, 3, 4] about approaches
+ for providing Internet Directories. Many of the earlier documents
+ discussed white pages directories that supply mappings from a
+ person's name to their telephone number, email address, etc.
+
+ More recently, there has been discussion of directories that map from
+ a company name to a domain name or web site. Many people are using
+ DNS as a directory today to find this type of information about a
+ given company. Typically when DNS is used, users guess the domain
+ name of the company they are looking for and then prepend "www.".
+ This makes it highly desirable for a company to have an easily
+
+
+
+Moats & Huber Informational [Page 1]
+
+RFC 2517 Building Directories from DNS February 1999
+
+
+ guessable name.
+
+ There are two major problems here. As the number of assigned names
+ increases, it becomes more difficult to get an easily guessable name.
+ Also, the TLD must be guessed as well as the name. While many users
+ just guess ".COM" as the "default" TLD today, there are many two-
+ letter country code top-level domains in current use as well as other
+ gTLDs (.NET, .ORG, and possibly .EDU) with the prospect of additional
+ gTLDs in the future. As the number of TLDs in general use increases,
+ guessing gets more difficult.
+
+ Between July 1996 and our shutdown in March 1998, the InterNIC
+ Directory and Database Services project maintained the Netfind search
+ engine [1] and the associated database that maps organization
+ information to domain names. This database thus acted as the type of
+ Internet directory that associates company names with domain names.
+ We also built WWWSeeker, a system that used the Netfind database to
+ find web sites associated with a given organization. The experienced
+ gained from maintaining and growing this database provides valuable
+ insight into the issues of providing a directory service. We present
+ it here to allow future implementors to avoid some of the blind
+ alleys that we have already explored.
+
+2. Directory Population
+
+2.1 What to do?
+
+ There are two issues in populating a directory: finding all the
+ domain names (building the skeleton) and associating those domains
+ with entities (adding the meat). These two issues are discussed
+ below.
+
+2.2 Building the skeleton
+
+ In "building the skeleton", it is popular to suggest using a variant
+ of a "tree walk" to determine the domains that need to be added to
+ the directory. Our experience is that this is neither a reasonable
+ nor an efficient proposal for maintaining such a directory. Except
+ for some infrequent and long-standing DNS surveys [5], DNS "tree
+ walks" tend to be discouraged by the Internet community, especially
+ given that the frequency of DNS changes would require a new tree walk
+ monthly (if not more often). Instead, our experience has shown that
+ data on allocated DNS domains can usually be retrieved in bulk
+ fashion with FTP, HTTP, or Gopher (we have used each of these for
+ particular TLDs). This has the added advantage of both "building the
+ skeleton" and "adding the meat" at the same time. Our favorite
+ method for finding a server that has allocated DNS domain information
+ is to start with the list maintained at
+
+
+
+Moats & Huber Informational [Page 2]
+
+RFC 2517 Building Directories from DNS February 1999
+
+
+ http://www.alldomains.com/countryindex.html and go from there.
+ Before this was available, it was necessary to hunt for a registry
+ using trial and error.
+
+ When maintaining the database, existing domains may be verified via
+ direct DNS lookups rather than a "tree walk." "Tree walks" should
+ therefore be the choice of last resort for directory population, and
+ bulk retrieval should be used whenever possible.
+
+2.3 Adding the meat
+
+ A possibility for populating a directory ("adding the meat") is to
+ use an automated system that makes repeated queries using the WHOIS
+ protocol to gather information about the organization that owns a
+ domain. The queries would be made against a WHOIS server located
+ with the above method. At the conclusion of the InterNIC Directory
+ and Database Services project, our backend database contained about
+ 2.9 million records built from data that could be retrieved via
+ WHOIS. The entire database contained 3.25 million records, with the
+ additional records coming from sources other than WHOIS.
+
+ In our experience this information contains many factual and
+ typographical errors and requires further examination and processing
+ to improve its quality. Further, TLD registrars that support WHOIS
+ typically only support WHOIS information for second level domains
+ (i.e. ne.us) as opposed to lower level domains (i.e.
+ windrose.omaha.ne.us). Also, there are TLDs without registrars, TLDs
+ without WHOIS support, and still other TLDs that use other methods
+ (HTTP, FTP, gopher) for providing organizational information. Based
+ on our experience, an implementor of an internet directory needs to
+ support multiple protocols for directory population. An automated
+ WHOIS search tool is necessary, but isn't enough.
+
+3. Directory Updating: Full Rebuilds vs Incremental Updates
+
+ Given the size of our database in April 1998 when it was last
+ generated, a complete rebuild of the database that is available from
+ WHOIS lookups would require between 134.2 to 167.8 days just for
+ WHOIS lookups from a Sun SPARCstation 20. This estimate does not
+ include other considerations (for example, inverting the token tree
+ required about 24 hours processing time on a Sun SPARCstation 20)
+ that would increase the amount of time to rebuild the entire
+ database.
+
+ Whether this is feasible depends on the frequency of database updates
+ provided. Because of the rate of growth of allocated domain names
+ (150K-200K new allocated domains per month in early 1998), we
+ provided monthly updates of the database. To rebuild the database
+
+
+
+Moats & Huber Informational [Page 3]
+
+RFC 2517 Building Directories from DNS February 1999
+
+
+ each month (based on the above time estimate) would require between 3
+ and 5 machines to be dedicated full time (independent of machine
+ architecture). Instead, we checkpointed the allocated domain list
+ and rebuild on an incremental basis during one weekend of the month.
+ This allowed us to complete the update on between 1 and 4 machines (3
+ Sun SPARCstation 20s and a dual-processor Sparcserver 690) without
+ full dedication over a couple of days. Further, by coupling
+ incremental updates with periodic refresh of existing data (which can
+ be done during another part of the month and doesn't require full
+ dedication of machine hardware), older records would be periodically
+ updated when the underlying information changes. The tradeoff is
+ timeliness and accuracy of data (some data in the database may be
+ old) against hardware and processing costs.
+
+4. Directory Presentation: Distributed vs Monolithic
+
+ While a distributed directory is a desirable goal, we maintained our
+ database as a monolithic structure. Given past growth, it is not
+ clear at what point migrating to a distributed directory becomes
+ actually necessary to support customer queries. Our last database
+ contained over 3.25 million records in a flat ASCII file. Searching
+ was done via a PERL script of an inverted tree (also produced by a
+ PERL script). While admittedly primitive, this configuration
+ supported over 200,000 database queries per month from our production
+ servers.
+
+ Increasing the database size only requires more disk space to hold
+ the database and inverted tree. Of course, using database technology
+ would probably improve performance and scalability, but we had not
+ reached the point where this technology was required.
+
+5. Security Considerations
+
+ The underlying data for the type of directory discussed in this
+ document is already generally available through WHOIS, DNS, and other
+ standard interfaces. No new information is made available by using
+ these techniques though many types of search become much easier. To
+ the extent that easier access to this data makes it easier to find
+ specific sites or machines to attack, security may be decreased.
+
+ The protocols discussed here do not have built-in security features.
+ If one source machine is spoofed while the directory data is being
+ gathered, substantial amounts of incorrect and misleading data could
+ be pulled in to the directory and be spread to a wider audience.
+
+
+
+
+
+
+
+Moats & Huber Informational [Page 4]
+
+RFC 2517 Building Directories from DNS February 1999
+
+
+ In general, building a directory from registry data will not open any
+ new security holes since the data is already available to the public.
+ Existing security and accuracy problems with the data sources are
+ likely to be amplified.
+
+6. Acknowledgments
+
+ This work described in this document was partially supported by the
+ National Science Foundation under Cooperative Agreement NCR-9218179.
+
+7. References
+
+ [1] M. F. Schwartz, C. Pu. "Applying an Information
+ Gathering Architecture to Netfind: A White Pages Tool for a
+ Changing and Growing Internet", University of Colorado Technical
+ Report CU-CS-656-93. December 1993, revised July 1994.
+
+ URL:ftp://ftp.cs.colorado.edu/pub/cs/techreports/schwartz/Netfind
+
+ [2] Sollins, K., "Plan for Internet Directory Services", RFC 1107,
+ July 1989.
+
+ [3] Hardcastle-Kille, S., Huizer, E., Cerf, V., Hobby, R. and S.
+ Kent, "A Strategic Plan for Deploying an Internet X.500 Directory
+ Service", RFC 1430, February 1993.
+
+ [4] Postel, J. and C. Anderson, "White Pages Meeting Report", RFC
+ 1588, February 1994.
+
+ [5] M. Lottor, "Network Wizards Internet Domain Survey", available
+ from http://www.nw.com/zone/WWW/top.html
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moats & Huber Informational [Page 5]
+
+RFC 2517 Building Directories from DNS February 1999
+
+
+8. Authors' Addresses
+
+ Ryan Moats
+ AT&T
+ 15621 Drexel Circle
+ Omaha, NE 68135-2358
+ USA
+
+ EMail: jayhawk@att.com
+
+
+ Rick Huber
+ AT&T
+ Room C3-3B30, 200 Laurel Ave. South
+ Middletown, NJ 07748
+ USA
+
+ EMail: rvh@att.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moats & Huber Informational [Page 6]
+
+RFC 2517 Building Directories from DNS February 1999
+
+
+9. Full Copyright Statement
+
+ Copyright (C) The Internet Society (1999). All Rights Reserved.
+
+ This document and translations of it may be copied and furnished to
+ others, and derivative works that comment on or otherwise explain it
+ or assist in its implementation may be prepared, copied, published
+ and distributed, in whole or in part, without restriction of any
+ kind, provided that the above copyright notice and this paragraph are
+ included on all such copies and derivative works. However, this
+ document itself may not be modified in any way, such as by removing
+ the copyright notice or references to the Internet Society or other
+ Internet organizations, except as needed for the purpose of
+ developing Internet standards in which case the procedures for
+ copyrights defined in the Internet Standards process must be
+ followed, or as required to translate it into languages other than
+ English.
+
+ The limited permissions granted above are perpetual and will not be
+ revoked by the Internet Society or its successors or assigns.
+
+ This document and the information contained herein is provided on an
+ "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+ TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+ BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+ HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+ MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moats & Huber Informational [Page 7]
+