1 files changed, 395 insertions, 0 deletions
diff --git a/doc/rfc/rfc2517.txt b/doc/rfc/rfc2517.txt
new file mode 100644
index 0000000..4ce72ff
--- /dev/null
+++ b/doc/rfc/rfc2517.txt
@@ -0,0 +1,395 @@
+
+
+
+
+
+
+Network Working Group                                       R. Moats
+Request for Comments: 2517                                  R. Huber
+Category: Informational                                         AT&T
+                                                       February 1999
+
+
+       Building Directories from DNS: Experiences from WWWSeeker
+
+Status of this Memo
+
+   This memo provides information for the Internet community.  It does
+   not specify an Internet standard of any kind.  Distribution of this
+   memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (1999).  All Rights Reserved.
+
+Abstract
+
+   There has been much discussion and several documents written about
+   the need for an Internet Directory.  Recently, this discussion has
+   focused on ways to discover an organization's domain name without
+   relying on use of DNS as a directory service.  This memo discusses
+   lessons that were learned during InterNIC Directory and Database
+   Services' development and operation of WWWSeeker, an application that
+   finds a web site given information about the name and location of an
+   organization.  The back end database that drives this application was
+   built from information obtained from domain registries via WHOIS and
+   other protocols.  We present this information to help future
+   implementors avoid some of the blind alleys that we have already
+   explored.  This work builds on the Netfind system that was created by
+   Mike Schwartz and his team at the University of Colorado at Boulder
+   [1].
+
+1. Introduction
+
+   Over time, there have been several RFCs [2, 3, 4] about approaches
+   for providing Internet Directories.  Many of the earlier documents
+   discussed white pages directories that supply mappings from a
+   person's name to their telephone number, email address, etc.
+
+   More recently, there has been discussion of directories that map from
+   a company name to a domain name or web site.  Many people are using
+   DNS as a directory today to find this type of information about a
+   given company.  Typically when DNS is used, users guess the domain
+   name of the company they are looking for and then prepend "www.".
+   This makes it highly desirable for a company to have an easily
+
+
+
+Moats & Huber                Informational                      [Page 1]
+
+RFC 2517             Building Directories from DNS         February 1999
+
+
+   guessable name.
+
+   There are two major problems here.  As the number of assigned names
+   increases, it becomes more difficult to get an easily guessable name.
+   Also, the TLD must be guessed as well as the name.  While many users
+   just guess ".COM" as the "default" TLD today, there are many two-
+   letter country code top-level domains in current use as well as other
+   gTLDs (.NET, .ORG, and possibly .EDU) with the prospect of additional
+   gTLDs in the future.  As the number of TLDs in general use increases,
+   guessing gets more difficult.
+
+   Between July 1996 and our shutdown in March 1998, the InterNIC
+   Directory and Database Services project maintained the Netfind search
+   engine [1] and the associated database that maps organization
+   information to domain names. This database thus acted as the type of
+   Internet directory that associates company names with domain names.
+   We also built WWWSeeker, a system that used the Netfind database to
+   find web sites associated with a given organization.  The experienced
+   gained from maintaining and growing this database provides valuable
+   insight into the issues of providing a directory service.  We present
+   it here to allow future implementors to avoid some of the blind
+   alleys that we have already explored.
+
+2. Directory Population
+
+2.1 What to do?
+
+   There are two issues in populating a directory: finding all the
+   domain names (building the skeleton) and associating those domains
+   with entities (adding the meat).  These two issues are discussed
+   below.
+
+2.2 Building the skeleton
+
+   In "building the skeleton", it is popular to suggest using a variant
+   of a "tree walk" to determine the domains that need to be added to
+   the directory.  Our experience is that this is neither a reasonable
+   nor an efficient proposal for maintaining such a directory.  Except
+   for some infrequent and long-standing DNS surveys [5], DNS "tree
+   walks" tend to be discouraged by the Internet community, especially
+   given that the frequency of DNS changes would require a new tree walk
+   monthly (if not more often).  Instead, our experience has shown that
+   data on allocated DNS domains can usually be retrieved in bulk
+   fashion with FTP, HTTP, or Gopher (we have used each of these for
+   particular TLDs).  This has the added advantage of both "building the
+   skeleton" and "adding the meat" at the same time.  Our favorite
+   method for finding a server that has allocated DNS domain information
+   is to start with the list maintained at
+
+
+
+Moats & Huber                Informational                      [Page 2]
+
+RFC 2517             Building Directories from DNS         February 1999
+
+
+   http://www.alldomains.com/countryindex.html and go from there.
+   Before this was available, it was necessary to hunt for a registry
+   using trial and error.
+
+   When maintaining the database, existing domains may be verified via
+   direct DNS lookups rather than a "tree walk." "Tree walks" should
+   therefore be the choice of last resort for directory population, and
+   bulk retrieval should be used whenever possible.
+
+2.3 Adding the meat
+
+   A possibility for populating a directory ("adding the meat") is to
+   use an automated system that makes repeated queries using the WHOIS
+   protocol to gather information about the organization that owns a
+   domain.  The queries would be made against a WHOIS server located
+   with the above method. At the conclusion of the InterNIC Directory
+   and Database Services project, our backend database contained about
+   2.9 million records built from data that could be retrieved via
+   WHOIS.  The entire database contained 3.25 million records, with the
+   additional records coming from sources other than WHOIS.
+
+   In our experience this information contains many factual and
+   typographical errors and requires further examination and processing
+   to improve its quality.  Further, TLD registrars that support WHOIS
+   typically only support WHOIS information for second level domains
+   (i.e. ne.us) as opposed to lower level domains (i.e.
+   windrose.omaha.ne.us).  Also, there are TLDs without registrars, TLDs
+   without WHOIS support, and still other TLDs that use other methods
+   (HTTP, FTP, gopher) for providing organizational information.  Based
+   on our experience, an implementor of an internet directory needs to
+   support multiple protocols for directory population.  An automated
+   WHOIS search tool is necessary, but isn't enough.
+
+3. Directory Updating: Full Rebuilds vs Incremental Updates
+
+   Given the size of our database in April 1998 when it was last
+   generated, a complete rebuild of the database that is available from
+   WHOIS lookups would require between 134.2 to 167.8 days just for
+   WHOIS lookups from a Sun SPARCstation 20. This estimate does not
+   include other considerations (for example, inverting the token tree
+   required about 24 hours processing time on a Sun SPARCstation 20)
+   that would increase the amount of time to rebuild the entire
+   database.
+
+   Whether this is feasible depends on the frequency of database updates
+   provided.  Because of the rate of growth of allocated domain names
+   (150K-200K new allocated domains per month in early 1998), we
+   provided monthly updates of the database. To rebuild the database
+
+
+
+Moats & Huber                Informational                      [Page 3]
+
+RFC 2517             Building Directories from DNS         February 1999
+
+
+   each month (based on the above time estimate) would require between 3
+   and 5 machines to be dedicated full time (independent of machine
+   architecture).  Instead, we checkpointed the allocated domain list
+   and rebuild on an incremental basis during one weekend of the month.
+   This allowed us to complete the update on between 1 and 4 machines (3
+   Sun SPARCstation 20s and a dual-processor Sparcserver 690) without
+   full dedication over a couple of days.  Further, by coupling
+   incremental updates with periodic refresh of existing data (which can
+   be done during another part of the month and doesn't require full
+   dedication of machine hardware), older records would be periodically
+   updated when the underlying information changes.  The tradeoff is
+   timeliness and accuracy of data (some data in the database may be
+   old) against hardware and processing costs.
+
+4. Directory Presentation: Distributed vs Monolithic
+
+   While a distributed directory is a desirable goal, we maintained our
+   database as a monolithic structure.  Given past growth, it is not
+   clear at what point migrating to a distributed directory becomes
+   actually necessary to support customer queries.  Our last database
+   contained over 3.25 million records in a flat ASCII file.  Searching
+   was done via a PERL script of an inverted tree (also produced by a
+   PERL script).  While admittedly primitive, this configuration
+   supported over 200,000 database queries per month from our production
+   servers.
+
+   Increasing the database size only requires more disk space to hold
+   the database and inverted tree. Of course, using database technology
+   would probably improve performance and scalability, but we had not
+   reached the point where this technology was required.
+
+5. Security Considerations
+
+   The underlying data for the type of directory discussed in this
+   document is already generally available through WHOIS, DNS, and other
+   standard interfaces.  No new information is made available by using
+   these techniques though many types of search become much easier.  To
+   the extent that easier access to this data makes it easier to find
+   specific sites or machines to attack, security may be decreased.
+
+   The protocols discussed here do not have built-in security features.
+   If one source machine is spoofed while the directory data is being
+   gathered, substantial amounts of incorrect and misleading data could
+   be pulled in to the directory and be spread to a wider audience.
+
+
+
+
+
+
+
+Moats & Huber                Informational                      [Page 4]
+
+RFC 2517             Building Directories from DNS         February 1999
+
+
+   In general, building a directory from registry data will not open any
+   new security holes since the data is already available to the public.
+   Existing security and accuracy problems with the data sources are
+   likely to be amplified.
+
+6. Acknowledgments
+
+   This work described in this document was partially supported by the
+   National Science Foundation under Cooperative Agreement NCR-9218179.
+
+7. References
+
+   [1] M. F. Schwartz, C. Pu.  "Applying an Information
+       Gathering Architecture to Netfind: A White Pages Tool for a
+       Changing and Growing Internet", University of Colorado Technical
+       Report CU-CS-656-93.  December 1993, revised July 1994.
+
+       URL:ftp://ftp.cs.colorado.edu/pub/cs/techreports/schwartz/Netfind
+
+   [2] Sollins, K., "Plan for Internet Directory Services", RFC 1107,
+       July 1989.
+
+   [3] Hardcastle-Kille, S., Huizer, E., Cerf, V., Hobby, R. and S.
+       Kent, "A Strategic Plan for Deploying an Internet X.500 Directory
+       Service", RFC 1430, February 1993.
+
+   [4] Postel, J. and  C. Anderson, "White Pages Meeting Report", RFC
+       1588, February 1994.
+
+   [5] M. Lottor, "Network Wizards Internet Domain Survey", available
+       from http://www.nw.com/zone/WWW/top.html
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moats & Huber                Informational                      [Page 5]
+
+RFC 2517             Building Directories from DNS         February 1999
+
+
+8. Authors' Addresses
+
+   Ryan Moats
+   AT&T
+   15621 Drexel Circle
+   Omaha, NE 68135-2358
+   USA
+
+   EMail:  jayhawk@att.com
+
+
+   Rick Huber
+   AT&T
+   Room C3-3B30, 200 Laurel Ave. South
+   Middletown, NJ 07748
+   USA
+
+   EMail: rvh@att.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moats & Huber                Informational                      [Page 6]
+
+RFC 2517             Building Directories from DNS         February 1999
+
+
+9.  Full Copyright Statement
+
+   Copyright (C) The Internet Society (1999).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Moats & Huber                Informational                      [Page 7]
+