diff options
Diffstat (limited to 'doc/rfc/rfc1625.txt')
-rw-r--r-- | doc/rfc/rfc1625.txt | 395 |
1 files changed, 395 insertions, 0 deletions
diff --git a/doc/rfc/rfc1625.txt b/doc/rfc/rfc1625.txt new file mode 100644 index 0000000..0c70aa6 --- /dev/null +++ b/doc/rfc/rfc1625.txt @@ -0,0 +1,395 @@ + + + + + + +Network Working Group M. St. Pierre +Request for Comments: 1625 WAIS, Inc. +Category: Informational J. Fullton + CNIDR + K. Gamiel + CNIDR + J. Goldman + Thinking Machines Corp. + B. Kahle + WAIS, Inc. + J. Kunze + UC Berkeley + H. Morris + WAIS, Inc. + F. Schiettecatte + FS Consulting + June 1994 + + + WAIS over Z39.50-1988 + +Status of this Memo + + This memo provides information for the Internet community. This memo + does not specify an Internet standard of any kind. Distribution of + this memo is unlimited. + +1. Introduction + + The network publishing system, Wide Area Information Servers (WAIS), + is designed to help users find information over a computer network. + The principles guiding WAIS development are: + + 1. A wide-area networked-based information system for searching, + browsing, and publishing. + 2. Based on standards. + 3. Easy to use. + 4. Flexible and growth oriented. + + From this basis, a large group of developers, publishers, standards + bodies, libraries, government agencies, schools, and users have been + helping further the WAIS system. + + The WAIS software architecture has four main components: the client, + the server, the database, and the protocol. The WAIS client is a + user-interface program that sends requests for information to local + or remote servers. Clients are available for most popular desktop + environments. The WAIS server is a program that services client + + + +IIIR Working Group [Page 1] + +RFC 1625 WAIS over Z39.50-1988 June 1994 + + + requests, and is available on a variety of UNIX platforms. The + server generally runs on a machine containing one or more information + sources, or WAIS databases. The protocol, Z39.50-1988, is used to + connect WAIS clients and servers and is based on the 1988 Version of + the NISO Z39.50 Information Retrieval Service and Protocol Standard. + The goal of the WAIS network publishing system is to create an open + architecture of information clients and servers by using a standard + computer-to-computer protocol that enables clients to communicate + with servers. + + WAIS development began in October 1989 with the first Internet + release occurring in April 1991. From the beginning, WAIS committed + to use the Z39.50-1988 standard as the information retrieval protocol + between WAIS clients and servers. The implementation is still in use + today by existing WAIS clients and servers resulting in over 50,000 + users of Z39.50-1988 on the Internet. + +2. Purpose + + The purpose of this memo is to initiate a discussion for a migration + path of the WAIS technology from Z39.50-1988 Information Retrieval + Service Definitions and Protocol Specification for Library + Applications [1] to Z39.50-1992 [2] and then to Z39.50-1994 [3]. The + purpose of this memo is not to provide a detailed implementation + specification, but rather to describe the high-level design goals and + functional assumptions made in the WAIS implementation of Z39.50- + 1988. WAIS use of Z39.50-1992 and Z39.50-1994 standards will be the + subject of future RFCs. + +3. Historical Design Goals of WAIS + + As an aid to understanding the original WAIS implementation and its + use of Z39.50-1988, the historical design goals of WAIS are presented + in this section. Included with each goal is a brief description of + the assumptions used to meet these design goals. + + 1. Provide users access to bibliographic and non-bibliographic + information, including full-text and images. + + Because Z39.50-1988 grew out of the bibliographic community, + additional assumptions with the protocol were required to serve non- + bibliographic information. They were also necessary to serve + documents existing in multiple formats (e.g., rtf, postscript, gif, + etc.). + + 2. Keep the client/server interface simple and independent of + changes in the functionality of the server. + + + + +IIIR Working Group [Page 2] + +RFC 1625 WAIS over Z39.50-1988 June 1994 + + + To achieve this, the text string entered by the user was transmitted + to the server without parsing the string into a Type-1 RPN (reverse- + polish notation) query, as is common for bibliographic applications. + Instead WAIS defined a new Type-3 query containing the text string. + In this way, knowledge of the Z39.50 Attributes supported by the + server was no longer required by the client or the user, as is true + of many existing Z39.50 implementations. In addition, the client + software did not require modification to support the evolving + functionality of the server. + + 3. Provide relevance feedback capability. + + Relevance feedback is the ability to select a document, or portion of + a document, and find a set of documents similar to the selection. + WAIS included documents used in relevance feedback as part of the + Type-3 query. + + 4. Permit the server to operate in a stateless manner. + + A WAIS server was designed to be "stateless", meaning that search + result sets were not stored by the server. In Z39.50 terms, the + server exercised its right to unilaterally delete a result set as + soon as it sent the search response. For this reason, the Present + Facility of Z39.50 was not used, and retrievals were performed using + the Search Facility. Relaxing this constraint in future + implementations may prove the most prudent path. + + 5. Provide the ability for a client to retrieve documents in + pieces. + + Because retrieval of a portion of a document could be done several + ways with Z39.50-1988, specific assumptions were made to implement + this functionality. Accessing a portion of a document was required + for both retrieval and for relevance feedback. + + 6. Run over TCP. + + The Z39.50-1988 standard was designed to run in the application layer + using the presentation services provided by the Open Systems + Interconnection (OSI) Reference Model. Due to the popularity of + TCP/IP and the Internet, WAIS was designed to run over TCP. Use of + Z39.50 over TCP is described in [4]. + +4. WAIS Implementation of Z39.50-1988 + + By working with the Z39.50 Implementors Group (ZIG), the WAIS + developers used a recommended subset of Z39.50-1988 and specific + assumptions to fulfill its requirements. Over time, many of these + + + +IIIR Working Group [Page 3] + +RFC 1625 WAIS over Z39.50-1988 June 1994 + + + requirements have then gone into the definition of subsequent + versions of Z39.50. As new requirements become apparent, WAIS will + document any additional assumptions and work with the ZIG in + developing extensions. + + WAIS supported the Init and Search Facilities of Z39.50-1988. Both + search and retrieval were implemented using the Search Facility, as + described in this section. + + Search was initiated by the client with a Search Request APDU + (Application Protocol Data Unit) using a Type-3 query. The query + contained two main fields: + + 1. The "seed words", or text, typed by the user. + 2. A list of document objects, where a document object is a + full document, or portion thereof, to be used in relevance + feedback. Each document object contains a document + identifier (Doc-ID) [5], type, chunk-code, and start and + end locations. The Doc-ID and type specify the location and + format, respectively, of the document. The chuck-code + determines the unit of measure for the start and end + locations. Examples of chunk-codes used include + byte, line, paragraph, and full document. If the chunk code + is a full document, the start and end locations are ignored. + + A Search Response APDU returned by the server contained a relevance + ranked list of records, or WAIS Citations. A WAIS Citation refers to + a document on the server. Each WAIS Citation contains the following + fields: + + 1. Headline - a set of words that convey the main idea of the + document. + 2. Rank - the numerical score of the document based on its + relevance to the query, normalized to a top score of 1000. + 3. List of available formats - e.g. text, postscript, tiff, etc. + 4. Doc-ID - the location of the document. + 5. Length - the length of the document in bytes. + + The number of WAIS Citations returned was limited by the preferred + message size negotiated during the Init. + + Retrieval of a document was initiated by the client with a Search + Request APDU using a Type-1 query. The query contained up to four + terms: + + 1. Term: Doc-ID + Use Attribute: system-control-number code = "un" + Relation Attribute: equal code = "re" + + + +IIIR Working Group [Page 4] + +RFC 1625 WAIS over Z39.50-1988 June 1994 + + + 2. Term: the requested document format + Use Attribute: data-type code = "wt" + Relation Attribute: equal code = "re" + 3. Term: the start location + Use Attribute: paragraph, line, byte code = "wp", "wl", + "wb" + Relation Attribute: greater-than-or-equal code = "ro" + 4. Term: the end location + Use Attribute: paragraph, line, byte code = "wp", "wl", + "wb" + Relation Attribute: less-than code = "rl" + + Because full-text and images were often larger in size than the + receive buffer of the client, clients were designed to optionally + retrieve documents in chunks, specifying the start and end positions + of the chunk in the query. An example of a fully-specified retrieval + query is: + + query = ( ( use = "un", relation = "re", term = <Doc-ID> ) + AND + ( use = "wt", relation = "re", term = postscript ) + AND + ( use = "wb", relation = "ro", term = 0 ) + AND + ( use = "wb", relation = "ro", term = 2000 ) + ) + + A retrieval response was issued by the server with a Search Response + APDU. In this case a single record corresponding to the requested + document, or portion thereof, was returned in the specified format. + +5. Security Considerations + + Security issues are not discussed in this memo. + +6. References + + [1] National Information Standards Organization (NISO). American + National Standard Z39.50, Information Retrieval Service + Definition and Protocol Specifications for Library Applications, + New Brunswick, NJ, Transaction Publishers; 1988. + + [2] ANSI/NISO Z30.50-1992 (version 2) Information Retrieval Service + and Protocol: American National Standard, Information Retrieval + Application Service Definition and Protocol Specification for + Open Systems Interconnection, 1992. + + + + + +IIIR Working Group [Page 5] + +RFC 1625 WAIS over Z39.50-1988 June 1994 + + + [3] Z39.50 Version 3: Draft 8", October 1993. Maintenance Agency + Reference: Z39.50MA-034. + + [4] Lynch, C., "Using the Z39.50 Information Retrieval Protocol + in the Internet Environment", Work in Progress, November 1993. + + [5] "Document Identifiers, or International Standard Book Numbers + for the Electronic Age", Brewster Kahle, Thinking Machines + Corporation, see URL=<ftp://wais.com/pub/protocol/doc-ids.txt>, + September 1991. + +7. Authors' Addresses + + Margaret St. Pierre + WAIS Incorporated + 1040 Noel Drive + Menlo Park, California 94025 + + Phone: (415) 327-WAIS + Fax: (415) 327-6513 + EMail: saint@wais.com + + + Jim Fullton + Clearinghouse for Networked Information + Discovery & Retrieval + 3021 Cornwallis Road + Research Triangle Park, North Carolina 27709-2889 + + Phone: (919)-248-9247 + Fax: (919)-248-1101 + EMail: jim.fullton@cnidr.org + + + Kevin Gamiel + Clearinghouse for Networked Information + Discovery & Retrieval + 3021 Cornwallis Road + Research Triangle Park, North Carolina 27709-2889 + + Phone: (919)-248-9247 + Fax: (919)-248-1101 + EMail: kevin.gamiel@cnidr.org + + + + + + + + +IIIR Working Group [Page 6] + +RFC 1625 WAIS over Z39.50-1988 June 1994 + + + Jonathan Goldman + Thinking Machines Corporation + 1010 El Camino Real, Suite 310 + Menlo Park, California 94025 + + Phone: (415) 329-9300 x229 + Fax: (415) 329-9329 + EMail: jonathan@think.com + + + Brewster Kahle + WAIS Incorporated + 1040 Noel Drive + Menlo Park, California 94025 + + Phone: (415) 327-WAIS + Fax: (415) 327-6513 + EMail: brewster@wais.com + + + John A. Kunze + UC Berkeley + 289 Evans Hall + Berkeley, California 94720 + + Phone: (510) 642-1530 + Fax: (510) 643-5385 + EMail: jak@violet.berkeley.edu + + + Harry Morris + WAIS Incorporated + 1040 Noel Drive + Menlo Park, California 94025 + + Phone: (415) 327-WAIS + Fax: (415) 327-6513 + EMail: morris@wais.com + + + Francois Schiettecatte + FS Consulting + 435 Highland Avenue + Rochester, New York 14620 + + Phone: (716) 256-2850 + EMail: francois@wais.com + + + + +IIIR Working Group [Page 7] + |