diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc219.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc219.txt')
-rw-r--r-- | doc/rfc/rfc219.txt | 395 |
1 files changed, 395 insertions, 0 deletions
diff --git a/doc/rfc/rfc219.txt b/doc/rfc/rfc219.txt new file mode 100644 index 0000000..12b4c03 --- /dev/null +++ b/doc/rfc/rfc219.txt @@ -0,0 +1,395 @@ + + + + + + +Network Working Group R. Winter +Request for Comments: 219 CCA +NIC: 7549 3 September 1971 +Category: +Updates: None +Obsoletes: None + + User's View of the Datacomputer + +MEMORANDUM + +TO: Datacomputer Design File + +FROM: R.A. Winter + +SUBJECT: User's View of the Datacomputer + +Date: September 3, 1971 + +________________________________________________________________________ + +Introduction + + The datacomputer is a specialized node of the ARPA network that is + dedicated to the management of a large, shared database. By large we + mean several trillion bits of data, of which at least one trillion + are on-line. Shared may mean, for some files, shared by nearly all + the users in the ARPA network. + + The name, datacomputer, derives from the idea that the system is + dedicated to data handling. Though the processor is capable of + general computation, it will not be used for that purpose. The + processor, like the mass storage device, is only a component of an + integrated system, which appears to the user as a black box. + + There is one language for addressing the black box: data language. + This language defines everything it can do. + + All the information presented in this memorandum is about the first + of a series of service offerings. We use the term access method to + refer collectively to a structure and the operations on it. Being + too modest to call the first one AM-1 (Access Method-1) we named it + DCAM-1 (Datacomputer Access Method-d). We expect subsequent DCAMs to + generalize DCAM-1. If the need arises, we will design parallel + services. All services will use the same data language. + + + + + + +Winter [Page 1] + +RFC 219 User's View of the Datacomputer September 1971 + + +System Overview + + The users of the datacomputer are programs running on other + computers, retrieving data from, and storing it in, the data base. + The environments, capabilities, and applications of these programs + vary widely; however, a chief design goal is to allow them to share + the data. + + There is further variation among users in physical connection. + Remotely-located users' access is over a narrow link to the data- + computer's low-speed port. Local users are connected to the high- + speed port through a link 80 times wider. + + Through its ports, the datacomputer accepts two kinds of input: data + and requests for services. Data is output through the ports as + requested. + + In the data base, descriptions are stored separately from the data, + and data elements are named, typed and ordered according to them. A + measure of structure independence is obtained by writing access + requests in terms of the symbolic names of items in the data + description. + + Directories are maintained by the system. A hierarchical naming + scheme is used, and access controls for privacy and data integrity + are provided. + + Redundant copies of data and/or journals of changes are maintained by + the system and used to effect recovery under system control in case + of system error. These facilities can be operated under user control + if there is external error. + + Since the datacomputer's only interface with the outside world is + through its ports, it sees the universe as a group of data streams. + Specifically, these are record streams, if one views all transactions + (in the data transfer protocol sense) as records. Associated with + each record stream is a data description, allowing the datacomputer + to parse the records into named, typed elements. + + Thus all data elements--stream elements and data base--are named and + fully described. Data type conversion proceeds automatically, as a + function of old and new data types, and optional information supplied + by the user. Reconfiguration above the element level is a matter of + arrangement of elements in records; a full set of capabilities is + provided for this. In general, the using program is concerned with + the configuration of the stream records that comprise its interface + with the datacomputer. The internal configuration of data affects + the user only as it limits the data's accessibility or malleability. + + + +Winter [Page 2] + +RFC 219 User's View of the Datacomputer September 1971 + + + In fact, the user should not generally have to be aware of the + internal data configuration. + + Although support on some level for all types of applications is + attempted, the first implementation gives particular attention to + large, simply-organized, shared files. Emphasis is placed on + allowing the user of such files to describe precisely what data is + really of interest to him, so that nothing but the desired + information is transmitted. This is crucial for avoiding overload of + the narrow link, and is accepted as a central design goal. + +Data Base Organization + + The database contains all information stored in the datacomputer. It + is a set of files, which are named, physically distinct, collections + of data. + + The location of one file, the file directory, is known to the system. + It contains the names, locations, and certain attributes of all the + other files. Access to this file is restricted. + + Internally, each file has its own organization, but each organization + is a particular application of a general model. The particular + application is defined by a file description associated with the + file. + + In the general model, each file contains uniquely numbered records. + Each record contains named fields. A field of a certain name may + occur more than once in a given record, and a unique number is + associated with each occurrence. A field contains an elementary + piece of data, the value of the field. + + The records are variable in format and size. Fields are variable in + length. + + In addition to the records themselves; each file can contain an + index. The system maintains the index to the specifications of the + user. Conceptually, the index contains lists of pointers to records + having certain properties. A typical list might point to the records + containing the field STATE with the value MASSACHUSETTS. + + The system supplies a unique, permanent, identifier for each record. + This identifier maps trivially into a location in the file, or at + worst, into a small region in which the record can be quickly + located. The identifier is used to pointers to the record, both from + the index and from other records. + + + + + +Winter [Page 3] + +RFC 219 User's View of the Datacomputer September 1971 + + + Besides the physical ordering, defined by record location, a logical + ordering will be maintained on request by the system. This can be + based on some simple function of record contents, such as the value + of certain fields. Alternatively, the user can compute the function, + and simply supply the result (for example, by saying "insert this + record after that one"). Retrieval from such ordered files can be + made either in physical or logical order. + + In all such ordered files, if insertions are made, space must be + reserved for them and garbage collection must be done periodically. + A single field value is viewed as a homogeneous string of characters + or basic data units. It is described by giving the type (e.g., + ASCII, BIT, binary integer, etc.) and the length is some unit + associated with the type. When the length of a field is constant + throughout the file, it is stored in the file description; otherwise + it appears with each occurrence of the field. The type of a field is + constant. + + The information in the file description is sufficient to parse a + record into (field name, value) pairs. Also, given such a set of + pairs, and a file description, the system can produce a record + satisfying the description. Mapping in either direction, there is + only one possible result. + + With a record, a file description, and a (field name, value) pair to + store in the record, there is also only one new record that can + result. + + Thus a file description defines all the possible formats for a record + from a particular file. + +Stream Organization + + Streams are sequences of records passed from using programs to the + datacomputer or vice versa. The format of the records is defined as + in the file description. Thus streams have the same organization as + files, except they cannot be indexed. The operations defined on + streams are more limited than those defined on files, since the + records must be accessed in sequence. + + There is no concept of permanent storage for streams. The records + move past the datacomputer one at a time, as though they were on a + conveyor belt. + + + + + + + + +Winter [Page 4] + +RFC 219 User's View of the Datacomputer September 1971 + + + One record, the current record, is available to the datacomputer in + each stream. To begin formatting the subsequent record in an output + stream, the datacomputer transmits the current record. To access the + next record in an input stream, the datacomputer relinquishes access + to the current one. + +Operations + + When the user is interested in the contents of his whole file in + solving the problem at hand, the datacomputer's job is simple in + terms of information retrieval. There may be reformatting or + reordering, but location of the right data to operate on is trivial. + However, this will not be the standard usage of the datacomputer, + particularly for the remote user. + + For most problems, the datacomputer expects to subset the file before + doing anything else. The larger the file compared to the subset, the + less acceptable it is to transact with the full file in order to form + the subset. And the datacomputer will have such enormous files that + using anything but a very small subset in one problem is most + unusual. Thus, subsetting without examining the entire file is a + fundamental requirement. + + Normally, the subset will be considered formed when a list of the + relevant record id's or record addresses is known. + + The index of the datacomputer file can be thought of as a collection + of primitive record id lists that the file designer expected to be + useful in forming interesting subsets. The values of all important + fields can be indexed. For example, every word in a field containing + a string of text might be indexed. In fact, an arbitrary function of + the contents of the record, or the relation of the record to other + records can be indexed. + + The common logical operators (AND, OR and NOT) are defined for record + subsets. Arbitrarily complex expressions of them can be evaluated + with relatively little processor time or I/O. The ease of this + operation results from careful design of the index and strategies, + the most important of which is the parallel evaluation of the Boolean + functions on large groups of records. Certain statistical + operations--like counting the number of records satisfying a certain + Boolean condition--can be done directly on the index. This can be + used to derive question-answering strategies heuristically, or can be + the direct input to a statistical study. + + Once the index has done all it can in subsetting, attention turns to + the records themselves. Certain conditions cannot be evaluated using + the index; an obvious case is the selection of records based on the + + + +Winter [Page 5] + +RFC 219 User's View of the Datacomputer September 1971 + + + value of an unindexed field. Also, certain data structures cannot be + explicitly represented in the file:record:field model. These must be + constructed by the user, out of groups of records linked by pointers, + or using other special mechanisms. The class of operations that is + useful in further record selection consists of field content testing, + pointer chasing, simple computation in the numerical and symbolic + senses, and various operations below the data element level, such as + pattern matching, string manipulation, etc. Such operations require + a control structure approaching that of the general purpose higher + level language. It is our intention to make all of this available, + though not with the goal of providing a computation facility, but + rather, a data management facility that is capable of using as much + knowledge as the programmer can supply. + + A simple set of primitives is required for file maintenance in the + data structure we are talking about. The operations are: + + 1. add a field/record + 2. delete a field/record + 3. replace a field/record. + + The difficult part, as in retrieval, is locating the element to be + operated on. Notice that individual record formats can be changed + at will: the set of possible formats is limited only by the file + description. + + When record contents are changed, index entries that are a function + of them must be changed also. When the function determining what is + to be indexed is part of the file description, the maintenance of the + index is automatically performed by the system. Otherwise, this is + the responsibility of the user. + + All fields in a record can be optional, variable length, allowed to + occur an arbitrary number of times (up to some fixed limit for each + field). Fields can be present and later be deleted from any record. + Fields can be added to the file description at any time. The only + reason for limiting the flexibility of a record format is to reduce + storage. + +Applications + + The system outlined here is intended to be suitable for many + applications; some examples are: + + 1. Storage and retrieval of dumps and other unstructured files. The + system will happily pack away your one enormous record, as quickly + and painlessly as possible. + + + + +Winter [Page 6] + +RFC 219 User's View of the Datacomputer September 1971 + + + 2. Applications that would normally be set up on tape: sequentially + accessed files that are copied over when they are changed. Most + record formats should be able to remain just as they are. If you + want to operate this way, the datacomputer imposes no overhead + (such as indexing) on you. The datacomputer willingly acts as + unsophisticated as a tape drive; it will pass your file, adding + and changing records as it copies them. It will pull off the + interesting ones, reconfigure if desired, and transmit them to + you. When you describe the data, you have solved the data sharing + problem for this application. + + 3. Simple-minded direct access applications. The great hairy index + structure neatly degenerates to imitate indexed sequential, simple + directly-addressed files, and other old standbys in the direct + access world. + + 4. Text/document retrieval. The indexing is made for this kind of + applications. In addition, documents can point to subdocuments, + related documents, etc. + + 5. Content-oriented, rapid retrieval applications are the specialty + of the house. + + 6. Large data bases used for statistical analysis or modeling such as + the census, the common social science data bases, etc. + + 7. Applications in which data element groups (such as records) are + related in a complex fashion, and the intelligence of the + datacomputer, which is close to the data and remote from the + computational facility, can be put to good use. + + In all of these, an important consideration is size. We hope to + handle these applications properly on the datacomputer, even when the + files are of extraordinary size. + + + [ This RFC was put into machine readable form for entry ] + [ into the online RFC archives by Sandy Ginoza 9/2001. ] + [ Original has hand-written note in Postel's handwriting: ] + [ "Received 21 Sept 71". ] + + + + + + + + + + + +Winter [Page 7] + |