summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc219.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc219.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc219.txt')
-rw-r--r--doc/rfc/rfc219.txt395
1 files changed, 395 insertions, 0 deletions
diff --git a/doc/rfc/rfc219.txt b/doc/rfc/rfc219.txt
new file mode 100644
index 0000000..12b4c03
--- /dev/null
+++ b/doc/rfc/rfc219.txt
@@ -0,0 +1,395 @@
+
+
+
+
+
+
+Network Working Group R. Winter
+Request for Comments: 219 CCA
+NIC: 7549 3 September 1971
+Category:
+Updates: None
+Obsoletes: None
+
+ User's View of the Datacomputer
+
+MEMORANDUM
+
+TO: Datacomputer Design File
+
+FROM: R.A. Winter
+
+SUBJECT: User's View of the Datacomputer
+
+Date: September 3, 1971
+
+________________________________________________________________________
+
+Introduction
+
+ The datacomputer is a specialized node of the ARPA network that is
+ dedicated to the management of a large, shared database. By large we
+ mean several trillion bits of data, of which at least one trillion
+ are on-line. Shared may mean, for some files, shared by nearly all
+ the users in the ARPA network.
+
+ The name, datacomputer, derives from the idea that the system is
+ dedicated to data handling. Though the processor is capable of
+ general computation, it will not be used for that purpose. The
+ processor, like the mass storage device, is only a component of an
+ integrated system, which appears to the user as a black box.
+
+ There is one language for addressing the black box: data language.
+ This language defines everything it can do.
+
+ All the information presented in this memorandum is about the first
+ of a series of service offerings. We use the term access method to
+ refer collectively to a structure and the operations on it. Being
+ too modest to call the first one AM-1 (Access Method-1) we named it
+ DCAM-1 (Datacomputer Access Method-d). We expect subsequent DCAMs to
+ generalize DCAM-1. If the need arises, we will design parallel
+ services. All services will use the same data language.
+
+
+
+
+
+
+Winter [Page 1]
+
+RFC 219 User's View of the Datacomputer September 1971
+
+
+System Overview
+
+ The users of the datacomputer are programs running on other
+ computers, retrieving data from, and storing it in, the data base.
+ The environments, capabilities, and applications of these programs
+ vary widely; however, a chief design goal is to allow them to share
+ the data.
+
+ There is further variation among users in physical connection.
+ Remotely-located users' access is over a narrow link to the data-
+ computer's low-speed port. Local users are connected to the high-
+ speed port through a link 80 times wider.
+
+ Through its ports, the datacomputer accepts two kinds of input: data
+ and requests for services. Data is output through the ports as
+ requested.
+
+ In the data base, descriptions are stored separately from the data,
+ and data elements are named, typed and ordered according to them. A
+ measure of structure independence is obtained by writing access
+ requests in terms of the symbolic names of items in the data
+ description.
+
+ Directories are maintained by the system. A hierarchical naming
+ scheme is used, and access controls for privacy and data integrity
+ are provided.
+
+ Redundant copies of data and/or journals of changes are maintained by
+ the system and used to effect recovery under system control in case
+ of system error. These facilities can be operated under user control
+ if there is external error.
+
+ Since the datacomputer's only interface with the outside world is
+ through its ports, it sees the universe as a group of data streams.
+ Specifically, these are record streams, if one views all transactions
+ (in the data transfer protocol sense) as records. Associated with
+ each record stream is a data description, allowing the datacomputer
+ to parse the records into named, typed elements.
+
+ Thus all data elements--stream elements and data base--are named and
+ fully described. Data type conversion proceeds automatically, as a
+ function of old and new data types, and optional information supplied
+ by the user. Reconfiguration above the element level is a matter of
+ arrangement of elements in records; a full set of capabilities is
+ provided for this. In general, the using program is concerned with
+ the configuration of the stream records that comprise its interface
+ with the datacomputer. The internal configuration of data affects
+ the user only as it limits the data's accessibility or malleability.
+
+
+
+Winter [Page 2]
+
+RFC 219 User's View of the Datacomputer September 1971
+
+
+ In fact, the user should not generally have to be aware of the
+ internal data configuration.
+
+ Although support on some level for all types of applications is
+ attempted, the first implementation gives particular attention to
+ large, simply-organized, shared files. Emphasis is placed on
+ allowing the user of such files to describe precisely what data is
+ really of interest to him, so that nothing but the desired
+ information is transmitted. This is crucial for avoiding overload of
+ the narrow link, and is accepted as a central design goal.
+
+Data Base Organization
+
+ The database contains all information stored in the datacomputer. It
+ is a set of files, which are named, physically distinct, collections
+ of data.
+
+ The location of one file, the file directory, is known to the system.
+ It contains the names, locations, and certain attributes of all the
+ other files. Access to this file is restricted.
+
+ Internally, each file has its own organization, but each organization
+ is a particular application of a general model. The particular
+ application is defined by a file description associated with the
+ file.
+
+ In the general model, each file contains uniquely numbered records.
+ Each record contains named fields. A field of a certain name may
+ occur more than once in a given record, and a unique number is
+ associated with each occurrence. A field contains an elementary
+ piece of data, the value of the field.
+
+ The records are variable in format and size. Fields are variable in
+ length.
+
+ In addition to the records themselves; each file can contain an
+ index. The system maintains the index to the specifications of the
+ user. Conceptually, the index contains lists of pointers to records
+ having certain properties. A typical list might point to the records
+ containing the field STATE with the value MASSACHUSETTS.
+
+ The system supplies a unique, permanent, identifier for each record.
+ This identifier maps trivially into a location in the file, or at
+ worst, into a small region in which the record can be quickly
+ located. The identifier is used to pointers to the record, both from
+ the index and from other records.
+
+
+
+
+
+Winter [Page 3]
+
+RFC 219 User's View of the Datacomputer September 1971
+
+
+ Besides the physical ordering, defined by record location, a logical
+ ordering will be maintained on request by the system. This can be
+ based on some simple function of record contents, such as the value
+ of certain fields. Alternatively, the user can compute the function,
+ and simply supply the result (for example, by saying "insert this
+ record after that one"). Retrieval from such ordered files can be
+ made either in physical or logical order.
+
+ In all such ordered files, if insertions are made, space must be
+ reserved for them and garbage collection must be done periodically.
+ A single field value is viewed as a homogeneous string of characters
+ or basic data units. It is described by giving the type (e.g.,
+ ASCII, BIT, binary integer, etc.) and the length is some unit
+ associated with the type. When the length of a field is constant
+ throughout the file, it is stored in the file description; otherwise
+ it appears with each occurrence of the field. The type of a field is
+ constant.
+
+ The information in the file description is sufficient to parse a
+ record into (field name, value) pairs. Also, given such a set of
+ pairs, and a file description, the system can produce a record
+ satisfying the description. Mapping in either direction, there is
+ only one possible result.
+
+ With a record, a file description, and a (field name, value) pair to
+ store in the record, there is also only one new record that can
+ result.
+
+ Thus a file description defines all the possible formats for a record
+ from a particular file.
+
+Stream Organization
+
+ Streams are sequences of records passed from using programs to the
+ datacomputer or vice versa. The format of the records is defined as
+ in the file description. Thus streams have the same organization as
+ files, except they cannot be indexed. The operations defined on
+ streams are more limited than those defined on files, since the
+ records must be accessed in sequence.
+
+ There is no concept of permanent storage for streams. The records
+ move past the datacomputer one at a time, as though they were on a
+ conveyor belt.
+
+
+
+
+
+
+
+
+Winter [Page 4]
+
+RFC 219 User's View of the Datacomputer September 1971
+
+
+ One record, the current record, is available to the datacomputer in
+ each stream. To begin formatting the subsequent record in an output
+ stream, the datacomputer transmits the current record. To access the
+ next record in an input stream, the datacomputer relinquishes access
+ to the current one.
+
+Operations
+
+ When the user is interested in the contents of his whole file in
+ solving the problem at hand, the datacomputer's job is simple in
+ terms of information retrieval. There may be reformatting or
+ reordering, but location of the right data to operate on is trivial.
+ However, this will not be the standard usage of the datacomputer,
+ particularly for the remote user.
+
+ For most problems, the datacomputer expects to subset the file before
+ doing anything else. The larger the file compared to the subset, the
+ less acceptable it is to transact with the full file in order to form
+ the subset. And the datacomputer will have such enormous files that
+ using anything but a very small subset in one problem is most
+ unusual. Thus, subsetting without examining the entire file is a
+ fundamental requirement.
+
+ Normally, the subset will be considered formed when a list of the
+ relevant record id's or record addresses is known.
+
+ The index of the datacomputer file can be thought of as a collection
+ of primitive record id lists that the file designer expected to be
+ useful in forming interesting subsets. The values of all important
+ fields can be indexed. For example, every word in a field containing
+ a string of text might be indexed. In fact, an arbitrary function of
+ the contents of the record, or the relation of the record to other
+ records can be indexed.
+
+ The common logical operators (AND, OR and NOT) are defined for record
+ subsets. Arbitrarily complex expressions of them can be evaluated
+ with relatively little processor time or I/O. The ease of this
+ operation results from careful design of the index and strategies,
+ the most important of which is the parallel evaluation of the Boolean
+ functions on large groups of records. Certain statistical
+ operations--like counting the number of records satisfying a certain
+ Boolean condition--can be done directly on the index. This can be
+ used to derive question-answering strategies heuristically, or can be
+ the direct input to a statistical study.
+
+ Once the index has done all it can in subsetting, attention turns to
+ the records themselves. Certain conditions cannot be evaluated using
+ the index; an obvious case is the selection of records based on the
+
+
+
+Winter [Page 5]
+
+RFC 219 User's View of the Datacomputer September 1971
+
+
+ value of an unindexed field. Also, certain data structures cannot be
+ explicitly represented in the file:record:field model. These must be
+ constructed by the user, out of groups of records linked by pointers,
+ or using other special mechanisms. The class of operations that is
+ useful in further record selection consists of field content testing,
+ pointer chasing, simple computation in the numerical and symbolic
+ senses, and various operations below the data element level, such as
+ pattern matching, string manipulation, etc. Such operations require
+ a control structure approaching that of the general purpose higher
+ level language. It is our intention to make all of this available,
+ though not with the goal of providing a computation facility, but
+ rather, a data management facility that is capable of using as much
+ knowledge as the programmer can supply.
+
+ A simple set of primitives is required for file maintenance in the
+ data structure we are talking about. The operations are:
+
+ 1. add a field/record
+ 2. delete a field/record
+ 3. replace a field/record.
+
+ The difficult part, as in retrieval, is locating the element to be
+ operated on. Notice that individual record formats can be changed
+ at will: the set of possible formats is limited only by the file
+ description.
+
+ When record contents are changed, index entries that are a function
+ of them must be changed also. When the function determining what is
+ to be indexed is part of the file description, the maintenance of the
+ index is automatically performed by the system. Otherwise, this is
+ the responsibility of the user.
+
+ All fields in a record can be optional, variable length, allowed to
+ occur an arbitrary number of times (up to some fixed limit for each
+ field). Fields can be present and later be deleted from any record.
+ Fields can be added to the file description at any time. The only
+ reason for limiting the flexibility of a record format is to reduce
+ storage.
+
+Applications
+
+ The system outlined here is intended to be suitable for many
+ applications; some examples are:
+
+ 1. Storage and retrieval of dumps and other unstructured files. The
+ system will happily pack away your one enormous record, as quickly
+ and painlessly as possible.
+
+
+
+
+Winter [Page 6]
+
+RFC 219 User's View of the Datacomputer September 1971
+
+
+ 2. Applications that would normally be set up on tape: sequentially
+ accessed files that are copied over when they are changed. Most
+ record formats should be able to remain just as they are. If you
+ want to operate this way, the datacomputer imposes no overhead
+ (such as indexing) on you. The datacomputer willingly acts as
+ unsophisticated as a tape drive; it will pass your file, adding
+ and changing records as it copies them. It will pull off the
+ interesting ones, reconfigure if desired, and transmit them to
+ you. When you describe the data, you have solved the data sharing
+ problem for this application.
+
+ 3. Simple-minded direct access applications. The great hairy index
+ structure neatly degenerates to imitate indexed sequential, simple
+ directly-addressed files, and other old standbys in the direct
+ access world.
+
+ 4. Text/document retrieval. The indexing is made for this kind of
+ applications. In addition, documents can point to subdocuments,
+ related documents, etc.
+
+ 5. Content-oriented, rapid retrieval applications are the specialty
+ of the house.
+
+ 6. Large data bases used for statistical analysis or modeling such as
+ the census, the common social science data bases, etc.
+
+ 7. Applications in which data element groups (such as records) are
+ related in a complex fashion, and the intelligence of the
+ datacomputer, which is close to the data and remote from the
+ computational facility, can be put to good use.
+
+ In all of these, an important consideration is size. We hope to
+ handle these applications properly on the datacomputer, even when the
+ files are of extraordinary size.
+
+
+ [ This RFC was put into machine readable form for entry ]
+ [ into the online RFC archives by Sandy Ginoza 9/2001. ]
+ [ Original has hand-written note in Postel's handwriting: ]
+ [ "Received 21 Sept 71". ]
+
+
+
+
+
+
+
+
+
+
+
+Winter [Page 7]
+