From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc677.txt | 563 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 563 insertions(+) create mode 100644 doc/rfc/rfc677.txt (limited to 'doc/rfc/rfc677.txt') diff --git a/doc/rfc/rfc677.txt b/doc/rfc/rfc677.txt new file mode 100644 index 0000000..7ef7edd --- /dev/null +++ b/doc/rfc/rfc677.txt @@ -0,0 +1,563 @@ + + + + + + +Network Working Group Paul R. Johnson (BBN-TENEX) +RFC # 677 Robert H. Thomas (BBN-TENEX) +NIC # 31507 January 27, 1975 + + + + + + The Maintenance of Duplicate Databases + + +Preface: + +This RFC is a working paper on the problem of maintaining duplicated +databases in an ARPA-like network. It briefly discusses the general +duplicate database problem, and then outlines in some detail a solution +for a particular type of duplicate database. The concepts developed +here were used in the design of the User Identification Database for the +TIP user authentication and accounting system. We believe that these +concepts are generally applicable to distributed database problems. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Johnson & Thomas [Page 1] + +RFC 677 The Maintenance of Duplicate Databases January 1975 + + +Introduction + + There are a number of motivations for maintaining redundant, +duplicate copies of databases in a distributed network environment. Two +important motivations are: + + - to increase reliability of data access. + + The accessibility of critical data can be increased by redundantly + maintaining it. The database used for TIP login and accounting is + redundantly distributed to achieve highly reliable access. + + - to insure efficiency of data access. + + Data can be more quickly and efficiently accessed when it is "near" + the accessing process. A copy of the TIP user ID database is + maintained at each site supporting the TIP login service to insure + rapid, efficient access. (Reliability considerations dictate that + this database be redundantly maintained, and efficiency + considerations dictate that a copy be maintained at each + authentication site.) + + The design of a system to maintain redundant, duplicate databases is +a challenging task because of the inherent communication delay between +copies of the database, as well as the real world constraints of system +crashes, operator error, communication channel failure, etc. This paper +discusses some of the problems we encountered in designing such a +system, and outlines a system design for maintaining a particular type +of database which solves those problems. + + +The Model + + A system for supporting duplicate copies of a database can be +modeled by a group of independent database management processes (DBMPs) +each maintaining its own copy of the database. These processes +communicate with each other over network communication paths. Each DBMP +has complete control over its copy of the database. It handles all +accesses to and modifications of the database in response to requests +from other processes. Though the DBMPs act only upon requests, in the +following they will often be said to be actually causing or originating +the modifications. + + An important design consideration is that the communication paths +between the DBMPs are subject to failures. Thus one DBMP may have its +interactions with other DBMPs interrupted and/or have to wait until +communication paths are re-established before it can communicate with +other DBMPs. An assumption made in this paper about the communication + + + +Johnson & Thomas [Page 2] + +RFC 677 The Maintenance of Duplicate Databases January 1975 + + +paths is that messages from one process to another are delivered in the +same order that they are sent. This is true of the ARPANET. For networks +that make no such guarantee, communication protocols between the DBMPs +can be used to correctly order the messages. + + In order to proceed further, it is necessary to be more precise +about the nature of the duplicated database and the operations allowed +on it. A constant, read-only database is at one extreme. The task of the +DBMPs is trivial in this case. They simply respond to value retrieval +requests. At the other extreme is a general shared database where +functional modification requests (such as "X := f(X,Y,Z)") are allowed +and/or where it is necessary to completely restrict access to entries +while they are being modified. In this case all the problems of shared +databases on a single computer system arise (e.g., the need for +synchronization mechanisms and the resulting potential deadlock +situations), as well as those unique to having multiple database copies +distributed among independent computers. For example, a completely +general system must deal with the possibility of communication failures +which cause the network to become partitioned into two or more sub- +networks. Any solution which relies on locking an element of the +database for synchronized modification must cope with the possibility of +processes in non-communicating sub-networks attempting to lock the same +element. Either they both must be allowed to do so (which violates the +lock discipline), or they both must wait till the partition ceases +(which may take arbitrarily long), or some form of centralized or +hierarchical control must be used, with a resulting dependency of some +DBMPs on others for all modifications and perhaps accesses as well. + + +The Database + + The type of database to be examined in this paper can be represented +as a collection of entries which are (Selector,Value) pairs. Each +selector is unique and the values are atomic entities as far as the +DBMPs are concerned. The mechanisms to be presented may be extended to +handle databases with greater structure - where the values may +themselves be collections of (selector,value) pairs - but this extension +will not be considered further here. + +Four operations are to be allowed on the database: + + 1) Selection - given a selector, the current associated value is + returned. + + 2) Assignment - a selector and a value are given and the given value + replaces the old value associated with the selector. + + + + + +Johnson & Thomas [Page 3] + +RFC 677 The Maintenance of Duplicate Databases January 1975 + + + 3) Creation - a new selector and an initial value are given and a + new (selector,initial value) entry is added to the database. + + 4) Deletion- a selector is given and the existing (selector,value) + entry is removed from the database. + +Note that value modification is limited to assignment. Functional +modification requests - such as "Change X to be Factorial(X)" - are +specifically ruled out. Allowing them would force the use of system wide +synchronization interlocks. + + +Consistency + + The extent to which the copies of the database can be kept +"identical" must be examined. Because of the inherent delay in +communications between DBMPs, it is impossible to guarantee that the +data bases are identical at all times. Rather, our goal is to guarantee +that the copies are "consistent" with each other. By this we mean that +given a cessation of update activity to any entry, and enough time for +each DBMP to communicate with all other DBMPs, then the state of that +entry (its existence and value) will be identical in all copies of the +database. + + +Timestamps + + We permit modifications to the database to originate at any of the +DBMPs maintaining it. These changes must, of course, be communicated to +the other DBMPs. To insure consistency, all of the DBMPs must make the +same decision as to which modification to a particular entry is to be +considered "final". It is desirable to select the "most recent" change. +However, since there is no way to absolutely determine the time sequence +of events in a distributed system without a universal, always accessible +sequence number generator (a network time standard should suffice), +"most recent" can only be approximated. We accomplish the approximation +by associating a timestamp with each modification and with each entry, +the latter being the timestamp of the modification which set its current +value.(1) Since the uniqueness of timestamps given out at more than one +---------- +(1) Time is useful in this context because it has the desired properties +of being monotonically increasing, and of being available with a +reasonable degree of accuracy. Any other sequence numbering scheme with +these properties can be used, "time-of-day" was chosen because it is +simple to obtain. Its main faults are that it is often manually set (and +thus prone to error), and it may stop during system service + + + + + +Johnson & Thomas [Page 4] + +RFC 677 The Maintenance of Duplicate Databases January 1975 + + +interruptions. As computer systems learn to adapt to a network +environment, the use of an independent time source should become more +common. This is beginning to happen with the TENEX sites on the ARPANET. + +DBMP can not be guaranteed, a "DBMP of origin" is included as part of +each timestamp. By (arbitrarily) ordering the DBMPs, we thus have a +means of uniquely ordering timestamps. Each + timestamp is a pair (T,D): T is a time, D is a DBMP identifier. For two +timestamps (T1,D1) and (T2,D2) we have the following: + + (T1,D1)>(T2,D2) <=> (T1>T2) or (T1=T2 and D1>D2) + (T1,D1) is said to be "more recent" than (T2,D2) + + If D1=D2 and T1=T2 it is assumed that the two modifications are + really two copies of the same modification request. + + In order to insure the uniqueness of timestamps, it is necessary +that each individual DBMP associate different times with different +modifications. This is certainly possible to do, though the fineness of +the unit of time may restrict the frequency of modifications at a single +DBMP site. + + Each entry in the data base is now a triple: + E ::= (S,V,T), where + S is the selector + V is the associated value + T is the timestamp (a Time,DBMP pair) of the last change to the + entry + + The task of each DBMP is to keep its copy of the database up-to- +date, given the information on modifications that it has received so +far. At the same time it must insure that each of its entries stays +consistent with those of all the other DBMPs. This must be done despite +the fact that the order in which it receives modifications may be very +different from the order in which they are received by other DBMPs. In +the remainder of this paper we examine the allowable database operations +with respect to their effect on DBMP operation. + + +Assignment + + Consider the case of assignment to an existing entry. When the +assignment is initiated (by a person or process) the DBMP involved makes +the change locally, and creates a copy of the modified entry and an +associated list of DBMPs to which the change must be sent. As the +modification is delivered to the other DBMPs, they are removed from the +list until no DBMPs remain. The copy of the modification is then +deleted. This distribution mechanism must be error free (i.e., receipt + + + +Johnson & Thomas [Page 5] + +RFC 677 The Maintenance of Duplicate Databases January 1975 + + +of a modification must be positively confirmed before removing a DBMP +from the list of recipients).(2) + + When a DBMP receives an assignment modification (either locally or +from another DBMP) it compares the timestamp of the modification with +the timestamp of the copy of the entry in its database and keeps +whichever is more "recent" as defined by the ordering given above. Thus +when all existing assignments to a given entry have been distributed to +all the DBMPs, they are guaranteed have the same "latest" value +associated with that entry. + + +Creation + + Creation and deletion of entries pose more of a problem. Note that +the ability to create new, previously unknown entries requires that a +DBMP be able to handle assignments to unknown entries. For example, +consider the case of an entry XYZ created by DBMP A, and the following +sequence of events: DBMP A tells DBMP B about the new entry, and +subsequently B assigns a new value to XYZ; DBMP B then tells DBMP C +about the assignment before C has heard from A about the creation. DBMP +C must either save the assignment to XYZ until it hears about the +creation, or simply assume the creation will be coming and use the "new" +entry right away. The latter is more in the spirit of trying to keep the +database as "up-to-date" as possible and leads to no inconsistencies. + + +Deletion + + Deletion of entries is the main problem for this database system. +If deletion is taken to mean immediate removal from the database, then +problems arise. Consider the following scenario: + XYZ is an entry known by all DBMPs. + XYZ is deleted at DBMP A. + XYZ is modified at DBMP B (before B is notified of the deletion + by A). +Now, consider a third DBMP, C. C may hear from DBMP B before DBMP A, in +which case XYZ ends up deleted at DBMP C. C may however hear from DBMP A + +---------- +(2) This same process (local modification and queuing for remote +distribution) is, of course, performed for the other possible operations +on the database. The details of how the local modification is done +safely, how the messages are queued, how confirmation of delivery is +done, etc., though important, will not be discussed here. The use of an +addressee list attached to the modification to be delivered is +conceptually easy to work with and not difficult to implement in +practice. + + + +Johnson & Thomas [Page 6] + +RFC 677 The Maintenance of Duplicate Databases January 1975 + + +before DBMP B. In this case, if C removes XYZ from its database, then +the assignment to XYZ initiated by DBMP B will result in the re-creation +of XYZ at DBMP C. To prevent this C must remember that XYZ has been +deleted until it is "safe" to completely forget about XYZ. + + One approach to this problem is to never actually remove an entry +from the database. Deletion just marks the entry as being deleted by +setting a "deleted" flag associated each entry. However, the problem of +receiving assignments to deleted entries still exists. For example, DBMP +A may receive an assignment from DBMP B to an entry which A has marked +as deleted. DBMP A can not tell whether B has not heard about the +deletion, or has heard about it but has also received a re-creation +request for the entry, which hasn't reached DBMP A. To enable A to +resolve such situations we include another timestamp in all entries: the +timestamp of the entry's creation. Thus in the above example, DBMP A can +compare the creation timestamps of the assignment and the deleted entry. +The one with the later creation timestamp is kept. Indeed whenever a +modification with an old creation timestamp is received it can be +ignored. + + We now have a 5-tuple for entries and modifications: + E ::= (S,V,F,CT,T) + S is the selector + V is the associated value + F is the deleted/not-deleted flag + CT is the timestamp of creation + T is the timestamp of this (last) modification + + Note that the values of the F, CT, and T components of a +modification uniquely specify the type of modification. Thus only the +5-tuple to become the new entry for a selector, not the type of +modification, need be communicated: + F = not deleted, CT = T => creation + F = not deleted, CT < T => assignment + F = deleted => deletion + + The mechanism described above handles all the desired operations on +the distributed database in a way that guarantees the consistency of all +copies. A modification to the database will take effect at each DBMP as +soon as it receives the request from the DBMP originating the change. + + A deficiency with this scheme is that deleted entries are never +removed from the database. A method which permits "garbage collection" +of deleted entries is discussed below. + + + + + + + +Johnson & Thomas [Page 7] + +RFC 677 The Maintenance of Duplicate Databases January 1975 + + +Removal of Deleted Entries + + The basic constraint is that a DBMP should not remove a deleted +entry until it will never receive any assignments with the same selector +(S) and the same or older create time (CT). If it fails to do this, then +it will be unable to distinguish these "out of date" assignments from +assignments to a newly created entry for the same S. To be able do +this, each DBMP must know for each deleted entry whether all other DBMPs +have heard about the deletion. To accomplish this, each DBMP could +notify the other DBMPs whenever it hears about a deletion. If these +notifications are transmitted in order with the "normal" sequence of +modifications, then upon receipt of such a notification a DBMP can be +sure that the sending DBMP has delivered any outstanding assignments to +the deleted entry, has marked it as deleted, and will not generate any +new assignments to it. Keeping track of, and exchanging messages about, +each individual deleted entry in this manner is, however, somewhat more +elaborate than necessary. + + Having each DBMP deliver all its own modifications in sequential +order (by timestamp) allows the following simplification. We have all +DBMPs maintain a table of the timestamps of the last modification +received from each other DBMP. Any DBMP, say A, is then guaranteed to +have received all modifications originating at another DBMP, say B. +which have timestamps earlier than (or equal to) the entry for B in A s +copy of this table. If this table is exchanged between DBMPs, then all +DBMPs would have a second N*N (N= number of DBMPs) table where entry +(I,J) would be the timestamp of the last modification received by DBMP I +from DBMP J. Thus DBMP A can remove a deleted entry whose deletion +originated at DBMP K when all entries in the Kth column of this table at +DBMP A are later than or equal to the timestamp of the deleted entry. +When a DBMP receives a deletion modification, in addition to acting on +it and acknowledging receipt of it, the DBMP should also send its table +of last timestamps received to all other DBMPs. This is sent in a +timestamped message which is queued with the "normal" modification +messages. + + A refinement to this approach, which reduces the amount of +information exchanged and the size of the tables, is to have the DBMPs +exchange only the oldest of the entries in the first table (of +timestamps of last modifications received from other DBMPs). These would +then be saved in a 1*N table, replacing the N*N table. A DBMP receiving +a modification with a timestamp equal to or older than the oldest +timestamp in its second table knows that the modification has been +confirmed as being received by all other DBMPs. A deleted entry can thus +be removed when its timestamp satisfies this condition. A DBMP would, +upon receipt of a deletion modification, queue up a message with this +"timestamp of oldest last modification received" for delivery to all +other DBMPs. + + + +Johnson & Thomas [Page 8] + +RFC 677 The Maintenance of Duplicate Databases January 1975 + + +Summary of solution: + + An entry in the database is a 5-tuple: + (S,V,F,CT,T) where + S is an selector used in all references to this entry. + V is its present value. + F is a deleted/undeleted flag. + CT is the timestamp of the creation of this entry. + T is the timestamp of the modification which set the current V + and/or F of the entry. + A timestamp is a pair (time,DBMP) where the DBMP identifies the + site at which the time was generated, and the DBMPs are + (arbitrarily) ordered, so that timestamps are completely + ordered. + + A modification is a pair (Set-of-DBMPs,Entry) where Set-of-DBMPs is + the set of DBMPs to which the Entry has yet to be delivered. + + An ordered (by timestamp) list of modifications is kept at each + DBMP. The DBMP periodically attempts to deliver modification + requests to those DBMPs which remain in the Set-of-DBMPs associated + with each modification. Modification entries are removed from this + list when they have been delivered to all DBMPs. + + When a DBMP receives a modification request from another DBMP, it + compares the timestamps of the request with the timestamps of the + corresponding entry (if any) in its database, and acts upon or + disregards the new entry accordingly. + + Each DBMP keeps a vector of the timestamp (T) of the last + modification received by it from each other DBMP. + + When a DBMP receives a deletion modification, it sends a timestamped + message to all other DBMPs containing the oldest timestamp in its + vector of timestamps of last modification received. Each DBMP keeps + a second vector of the last of these timestamps received from each + other DBMP. + + A deleted entry may be removed from the database when its timestamp + (T) is older than all the timestamps in this second vector of + timestamps received from other DBMPs. + + + + + + + + + + +Johnson & Thomas [Page 9] + +RFC 677 The Maintenance of Duplicate Databases January 1975 + + +Conclusion + + This paper has presented techniques by which a number of loosely +coupled processes can maintain duplicate copies of a database, despite +the unreliability of their only means of communication. The copies of +the database can be kept "consistent'. However it is possible for +seemingly anomalous behavior to occur. For example a user may assign a +value to a selector using one DBMP, later use another DBMP and assign a +new value, and still later find that the first value is the one that +remains in the database. This can occur if the clocks used by the two +DBMPs for their timestamps are sufficiently out of synchrony that the +second assignment is timestamped as having taken place before the first +assignment. To the extent that the communication paths can be made +reliable, and the clocks used by the processes kept close to synchrony, +the probability of seemingly strange behavior can be made very small. +However, the distributed nature of the system dictates that this +probability can never be zero. + + The major innovation presented here is the explicit representation +of the time sequence of modifications through timestamps for both +modifications and entries. This enables the each DBMP to select the same +"most-recent" modification of an entry. In addition, timestamps enable +the DBMPs to decide when a deleted entry is no longer referenced (i.e., +still active anywhere) and can be deallocated. These techniques should +have broader utility in building and modeling other systems of +concurrent, cooperating processes. + + + + + + + + + + + [ This RFC was put into machine readable form for entry ] + [ into the online RFC archives by Alex McKenzie with ] + [ support from GTE, formerly BBN Corp. 12/99 ] + + + + + + + + + + + + +Johnson & Thomas [Page 10] + -- cgit v1.2.3