1 files changed, 543 insertions, 0 deletions
diff --git a/doc/rfc/rfc672.txt b/doc/rfc/rfc672.txt
new file mode 100644
index 0000000..b206f89
--- /dev/null
+++ b/doc/rfc/rfc672.txt
@@ -0,0 +1,543 @@
+Network Working Group                               Richard Schantz (BBN-TENEX)
+Request for Comments: 672                                              Dec 1974
+NIC #31440
+
+
+
+                     A Multi-Site Data Collection Facility
+
+
+
+
+        Preface:
+
+        This RFC reproduces most of a working document
+        prepared during the design and implementation of the
+        protocols for the TIP-TENEX integrated system for
+        handling TIP accounting. Bernie Cosell (BBN-TIP)
+        and Bob Thomas (BBN-TENEX) have contributed to
+        various aspects of this work. The system has been
+        partially operational for about a month on selected
+        hosts. We feel that the techniques described here
+        have wide applicability beyond TIP accounting.
+
+
+Section I
+
+Protocols for a Multi-site Data Collection Facility
+
+
+Introduction
+
+
+     The development of computer networks has provided the
+groundwork for distributed computation: one in which a job or task
+is comprised of components from various computer systems. In a
+single computer system, the unavailability or malfunction of any of
+the job components (e.g. program, file, device, etc.) usually
+necessitates job termination. With computer networks, it becomes
+feasible to duplicate certain job components which previously had no
+basis for duplication. (In a single system, it does not matter how
+many times a process that performs a certain function is duplicated;
+a system crash makes all unavailable). It is such resource
+duplication that enables us to utilize the network to achieve high
+reliability and load leveling. In order to realize the potential of
+resource duplication, it is necessary to have protocols which
+provide for the orderly use of these resources. In this document,
+we first discuss in general terms a problem of protocol definition
+for interacting with a multiply defined resource (server). The
+problem deals with providing a highly reliable data collection
+facility, by supporting it at many sites throughout the network. In
+the second section of this document, we describe in detail a
+particular implementation of the protocol which handles the problem
+of utilizing multiple data collector processes for collecting
+accounting data generated by the network TIPs. This example also
+illustrates the specialization of hosts to perform parts of a
+computation they are best equipped to handle. The large network
+hosts (TENEX systems) perform the accounting function for the small
+network access TiPs.
+
+     The situation to be discussed is the following: a data
+generating process needs to use a data collection service which is
+duplicately provided by processes on a number of network machines.
+A request to a server involves sending the data to be collected.
+
+
+An Initial Approach
+
+
+     The data generator could proceed by selecting a particular
+server and sending its request to that server. It might also take
+the attitude that if the message reaches the destination host (the
+communication subsystem will indicate this) the message will be
+properly processed to completion. Failure of the request Message
+would then lead to selecting another server, until the request
+succeeds or all servers have been tried.
+
+
+
+                                      -2-
+
+
+     Such a simple strategy is a poor one. It makes sense to
+require that the servicing process send a positive acknowledgement
+to the requesting process. If nothing else, the reply indicates
+that the server process itself is still functioning. Waiting for
+such a reply also implies that there is a strategy for selecting
+another server if the reply is not forthcoming. Herein lies a
+problem. If the expected reply is timed out, and then a new request
+is sent to another server, we run the risk of receiving the
+(delayed) original acknowledgement at a later time. This could
+result in having the data entered into the collection system twice
+(data duplication). If the request is re-transmitted to the same
+server only, we face the possibility of not being able to access a
+collector (data loss). In addition, for load leveling purposes, we
+may wish to send new requests to some (or all) servers. We can then
+use their reply (or lack of reply) as an indicator of load on that
+particular instance of the service. Doing this without data
+duplication requires more than a simple request and acknowledgement
+protocol*.
+
+
+Extension of the Protocol
+
+
+     The general protocol developed to handle multiple collection
+servers involves having the data generator send the data request to
+some (or all) data collectors. Those willing to handle the request
+reply with an "I've got it" message. They then await further
+notification before finalizing the processing of the data. The data
+generator sends a "go ahead" message to one of the replying
+collectors, and a "discard" message to all other replying
+collectors. The "go ahead" message is the signal to process the
+data (i.e. collect permanently), while the "discard" message
+indicates that the data is being collected elsewhere and should not
+be retained.
+
+     The question now arises as to whether or not the collector
+process should acknowledge receipt of the "go ahead" message with a
+reply of its own, and then should the generator process acknowledge
+this acknowledgement, etc. We would like to send as few messages as
+possible to achieve reliable communication. Therefore, when a state
+--------------------
+
+* If the servers are independent of each other to the extent that if
+two or more servers all act on the same request, the end result is
+the same as having a single server act on the request, then a simple
+request/acknowledgement protocol is adequate. Such may be the case,
+for example, if we subject the totality of collected data (i.e. all
+data collected by all collectors for a certain period) to a
+duplicate detection scan. If we could store enough context in each
+entry to be able to determine duplicates, then having two or more
+servers act on the data would be functionally equivalent to
+processing by a single server.
+
+
+                                      -3-
+
+
+is reached for which further acknowledgements lead to a previously
+visited state, or when the cost of further acknowledgements outweigh
+the increase in reliability they bring, further acknowledgements
+become unnecessary.
+
+     The initial question was should the collector process
+acknowledge the "go ahead" message? Assume for the moment that it
+should not send such an acknowledgement. The data generator could
+verify, through the communication subsystem, the transmission of the
+"go ahead" message to the host of the collector. If this message
+did not arrive correctly, the generator has the option of
+re-transmitting it or sending a "go ahead" to another collector
+which has acknowledged receipt of the data. Either strategy
+involves no risk of duplication. If the "go ahead" message arrives
+correctly, and a collector acknowledgement to the "go ahead" message
+is not required, then we incur a vulnerability to (collector host)
+system crash from the time the "go ahead" message is accepted by the
+host until the time the data is totally processed. Call the data
+processing time P. Once the data generator has selected a
+particular collector (on the basis of receiving its "I've got it"
+message), we also incur a vulnerability to malfunction of this
+collector process. The vulnerable period is from the time the
+collector sends its "i've got it" message until the time the data is
+processed. This amounts to two network transit times (2N) plus IMP
+and host overhead for message delivery (0) plus data processing time
+(P). [Total time=2N+P+O]. A malfunction (crash) in this period can
+cause the loss of data. There is no potential for duplication.
+
+     Now, assume that the data collector process must acknowledge
+the "go ahead" message. The question then arises as to when such an
+acknowledgement should be sent. The reasonable choices are either
+immediately before final processing of the data (i.c. before the
+data is permanently recorded) or immediately after final processing.
+It can be argued that unless another acknowledgement is required (by
+the generator to the collector) to this acknowledgement BEFORE the
+actual data update, then the best time for the collector to
+acknowledge the "go ahead" is after final processing. This is so
+because receiving the acknowledgement conveys more information if it
+is sent after processing, while not receiving it (timeout), in
+either case, leaves us in an unknown state with respect to the data
+update. Depending on the relative speeds of various network and
+system components, the data may or may not be permanently entered.
+Therefore if we interpret the timeout as a signal to have the data
+processed at another site, we run the risk of duplication of data.
+To avoid data duplication, the timeout strategy must only involve
+re-sending the "go ahead" message to the same collector. This will
+only help if the lack of reply is due to a lost network message.
+Our vulnerability intervals to system and process malfunction remain
+as before.
+
+     It is our conjecture (to be analyzed further) that any further
+acknowledgements to these acknowledgements will have virtually no
+effect on reducing the period of vulnerability outlined above. As
+such, the protocol with the fewest messages required is superior.
+
+
+                                      -4-
+
+Data Dependent Aspects of the Protocol
+
+
+     As discussed above, a main issue is which process should be the
+last to respond (send an acknowledgement). If the data generator
+sends the last message (i.e. "go ahead"), we can only check on its
+correct arrival at the destination host. We must "take on faith"
+the ability of the collector to correctly complete the transaction.
+This strategy is geared toward avoiding data duplication. If on the
+other hand, the protocol specifies that the collector is to send the
+last message, with the timeout of such a message causing the data
+generator to use another collector, then the protocol is geared
+toward the best efforts of recording the data somewhere, at the
+expense of possible duplication.
+
+     Thus, the nature of the problem will dictate which of the
+protocols is appropriate for a given situation. The next section
+deals in the specifics of an implement;tion of a data collection
+protocol to handle the problem of collecting TIP accounting data by
+using the TENEX systems for running the collection server processes.
+It is shown how the general protocol is optimized for the accounting
+data collection.
+
+
+
+
+Section II
+
+Protocol for TIP-TENEX Accounting Server Information Exchange
+
+
+Overview of the Facility
+
+
+     When a user initially requests service from a TIP, the TIP will
+perform a broadcast ICP to find an available RSEXEC which maintains
+an authentication data base. The user must then complete s login
+sequence in order to authenticate himself. If he is successful the
+RSEXEC will transmit his unique ID code to the TIP. Failure will
+cause the RSEXEC to close the connection and the TIP to hang up on
+the user. After the user is authenticated, the TIP will accumulate
+accounting data for the user session. The data includes a count of
+messages sent on behalf of the user, and the connect time for the
+user. From time to time the TIP will transmit intermediate
+accounting data to Accounting Server (ACTSER) processes scattered
+throughout the network. These accounting servers will maintain
+files containing intermediate raw accounting data. The raw
+accounting data will periodically be collected and sorted to produce
+an accounting data base. Providing a number of accounting servers
+reduces the possibility of being unable to find a repository for the
+intermediate data, which otherwise would be lost due to buffering
+limitations in the TiPs. The multitude of accounting servers can
+also serve to reduce the load on the individual hosts providing this
+facility.
+
+
+                                      -5-
+
+The rest of this document details the protocol that has been
+developed to ensure delivery of TIP accounting data to one of the
+available accounting servers for storage in the intermediate
+accounting files.
+
+
+Adapting the Protocol
+
+
+The TIP to Accounting Server data exchange uses a protocol that
+allows the TIP to select for data transmission one, some, or all
+server hosts either sequentially or in parallel, yet insures that
+the data that becomes part of the accounting file does not contain
+duplicate information. The protocol also minimizes the amount of
+data buffering that must be done by the limited capacity TiPs. The
+protocol is applicable to a wide class of data collection problems
+which use a number of data generators and collectors. The following
+describes how the protocol works for TIP accounting.
+
+Each TIP is responsible for maintaining in its memory the cells
+indicating the connect time and the number of messages sent for each
+of its current users. These cells are incremented by the TIP for
+every quantum of connect time and message sent, as the case may be.
+This is the data generation phase. Periodically, the TIP will scan
+all its active counters, and along with each user ID code, pack the
+accumulated data into one network message (i.e. less than 8K bits).
+The TIP then transmits this data to a set of Accounting Server
+processes residing throughout the network. The data transfer is
+over a specially designated host-host link. The accounting servers
+utilize the raw network message facility of TENEX 1.32 in order to
+directly access that link. When an ACTSER receives a data message
+from a TIP, it buffers the data and replies by returning the entire
+message to the originating TIP. The TIP responds with a positive
+acknowledgement ("go ahead") to the first ACTSER which returns the
+data, and responds with a negative acknowledgement ("discard") to
+all subsequent ACTSER data return messages for this series of
+transfers. If the TIP does not receive a reply from any ACTSER, it
+accumulates new data (i.e. the TIP has all the while been
+incrementing its local counters to reflect the increased connect
+time and message count; the current values will comprise new data
+transfers) and sends the new data to the Accounting Server
+processes. When an ACTSER receives a positive acknowledgement from
+a TIP (i.e. "go ahead"), it appends the appropriate parts of the
+buffered data to the locally maintained accounting information file.
+On receiving a negative acknowledgement from the TIP (i.e.
+"discard"), the ACTSER discards the data buffered for this TIP. In
+-addition, when the TIP responds with a "go ahead" to the first
+ACTSER which has accepted the data (acknowledged by returning the
+data along with the "I've got it"), the TIP decrements the connect
+time and message counters for each user by the amount indicated in
+the data returned by the ACTSER. This data will already be
+accounted for in the intermediate accounting files.
+
+As an aid in determining which ACTSER replies are to current
+requests, and which are tardy replies to old requests, the TIP
+
+                                      -6-
+maintains a sequence number indicator, and appends this number to
+each data message sent to an ACTSER. On receiving a reply from an
+ACTSER, the TIP merely checks the returned sequence number to see if
+this is the first reply to the current set of TIP requests. If the
+returned sequence number is the same as the current sequence number,
+then this is the first reply; a positive acknowledgement is sent
+off, the counters are decremented by the returned data, and the
+sequence number is incremented. If the returned sequence number is
+not the same as the current one (i.e. not the one we are now
+seeking a reply for) then a negative acknowledgement is sent to the
+replying ACTSER. After a positive acknowledgement to an ACTSER (and
+the implied incrementing of the sequence number), the TIP can wait
+for more information to accumulate, and then start transmitting
+again using the new sequence number.
+
+
+Further Clarification of the Protocol
+
+
+There are a number of points concerning the protocol that
+should be noted.
+
+1. The data generator (TIP) can send different (i.e. updated
+versions) data to different data collectors (accounting servers) as
+part of the same logical transmission sequence. This is possible
+because the TIP does not account for the data sent until it receives
+the acknowledgement of the data echo. This strategy relieves the
+TIP of any buffering in conjunction with re-transmission of data
+which hasn't been acknowledged.
+
+2. A new data request to an accounting server from a TIP will
+also serve as a negative acknowledgement concerning any data already
+buffered by the ACTSER for that TIP, but not yet acknowledged. The
+old data will be discarded, and the new data will be buffered and
+echoed as an acknowledgement. This allows the TIP the option of not
+sending a negative acknowledgement when it is not convenient to do
+so, without having to remember that it must be sent at a later time.
+There is one exception to this convention. If the new data message
+has the same sequence number as the old buffered message, then the
+new data must be discarded, and the old data kept and re-echoed.
+This is to prevent a slow acknowledgement to the old data from being
+accepted by the TIP, after the TIP has already sent the new data to
+the slow host. This caveat can be avoided if the TIP does not
+resend to a non-responding server within the time period that a
+message could possibly be stuck in the network, but could still be
+delivered. Ignoring this situation may result in some accounting
+data being counted twice. Because of the rule to keep old data when
+confronted with matching sequence numbers, on restarting after a
+crash, the TIP should send a "discard" message to all servers in
+order to clear any data which has been buffered for it prior to the
+crash. An alternative to this would be for the TIP to initialize
+its sequence number from a varying source such as time of day.
+
+3. The accounting server similarly need not acknowledge receipt
+of data (by echoing) if it finds itself otherwise occupied. This
+will mean that the ACTSER is not buffering the data, and hence is
+not a candidate for entering the data into the file. However, the
+
+                                      -7-
+TIP may try this ACTSER at a later time (even with the same data),
+with no ill effects.
+
+4. Because of 2 and 3 above, the protocol is robust with respect
+to lost or garbled transmissions of TIP data requests and accounting
+server echo replies. That is, in the event of loss of such a
+message, a re-transmission will occur as the normal procedure.
+
+5. There is no synchronization problem with respect to the
+sequence number used for duplicate detection, since this number is
+maintained only at the TIP site. The accounting server merely
+echoes the sequence number it has received as part of the data.
+
+6. There are, however, some constraints on the size of the
+sequence number field. It must be large enough so that ALL traces
+of the previous use of a given sequence number are totally reMoved
+from the network before the number is re-used by the TIP. The
+sequence number is modulo the size of the largest number represented
+by the number of bits allocated, and is cyclic. Problems generally
+arise when a host proceeds from a service interruption while it was
+holding on to a reply. If during the service interruption, we have
+cycled through our sequence numbers exactly N times (where N is any
+integer), this VERY tardy reply could be mistaken for a reply to the
+new data, which has the same sequence number (i.e. N revolutions of
+sequence numbers later). By utilizing a sufficiently large sequence
+number field (16 bits), and by allowing sufficient time between
+instances of sending new data, we can effectively reduce the
+probability of such an error to zero.
+
+7. Since the data involved in this problem is the source of
+accounting information, care must be taken to avoid duplicate
+entries. This must be done at the expense of potentially losing
+data in certain instances. Other than the obvious TIP malfunction,
+there are two known ways of losing data. One is the situation where
+no accounting server responds to a TIP for an extended period of
+time causing the TIP counters to overflow (highly unlikely if there
+are sufficient Accounting Servers). In this case, the TIP can hold
+the counters at their maximum value until a server comes up, thereby
+keeping the lost accounting data at its minimum. The other
+situation results from adapting the protocol to our insistence on no
+duplicate data in the incremental files. We are vulnerable to data
+loss with no recourse from the time the server receives the "go
+ahead" to update the file with the buffered data (i.e. positive
+acknowledgement) until the time the update is completed and the file
+is closed. An accounting server crash during this period will cause
+that accounting data to be lost. In our initial implementation, we
+have slightly extended this period of vulnerability in order to save
+the TIP from having to buffer the acknowledged data for a short
+period of time. By updating TIP counters from the returned data in
+parallel with sending the "go ahead" acknowledgement, we relieve the
+TIP of the burden of buffering this data until the Request for Next
+Message (RFNM) from the accounting server IMP is received. This
+adds slightly to our period of vulnerability to malfunction, moving
+the beginning of the period from the point when the ACTSER host
+receives the "go ahead", back to the point when the TIP sends off
+
+                                      -8-
+
+the "go ahead" (i.e. a period of one network transit time plus some
+IMP processing time). However, loss of data in this period is
+detectable through the Host Dead or Incomplete Transmission return
+in place of the RFNM. We intend to record such occurrences with the
+
+Network Control Center. If this data loss becomes intolerable, the
+TIP program will be modified to await the RFNM for the positive
+acknowledgement before updating its counters. In such a case, if
+the RFNM does not come, the TIP can discard the buffered data and
+re-transmit new data to other servers.
+
+8. There is adequate protection against the entry of forged data
+into the intermediate accounting files. This is primarily due to
+the system enforced limited access to Host-Imp messages and
+Host-Host links. In addition, messages received on such designated
+limited access links can be easily verified as coming from a TIP.
+The IMP subnet appends the signature (address) of the sending host
+to all of its messages, so there can be no forging. The Accounting
+Server is in a position to check if the source of the message is in
+fact a TIP data generator.
+
+
+Current Parameters of the Protocol
+
+
+In the initial implementation, the TIP sends its accumulated
+accounting data about once every half hour. If it gets no positive
+acknowledgement, it tries to send with greater frequency (about
+every 5 minutes) until it finally succeeds. It can then return to
+the normal waiting period. (A TIP user logout introduces an
+exception to this behavior. In order to re-use the TIP port and its
+associated counters as soon as possible, a user terminating his TIP
+session causes the accounting data to be sent immediately).
+initially, our implementation calls for each TIP to remember a
+"favored" accounting server. At the wait period expiration, the TIP
+will try to deposit the data at its "favored" site. If successful
+within a short timeout period, this site remains the favored site,
+and the wait interval is reset. If unsuccessful within the short
+timeout, the data can be sent to all servers*. The one replying
+first will update its file with the data and also become the
+"favored" server for this TIP. With these parameters, a host would
+have to undergo a proceedable service interruption of more than a
+year in order for the potential sequence number problem outlined in
+(6) above to occur.
+
+
+Concluding Remarks
+
+
+When the implementation is complete, we will have a general
+data accumulation and collection system which can be used to gather
+a wide variety of information. The protocol as outlined is geared
+to gathering data which is either independent of the previously
+accumulated data items (e.g. recording names), or data which
+adheres to a commutative relationship (e.g. counting). This is a
+
+                                      -9-
+
+consequence of the policy of retransmission of different versions of
+the data to different potential collectors (to relieve TIP buffering
+problems).
+
+In the specified version of the protocol, care was taken to
+avoid duplicate data entries, at the cost of possibly losing some
+data through collector malfunction. Data collection problems which
+require avoiding such loss (at the cost of possible duplication of
+some data items) can easily be accommodated with a slight adjustment
+to the protocol. Collected data which does not adhere to the
+commutative relationship indicated above, can also be handled by
+utilizing more buffer space at the data generator sites.
+
+
+The sequence number can be incremented for this new set of data
+messages, and the new data can also be sent to the slow host. In
+this way we won't be giving the tardy response from the old favored
+host unfair advantage in determining which server can respond most
+quickly. If there is no reply to this series of messages, the TIP
+can continue to resend the new data. However, the sequence number
+should not be incremented, since no reply was received, and since
+indiscriminate incrementing of the sequence number increases the
+chance of recycling during the lifetime of a message.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+                                     -10-