summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc636.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc636.txt')
-rw-r--r--doc/rfc/rfc636.txt446
1 files changed, 446 insertions, 0 deletions
diff --git a/doc/rfc/rfc636.txt b/doc/rfc/rfc636.txt
new file mode 100644
index 0000000..cfe7c87
--- /dev/null
+++ b/doc/rfc/rfc636.txt
@@ -0,0 +1,446 @@
+
+NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
+TIP/TENEX Reliability Improvements
+
+
+
+RFC 636 J. Burchfiel - BBN-TENEX
+ B. Cosell - BBN-NET
+NIC 30490 R. Tomlinson - BBN-TENEX
+ D. Walden - BBN-NET
+ 10 June 1974
+
+ TIP/TENEX Reliability Improvements
+
+
+
+
+
+During the past months we have felt strong pressure to improve the
+reliability of TIP/TENEX network connection as improvement in the
+reliability of users' connections between TENEXs and TIPs would have
+major impact on the appearance of overall network reliability due to the
+large number and high visibility of TENEXs and TIPs. Despite the
+emphasis on TIP/TENEX interaction, all work done applies equally well to
+interactions between Hosts of any type.
+
+The remainder of this RFC gives a sketch of our plan for improving the
+reliability of connections bettween TIPs and TENEXs. Major portions of
+this plan have already been implemented (TIP version 322; TENEX version
+1.32) and are now undergoing final test prior to release throughout the
+network. Completion of the implementation of the plan is expected in
+the next quarter.
+
+Our plan for improving the reliability of TIP/TENEX connections is
+concerned with obtaining and maintaining TIP/TENEX connections,
+gracefully recovering from lost connections, and providing clear
+messages to the user whenever the state of his connection changes.
+
+When a TIP user attempts to open a connection to any Host, the Host may
+be down. In this case it would be helpful to provide the user with
+information about the extent of the Host's unavailability. To facilitate
+this, we modified the IMP program to accept and utilize information from
+a Host about when the Host will be back up and for what reason it is
+down. TENEX is to be modified to supply such information before it goes
+down, or through manual means, after it has gone down. When the TIP
+user then attempts to connect to the down TENEX, the IMP local to the
+TENEX returns the information about why and for how long TENEX will be
+down. The TIP is to be modified to report this sort of information to
+the user; e.g., "Host unavailable because of hardware maintenance --
+expected available Tuesday at 16:30 GMT".
+
+The TIP's logger is presently not reentrant. Thus, no single TIP user
+can be allowed to tie up the logger for too long at a time; and the TIP
+
+NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
+TIP/TENEX Reliability Improvements
+
+
+
+therefore enforces a timeout of arbitrary length (about 60 seconds) on
+logger use. However, a heavily loaded Host cannot be guaranteed always
+to respond within 60 seconds to a TIP login request, and at present TIP
+users sometimes cannot get connected to a heavily loaded TENEX. To
+correct this problem, the TIP logger will be made reentrant and the
+timeout on logger use will be eliminated.
+
+One notorious soft spot in the Host/Host protocol which degrades the
+reliability of connections is the Host/Host protocol incremental
+allocate mechanism. Low frequency software bugs, intermittant hardware
+bugs, etc., can lead to the incremental allocates associated with a
+connection getting out of synchronization. When this happens it usually
+appears to the user as if the connection just "hung up". A slight
+addiition to the Host/Host protocol to allow connection allocates to be
+resynchronized has been designed and implemented for both the TIP and
+TENEX.
+
+TENEX has a number of internal consistency checks (called "bughalts")
+which occasionally cause TENEX to halt. Frequently, after diagnosis by
+system personnel, TENEX can be made to proceed without loss from the
+viewpoint of local users. A mechanism is being provided which allows
+TENEX to proceed in this case from the point of view of TIP users of
+TENEX.
+
+The appropriate mechanism entails the following: TENEX will not drop
+its ready line during a bughalt (from which TENEX can usually proceed
+successfully), nor will it clear its NCP tables and abort all
+connections. Instead, after a bughalt TENEX will: discard the message
+it is currently receiving, as the IMP has returned an Incomplete
+Transmission to the source for this message; reinitialize the interface
+to the IMP; and resynchronize, on all connections possible, Host/Host
+protocol allocate inconsistencies due to lost messages, RFNMs etc. The
+latter is done with the same mechanism described above. This procedure
+is not guaranteed to save all data -- a tiny bit may be lost -- but this
+is of secondary importance to maintaining the connection over the TENEX
+bughalt.
+
+The TIP user must be kept fully informed as TENEX halts and then
+continues. Therefore, the TIP has been modified to report "Host not
+responding -- connection suspended" when it senses that TENEX has halted
+(it does this by properly interpreting messages returned by the
+destination IMP). When TENEX resumes service after proceeding from a
+bughalt, the above procedure notifies the TIP that service is restored,
+and the TIP has been modified to report "Service resumed" to all users
+of that Host.
+
+On the other hand, the service interruption may not be proceedable and
+
+
+
+
+
+ 1
+
+NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
+TIP/TENEX Reliability Improvements
+
+
+
+TENEX may have to do a total system reload and restart. In this case
+TENEX will clear its NCP connection tables and send a Host/Host protocol
+reset command to all other Hosts. On receiving this reset command, the
+TIP will report "Host reset -- connection closed" to all users of that
+Host with suspended connections. The TIP user can then re-login to the
+TENEX or to some other Host.
+
+Of couse, the user may not have the patience to wait for service to
+resume after a TENEX bughalt. Instead, he may unilaterally choose to
+connect to some other Host, ignoring the previously suspended
+connection. If TENEX is then able to proceed, its NCP will still think
+its connection to the TIP is good and suitable for use. Thus, we have a
+connection which the TIP thinks is closed and TENEX thinks is open, a
+phenomenon known as the "half-closed connection". An automatic
+procedure for cleanly completing the closing of such a connection has
+been specified and implemented for the TIP and TENEX.
+
+Since TENEX will maintain connections across service interruptions, the
+TIP user will be required to take the security procedure telling the TIP
+to "forget" his suspended connection before abandoning his terminal.
+The command @H 0 (for example) will guarantee that his connection will
+not be reestablished on resumpption of service. Otherwise, his job
+would be left at the mercy of anyone who acquires that terminal.
+
+An appendix follows which describes the Host/Host protocol changes made.
+These changes are backward compatible (with the exception that Hosts
+which have not implemented these changes will sometimes receive
+unrecognizable Host/Host protocol commands which they presumably discard
+without suffering harm). These protocol changes are ad hoc in nature
+but in light of their backward compatibility and potential utility, ARPA
+okayed their addition to the TIP and TENEX NCPs without (we believe) any
+implication that other Hosts have to implement them (although we would
+encourage their widespread implementation).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ 2
+
+NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
+TIP/TENEX Reliability Improvements
+
+
+
+ Appendix - Ad Hoc Change to Host-Host Protocol
+
+ A.1 Introduction
+
+ The current Host-Host protocol (NIC #8246) contains no provisions
+ for resynchronizing the status information kept at the two ends of
+ each connection. In particular, if either host suffers a service
+ interruption, or if a control message is lost or corrupted in an
+ interface or in the subnet, the status information at the two ends
+ of the connection will be inconsistent.
+
+ Since the current protocol provides no way to correct this
+ condition, the NCPs at the two ends stay "confused" forever. An
+ occasional frustrating symptom of this effect is the "lost
+ allocate" phenomenon, where the receiving NCP believes that it has
+ bit and message allocations outstanding, while the sending NCP
+ believes that it does not have any allocation. As a result,
+ information flow over that connection can never be restarted.
+
+ Use of the Host-Host RST (reset) command is inappropriate here, as
+ it destroys all connections between the two hosts. What is needed
+ is a way to resynchronize only the affected connection without
+ disturbing any others.
+
+ A second troublesome symptom of inconsistency in status
+ information is the "half-closed" connection: after a service
+ interruption or network partitioning, one NCP may believe that a
+ connection is still open, while the other believes that the
+ connection is closed (does not exist). When such an inconsistency
+ is discovered, the "open" end of the connection should be closed.
+
+ A.2 The RAR, RAS and RAP commands
+
+ To achieve resynchronization of allocation, we add the following
+ three commands to the host-host protocol.
+
+ 8 bits 8 bits
+ -------------------
+ ! ! !
+ 16 ! RAR ! link !
+ ! ! !
+ -------------------
+ Reset Allocation by Receiver
+
+ 8 bits 8 bits
+ -------------------
+ ! ! !
+
+
+
+
+
+ 3
+
+NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
+TIP/TENEX Reliability Improvements
+
+
+
+ 17 ! RAS ! link !
+ ! ! !
+ -------------------
+ Reset Allocation by Sender
+
+ 8 bits 8 bits
+ -------------------
+ ! ! !
+ 20 ! RAP ! link !
+ ! ! !
+ -------------------
+ Reset Allocation Please
+
+ The RAS command is sent from the Host sending on "link" to the
+ Host receiving on "link". This command may be sent whenever the
+ sending Host desires to resynch the status information associated
+ with the connection (and doesn't have a message in transit through
+ the network). Some circumstances in which the sending Host may
+ choose to do this are:
+
+ 1) After a timeout when there is traffic to move but no
+ allocation (assumes that an allocation has been lost);
+
+ 2) When an inconsistent event occurs associated with that
+ connection (e.g. an outstanding allocation in excess of 2^32
+ bits or 2^16 messages);
+
+ 3) After the sending host has suffered an interruption of
+ network service;
+
+ 4) In response to a RAP (see below).
+
+ The RAR command is sent from the Host receiving on "link" to the
+ Host sending on "link" in response to an RAS. It marks the
+ completion of the connection resynchronization. When the RAR is
+ returned the connection is in the known state of having no
+ messages in transit in either direction and the allocations are
+ zero. The receiving Host may then start afresh with a new
+ allocation and normal message transmission can proceed. Since the
+ RAR may be sent ONLY in response to an RAS, there are no races in
+ the resynchronization. All of the initiative lies with the
+ sending Host.
+
+ If the receiving Host detects an anomalous situation, however,
+ there is no way to inform the sending Host that a
+ resynchronization is desirable. For this purpose, the RAP command
+ is provided. It constitutes a "suggestion" on the part of the
+
+
+
+
+
+ 4
+
+NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
+TIP/TENEX Reliability Improvements
+
+
+
+ receiving Host that the sending Host resynchronize; the sending
+ Host is free to honor it or not as it sees fit. Since there is no
+ obligatory response to a RAP, the receiving Host may send them as
+ frequently as it chooses and no harm can occur. For example, if a
+ message in excess of the allocate arrives, the receiving Host
+ might send RAPs every few seconds until the sending Host replies
+ with no fears of races if one or more RAPs pass a RAS in the
+ network.
+
+ A.3 Resynchronization Procedure
+
+ The resynchronization sequence below may be initiated only by the
+ sender either for internally generated reasons or upon the receipt
+ of a RAP.
+
+ a) Sender - decision to resynch
+
+ 1) Set state to "Wait-for-RAR" (Defer transmission of
+ message.)
+ 2) Wait until no RFNM outstanding
+ 3) Send RAS
+ 4) Zero allocation
+ 5) Ignore allocates until RAR received
+ 6) Set state to "Open" (Resume normal message transmission
+ subject to flow control.)
+
+ b) Receiver - receipt of RAS
+
+ 1) Send RAR
+ 2) Zero allocation
+ 3) Send a new allocation
+
+ When the sender is in the "Wait-for-RAR" state it is not permitted
+ to send new regular messages. (Note that steps 4 and 5 will
+ insure this in the normal course of events.) With the return of
+ the RAR the pipeline contains no messages and no allocates, the
+ outstanding allocation variables at both ends are forced into
+ agreement by setting them both to zero. The receiver will then
+ reconsider bit and message allocation, and send an ALL command for
+ any allocation it cares to do.
+
+ A.4 The Problem of Half-closed Connections
+
+ The above procedures provide a way to resynchronize a connection
+ after a brief lapse by a communications component, which results
+ in lost messages or allocates for an open connection.
+
+
+
+
+
+
+ 5
+
+NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
+TIP/TENEX Reliability Improvements
+
+
+
+ A longer and more severe interruption of communication may result
+ from a partitioning of the subnet or from a service interruption
+ on one of the communicating hosts. It is undesirable to tie up
+ resources indefinitely under such circumstances, so the user is
+ provided with the option of freeing up these resources (including
+ himself) by unilaterally dissolving the connection. Here
+ "unilaterally" means sending the CLS command and closing the
+ connection without receiving the CLS acknowledgement. Note that
+ this is legal only if the subnet indicates that the destination is
+ dead.
+
+ When service is restored ater such an interruption, the status
+ information at the two ends of the connection is out of
+ synchronization. One end believes that the connection is open,
+ and may proceed to use the connection. The disconnecting end
+ believes that the connection is closed (does not exist), and may
+ proceed to re-initialize communication by opening a new connection
+ (RTS or STR command) using the same socket pair or same link.
+
+ The resynchronization needed here is to properly close the open
+ end of the connection when the inconsistency is detected. We will
+ accomplish this by specifying consistency checks and adding a new
+ pair of commands.
+
+ A.5 The NXS and NXR Commands
+
+ The "missing CLS" situation described above can manifest itself in
+ two ways. The first way involves action taken by the NCP at the
+ "open" end of the connection. It may continue to send regular
+ messages on the link of the half-closed connection, or control
+ messages referencing its link. The closed end should respond with
+ an NXS if the message referred to a non-existent transmit link
+ (e.g. was an ALL) or NXR if the message referred to a non-existent
+ receive link (e.g. a data message). On receipt of such an NXS or
+ NXR message, the NCP at the "open" end should close the connection
+ by modifying its tables (without sending any CLS command) thereby
+ bringing both ends into agreement.
+
+ 8 bits 8 bits
+ -------------------
+ ! ! !
+ 21 ! NXR ! link !
+ ! ! !
+ -------------------
+ Non-existent Receive Link
+
+ 8 bits 8 bits
+
+
+
+
+
+ 6
+
+NWG/RFC# 636 JDB BPC RST DCW3 MLK 23-OCT-75 22:27 30490
+TIP/TENEX Reliability Improvements
+
+
+
+ -------------------
+ ! ! !
+ 22 ! NXS ! link !
+ ! ! !
+ -------------------
+ Non-existent Send Link
+
+ A.6 Consistency Checks
+
+ A second way this inconsistency can show up involves actions
+ initiated by the NCP at the "closed" end. It may (thinking the
+ connection is closed) send an STR or RTS to reopen the connection.
+ The NCP at the "open" end should detect the inconsistency when it
+ receives such an RTS or STR command, because it specifies the same
+ socket pair as an existing open connection, or, in the case of an
+ RTS, the same link. In this case, the NCP at the "open" end
+ should close the connection (without sending any CLS command) to
+ bring the two ends into agreement before responding to the
+ RTS/STR.
+
+ A.7 Conclusion
+
+ The scheme presented in Section A.2 to resynchronize allocation
+ has one very important property: the data stream is preserved
+ through the exchange. Since no data is lost, it is safe to
+ initiate resynchronization from either end at any time. When in
+ doubt, resynchronize.
+
+ The consistency checks for RTS and STR, and the NXR and NXS
+ commands provide the synchronization needed to complete the
+ closing of "half-closed" connections.
+
+ The protocol changes above