1 files changed, 446 insertions, 0 deletions
diff --git a/doc/rfc/rfc636.txt b/doc/rfc/rfc636.txt
new file mode 100644
index 0000000..cfe7c87
--- /dev/null
+++ b/doc/rfc/rfc636.txt
@@ -0,0 +1,446 @@
+
+NWG/RFC# 636                 JDB BPC RST DCW3 MLK 23-OCT-75 22:27  30490
+TIP/TENEX Reliability Improvements
+
+
+
+RFC 636                                    J. Burchfiel  - BBN-TENEX
+                                           B. Cosell     - BBN-NET
+NIC 30490                                  R. Tomlinson  - BBN-TENEX
+                                           D. Walden     - BBN-NET
+                                                            10 June 1974
+                                                                       
+                   TIP/TENEX Reliability Improvements
+
+
+
+                                                                       
+
+During the past months we have felt strong pressure to improve the
+reliability of TIP/TENEX network connection as improvement in the
+reliability of users' connections between TENEXs and TIPs would have
+major impact on the appearance of overall network reliability due to the
+large number and high visibility of TENEXs and TIPs.  Despite the
+emphasis on TIP/TENEX interaction, all work done applies equally well to
+interactions between Hosts of any type.                                
+
+The remainder of this RFC gives a sketch of our plan for improving the
+reliability of connections bettween TIPs and TENEXs.  Major portions of
+this plan have already been implemented (TIP version 322; TENEX version
+1.32) and are now undergoing final test prior to release throughout the
+network.  Completion of the implementation of the plan is expected in
+the next quarter.                                                      
+
+Our plan for improving the reliability of TIP/TENEX connections is
+concerned with obtaining and maintaining TIP/TENEX connections,
+gracefully recovering from lost connections, and providing clear
+messages to the user whenever the state of his connection changes.     
+
+When a TIP user attempts to open a connection to any Host, the Host may
+be down.  In this case it would be helpful to provide the user with
+information about the extent of the Host's unavailability. To facilitate
+this, we modified the IMP program to accept and utilize information from
+a Host about when the Host will be back up and for what reason it is
+down.  TENEX is to be modified to supply such information before it goes
+down, or through manual means, after it has gone down.  When the TIP
+user then attempts to connect to the down TENEX, the IMP local to the
+TENEX returns the information about why and for how long TENEX will be
+down.  The TIP is to be modified to report this sort of information to
+the user; e.g., "Host unavailable because of hardware maintenance --
+expected available Tuesday at 16:30 GMT".                              
+
+The TIP's logger is presently not reentrant.  Thus, no single TIP user
+can be allowed to tie up the logger for too long at a time; and the TIP
+
+NWG/RFC# 636                 JDB BPC RST DCW3 MLK 23-OCT-75 22:27  30490
+TIP/TENEX Reliability Improvements
+
+
+
+therefore enforces a timeout of arbitrary length (about 60 seconds) on
+logger use.  However, a heavily loaded Host cannot be guaranteed always
+to respond within 60 seconds to a TIP login request, and at present TIP
+users sometimes cannot get connected to a heavily loaded TENEX.  To
+correct this problem, the TIP logger will be made reentrant and the
+timeout on logger use will be eliminated.                              
+
+One notorious soft spot in the Host/Host protocol which degrades the
+reliability of connections is the Host/Host protocol incremental
+allocate mechanism.  Low frequency software bugs, intermittant hardware
+bugs, etc., can lead to the incremental allocates associated with a
+connection getting out of synchronization.  When this happens it usually
+appears to the user as if the connection just "hung up".  A slight
+addiition to the Host/Host protocol to allow connection allocates to be
+resynchronized has been designed and implemented for both the TIP and
+TENEX.                                                                 
+
+TENEX has a number of internal consistency checks (called "bughalts")
+which occasionally cause TENEX to halt.  Frequently, after diagnosis by
+system personnel, TENEX can be made to proceed without loss from the
+viewpoint of local users.  A mechanism is being provided which allows
+TENEX to proceed in this case from the point of view of TIP users of
+TENEX.                                                                 
+
+The appropriate mechanism entails the following:  TENEX will not drop
+its ready line during a bughalt (from which TENEX can usually proceed
+successfully), nor will it clear its NCP tables and abort all
+connections.  Instead, after a bughalt TENEX will:  discard the message
+it is currently receiving, as the IMP has returned an Incomplete
+Transmission to the source for this message; reinitialize the interface
+to the IMP; and resynchronize, on all connections possible, Host/Host
+protocol allocate inconsistencies due to lost messages, RFNMs etc.  The
+latter is done with the same mechanism described above.  This procedure
+is not guaranteed to save all data -- a tiny bit may be lost -- but this
+is of secondary importance to maintaining the connection over the TENEX
+bughalt.                                                              
+
+The TIP user must be kept fully informed as TENEX halts and then
+continues.  Therefore, the TIP has been modified to report "Host not
+responding -- connection suspended" when it senses that TENEX has halted
+(it does this by properly interpreting messages returned by the
+destination IMP).  When TENEX resumes service after proceeding from a
+bughalt, the above procedure notifies the TIP that service is restored,
+and the TIP has been modified to report "Service resumed" to all users
+of that Host.                                                         
+
+On the other hand, the service interruption may not be proceedable and
+
+
+
+
+
+                                   1
+
+NWG/RFC# 636                 JDB BPC RST DCW3 MLK 23-OCT-75 22:27  30490
+TIP/TENEX Reliability Improvements
+
+
+
+TENEX may have to do a total system reload and restart.  In this case
+TENEX will clear its NCP connection tables and send a Host/Host protocol
+reset command to all other Hosts.  On receiving this reset command, the
+TIP will report "Host reset -- connection closed" to all users of that
+Host with suspended connections.  The TIP user can then re-login to the
+TENEX or to some other Host.                                          
+
+Of couse, the user may not have the patience to wait for service to
+resume after a TENEX bughalt.  Instead, he may unilaterally choose to
+connect to some other Host, ignoring the previously suspended
+connection.  If TENEX is then able to proceed, its NCP will still think
+its connection to the TIP is good and suitable for use.  Thus, we have a
+connection which the TIP thinks is closed and TENEX thinks is open, a
+phenomenon known as the "half-closed connection".  An automatic
+procedure for cleanly completing the closing of such a connection has
+been specified and implemented for the TIP and TENEX.                 
+
+Since TENEX will maintain connections across service interruptions, the
+TIP user will be required to take the security procedure telling the TIP
+to "forget" his suspended connection before abandoning his terminal.
+The command @H 0 (for example) will guarantee that his connection will
+not be reestablished on resumpption of service.  Otherwise, his job
+would be left at the mercy of anyone who acquires that terminal.      
+
+An appendix follows which describes the Host/Host protocol changes made.
+These changes are backward compatible (with the exception that Hosts
+which have not implemented these changes will sometimes receive
+unrecognizable Host/Host protocol commands which they presumably discard
+without suffering harm).  These protocol changes are ad hoc in nature
+but in light of their backward compatibility and potential utility, ARPA
+okayed their addition to the TIP and TENEX NCPs without (we believe) any
+implication that other Hosts have to implement them (although we would
+encourage their widespread implementation).                           
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+                                   2
+
+NWG/RFC# 636                 JDB BPC RST DCW3 MLK 23-OCT-75 22:27  30490
+TIP/TENEX Reliability Improvements
+
+
+
+             Appendix - Ad Hoc Change to Host-Host Protocol           
+
+   A.1  Introduction                                                 
+
+      The current Host-Host protocol (NIC #8246) contains no provisions
+      for resynchronizing the status information kept at the two ends of
+      each connection.  In particular, if either host suffers a service
+      interruption, or if a control message is lost or corrupted in an
+      interface or in the subnet, the status information at the two ends
+      of the connection will be inconsistent.                       
+
+      Since the current protocol provides no way to correct this
+      condition, the NCPs at the two ends stay "confused" forever.  An
+      occasional frustrating symptom of this effect is the "lost
+      allocate" phenomenon, where the receiving NCP believes that it has
+      bit and message allocations outstanding, while the sending NCP
+      believes that it does not have any allocation.  As a result,
+      information flow over that connection can never be restarted. 
+
+      Use of the Host-Host RST (reset) command is inappropriate here, as
+      it destroys all connections between the two hosts.  What is needed
+      is a way to resynchronize only the affected connection without
+      disturbing any others.                                        
+
+      A second troublesome symptom of inconsistency in status
+      information is the "half-closed" connection:  after a service
+      interruption or network partitioning, one NCP may believe that a
+      connection is still open, while the other believes that the
+      connection is closed (does not exist).  When such an inconsistency
+      is discovered, the "open" end of the connection should be closed.
+                                                                    
+   A.2  The RAR, RAS and RAP commands                               
+
+      To achieve resynchronization of allocation, we add the following
+      three commands to the host-host protocol.                     
+
+              8 bits   8 bits
+            -------------------
+            !        !        !
+         16 !  RAR   !  link  !
+            !        !        !
+            -------------------
+         Reset Allocation by Receiver
+
+              8 bits   8 bits
+            -------------------
+            !        !        !
+
+
+
+
+
+                                   3
+
+NWG/RFC# 636                 JDB BPC RST DCW3 MLK 23-OCT-75 22:27  30490
+TIP/TENEX Reliability Improvements
+
+
+
+         17 !  RAS   !  link  !
+            !        !        !
+            -------------------
+         Reset Allocation by Sender
+
+              8 bits   8 bits
+            -------------------
+            !        !        !
+         20 !  RAP   !  link  !
+            !        !        !
+            -------------------
+         Reset Allocation Please
+
+      The RAS command is sent from the Host sending on "link" to the
+      Host receiving on "link".  This command may be sent whenever the
+      sending Host desires to resynch the status information associated
+      with the connection (and doesn't have a message in transit through
+      the network).  Some circumstances in which the sending Host may
+      choose to do this are:                                        
+
+         1)  After a timeout when there is traffic to move but no
+         allocation (assumes that an allocation has been lost);
+
+         2)  When an inconsistent event occurs associated with that
+         connection (e.g. an outstanding allocation in excess of 2^32
+         bits or 2^16 messages);
+
+         3)  After the sending host has suffered an interruption of
+         network service;
+
+         4)  In response to a RAP (see below).
+
+      The RAR command is sent from the Host receiving on "link" to the
+      Host sending on "link" in response to an RAS.  It marks the
+      completion of the connection resynchronization.  When the RAR is
+      returned the connection is in the known state of having no
+      messages in transit in either direction and the allocations are
+      zero.  The receiving Host may then start afresh with a new
+      allocation and normal message transmission can proceed.  Since the
+      RAR may be sent ONLY in response to an RAS, there are no races in
+      the resynchronization.  All of the initiative lies with the
+      sending Host.                                                 
+
+      If the receiving Host detects an anomalous situation, however,
+      there is no way to inform the sending Host that a
+      resynchronization is desirable.  For this purpose, the RAP command
+      is provided.  It constitutes a "suggestion" on the part of the
+
+
+
+
+
+                                   4
+
+NWG/RFC# 636                 JDB BPC RST DCW3 MLK 23-OCT-75 22:27  30490
+TIP/TENEX Reliability Improvements
+
+
+
+      receiving Host that the sending Host resynchronize; the sending
+      Host is free to honor it or not as it sees fit.  Since there is no
+      obligatory response to a RAP, the receiving Host may send them as
+      frequently as it chooses and no harm can occur.  For example, if a
+      message in excess of the allocate arrives, the receiving Host
+      might send RAPs every few seconds until the sending Host replies
+      with no fears of races if one or more RAPs pass a RAS in the
+      network.                                                      
+
+   A.3  Resynchronization Procedure                                 
+
+      The resynchronization sequence below may be initiated only by the
+      sender either for internally generated reasons or upon the receipt
+      of a RAP.                                                     
+
+         a)  Sender - decision to resynch
+
+            1)  Set state to "Wait-for-RAR" (Defer transmission of
+            message.)
+            2)  Wait until no RFNM outstanding
+            3)  Send RAS
+            4)  Zero allocation
+            5)  Ignore allocates until RAR received
+            6)  Set state to "Open" (Resume normal message transmission
+            subject to flow control.)
+
+         b)  Receiver - receipt of RAS
+
+            1)  Send RAR
+            2)  Zero allocation
+            3)  Send a new allocation
+
+      When the sender is in the "Wait-for-RAR" state it is not permitted
+      to send new regular messages.  (Note that steps 4 and 5 will
+      insure this in the normal course of events.)  With the return of
+      the RAR the pipeline contains no messages and no allocates, the
+      outstanding allocation variables at both ends are forced into
+      agreement by setting them both to zero.  The receiver will then
+      reconsider bit and message allocation, and send an ALL command for
+      any allocation it cares to do.                                
+
+   A.4  The Problem of Half-closed Connections                      
+
+      The above procedures provide a way to resynchronize a connection
+      after a brief lapse by a communications component, which results
+      in lost messages or allocates for an open connection.         
+
+
+
+
+
+
+                                   5
+
+NWG/RFC# 636                 JDB BPC RST DCW3 MLK 23-OCT-75 22:27  30490
+TIP/TENEX Reliability Improvements
+
+
+
+      A longer and more severe interruption of communication may result
+      from a partitioning of the subnet or from a service interruption
+      on one of the communicating hosts.  It is undesirable to tie up
+      resources indefinitely under such circumstances, so the user is
+      provided with the option of freeing up these resources (including
+      himself) by unilaterally dissolving the connection.  Here
+      "unilaterally" means sending the CLS command and closing the
+      connection without receiving the CLS acknowledgement.  Note that
+      this is legal only if the subnet indicates that the destination is
+      dead.                                                         
+
+      When service is restored ater such an interruption, the status
+      information at the two ends of the connection is out of
+      synchronization.  One end believes that the connection is open,
+      and may proceed to use the connection.  The disconnecting end
+      believes that the connection is closed (does not exist), and may
+      proceed to re-initialize communication by opening a new connection
+      (RTS or STR command) using the same socket pair or same link. 
+
+      The resynchronization needed here is to properly close the open
+      end of the connection when the inconsistency is detected.  We will
+      accomplish this by specifying consistency checks and adding a new
+      pair of commands.                                             
+
+   A.5  The NXS and NXR Commands                                    
+
+      The "missing CLS" situation described above can manifest itself in
+      two ways.  The first way involves action taken by the NCP at the
+      "open" end of the connection.  It may continue to send regular
+      messages on the link of the half-closed connection, or control
+      messages referencing its link.  The closed end should respond with
+      an NXS if the message referred to a non-existent transmit link
+      (e.g. was an ALL) or NXR if the message referred to a non-existent
+      receive link (e.g. a data message).  On receipt of such an NXS or
+      NXR message, the NCP at the "open" end should close the connection
+      by modifying its tables (without sending any CLS command) thereby
+      bringing both ends into agreement.                            
+
+              8 bits   8 bits
+            -------------------
+            !        !        !
+         21 !  NXR   !  link  !
+            !        !        !
+            -------------------
+         Non-existent Receive Link
+
+              8 bits   8 bits
+
+
+
+
+
+                                   6
+
+NWG/RFC# 636                 JDB BPC RST DCW3 MLK 23-OCT-75 22:27  30490
+TIP/TENEX Reliability Improvements
+
+
+
+            -------------------
+            !        !        !
+         22 !  NXS   !  link  !
+            !        !        !
+            -------------------
+         Non-existent Send Link
+
+   A.6  Consistency Checks                                          
+
+      A second way this inconsistency can show up involves actions
+      initiated by the NCP at the "closed" end.  It may (thinking the
+      connection is closed) send an STR or RTS to reopen the connection.
+      The NCP at the "open" end should detect the inconsistency when it
+      receives such an RTS or STR command, because it specifies the same
+      socket pair as an existing open connection, or, in the case of an
+      RTS, the same link.  In this case, the NCP at the "open" end
+      should close the connection (without sending any CLS command) to
+      bring the two ends into agreement before responding to the
+      RTS/STR.                                                      
+
+   A.7  Conclusion                                                  
+
+      The scheme presented in Section A.2 to resynchronize allocation
+      has one very important property:  the data stream is preserved
+      through the exchange.  Since no data is lost, it is safe to
+      initiate resynchronization from either end at any time.  When in
+      doubt, resynchronize.                                         
+
+      The consistency checks for RTS and STR, and the NXR and NXS
+      commands provide the synchronization needed to complete the
+      closing of "half-closed" connections.                         
+
+      The protocol changes above