diff options
Diffstat (limited to 'doc/rfc/rfc187.txt')
-rw-r--r-- | doc/rfc/rfc187.txt | 589 |
1 files changed, 589 insertions, 0 deletions
diff --git a/doc/rfc/rfc187.txt b/doc/rfc/rfc187.txt new file mode 100644 index 0000000..9828754 --- /dev/null +++ b/doc/rfc/rfc187.txt @@ -0,0 +1,589 @@ + + + + + + + A NETWORK/440 PROTOCOL CONCEPT + +Network Working Group Douglas B. McKay +Request for Comments #187 Donald P. Karp +NIC #7131 IBM Thomas J. Watson Research Center +Categories: C3,C4,C5,C6,D7 Yorktown Heights, New York +Update: None +Obsoletes: None + + + + + + + + This RFC is being circulated as an + information RFC. Its intent is to + convey some of the thinking and + philosophy that went into IBM's + network protocol and overall + network design. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + [Page 1] + +INTRODUCTION + +Network/44O is an experimental project in computer netting that was +undertaken by the Computer Science Department of IBM Research. The +primary objectives of the project have been to understand netting, +identify design problems and implement the solutions to these problems. + +The above objectives have been met since a network has been built and is +presently being operated by the project. Implementation discussions +transpired with another department at Research in order to define a +realistic user system interface. The protocol defined for the project's +network is also the basis for the operation of an IBM OS network. + +The Network/44O project has also been involved in the philosophical and +architectural concepts of network systems. The basic premise in our work +is the concept of a logical network machine.(1) The main theme is to +treat all systems involved in the network as a part of a single (large) +multiprocessor system. Although many of the ideas have been based on +hypothetical concepts, an equal number of ideas were derived from our +network implementation and operating experience. + +The scope of this paper is to describe the philosophy and definition of +a network protocol that is not restricted to any physical configuration. +This is exemploified by the fact that a major portion of the ideas are +implemented in IBM's two major operational networks, one of which is a +distributed configuration and the other a star configuration. + +(1) Intenet - Report 2, February 1, 1970, Computer Science Department, + IBM Corporation, T. J. Watson Research Center, Yorktown Heights, + New York. + +BASIC ASSUMPTIONS + +There was a necessity to delineate many network functions in setting up +an operating protocol. These functions included switching control, +buffer control, message control, and operating control. The operating +control function becomes further complicated as the user is able to +program the network as if it were a single operating system. The +protocol had to be further broken dowm into detailed functions in order +to cope with error recovery and handling techniques. + +The original thoughts on handling these functions were to provide two +basic realms of control. The net control is a higher level function that +recognizes and controls all aspects of net jobs and the execution of job +steps in the network machine. In addition, a communication control +facility (referred to as an "Express Interpreter") was incorporated to +provide fast service for all messages that were to be moved between user +systems without intervention by the net controller. + + + + [Page 2] + + --------------- + | NC | + --------------- ^ + | / + ----> in | out / + -------------------------------- + | Express Exchange | + <---- -------------------------------- + out in ^ + \ + \ + +The above figure illustrates the two major functions with messages +travelling in both directions and directly through the Express Exchange, +except in the case of messages that must be acted on by the Net +Controller. These messages will be explained in detail later. + +These two functions can exist on any system and operate in any physical +configuration providing the control information reflects the +configuration so that proper operation can be maintained. There is no +reference to physical configuration in this paper because of the +flexible nature of the protocol and its adaptability to any +configuration. For example, in the case of a distributed net, the +Express Exchange would pass messages directly to the next station +without any 'NC' overhead. The 'NC' would only come into play at the +final destination and with the same reasoning, the 'NC' would not have +to be present at every station. + +DEFINITIONS + +Before proceeding with the discussion of protocol and control, the basic +message content and concepts must be defined. + +A transmission block is a physical entity that consists of header and +text. A message (logical) consists of many transmission blocks. + + ------------------------------------ + | Header | Text | + ------------------------------------ + +The primary purpose of the network is to deliver messages from one user +system to another in an orderly controlled manner. In order to provide +all the information necessary to maintain control, the header contains a +set of operational functions. These functions are listed below with the +rationale for each. + + + + + + + [Page 3] + +Action Code + +This code selects the immediate destination of the transmitted blocks; +the data may be transmitted directly to the user described in the DSID +field, sent to 'NC', or used by 'EE'. Any conflict in information +between this field and any other field in the header will cause an error +message to be returned to the originating station. The AC will serve a +similar function at the receiving system, indicating to the +communications interface (CI) whether the data block is destined for a +user routine or contains control information for the CI. [The CI is that +function which interfaces directly with the local operating system.] + +Transmission Block Number + +Each block of transmission within the network will contain a sequential +number inserted by the transmitting station. As the block flows through +the network, every station will insert its own number into the block, +overlaying the previous station's number. The purpose of this sequential +number is to guarantee that no messages are lost in the physical +communications process. + +Network Job Identifier + +The function of this field is to associate a transmission block with the +network job to which it belongs. The identifier is assigned to the +network job and to each associated transmission block by the user system +or by the 'NC'. In order to establish a unique name for each job within +the network, the user node identifier (i.e., the name of the user system +originating the net job) will be concatenated with a number generated by +the originating user system. + +Job Step (Marker) + +The purpose here is to uniquely identify a job step within a network +job. The NC will assign this name since it maintains control of all +network jobs. + +Originating System Identifier + +In order to route a block of data from one user system to another, a +unique name must be associated with each user system. The name will be +assigned by the network control group at the time the user system is +accepted as a network participant. The station originating a block of +data will place his assigned identification in this field in every block +of data originating at his system. + + + + + + + [Page 4] + +Message Priority + +This field indicates transmission priority (not to be confused with +processing priority) by block within the queue for a particular user +system. + +Destination System Identifier + +This is similar to the originating node identifier except that the +identification inserted is that of the node for which the block is +destined. + +Logical Message Flags + +The message flags denote the first and last blocks of a message; all +intermediate blocks are noted by their absence. The flag field in +conjunction with the logical message sequence number will enable the +user to determine if any blocks are missing from a message and will also +provide an identifier that can be used to recover missing blocks. When +the first and last indicators are turned on in a single block, the +message is contained within the block. + +Logical Message Sequence Number + +This field is used to number sequentially the blocks within a message. +The first block (denoted by the LMID) will contain the lowest number +assigned (not necessarily 1) within a message while the last block will +contain the highest number. Unlike the TBN, this number will remain +intact throughout the journey of the block through the network. It is +used for error detection and recovery along with the logical message +flag. + +Logical Message Identifier + +Since all communications lines in the network can be multiplexed (blocks +within a message will be interleaved with blocks from other messages), a +message identifier becomes necessary in order to reassemble the message +at the user destination. Therefore; each block within a message will +contain an identifier unique to the message. In the simple case where +the message is contained in one block, the identifier performs no +function. + +When multiple blocks comprise a message, LMID will enable the user to +reassemble the message. There can be any number of physical message +blocks associated with any logical message. It is important that the +that this LMID be used in the messages generated by the CI in response +to NC commands. + + + + + [Page 5] + +Length of Text + +This field contains a binary number that equals the number of characters +in the text portion of the transmission block, Although there are other +means available to obtain this number, it is included in the header for +redundancy check purposes. + +Logical Message Structuring + +The network controller maintains control for every user job submitted by +NJID. The following hierarchical structure is set up for a message +configuration, Any message pertaining to any step in a network job can +be tracked and retransmitted if necessary. It provides a mapping of the +logical structure of any network job into their appropriate message +configuration. + + Net Controller + ------------------------------- + | | | + NJID(1) NJID(2) - - - NJID(N) + ---------------------- . . . . . + | | + Stepname Stepname + ------------------------------- + | | | + LMID(1) LMID(2) LMID(n) + ----------------------------------- + | | + LMSN(1) LMSN(2) LMSN(n) + + +The Express Exchange is a combination of functions. It is basically a +communication handler and store and forward switch. The 'EE' has the +ability to keep track of all messages in the network by TEN (defined +earlier). It is therefore possible to record and reflect the entire +status of the network down to any detail desired. + +PROTOCOL + +The protocol for operating a network system has different levels of +control. The 'EE' must exercise control on the communication link +between any pair of stations. The NC maintains control at the net job +level. However, the functions that each unit performs are combined to +handle special control cases. These complimentary functions will be +discussed in detail as they arise in the protocol discussion. + + + + + + + [Page 6] + +First of all, there must be a series of initialization messages sent +from one station to another before any actual message transmission takes +place. These messages are sent between each station and positive +acknowledgments must be received in order to complete the initial hand +shaking. + +At any point during the transmission of messages an error can occur +which will be detected by a negative acknowledgement. The message in +error will be retransmitted several times. If the error persists, the +line is timed out and will be retried later. The assumption here is the +line may be temporarily noisy and we give it time to quiesce. + +When a station receives an initialization message it is possible to +respond in several ways depending on the status of the user system. + +(1) The station receiving the initialization message can acknowledge + that it is ready to receive and transmit. +(2) Temporarily cannot receive certain logical messages (actual data + transmissions) but can receive special control messages. This + option allows a user system to selectively process net jobs as + facilities on his system become available. +(3) Unable to receive traffic (in other words, the user system is + logically or physically disconnected from the network). +(4) Unable to receive new network job requests but able to handle + traffic for jobs in progress. The user system may have several + jobs in progress that are transmitting and receiving messages. + This acknowledgement gives the user system the ability to allow + these jobs to continue normal processing. + +The last alternative gives the CI at each user system the mechanism to +selectively demultiplex itself to handling one logical message. The +temporarily deactivated. + +Thus, all user systems can selectively halt messages throughout the +entire network. The destination system can selectively halt all messages +for a given NJID or selective halt logical messages within a net job. +The adjacent system would keep accepting messages until its buffers were +filled to some operational threshold limit that must be maintained to +keep the network from coming to a complete standstill, and would issue +selective halts to systems sending to it. It is conceivable that the +message blocks of one logical message would be stored in distributed +segments throughout the network. + +The same selective halt mechanism can be applied in reverse through a +resume message. The resume message can apply to an entire set of +messages for a net job or selective logical messages within a job. The +reinitiation of a transmission takes place between any two stations that +wish to allow more message blocks to be transmitted. The destination + + + + [Page 7] + +station must resume on a particular logical message to allow the message +to reach its final destination and complete transmission through the +network. The LMID of the message header enables the 'EE' and 'NC' to +cooperate in controlling and cleaning up network operation. Not only +does this cooperation between logical levels reduce a duplication of +effort but it enables the control to become realistic and practical. +Complete separation of communications and control functions could cause +a loss of useful information that may not be obtained by other means. + +For example, if a file transmission consisted of many blocks and a +transmission error occurred that the network was unable to recover. The +'EE' would notify the 'NC' of the error occurrence on this file +transmission and then 'NC' would issue purge messages to the 'EE's for +those particular 'logic message' blocks. This mechanism-allows a general +'clean-up' and management of all file transmissions. + +There is also the condition when a receiving system goes down. When this +occurs there may be a number of network jobs involved with that user +system. If the user system remains down for an extended period of time +and the 'EE' buffer resources are filled to threshold limit, it may be +necessary to purge pending message blocks. The 'EE' will notify the 'NC' +of the user system being down and the 'NC' will issue purge commands to +the 'EE' for all pending messages of those netjobs involved with the +down user system. However, in our present implementation the 'EE' uses +disk storage as a logical extension of core for message buffering. In +this operation, the freeing of real core buffers becomes a simple matter +of moving the messages on to disk for later retrieval. In some instances +of transmission a file may be scored in segments at several locations +until the receiving system is able to receive it. Network buffer +resources are treated as a logically simple entity that may be +physically distributed. + +When the user system comes back on the air the involved user network job +will be restarted by issuing resume transmit commands to the 'EE'. If +the user is, an interactive user controlling the network, he would be +notifed of the problem and status of his file transmission. He could +then reinstate his command at a later time. The batch network jab would +be restarted at a point where no unnecessary retransmission would occur. + +It has not been determined how long files should reside in a store and +forward node before being purged from the network. If a backing storage +device is available to network operation, the file can remain for a +longer time but still not indefinitely. + + + + + + + + + [Page 8] + +NC PROTOCOL + +The File Transmission Protocol of the 'NC' is primarily concerned with +the control and transfer of user files for storage, temporary use at a +remote system, and execution. + +The commands and status messages that pertain to the second level logic +of the 'NC' are sent and interpreted by the sending and receiving +systems. All initiation of file transfers result from direct user +commands to the 'NC'. + +The sending system will first be interrogated to determine if the file +is resident at that system. The user must provide the necessary +information to locate the file if it is not catalogued at that system. +This information consists of the physical attributes, such as volume and +serial number. A negative acknowledgement to this message would result +in the termination of a net job step with the reason for termination +returned to the originator. + +When a positive acknowledgement is received by the 'NC' it has two +options available. It must first determine the amount of unused buffer +space in the 'EE' and based on the size of the file to be transferred, +decide whether to have the data set sent immediately or wait for an +acknowledgement to the receive message. + +If the 'NC' decides to move the file regardless of the state of the +receiving system, the 'NC' will issue a send or receive message to both +systems simultaneously. A negative response to the 'receive' message is +taken as a definite refusal by the receiving system to accept the data +transmission. This may result from insufficient resources to handle the +job. If the file was transmitted from the receiving system and is +resident in the network storage facilities, the user will be notified of +its exact location so that he may move it from that point at a later +time. If the 'NC' chose the second option, the file would still be +resident at the originating system. + +A positive acknowledgement will allow the file to continue its normal +flow through the network. Queuing in the 'EE' is always done in order +that 'receive' messages will be sent before the actual data files. The +possibilities include loading the file directly into the job stream +(this step assumes the appropriate JCL is included in the text of the +files) or cataloguing the file at the remote system or storing it for +temporary immediate use. All network files are catalogues with a unique +name that includes User ID (unique at his home node), home node ID +(unique in the network) and his own data name which is unique in his own +work. The 'receive' message may also contain some special instructions +to print or punch a file. + + + + + [Page 9] + +When the sending and receiving stations have completed the file +transfer, they send status messages back to the 'NC' indicating the +completed action. These status messages enable the 'NC' to keep a record +of user network job steps and their progress through the network. These +status messages play an important part in insuring proper checkpoint +restart for the network. + +Files routed specifically for execution require a third status message +from the receiving user system. The system must indicate when and how +the job completed execution. This status message will also contain the +appropriate accounting information to allow dynamic updating of network +user and system accounting information. It is not clear at this time +what should be accounted for in the network, but it is an area of prime +concern to operational networks. + +An error in the second logic level can occur during the file +transmission. There may be an error moving files from devices into the +line buffers or reading from the line buffers. When this occurs, the +operating system must pass this information to the 'NC'. The 'NC' will +then terminate the task involved in this job step and purge all the +network buffers containing blocks of this message transmission. + +When the 'NC' receives the file error message it will immediately send a +'release' message to all the network tasks supporting this job step. +This action will cause the user systems to end all pending tasks +associated with this net job step. In addition a purge message for that +job step will be sent to the 'EE' to purge the message from its buffers. +If there is more than one 'EE' involved, the purge message would be +passed to all other 'EE's. + +This is another example of the 'EE' and 'NC' combining functional +capability and providing effective management of network traffic. The +mapping of message Into the job step allows the 'NC' to selectively +choose all messages it wishes to purge. + +The protocol the user must use for interactive use of the network is +different, There are some standard message types that are provided for +interactive use to insure the proper message recognition from one system +to another, Terminal type traffic will be sent across the network +through the normal netting' interface, The control information that a +terminal sends to the operating system must be incorporated in the +network protocol by the 'CI'. + +The interactive user can request a direct connection to the remote +system through the 'NC'. The 'NC' will notify the remote system of the +user request and establish the user's direct link, The 'NC' becomes a +monitor of the conversation but no longer becomes involved with the +messages. Other conversational messages are sent back and forth through + + + + [Page 10] + +the 'EE' with no interaction by the 'NC'. In the event one of the +systems goes down breaking the logical link, the 'NC' must notify the +other system to terminate the waiting task, In most cases a user system +will be isolated from the second user system by other stations and the +'NC' is a convenient way of notifying other user systems about the +"disaster." + +Once the user's connection is established, three types of messages may +be generated, These messages are identified by the 'AC' field in the +header. The three basic transmission types covered by the protocol are: +a response requested - with or without text included in the message, a +text message which is simply a response to the first or just data to be +printed at the user's terminal, and finally, an interrupt message which +indicates the user wishes to stop a task or talk directly to the +operating system. + +It is important to note that regardless of what type of conditions +exist, there are always enough buffers left to receive an interrupt +message and terminate or flush any existing task and the associated +operation it may be supporting. + +CONCLUSION + +The protocol concepts discussed in this paper were developed to +facilitate the transfer of data between two or more independent systems. +The protocol is able to handle the various pathological cases that may +arise during network operation, A fundamental design consideration in +developing these concepts was to maintain complete recovery from any +recoverable error condition. + +Many of the concepts have been used in an operational star network, with +a single 'EE' and 'NC' located in the central system and a 'CI' located +at each participating system. The successful operation of the network +has proven the feasibility of this protocol. + +ACKNOWLEDGMENT + +The authors wish to acknowledge the design and implementation effort of +the contributing members of the Computer Science Department of the T. J. +Watson Research Center. + + + [ This RFC was put into machine readable form for entry ] + [ into the online RFC archives by Tim Buck 5/97 ] + + + + + + + + [Page 11] + |