summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc83.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc83.txt')
-rw-r--r--doc/rfc/rfc83.txt731
1 files changed, 731 insertions, 0 deletions
diff --git a/doc/rfc/rfc83.txt b/doc/rfc/rfc83.txt
new file mode 100644
index 0000000..5e1d9fc
--- /dev/null
+++ b/doc/rfc/rfc83.txt
@@ -0,0 +1,731 @@
+
+
+
+
+
+
+Network Working Group R. Anderson
+Request for Comments: 83 A. Harslem
+NIC: 5621 J. Heafner
+ RAND
+ 18 December 1970
+
+
+ LANGUAGE-MACHINE FOR DATA RECONFIGURATION
+
+Introduction
+
+ In NWG/RFC #80 we mentioned the needs for data reconfiguration along
+ with a complier/executor version of a Form Machine to perform those
+ manipulations.
+
+ This note proposes a different approach to the Form Machine.
+ Specifically, we describe a syntax-driven interpreter that operates
+ on a grammar which is an ordered set of replacement rules. Following
+ the interpreter description are some "real-world" examples of
+ required data reconfigurations that must occur between RAND consoles
+ and the Remote Job System on the UCLA 360/91. Lastly, we suggest
+ that the Protocol Manager mentioned in NWG/RFC #80 can be simplified
+ by using the Form Machine and two system forms (specified a priori in
+ the code).
+
+ Caveat: The Form Machine is not intended to be a general purpose
+ programming language. Note the absence of declaration statements,
+ etc.
+
+THE FORM MACHINE
+
+I. Forms
+
+ A form is an ordered set of rules.
+
+ F = {R1, ...,Rn}
+
+ The first rule (R1) is the rule of highest priority; the last rule
+ (Rn) is the rule of lowest priority.
+
+ The form machine gets as input: 1) a list of addresses and lengths
+ that delimit the input stream(s); 2) a list of addresses and lengths
+ that delimit the output area(s); 3) a pointer to a list of form(s);
+ 4) a pointer to the starting position of the input stream; and 5) a
+ pointer to the starting position of the output area. The Form
+ Machine applies a form to the input string emitting an output string
+ in the output area. The form is applied in the following manner:
+
+
+
+
+Anderson, et. al. [Page 1]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+ Step 1: R1 is made the current rule.
+
+ Step 2: The current rule is applied to the input data.
+
+ Step3: a) If the rule fails, the rule of priority one lower is
+ made current.
+
+ b) If the rule succeeds, the rule of highest priority is
+ made current
+
+ c) When the rule of lowest priority fails, the form fails
+ and application of the form to the input data
+ terminates.
+
+ Step 4: Continue at Step 2.
+
+ In addition, during Step 2, if the remainder of the input string is
+ insufficient to satisfy a rule, then that rule fails and partial
+ results are not emitted. If a rule fills the output string,
+ application of the form is terminated.
+
+II. Rules
+
+ A rule is a replacement operation of the form:
+
+ left-hand-side -> right-hand-side
+
+ Both sides of a rule consists of a series of zero or more _terms_
+ (see below) separated by commas.
+
+ The left-hand-side of the rule is applied to the input string at the
+ current position as a pattern-match operation. If it exactly
+ describes the input, 1) the current input position pointer is
+ advanced over the matched input, 2) the right-hand-side emits data at
+ the current position in the output string, and 3) the current output
+ position pointer is advanced over the emitted data.
+
+III. Terms
+
+ A term is a variable that describes the input string to be matched or
+ the output string to be emitted. A term has three formats.
+
+
+
+
+
+
+
+
+
+
+Anderson, et. al. [Page 2]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+Term Format 1
++---------------------------------------------------------------------+
+| |
+| name ( data replication . value : length ) |
+| type expression expression expression |
+| |
+|_____________________________________________________________________|
+
+ Any of the fields may be absent.
+
+ The _name_ is a symbolic name of the term in the usual programming
+ language sense. It is a single, lower-case alphabetic that is unique
+ within a rule.
+
+ The _data type_ describes the kind of data that the term represents.
+ It is a member of the set:
+
+ {D, O, X, A, E, B}
+
+ Data types have the following meanings and implied unit lengths:
+
+ Char. Meaning Length
+ ----- -------- -------
+ D decimal number 1 bit
+ O octal number 3 bits
+ X hexadecimal number 4 bits
+ A ASCII character 8 bits
+ E EBCDIC character 8 bits
+ B binary number 1 bit
+
+ The _replication expression_ is a multiplier of the value expression.
+ A replication expression has the formats.
+
+ 1) an arithmetic expression of the members of the set:
+
+ {v(name), L(name) , numerals, programming variables}
+
+ The v(name) is a value operator that generates a numeric value of
+ the named data type and L(name) is a length operator that
+ generates a numeric value of the named string length.
+
+ The programming variable is described under term format three.
+ Arithmetic operators are shown below and have their usual
+ meanings.
+
+ {*, /, +, -}
+
+
+
+
+
+Anderson, et. al. [Page 3]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+ or 2) the terminal '#' which means an arbitrary multiple of the value
+ expression.
+
+ The _value expression_ is the unit value of a term expressed in the
+ format indicated by the data type. The value expression is repeated
+ according to the replication expression. A value expression has the
+ format:
+
+ 1) same as part 1) of the replication expression where again
+ v(name) produces a numeric value
+
+ or 2) a single member of the set
+
+ {v(name), quoted literal}
+
+ where v(name) produces a data type (E or A) value). (Note that
+ concatenation is accomplished through multiple terms.)
+
+ The _length expression_ is the length of the field containing the
+ value expression as modified by the replication expression. It has
+ the same formats as a replication expression.
+
+ Thus, the term
+
+ x(E(7.'F'):L(x)) is named x, is of type EBCDIC, has the value
+ 'FFFFFFF' and is of length 7.
+
+ The term
+
+ y(A:8) on the left-hand-side of a rule would be assigned the next
+ 64 bits of input as its value; on the right-hand-side it would
+ only cause the output pointer to be advanced 64 bit positions
+ because is has no value expression (contents) to generate data in
+ the output area.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Anderson, et. al. [Page 4]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+Term Format 2
++---------------------------------------------------------------------+
+| |
+| name (label) |
+| |
++---------------------------------------------------------------------+
+
+ The _label_ is a symbolic reference to a previously named term in the
+ rule. It has the same value as the term by that name.
+
+ The identity operation below illustrates the use of the _label_
+ notation.
+
+ a(A:10) -> (a)
+
+ The (a) on the right-hand side causes the term a to be emitted in the
+ output area. It is equivalent to the rule below.
+
+ a(A:10) -> (Av(a):L(a))
+
+
+Term Format 3
++---------------------------------------------------------------------+
+| |
+| name ( programming connective operand ) |
+| variable expression |
+| |
++---------------------------------------------------------------------+
+
+ A _programming variable_ is a user-controlled data item that does not
+ explicitly appear in the input/output streams. Its value can be
+ compared to input data, to constants, and used to generate output
+ data. Programming variables are single, lower case Greek symbols.
+
+ They are used: to generate indices, counters, etc. in the output
+ area; to compare indices, counters, etc. in the input area, and; to
+ bind replacement rules where the data is context sensitive (explained
+ later).
+
+ A _connective_ is a member of the set:
+
+ {<-, =, !=, >=, <=, <, >}
+
+ The left arrow denotes replacement of the left part by the right
+ part; the other connectives are comparators.
+
+
+
+
+
+
+Anderson, et. al. [Page 5]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+ The _operand expression_ is an arithmetic expression of members of
+ the set:
+
+ {programming variables, v(name), l(name), numerals}
+
+ For example, if the programming variable [alpha] has the value 0 and
+ the rule
+
+ a(H[alpha]:1) -> (a), ([alpha]<-[alpha]+1), (H[alpha]:1)
+
+ is applied exhaustively to string of hexadecimal digits
+
+ 0 1 2 3 4 5
+
+ the output would be the hexadecimal string
+
+ 0 1 1 2 2 3 3 4 4 5 5 6 .
+
+ Note: the above rule is equivalent to
+
+ a(B[alpha]:4) -> (a), ([alpha]<-[alpha]+1), (B[alpha]:4)
+
+
+IV. Restrictions and Interpretations of Term Functions
+
+ When a rule succeeds output will be generated. In the rule
+
+ a(A:#),(A'/':1)->(Ev(a):74),(E'?':1)
+
+ the input string is searched for an arbitrary number of ASCIIs
+ followed by a terminal '/'. The ASCIIs (a) are converted to EBCDIC
+ in a 74-byte field followed by a terminal '?'. This brings out three
+ issues:
+
+ 1. Arbitrary length terms must be separated by literals since the
+ data is not type-specific.
+
+ 2. The # may only be used on the left-hand-side of a rule.
+
+ 3. A truncation padding scheme is needed.
+
+
+
+
+
+
+
+
+
+
+
+Anderson, et. al. [Page 6]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+ The truncation padding scheme is as follows:
+
+ a. Character to Character (types: A, E)
+
+ Output is left-justified with truncation or padding (with
+ blanks) on the right.
+
+ b. Character to Numeric (A, E to D, O, H, B)
+
+ c. Numeric to Character (D, O, H, B to A, E)
+
+ d. Numeric to Numeric (D, O, H, B)
+
+ Output is right-justified with padding or truncation on the
+ left. Padding is zeros if output is numeric.
+
+
+EXAMPLES OF SOME DATA RECONFIGURATIONS
+
+ The following are examples of replacement rule types for specifically
+ needed applications.
+
+ Literal Insertion
+
+ To insert a literal, separate the left-hand-side terms for its
+ insertion on the right.
+
+ a(A:10),b(A:70)->(a),(E'LIT':3),(b)
+
+ The 80 ASCII characters are emitted in the output area with the
+ EBCDIC literal LIT inserted after the first 10 ASCII characters.
+
+ Deletion
+
+ Terms on the left are separated so that the right side may omit
+ unwanted terms.
+
+ (B:7),a(A:10)->(Ev(a):L(a))
+
+ Only the 10 ASCII characters are emitted (as EBCDIC) in the output
+ area, the 7 binary digits are discarded.
+
+ Spacing in the Output Buffer
+
+ Where a pre-formatted output buffer exists (typically a display
+ buffer) spacing can be realized by omitting the replication and
+ value functions from a term on the right.
+
+
+
+
+Anderson, et. al. [Page 7]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+ a(A:74)->(E:6),(Ev(a):74)
+
+ The (E:6) causes 48 bit positions to be skipped over in the output
+ area, then the 74 ASCII characters are converted to EBCDIC and
+ emitted at the current output position.
+
+ Arbitrary Lengths
+
+ Some devices/programs generate a variable number of characters per
+ line and it is desirable to produce fixed-length records from
+ them.
+
+ a(A:#) -> (Ev(a):74)
+
+ The ASCII characters are truncated or padded as required and
+ converted to EBCDIC in a 74 character field.
+
+ Transposition
+
+ Fields to be transposed should be isolated as terms on the left.
+
+ a(X:2),b(A:#)->(Ev(b):L(b)),(a)
+
+ String Length Computation
+
+ Some formats require the string length as part of the data stream.
+ This can be accomplished by the length function.
+
+ a(E:10),b(X'FF':2)->(BL(a)+L(b)+8:8),(Av(a):L(a)),(b)
+
+ The length term is emitted first, in a 8 bit field. In this case
+ the length includes the length field as well as the ASCII
+ character field.
+
+ Expansion and Compression of repeated Symbols
+
+ The following rule packs repeated symbols.
+
+ a(E:1), b(E#*v(a):L(b)) -> (BL(b)+1:8),(a)
+
+ Given the input string below, three successive applications of the
+ rule will emit the output string shown.
+
+ Input: XXXXYYZZZZZZZ
+
+ Output: 4X2Y7Z
+
+
+
+
+
+Anderson, et. al. [Page 8]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+ APPLICATION OF THE FORM MACHINE TO PROGRAM PROTOCOLS
+
+ The Protocol Manager mentioned in NWG/RFC #80 needs several
+ interesting features that are properties of the above Form Machine.
+
+ In certain instances during a protocol dialog it might be acceptable
+ to get either an accept on connection A or an allocation on connect
+ B, that is, the order is sometimes unimportant. The defined
+ procedure for applying rules allows for order independence.
+
+ A logger might send us a socket number embedded in a regular message
+ -- the socket number is intended to be the first of a contiguous set
+ of sockets that we can use to establish connections with some
+ program. We wish to extract the socket number field from the regular
+ message, perhaps convert it to another format, and add to it to get
+ the additional socket names. As a result of the regular message we
+ wish to emit several INIT system calls that include the socket
+ numbers that we have computed. The value operator and the arithmetic
+ operators of the Form Machine can do this.
+
+ A third property of the Form Machine that is applicable to protocols
+ is inter- and intra-rule binding to resolve context sensitive
+ information. In general we wish rules to be order independent but in
+ certain cases we wish to impose an ordering. Using the logger in
+ NWG/RFC #66 as an example, the close that is sent by the logger can
+ have two different meanings depending upon its context. If the close
+ is sent before the regular message containing the socket number then
+ it means call refused. If the regular message precedes the close
+ then the call is accepted. Since the close has contextual meaning,
+ we must bind it to the regular message to avoid introducing IF and
+ THEN into the Form Machine language.
+
+ Assume for a moment that we can express system calls in Form Machine
+ notation. (The notation below is for _illustration only_ and is not
+ part of the Form Machine language.) We have two ways to bind the
+ regular message to the close. By intra-rule binding we insist that
+ the close be preceded by a regular message.
+
+ Reg. Msg , Close ->
+
+ Now assume for a moment that the remote party must have an echo after
+ each transmission. Since we must emit an echo after receiving the
+ regular message and before the close is sent, then we must use
+ inter-rule binding. This can be accomplished with the programming
+ variable. It is assigned a value when the regular message is
+ received and the value is tested when the close is received.
+
+ Reg. Msg -> Echo , ([lambda]+1)
+
+
+
+Anderson, et. al. [Page 9]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+ Close, ([lambda]=1) ->
+
+ To illustrate inter-rule binding via the programming variable the
+ connection protocol in NWG/RFC #66 could be represented by passing
+ the following form to a protocol manager. (The notation below is for
+ _illustration only_ and is not part of the Form Machine language).
+
+ 1. ->INIT(parameters) , ([alpha]<-0)
+
+ Send an INIT(RTS).
+
+ 2. INIT(parameters) -> ALLOCATE(parameters)
+
+ Send an allocate in response to the connection completion (an STR
+ received).
+
+ 3. Reg. Msg (parameters) -> ([alpha]<-1)
+
+ When the messages bearing link numbers is received, set an
+ internal indicator. (The extraction of the link is not
+ illustrated.)
+
+ 4. CLOSE(parameters),([alpha]=1) ->
+ INIT(parameters),INIT(parameters)
+
+ When the close is received following the regular message [2] is
+ checked to see that the regular message was received before
+ establishing the duplex connection. If the close is received with
+ no regular message preceding it (call refused) the form will fail
+ (since no rules is satisfied).
+
+ This protocol can be handled via a single form containing four
+ replacement rules. We have examined similar representations for more
+ complex protocol sequences. Such protocol sequences, stored by name,
+ are an asset to the user; he can request a predefined sequence to be
+ executed automatically.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Anderson, et. al. [Page 10]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+Two System Forms to Handle Protocol Statements
+
+ Assume that we have a Protocol Manager that manages protocol
+ sequences between consoles and the Network. The consoles generate
+ and accept EBCDIC character strings and the Network transmits binary
+ digits. The console user has a language similar to system calls in
+ which he can create and store protocol sequences via Protocol
+ Manager, and at the same time he can indicate which commands are
+ expected to be sent and which are to be received. Upon command the
+ Protocol Manager can execute this sequence with the Network,
+ generating commands and validating those received. Assume also that
+ the Protocol Manager displays the dialog for the console user as it
+ progresses.
+
+ In order to translate between console and Network for generating,
+ comparing, and displaying commands, the Protocol Manager can use the
+ Form Machine. Two system forms are needed, see Fig. 1. One is a
+ console-to-Network set of rules containing EBCDIC to binary for all
+ legal commands; the other is a mirror image for Network-to-console.
+
+REQUEST
+
+ Since language design is not our forte, we would like comments from
+ those with more experience than we.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Anderson, et. al. [Page 11]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+ System form:
+ C -> N
+ +----------+
+ | one rule |
+ | for each |
+ | legal |
+ | command |
+ +-------|- - - - - |<----+
+ | +----------+ |
+ Binary | | EBCDIC
+ | |
+ +----------+ | | +----------+
+ | |<---+ +------| |
+ | Network | | Consoles |
+ | |----+ +----->| |
+ +----------+ | | +----------+
+ | Binary EBCDIC |
+ | |
+ | |
+ | System form: |
+ | N -> C |
+ | +----------+ |
+ +------>|- - - - - |-----+
+ | one rule |
+ | for each |
+ | legal |
+ | response |
+ +----------+
+
+ Figure 1 -- Application of System Form for Protocol Management
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Anderson, et. al. [Page 12]
+
+RFC 83 Language Machine For Data 18 December 1970
+
+
+Distribution List
+-----------------
+
+ Alfred Cocanower - MERIT
+ Gerry Cole - SDC
+ Les Earnest - Stanford
+ Bill English - SRI
+ James Forgie - Lincoln Laboratory
+ Jennings Computer Center - Case
+ Nico Haberman - Carnegie-Melon
+ Robert Kahn - BB&N
+ Peggy Karp - MITRE
+ Benita Kirstel - UCLA
+ Tom Lawrence - RADC/ISIM
+ James Madden - University of Illinois
+ George Mealy - Harvard
+ Thomas O'Sullivan - Raytheon
+ Larry Roberts - ARPA
+ Ron Stoughton - UCSB
+ Albert Vezza- MIT
+ Barry Wessler - Utah
+
+
+
+ [The original document included non-ASCII characters. The Greek
+ letters Alpha and Lambda have been spelled out and enclosed in
+ square brackets "[ ]". A curly "l" character
+ has been replaced by capital L. Left and right arrows have been
+ replaced by "<-" and "->" respectively. RFC-Editor]
+
+
+ [This RFC was put into machine readable form for entry]
+ [into the online RFC archives by Lorrie Shiota, 10/01]
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Anderson, et. al. [Page 13]
+