diff options
| author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 | 
|---|---|---|
| committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 | 
| commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
| tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc468.txt | |
| parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) | |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc468.txt')
| -rw-r--r-- | doc/rfc/rfc468.txt | 395 | 
1 files changed, 395 insertions, 0 deletions
| diff --git a/doc/rfc/rfc468.txt b/doc/rfc/rfc468.txt new file mode 100644 index 0000000..10ce629 --- /dev/null +++ b/doc/rfc/rfc468.txt @@ -0,0 +1,395 @@ + + + + + + +Network Working Group                                         R.  Braden +Request for Comment: 468                                        UCLA/CCN +NIC: 14742                                                 March 8, 1973 + + +                          FTP DATA COMPRESSION + +I.  INTRODUCTION + +APOLOGIA + +   Major design objectives of the proposed File Transfer Protocol (FTP) +   are reliability and efficiency for transmission of large files. +   Efficiency has two faces: efficiency of the host CPU's, and efficient +   use of the Network bandwidth.  Block mode is intended to minimize CPU +   overhead for bandwidth efficiency, there is a mode called "HASP" in +   RFC 454.  The "HASP" mode of FTP is really transmission with data +   compression, i.e., an encoding scheme to reduce the information +   redundancy in the messages. + +   RFC 454 contains no explicit definition of the "HASP" or compressed +   mode, but instead notes that a future RFC by yours truly will define +   the mode.  Students of FTP may find this scarcely credible, but you +   are now reading the promised RFC.  It turned out to be much farther +   in the future than any of us expected.  Mea Culpa. + +GENERAL CONSIDERATIONS + +   In the early years of the Network, its major uses have been remote +   terminal interactions and the small-to-medium-sized file transmission +   typical of remote job entry.  As facilities such as the Illiac IV and +   the Data Machine become operational on the Network, and the Network +   community begins to include users with heavy data transmission +   requirements, large file transmission will become a major mode of +   Network use.  For example, one user of CCN expects to send 2 x 10**8 +   bits of data _each_ _day_ over the Network. + +   Local byte compression of the type proposed here is particular +   effective for reducing the size of "printer" files such as those +   transmitted under the Network RJE protocol.  Experience with CCN's +   RJS service has shown a typical compression of print files by a +   factor of between two and three.  Since FTP was intended to contain +   the data transfer part of Network RJE protocol as a subset, it is +   appropriate to include a print file compression mechanism in FTP. +   These considerations led the FTP committee to include a compressed +   mode within FTP. + + + + + +Braden                                                          [Page 1] + +RFC 468                   FTP Data Compression                March 1973 + + +   The two main arguments for data compression are economics and +   convenience (usability).  Consider first economics, which is +   essentially a trade-off between CPU time and transmission costs.  Of +   course, as long as Network use is a free commodity, the economics of +   data compression are all bad.  That happy state won't last forever. +   What does data compression cost? + +   Let us consider only simple linear compression schemes, such as the +   one proposed here.  By linear, I mean that the CPU time to examine a +   source record is proportional to number of bytes in the record.  A +   simple linear scheme could detect repeated single characters, for +   example.  One could imagine quadratic schemes, which detected +   repeated substrings; but except for possible special circumstance +   where the source stings have some structure known to the compression +   algorithm, the CPU economics don't favor quadratic compression. + +   Assuming a reasonable figure for large-scale CPU costs in the +   generation of CCN's 360/91, we concluded that an upper bound on CPU +   costs for total compression and decompression would be 5 cents per +   megabit; this is based on very loose coding of a simple linear +   algorithm.  This may be compared with the projected Network +   transmission costs of over 30 cents per megabit (possibly a lot +   over). + +   Thus, the CPU time to conserve bandwidth costs significantly less +   than the bandwidth saved.  Both CPU costs and bandwidth costs are +   trending downward, but it seems exceedingly unlikely that the ratio +   of CPU cost to bandwidth cost for linear compression will reverse in +   the next few years.  On the other hand, this calculation clearly +   discourages one from using quadratic compression. + +WHY HASP + +   CCN's batch remote job entry protocol NETRJS (see RFC #189, July 15, +   1971) was designed to include two data transfer modes, truncated and +   compressed.  The NETRJS truncated mode is essentially identical to +   current FTP block mode record structure (except for minor bit format +   differences).  The compressed mode of NETRJS uses an adaptation of +   the particular compression scheme which is incorporated in the +   "Multileaving protocol" of the binary synchronous rje support in +   IBM's HASP system. + +   Although it isn't really necessary for the purpose of defining a +   compression scheme in FTP, I have included an appendix summarizing +   very briefly the nature of HASP and its rje package.  That appendix +   may be considered cultural enrichment for those in the Network +   Community who have been denied the privilege of being an IBM +   customer.  After all, I know a lot of HASP experts who never heard of + + + +Braden                                                          [Page 2] + +RFC 468                   FTP Data Compression                March 1973 + + +   TENEX! More seriously, because HASP is widely used on IBM machines, +   the HASP compression scheme is also widely used.  In designing +   NETRJS, we chose the HASP scheme of compression because of its +   ubiquity and plausibility. + +   However, certain details of the HASP bit formats are inappropriate or +   sub-optimal for FTP.  Therefore, our proposal for compressed mode of +   FTP is only an adaptation of the HASP compression scheme. + +   It should be clear from Appendix A that the compression scheme of +   HASP, even if used literally, is a very minor and incidental part of +   that system.  Although we ought to properly credit the intellectual +   origin of FTP's compressed mode, it seems a little strange to call +   the compressed mode in FTP the "HASP mode".  I trust this will be +   rectified by the forthcoming FTP meeting. + +II.  PROPOSED FTP COMPRESSED DATA MODE + +   Byte size is B bits.  Figures above boxes are field lengths in bits. + +                                  n bytes of data +                               /--------/\--------\ +              1    B-1        /   B              B \ +             +---+------+    +--------+     +--------+ +Byte String: | 0 |  n   |    |   d    |. . .|   d    | +             |   |      |    |    1   |     |    n   | +             +---+------+    +--------+     +--------+ +                  String of n data bytes d(1),...,d(n) +                  Count n must be positive + +                     2     B-2            B +                   +----+------+    +---------+ +Replicated Byte:   | 1 0|   n  |    |    d    | +                   +----+------+    +---------+ +                 String consisting of n replications of the data byte d + +                     2    B-2 +                  +----+------+ +Filler String:    | 1 1|   n  | +                  +----+------+ +                 String of n filler bytes.  The filler byte is a "space" +                 character for ASCII or EBCDIC type, or a binary zero +                 byte for Image or Local Byte Type. + +                                B            B +                          +----------+ +----------+ +Control Escape Sequence:  | 0......0 | |      C   |    (see below) +                          +----------+ +----------+ + + + +Braden                                                          [Page 3] + +RFC 468                   FTP Data Compression                March 1973 + + +   The control byte "C" which is the second byte of a control escape +   sequence is to have the same coding as the descriptor byte in Block +   Mode.  This includes end-of-file and end-of-record indications.  I +   will not specify this further because there is some question at +   present about the exact coding of the Block Mode descriptor byte. + +   Following the example of APL*, we have let the meaning of the filler +   (blank or 0) be determined by the type: character (ASCII|EBCDIC) vs. +   binary (Image|Local Byte).  If byte size is equal to the word size of +   the transmitting host, the compressed mode allows one to send sparse +   notices with reasonable efficiency. + +   * Compare 1 (take) 0 1\`A' with 1 (take) 0 1\2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Braden                                                          [Page 4] + +RFC 468                   FTP Data Compression                March 1973 + + +APPENDIX A: HASP MULTILEAVING + +   HASP (Houston Automatic Spooling Program) is a subsystem which +   essentially runs within OS/360 as a job but takes over the batch +   processing management functions from the operating system.  That is, +   HASP handles spooling of card input and printer and punch output, +   queueing and scheduling of job execution, and the operator control +   interface.  It is a tightly-written and efficient system for running +   a large and varied job load through a large-scale machine.  The name +   results from the historical fact that HASP was originally by a local +   IBM group for one particular customer, NASA Houston. + +   HASP has always been an anomaly in the IBM scheme of things.  The +   system was written around 1965 by two programmers; the HASP group has +   probably averaged three programmers during most of its life.  The +   leader of the group has been "Mr. HASP", Tom Simpson.  The HASP +   system spread rapidly through (more or less) underground channels to +   many of the medium and large scale 360's.  At least once, only +   intense customer pressure prevented IBM from killing the HASP effort. +   HASP generated an astonishing emotional mystique among its users. +   The HASP sessions at SHARE Meetings were reminiscent of revival +   meetings.  For years every SHARE Meeting has included HASP song +   sessions around the piano during the nightly open bar.  HASP forms a +   fascinating chapter in the history of IBM's large machine business. + +   The core concepts in HASP are pseudo-devices, and the general +   technique of intercepting supervisor calls to augment operating +   system functions without changing the operating system itself.  A +   generation of OS/360 system programmers learned these techniques from +   HASP.  (These important techniques are hardly ever described in the +   literature, and "practical" programmers don't read the literature +   anyway). + +   When HASP starts up (in supervisor state), it overlays an instruction +   in the I/O Supervisor with a branch to its own code.  A user program +   is written as if it were doing real I/O to card readers and printers. +   HASP intercepts and interprets these I/O operations to handle job I/O +   in a manner transparent to OS/360.  It similarly intercepts and +   interprets operator console I/O. + +   HASP includes batch remote job entry using binary synchronous +   communication.  The HASP communication protocol and message formats +   use a scheme developed by Simpson's group called "Multileaving +   Protocol".  The HASP rje system, by far the best rje package IBM has +   produced, finally replaced two competitive IBM packages and has +   effectively become the IBM standard for rje.  CCN's RJS system not +   only adopted the Multileaving Protocol but essentially copied its +   binary synchronous communication line handler directly form HASP. + + + +Braden                                                          [Page 5] + +RFC 468                   FTP Data Compression                March 1973 + + +   The Multileaving Protocol is described in the HASP manual(1) as the +   "fully synchronized, pseudo-simultaneous, bidirectional transmission +   of a variable number of data streams between two or more computers +   using binary synchronous communications facilities".  It allows a +   remote batch terminal to operate a variable number of card readers +   and printers simultaneously at different speeds over one +   communication line.  It is not surprising that HASP Multileaving +   contains in miniature many of the features of IMP-IMP Protocol and a +   little host-host protocol.  Specifically, Multileaving includes the +   following general features: + +      (1) "Conversational" transmission line protocol using transparency +          (DLE STX, etc.). + +      (2) "Strong" error control and retransmission using a 16-bit CRC +          and a modulo-16 block sequence number. + +      (3) Flow control for multiple streams in both directions.  This +          includes the interchanging of matching control records +          ("RFC's") to open a stream, and a set of flow control bits in +          each block.  Each flow control bit is logically equivalent to +          an ALLOcate command for one "message" (buffer) for a +          particular stream. + +      (4) Optional Special Control Information for remote devices.  This +          includes printer carriage control, switching card reader +          hoppers, etc. + +      (5) Multiplexing ("multileaving") multiple streams into a single +          block for transmission. + +      (6) Marking end of file and ends of records within each stream. + +      (7) Compressing transmitted text by encoding repeated blanks and +          repeated single characters. + + + + + + + + + + + + + + + + +Braden                                                          [Page 6] + +RFC 468                   FTP Data Compression                March 1973 + + +   Finally, we have reached the (only) aspect of HASP relevant to FTP: +   its compression scheme.  HASP uses the following encoding: + +                       8 +                  +---------+ +   End of Record: | 0 ... 0 | +                  +---------+ +                    2     6             8              8 +                  +---+---------+   +-------+     +--------+ +   Data String:   |1 1|     N   |   |    d  | ... |   d    | +                  |   |         |   |     1 |     |    N   | +                  +---+---------+   +-------+     +--------+ +                         3      5 +                       +---+--------+ +   N Duplicate Blanks  |100|     N  | +                       +---+--------+ +                               3       5           8 +                             +---+---------+  +---------+ +   N Replicated Characters D |101|    N    |  |    D    | +                             +---+---------+  +---------+ + +   HASP is concerned only with 8-bit bytes.  However, there is a +   provision (which was never implemented) in the Multileaving Protocol +   to set the unit of the counts N as 1 byte, 2 bytes, or 4 bytes. + +   Reference: + +   (1) HASP II System Manual, IBM Corporation (February 26, 1971) + + + + + + +         [ This RFC was put into machine readable form for entry ] +         [ into the online RFC archives by Via Genie         4/00] + + + + + + + + + + + + + + + +Braden                                                          [Page 7] + |