1 files changed, 1627 insertions, 0 deletions
diff --git a/doc/rfc/rfc3284.txt b/doc/rfc/rfc3284.txt
new file mode 100644
index 0000000..69c906d
--- /dev/null
+++ b/doc/rfc/rfc3284.txt
@@ -0,0 +1,1627 @@
+
+
+
+
+
+
+Network Working Group                                            D. Korn
+Request for Comments: 3284                                     AT&T Labs
+Category: Standards Track                                   J. MacDonald
+                                                             UC Berkeley
+                                                                J. Mogul
+                                                 Hewlett-Packard Company
+                                                                   K. Vo
+                                                               AT&T Labs
+                                                               June 2002
+
+
+      The VCDIFF Generic Differencing and Compression Data Format
+
+Status of this Memo
+
+   This document specifies an Internet standards track protocol for the
+   Internet community, and requests discussion and suggestions for
+   improvements.  Please refer to the current edition of the "Internet
+   Official Protocol Standards" (STD 1) for the standardization state
+   and status of this protocol.  Distribution of this memo is unlimited.
+
+Copyright Notice
+
+   Copyright (C) The Internet Society (2002).  All Rights Reserved.
+
+Abstract
+
+   This memo describes VCDIFF, a general, efficient and portable data
+   format suitable for encoding compressed and/or differencing data so
+   that they can be easily transported among computers.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                     [Page 1]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+Table of Contents
+
+    1.  Executive Summary ...........................................  2
+    2.  Conventions .................................................  4
+    3.  Delta Instructions ..........................................  5
+    4.  Delta File Organization .....................................  6
+    5.  Delta Instruction Encoding .................................. 12
+    6.  Decoding a Target Window .................................... 20
+    7.  Application-Defined Code Tables ............................. 21
+    8.  Performance ................................................. 22
+    9.  Further Issues .............................................. 24
+   10.  Summary ..................................................... 25
+   11.  Acknowledgements ............................................ 25
+   12.  Security Considerations ..................................... 25
+   13.  Source Code Availability .................................... 25
+   14.  Intellectual Property Rights ................................ 26
+   15.  IANA Considerations ......................................... 26
+   16.  References .................................................. 26
+   17.  Authors' Addresses .......................................... 28
+   18.  Full Copyright Statement .................................... 29
+
+1.  Executive Summary
+
+   Compression and differencing techniques can greatly improve storage
+   and transmission of files and file versions.  Since files are often
+   transported across machines with distinct architectures and
+   performance characteristics, such data should be encoded in a form
+   that is portable and can be decoded with little or no knowledge of
+   the encoders.  This document describes Vcdiff, a compact portable
+   encoding format designed for these purposes.
+
+   Data differencing is the process of computing a compact and
+   invertible encoding of a "target file" given a "source file".  Data
+   compression is similar, but without the use of source data.  The UNIX
+   utilities diff, compress, and gzip are well-known examples of data
+   differencing and compression tools.  For data differencing, the
+   computed encoding is called a "delta file", and for data compression,
+   it is called a "compressed file".  Delta and compressed files are
+   good for storage and transmission as they are often smaller than the
+   originals.
+
+   Data differencing and data compression are traditionally treated as
+   distinct types of data processing.  However, as shown in the Vdelta
+   technique by Korn and Vo [1], compression can be thought of as a
+   special case of differencing in which the source data is empty.  The
+   basic idea is to unify the string parsing scheme used in the Lempel-
+   Ziv'77 (LZ'77) style compressors [2] and the block-move technique of
+   Tichy [3].  Loosely speaking, this works as follows:
+
+
+
+Korn, et. al.               Standards Track                     [Page 2]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+      a. Concatenate source and target data.
+      b. Parse the data from left to right as in LZ'77 but make sure
+         that a parsed segment starts the target data.
+      c. Start to output when reaching target data.
+
+   Parsing is based on string matching algorithms, such as suffix trees
+   [4] or hashing with different time and space performance
+   characteristics.  Vdelta uses a fast string matching algorithm that
+   requires less memory than other techniques [5,6].  However, even with
+   this algorithm, the memory requirement can still be prohibitive for
+   large files.  A common way to deal with memory limitation is to
+   partition an input file into chunks called "windows" and process them
+   separately.  Here, except for unpublished work by Vo, little has been
+   done on designing effective windowing schemes.  Current techniques,
+   including Vdelta, simply use source and target windows with
+   corresponding addresses across source and target files.
+
+   String matching and windowing algorithms have great influence on the
+   compression rate of delta and compressed files.  However, it is
+   desirable to have a portable encoding format that is independent of
+   such algorithms.  This enables the construction of client-server
+   applications in which a server may serve clients with unknown
+   computing characteristics.  Unfortunately, all current differencing
+   and compressing tools, including Vdelta, fall short in this respect.
+   Their storage formats are closely intertwined with the implemented
+   string matching and/or windowing algorithms.
+
+   The encoding format Vcdiff proposed here addresses the above issues.
+   Vcdiff achieves the characteristics below:
+
+      Output compactness:
+         The basic encoding format compactly represents compressed or
+         delta files.  Applications can further extend the basic
+         encoding format with "secondary encoders" to achieve more
+         compression.
+
+      Data portability:
+         The basic encoding format is free from machine byte order and
+         word size issues.  This allows data to be encoded on one
+         machine and decoded on a different machine with different
+         architecture.
+
+      Algorithm genericity:
+         The decoding algorithm is independent from string matching and
+         windowing algorithms.  This allows competition among
+         implementations of the encoder while keeping the same decoder.
+
+
+
+
+
+Korn, et. al.               Standards Track                     [Page 3]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+      Decoding efficiency:
+         Except for secondary encoder issues, the decoding algorithm
+         runs in time proportionate to the size of the target file and
+         uses space proportionate to the maximal window size.  Vcdiff
+         differs from more conventional compressors in that it uses only
+         byte-aligned data, thus avoiding bit-level operations, which
+         improves decoding speed at the slight cost of compression
+         efficiency.
+
+   The combined differencing and compression method is called "delta
+   compression" [14].  As this way of data processing treats compression
+   as a special case of differencing, we shall use the term "delta file"
+   to indicate the compressed output for both cases.
+
+2. Conventions
+
+   The basic data unit is a byte.  For portability, Vcdiff shall limit a
+   byte to its lower eight bits even on machines with larger bytes.  The
+   bits in a byte are ordered from right to left so that the least
+   significant bit (LSB) has value 1, and the most significant bit
+   (MSB), has value 128.
+
+   For purposes of exposition in this document, we adopt the convention
+   that the LSB is numbered 0, and the MSB is numbered 7.  Bit numbers
+   never appear in the encoded format itself.
+
+   Vcdiff encodes unsigned integer values using a portable, variable-
+   sized format (originally introduced in the Sfio library [7]).  This
+   encoding treats an integer as a number in base 128.  Then, each digit
+   in this representation is encoded in the lower seven bits of a byte.
+   Except for the least significant byte, other bytes have their most
+   significant bit turned on to indicate that there are still more
+   digits in the encoding.  The two key properties of this integer
+   encoding that are beneficial to a data compression format are:
+
+      a. The encoding is portable among systems using 8-bit bytes, and
+      b. Small values are encoded compactly.
+
+   For example, consider the value 123456789, which can be represented
+   with four 7-bit digits whose values are 58, 111, 26, 21 in order from
+   most to least significant.  Below is the 8-bit byte encoding of these
+   digits.  Note that the MSBs of 58, 111 and 26 are on.
+
+              +-------------------------------------------+
+              | 10111010 | 11101111 | 10011010 | 00010101 |
+              +-------------------------------------------+
+                MSB+58     MSB+111    MSB+26     0+21
+
+
+
+
+Korn, et. al.               Standards Track                     [Page 4]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   Henceforth, the terms "byte" and "integer" will refer to a byte and
+   an unsigned integer as described.
+
+   Algorithms in the C language are occasionally exhibited to clarify
+   the descriptions.  Such C code is meant for clarification only, and
+   is not part of the actual specification of the Vcdiff format.
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in BCP 14, RFC 2119 [12].
+
+3.  Delta Instructions
+
+   A large target file is partitioned into non-overlapping sections
+   called "target windows".  These target windows are processed
+   separately and sequentially based on their order in the target file.
+
+   A target window T, of length t, may be compared against some source
+   data segment S, of length s.  By construction, this source data
+   segment S comes either from the source file, if one is used, or from
+   a part of the target file earlier than T.  In this way, during
+   decoding, S is completely known when T is being decoded.
+
+   The choices of T, t, S and s are made by some window selection
+   algorithm, which can greatly affect the size of the encoding.
+   However, as seen later, these choices are encoded so that no
+   knowledge of the window selection algorithm is needed during
+   decoding.
+
+   Assume that S[j] represents the jth byte in S, and T[k] represents
+   the kth byte in T.  Then, for the delta instructions, we treat the
+   data windows S and T as substrings of a superstring U, formed by
+   concatenating them like this:
+
+         S[0]S[1]...S[s-1]T[0]T[1]...T[t-1]
+
+   The "address" of a byte in S or T is referred to by its location in
+   U.  For example, the address of T[k] is s+k.
+
+   The instructions to encode and direct the reconstruction of a target
+   window are called delta instructions.  There are three types:
+
+      ADD:  This instruction has two arguments, a size x and a sequence
+            of x bytes to be copied.
+      COPY: This instruction has two arguments, a size x and an address
+            p in the string U.  The arguments specify the substring of U
+            that must be copied.  We shall assert that such a substring
+            must be entirely contained in either S or T.
+
+
+
+Korn, et. al.               Standards Track                     [Page 5]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+      RUN:  This instruction has two arguments, a size x and a byte b,
+            that will be repeated x times.
+
+   Below are example source and target windows and the delta
+   instructions that encode the target window in terms of the source
+   window.
+
+         a b c d e f g h i j k l m n o p
+         a b c d w x y z e f g h e f g h e f g h e f g h z z z z
+
+         COPY  4, 0
+         ADD   4, w x y z
+         COPY  4, 4
+         COPY 12, 24
+         RUN   4, z
+
+   Thus, the first letter 'a' in the target window is at location 16 in
+   the superstring.  Note that the fourth instruction, "COPY 12, 24",
+   copies data from T itself since address 24 is position 8 in T.  This
+   instruction also shows that it is fine to overlap the data to be
+   copied with the data being copied from, as long as the latter starts
+   earlier.  This enables efficient encoding of periodic sequences,
+   i.e., sequences with regularly repeated subsequences.  The RUN
+   instruction is a compact way to encode a sequence repeating the same
+   byte even though such a sequence can be thought of as a periodic
+   sequence with period 1.
+
+   To reconstruct the target window, one simply processes one delta
+   instruction at a time and copies the data, either from the source
+   window or the target window being reconstructed, based on the type of
+   the instruction and the associated address, if any.
+
+4.  Delta File Organization
+
+   A Vcdiff delta file starts with a Header section followed by a
+   sequence of Window sections.  The Header section includes magic bytes
+   to identify the file type, and information concerning data processing
+   beyond the basic encoding format.  The Window sections encode the
+   target windows.
+
+   Below is the overall organization of a delta file.  The indented
+   items refine the ones immediately above them.  An item in square
+   brackets may or may not be present in the file depending on the
+   information encoded in the Indicator byte above it.
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                     [Page 6]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+      Header
+          Header1                                  - byte
+          Header2                                  - byte
+          Header3                                  - byte
+          Header4                                  - byte
+          Hdr_Indicator                            - byte
+          [Secondary compressor ID]                - byte
+          [Length of code table data]              - integer
+          [Code table data]
+              Size of near cache                   - byte
+              Size of same cache                   - byte
+              Compressed code table data
+      Window1
+          Win_Indicator                            - byte
+          [Source segment size]                    - integer
+          [Source segment position]                - integer
+          The delta encoding of the target window
+              Length of the delta encoding         - integer
+              The delta encoding
+                  Size of the target window        - integer
+                  Delta_Indicator                  - byte
+                  Length of data for ADDs and RUNs - integer
+                  Length of instructions and sizes - integer
+                  Length of addresses for COPYs    - integer
+                  Data section for ADDs and RUNs   - array of bytes
+                  Instructions and sizes section   - array of bytes
+                  Addresses section for COPYs      - array of bytes
+      Window2
+      ...
+
+4.1 The Header Section
+
+   Each delta file starts with a header section organized as below.
+   Note the convention that square-brackets enclose optional items.
+
+         Header1                                  - byte = 0xD6
+         Header2                                  - byte = 0xC3
+         Header3                                  - byte = 0xC4
+         Header4                                  - byte
+         Hdr_Indicator                            - byte
+         [Secondary compressor ID]                - byte
+         [Length of code table data]              - integer
+         [Code table data]
+
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                     [Page 7]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   The first three Header bytes are the ASCII characters 'V', 'C' and
+   'D' with their most significant bits turned on (in hexadecimal, the
+   values are 0xD6, 0xC3, and 0xC4).  The fourth Header byte is
+   currently set to zero.  In the future, it might be used to indicate
+   the version of Vcdiff.
+
+   The Hdr_Indicator byte shows if there is any initialization data
+   required to aid in the reconstruction of data in the Window sections.
+   This byte MAY have non-zero values for either, both, or neither of
+   the two bits VCD_DECOMPRESS and VCD_CODETABLE below:
+
+       7 6 5 4 3 2 1 0
+      +-+-+-+-+-+-+-+-+
+      | | | | | | | | |
+      +-+-+-+-+-+-+-+-+
+                   ^ ^
+                   | |
+                   | +-- VCD_DECOMPRESS
+                   +---- VCD_CODETABLE
+
+   If bit 0 (VCD_DECOMPRESS) is non-zero, this indicates that a
+   secondary compressor may have been used to further compress certain
+   parts of the delta encoding data as described in Sections 4.3 and 6.
+   In that case, the ID of the secondary compressor is given next.  If
+   this bit is zero, the compressor ID byte is not included.
+
+   If bit 1 (VCD_CODETABLE) is non-zero, this indicates that an
+   application-defined code table is to be used for decoding the delta
+   instructions.  This table itself is compressed.  The length of the
+   data comprising this compressed code table and the data follow next.
+   Section 7 discusses application-defined code tables.  If this bit is
+   zero, the code table data length and the code table data are not
+   included.
+
+   If both bits are set, then the compressor ID byte is included before
+   the code table data length and the code table data.
+
+4.2 The Format of a Window Section
+
+   Each Window section is organized as follows:
+
+      Win_Indicator                            - byte
+      [Source segment length]                  - integer
+      [Source segment position]                - integer
+      The delta encoding of the target window
+
+
+
+
+
+
+Korn, et. al.               Standards Track                     [Page 8]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   Below are the details of the various items:
+
+      Win_Indicator:
+          This byte is a set of bits, as shown:
+
+          7 6 5 4 3 2 1 0
+         +-+-+-+-+-+-+-+-+
+         | | | | | | | | |
+         +-+-+-+-+-+-+-+-+
+                      ^ ^
+                      | |
+                      | +-- VCD_SOURCE
+                      +---- VCD_TARGET
+
+         If bit 0 (VCD_SOURCE) is non-zero, this indicates that a
+         segment of data from the "source" file was used as the
+         corresponding source window of data to encode the target
+         window.  The decoder will use this same source data segment to
+         decode the target window.
+
+         If bit 1 (VCD_TARGET) is non-zero, this indicates that a
+         segment of data from the "target" file was used as the
+         corresponding source window of data to encode the target
+         window.  As above, this same source data segment is used to
+         decode the target window.
+
+         The Win_Indicator byte MUST NOT have more than one of the bits
+         set (non-zero).  It MAY have none of these bits set.
+
+         If one of these bits is set, the byte is followed by two
+         integers to indicate respectively, the length and position of
+         the source data segment in the relevant file.  If the indicator
+         byte is zero, the target window was compressed by itself
+         without comparing against another data segment, and these two
+         integers are not included.
+
+      The delta encoding of the target window:
+
+         This contains the delta encoding of the target window, either
+         in terms of the source data segment (i.e., VCD_SOURCE or
+         VCD_TARGET was set) or by itself if no source window is
+         specified.  This data format is discussed next.
+
+
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                     [Page 9]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+4.3 The Delta Encoding of a Target Window
+
+   The delta encoding of a target window is organized as follows:
+
+      Length of the delta encoding            - integer
+      The delta encoding
+          Length of the target window         - integer
+          Delta_Indicator                     - byte
+          Length of data for ADDs and RUNs    - integer
+          Length of instructions section      - integer
+          Length of addresses for COPYs       - integer
+          Data section for ADDs and RUNs      - array of bytes
+          Instructions and sizes section      - array of bytes
+          Addresses section for COPYs         - array of bytes
+
+         Length of the delta encoding:
+            This integer gives the total number of remaining bytes that
+            comprise the data of the delta encoding for this target
+            window.
+
+         The delta encoding:
+            This contains the data representing the delta encoding which
+            is described next.
+
+         Length of the target window:
+            This integer indicates the actual size of the target window
+            after decompression.  A decoder can use this value to
+            allocate memory to store the uncompressed data.
+
+         Delta_Indicator:
+            This byte is a set of bits, as shown:
+
+          7 6 5 4 3 2 1 0
+         +-+-+-+-+-+-+-+-+
+         | | | | | | | | |
+         +-+-+-+-+-+-+-+-+
+                    ^ ^ ^
+                    | | |
+                    | | +-- VCD_DATACOMP
+                    | +---- VCD_INSTCOMP
+                    +------ VCD_ADDRCOMP
+
+              VCD_DATACOMP:   bit value 1.
+              VCD_INSTCOMP:   bit value 2.
+              VCD_ADDRCOMP:   bit value 4.
+
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 10]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+         As discussed, the delta encoding consists of COPY, ADD and RUN
+         instructions.  The ADD and RUN instructions have accompanying
+         unmatched data (that is, data that does not specifically match
+         any data in the source window or in some earlier part of the
+         target window) and the COPY instructions have addresses of
+         where the matches occur.  OPTIONALLY, these types of data MAY
+         be further compressed using a secondary compressor.  Thus,
+         Vcdiff separates the encoding of the delta instructions into
+         three parts:
+
+            a. The unmatched data in the ADD and RUN instructions,
+            b. The delta instructions and accompanying sizes, and
+            c. The addresses of the COPY instructions.
+
+         If the bit VCD_DECOMPRESS (Section 4.1) was on, each of these
+         sections may have been compressed using the specified secondary
+         compressor.  The bit positions 0 (VCD_DATACOMP), 1
+         (VCD_INSTCOMP), and 2 (VCD_ADDRCOMP) respectively indicate, if
+         non-zero, that the corresponding parts are compressed.  Then,
+         these parts MUST be decompressed before decoding the delta
+         instructions.
+
+      Length of data for ADDs and RUNs:
+         This is the length (in bytes) of the section of data storing
+         the unmatched data accompanying the ADD and RUN instructions.
+
+      Length of instructions section:
+         This is the length (in bytes) of the delta instructions and
+         accompanying sizes.
+
+      Length of addresses for COPYs:
+         This is the length (in bytes) of the section storing the
+         addresses of the COPY instructions.
+
+      Data section for ADDs and RUNs:
+         This sequence of bytes encodes the unmatched data for the ADD
+         and RUN instructions.
+
+      Instructions and sizes section:
+         This sequence of bytes encodes the instructions and their
+         sizes.
+
+      Addresses section for COPYs:
+         This sequence of bytes encodes the addresses of the COPY
+         instructions.
+
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 11]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+5. Delta Instruction Encoding
+
+   The delta instructions described in Section 3 represent the results
+   of string matching.  For many data differencing applications in which
+   the changes between source and target data are small, any
+   straightforward representation of these instructions would be
+   adequate.  However, for applications including differencing of binary
+   files or data compression, it is important to encode these
+   instructions well to achieve good compression rates.  The keys to
+   this achievement is to efficiently encode the addresses of COPY
+   instructions and the sizes of all delta instructions.
+
+5.1 Address Encoding Modes of COPY Instructions
+
+   Addresses of COPY instructions are locations of matches and often
+   occur close by or even exactly equal to one another.  This is because
+   data in local regions are often replicated with minor changes.  In
+   turn, this means that coding a newly matched address against some
+   recently matched addresses can be beneficial.  To take advantage of
+   this phenomenon and encode addresses of COPY instructions more
+   efficiently, the Vcdiff data format supports the use of two different
+   types of address caches.  Both the encoder and decoder maintain these
+   caches, so that decoder's caches remain synchronized with the
+   encoder's caches.
+
+   a. A "near" cache is an array with "s_near" slots, each containing an
+      address used for encoding addresses nearby to previously encoded
+      addresses (in the positive direction only).  The near cache also
+      maintains a "next_slot" index to the near cache.  New entries to
+      the near cache are always inserted in the next_slot index, which
+      maintains a circular buffer of the s_near most recent addresses.
+
+   b. A "same" cache is an array with "s_same", with a multiple of 256
+      slots, each containing an address.  The same cache maintains a
+      hash table of recent addresses used for repeated encoding of the
+      exact same address.
+
+   By default, the parameters s_near and s_same are respectively set to
+   4 and 3.  An encoder MAY modify these values, but then it MUST encode
+   the new values in the encoding itself, as discussed in Section 7, so
+   that the decoder can properly set up its own caches.
+
+   At the start of processing a target window, an implementation
+   (encoder or decoder) initializes all of the slots in both caches to
+   zero.  The next_slot pointer of the near cache is set to point to
+   slot zero.
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 12]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   Each time a COPY instruction is processed by the encoder or decoder,
+   the implementation's caches are updated as follows, where "addr" is
+   the address in the COPY instruction.
+
+   a. The slot in the near cache referenced by the next_slot index is
+      set to addr.  The next_slot index is then incremented modulo
+      s_near.
+
+   b. The slot in the same cache whose index is addr%(s_same*256) is set
+      to addr.  [We use the C notations of % for modulo and * for
+      multiplication.]
+
+5.2 Example code for maintaining caches
+
+   To make clear the above description, below are examples of cache data
+   structures and algorithms to initialize and update them:
+
+   typedef struct _cache_s
+   {
+       int*  near;      /* array of size s_near        */
+       int   s_near;
+       int   next_slot; /* the circular index for near */
+       int*  same;      /* array of size s_same*256    */
+       int   s_same;
+   } Cache_t;
+
+   cache_init(Cache_t* ka)
+   {
+       int   i;
+
+       ka->next_slot = 0;
+       for(i = 0; i < ka->s_near; ++i)
+            ka->near[i] = 0;
+
+       for(i = 0; i < ka->s_same*256; ++i)
+            ka->same[i] = 0;
+   }
+
+   cache_update(Cache_t* ka, int addr)
+   {
+       if(ka->s_near > 0)
+       {   ka->near[ka->next_slot] = addr;
+           ka->next_slot = (ka->next_slot + 1) % ka->s_near;
+       }
+
+       if(ka->s_same > 0)
+           ka->same[addr % (ka->s_same*256)] = addr;
+   }
+
+
+
+Korn, et. al.               Standards Track                    [Page 13]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+5.3 Encoding of COPY instruction addresses
+
+   The address of a COPY instruction is encoded using different modes,
+   depending on the type of cached address used, if any.
+
+   Let "addr" be the address of a COPY instruction to be decoded and
+   "here" be the current location in the target data (i.e., the start of
+   the data about to be encoded or decoded).  Let near[j] be the jth
+   element in the near cache, and same[k] be the kth element in the same
+   cache.  Below are the possible address modes:
+
+      VCD_SELF: This mode has value 0.  The address was encoded by
+         itself as an integer.
+
+      VCD_HERE: This mode has value 1.  The address was encoded as the
+         integer value "here - addr".
+
+      Near modes: The "near modes" are in the range [2,s_near+1].  Let m
+         be the mode of the address encoding.  The address was encoded
+         as the integer value "addr - near[m-2]".
+
+      Same modes: The "same modes" are in the range
+         [s_near+2,s_near+s_same+1].  Let m be the mode of the encoding.
+         The address was encoded as a single byte b such that "addr ==
+         same[(m - (s_near+2))*256 + b]".
+
+5.4 Example code for encoding and decoding of COPY instruction addresses
+
+   We show example algorithms below to demonstrate the use of address
+   modes more clearly.  The encoder has the freedom to choose address
+   modes, the sample addr_encode() algorithm merely shows one way of
+   picking the address mode.  The decoding algorithm addr_decode() will
+   uniquely decode addresses, regardless of the encoder's algorithm
+   choice.
+
+   Note that the address caches are updated immediately after an address
+   is encoded or decoded.  In this way, the decoder is always
+   synchronized with the encoder.
+
+
+
+
+
+
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 14]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   int addr_encode(Cache_t* ka, int addr, int here, int* mode)
+   {
+       int  i, d, bestd, bestm;
+
+       /* Attempt to find the address mode that yields the
+        * smallest integer value for "d", the encoded address
+        * value, thereby minimizing the encoded size of the
+        * address. */
+
+       bestd = addr; bestm = VCD_SELF;      /* VCD_SELF == 0 */
+
+       if((d = here-addr) < bestd)
+           { bestd = d; bestm = VCD_HERE; } /* VCD_HERE == 1 */
+
+       for(i = 0; i < ka->s_near; ++i)
+           if((d = addr - ka->near[i]) >= 0 && d < bestd)
+               { bestd = d; bestm = i+2; }
+
+       if(ka->s_same > 0 && ka->same[d = addr%(ka->s_same*256)] == addr)
+           { bestd = d%256; bestm = ka->s_near + 2 + d/256; }
+
+       cache_update(ka,addr);
+
+       *mode = bestm; /* this returns the address encoding mode */
+       return  bestd; /* this returns the encoded address       */
+   }
+
+   Note that the addr_encode() algorithm chooses the best address mode
+   using a local optimization, but that may not lead to the best
+   encoding efficiency because different modes lead to different
+   instruction encodings, as described below.
+
+   The functions addrint() and addrbyte() used in addr_decode(), obtain
+   from the "Addresses section for COPYs" (Section 4.3), an integer or a
+   byte, respectively.  These utilities will not be described here.  We
+   simply recall that an integer is represented as a compact variable-
+   sized string of bytes, as described in Section 2 (i.e., base 128).
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 15]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   int addr_decode(Cache_t* ka, int here, int mode)
+   {   int  addr, m;
+
+       if(mode == VCD_SELF)
+            addr = addrint();
+       else if(mode == VCD_HERE)
+            addr = here - addrint();
+       else if((m = mode - 2) >= 0 && m < ka->s_near) /* near cache */
+            addr = ka->near[m] + addrint();
+       else /* same cache */
+       {    m = mode - (2 + ka->s_near);
+            addr = ka->same[m*256 + addrbyte()];
+       }
+
+       cache_update(ka, addr);
+
+       return addr;
+   }
+
+5.4 Instruction Codes
+
+   Matches are often short in lengths and separated by small amounts of
+   unmatched data.  That is, the lengths of COPY and ADD instructions
+   are often small.  This is particularly true of binary data such as
+   executable files or structured data, such as HTML or XML.  In such
+   cases, compression can be improved by combining the encoding of the
+   sizes and the instruction types, as well as combining the encoding of
+   adjacent delta instructions with sufficiently small data sizes.
+   Effective choices of when to perform such combinations depend on many
+   factors including the data being processed and the string matching
+   algorithm in use.  For example, if many COPY instructions have the
+   same data sizes, it may be worthwhile to encode these instructions
+   more compactly than others.
+
+   The Vcdiff data format is designed so that a decoder does not need to
+   be aware of the choices made in encoding algorithms.  This is
+   achieved with the notion of an "instruction code table", containing
+   256 entries.  Each entry defines, either a single delta instruction
+   or a pair of instructions that have been combined.  Note that the
+   code table itself only exists in main memory, not in the delta file
+   (unless using an application-defined code table, described in Section
+   7).  The encoded data simply includes the index of each instruction
+   and, since there are only 256 indices, each index can be represented
+   as a single byte.
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 16]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   Each instruction code entry contains six fields, each of which is a
+   single byte with an unsigned value:
+
+          +-----------------------------------------------+
+          | inst1 | size1 | mode1 | inst2 | size2 | mode2 |
+          +-----------------------------------------------+
+
+   Each triple (inst,size,mode) defines a delta instruction.  The
+   meanings of these fields are as follows:
+
+      inst: An "inst" field can have one of the four values: NOOP (0),
+            ADD (1), RUN (2) or COPY (3) to indicate the instruction
+            types.  NOOP means that no instruction is specified.  In
+            this case, both the corresponding size and mode fields will
+            be zero.
+
+      size: A "size" field is zero or positive.  A value zero means that
+            the size associated with the instruction is encoded
+            separately as an integer in the "Instructions and sizes
+            section" (Section 6).  A positive value for "size" defines
+            the actual data size.  Note that since the size is
+            restricted to a byte, the maximum value for any instruction
+            with size implicitly defined in the code table is 255.
+
+      mode: A "mode" field is significant only when the associated delta
+            instruction is a COPY.  It defines the mode used to encode
+            the associated addresses.  For other instructions, this is
+            always zero.
+
+5.6 The Code Table
+
+   Following the discussions on address modes and instruction code
+   tables, we define a "Code Table" to have the data below:
+
+         s_near: the size of the near cache,
+         s_same: the size of the same cache,
+         i_code: the 256-entry instruction code table.
+
+   Vcdiff itself defines a "default code table" in which s_near is 4 and
+   s_same is 3.  Thus, there are 9 address modes for a COPY instruction.
+   The first two are VCD_SELF (0) and VCD_HERE (1).  Modes 2, 3, 4 and 5
+   are for addresses coded against the near cache.  And modes 6, 7  and
+   8, are for addresses coded against the same cache.
+
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 17]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+        TYPE      SIZE     MODE    TYPE     SIZE     MODE     INDEX
+       ---------------------------------------------------------------
+    1.  RUN         0        0     NOOP       0        0        0
+    2.  ADD    0, [1,17]     0     NOOP       0        0      [1,18]
+    3.  COPY   0, [4,18]     0     NOOP       0        0     [19,34]
+    4.  COPY   0, [4,18]     1     NOOP       0        0     [35,50]
+    5.  COPY   0, [4,18]     2     NOOP       0        0     [51,66]
+    6.  COPY   0, [4,18]     3     NOOP       0        0     [67,82]
+    7.  COPY   0, [4,18]     4     NOOP       0        0     [83,98]
+    8.  COPY   0, [4,18]     5     NOOP       0        0     [99,114]
+    9.  COPY   0, [4,18]     6     NOOP       0        0    [115,130]
+   10.  COPY   0, [4,18]     7     NOOP       0        0    [131,146]
+   11.  COPY   0, [4,18]     8     NOOP       0        0    [147,162]
+   12.  ADD       [1,4]      0     COPY     [4,6]      0    [163,174]
+   13.  ADD       [1,4]      0     COPY     [4,6]      1    [175,186]
+   14.  ADD       [1,4]      0     COPY     [4,6]      2    [187,198]
+   15.  ADD       [1,4]      0     COPY     [4,6]      3    [199,210]
+   16.  ADD       [1,4]      0     COPY     [4,6]      4    [211,222]
+   17.  ADD       [1,4]      0     COPY     [4,6]      5    [223,234]
+   18.  ADD       [1,4]      0     COPY       4        6    [235,238]
+   19.  ADD       [1,4]      0     COPY       4        7    [239,242]
+   20.  ADD       [1,4]      0     COPY       4        8    [243,246]
+   21.  COPY        4      [0,8]   ADD        1        0    [247,255]
+       ---------------------------------------------------------------
+
+   The default instruction code table is depicted above, in a compact
+   representation that we use only for descriptive purposes.  See
+   section 7 for the specification of how an instruction code table is
+   represented in the Vcdiff encoding format.  In the depiction, a zero
+   value for size indicates that the size is separately coded.  The mode
+   of non-COPY instructions is represented as 0, even though they are
+   not used.
+
+   In the depiction, each numbered line represents one or more entries
+   in the actual instruction code table (recall that an entry in the
+   instruction code table may represent up to two combined delta
+   instructions.)  The last column ("INDEX") shows which index value, or
+   range of index values, of the entries are covered by that line.  (The
+   notation [i,j] means values from i through j, inclusively.)  The
+   first 6 columns of a line in the depiction, describe the pairs of
+   instructions used for the corresponding index value(s).
+
+   If a line in the depiction includes a column entry using the [i,j]
+   notation, this means that the line is instantiated for each value in
+   the range from i to j, inclusively.  The notation "0, [i,j]" means
+   that the line is instantiated for the value 0 and for each value in
+   the range from i to j, inclusively.
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 18]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   If a line in the depiction includes more than one entry using the
+   [i,j] notation, implying a "nested loop" to convert the line to a
+   range of table entries, the first such [i,j] range specifies the
+   outer loop, and the second specifies the inner loop.
+
+   The below examples should make clear the above description:
+
+   Line 1 shows the single RUN instruction with index 0.  As the size
+   field is 0, this RUN instruction always has its actual size encoded
+   separately.
+
+   Line 2 shows the 18 single ADD instructions.  The ADD instruction
+   with size field 0 (i.e., the actual size is coded separately) has
+   index 1.  ADD instructions with sizes from 1 to 17 use code indices 2
+   to 18 and their sizes are as given (so they will not be separately
+   encoded.)
+
+   Following the single ADD instructions are the single COPY
+   instructions ordered by their address encoding modes.  For example,
+   line 11 shows the COPY instructions with mode 8, i.e., the last of
+   the same cache.  In this case, the COPY instruction with size field 0
+   has index 147.  Again, the actual size of this instruction will be
+   coded separately.
+
+   Lines 12 to 21 show the pairs of instructions that are combined
+   together.  For example, line 12 depicts the 12 entries in which an
+   ADD instruction is combined with an immediately following COPY
+   instruction.  The entries with indices 163, 164, 165 represent the
+   pairs in which the ADD instructions all have size 1, while the COPY
+   instructions have mode 0 (VCD_SELF) and sizes 4, 5 and 6
+   respectively.
+
+   The last line, line 21, shows the eight instruction pairs, where the
+   first instruction is a COPY and the second is an ADD.  In this case,
+   all COPY instructions have size 4 with mode ranging from 0 to 8 and
+   all the ADD instructions have size 1.  Thus, the entry with the
+   largest index 255 combines a COPY instruction of size 4 and mode 8
+   with an ADD instruction of size 1.
+
+   The choice of the minimum size 4 for COPY instructions in the default
+   code table was made from experiments that showed that excluding small
+   matches (less then 4 bytes long) improved the compression rates.
+
+
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 19]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+6. Decoding a Target Window
+
+   Section 4.3 discusses that the delta instructions and associated data
+   are encoded in three arrays of bytes:
+
+         Data section for ADDs and RUNs,
+         Instructions and sizes section, and
+         Addresses section for COPYs.
+
+   Further, these data sections may have been further compressed by some
+   secondary compressor.  Assuming that any such compressed data has
+   been decompressed so that we now have three arrays:
+
+         inst: bytes coding the instructions and sizes.
+         data: unmatched data associated with ADDs and RUNs.
+         addr: bytes coding the addresses of COPYs.
+
+   These arrays are organized as follows:
+
+      inst: a sequence of (index, [size1], [size2]) tuples, where
+            "index" is an index into the instruction code table, and
+            size1 and size2 are integers that MAY or MAY NOT be included
+            in the tuple as follows.  The entry with the given "index"
+            in the instruction code table potentially defines two delta
+            instructions.  If the first delta instruction is not a
+            VCD_NOOP and its size is zero, then size1 MUST be present.
+            Otherwise, size1 MUST be omitted and the size of the
+            instruction (if it is not VCD_NOOP) is as defined in the
+            table.  The presence or absence of size2 is defined
+            similarly with respect to the second delta instruction.
+
+      data: a sequence of data values, encoded as bytes.
+
+      addr: a sequence of address values.  Addresses are normally
+            encoded as integers as described in Section 2 (i.e., base
+            128).  However, since the same cache emits addresses in the
+            range [0,255], same cache addresses are always encoded as a
+            single byte.
+
+   To summarize, each tuple in the "inst" array includes an index to
+   some entry in the instruction code table that determines:
+
+   a. Whether one or two instructions were encoded and their types.
+
+   b. If the instructions have their sizes encoded separately, these
+      sizes will follow, in order, in the tuple.
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 20]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   c. If the instructions have accompanying data, i.e., ADDs or RUNs,
+      their data will be in the array "data".
+
+   d. Similarly, if the instructions are COPYs, the coded addresses are
+      found in the array "addr".
+
+   The decoding procedure simply processes the arrays by reading one
+   code index at a time, looking up the corresponding instruction code
+   entry, then consuming the respective sizes, data and addresses
+   following the directions in this entry.  In other words, the decoder
+   maintains an implicit next-element pointer for each array;
+   "consuming" an instruction tuple, data, or address value implies
+   incrementing the associated pointer.
+
+   For example, if during the processing of the target window, the next
+   unconsumed tuple in the inst array has an index value of 19, then the
+   first instruction is a COPY, whose size is found as the immediately
+   following integer in the inst array.  Since the mode of this COPY
+   instruction is VCD_SELF, the corresponding address is found by
+   consuming the next integer in the addr array.  The data array is left
+   intact.  As the second instruction for code index 19 is a NOOP, this
+   tuple is finished.
+
+7. APPLICATION-DEFINED CODE TABLES
+
+   Although the default code table used in Vcdiff is good for general
+   purpose encoders, there are times when other code tables may perform
+   better.  For example, to code a file with many identical segments of
+   data, it may be advantageous to have a COPY instruction with the
+   specific size of these data segments, so that the instruction can be
+   encoded in a single byte.  Such a special code table MUST then be
+   encoded in the delta file so that the decoder can reconstruct it
+   before decoding the data.
+
+   Vcdiff allows an application-defined code table to be specified in a
+   delta file with the following data:
+
+         Size of near cache            - byte
+         Size of same cache            - byte
+         Compressed code table data
+
+   The "compressed code table data" encodes the delta between the
+   default code table (source) and the new code table (target) in the
+   same manner as described in Section 4.3 for encoding a target window
+   in terms of a source window.  This delta is computed using the
+   following steps:
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 21]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   a. Convert the new instruction code table into a string, "code", of
+      1536 bytes using the below steps in order:
+
+       i. Add in order the 256 bytes representing the types of the first
+          instructions in the instruction pairs.
+      ii. Add in order the 256 bytes representing the types of the
+          second instructions in the instruction pairs.
+     iii. Add in order the 256 bytes representing the sizes of the first
+          instructions in the instruction pairs.
+      iv. Add in order the 256 bytes representing the sizes of the
+          second instructions in the instruction pairs.
+       v. Add in order the 256 bytes representing the modes of the first
+          instructions in the instruction pairs.
+      vi. Add in order the 256 bytes representing the modes of the
+          second instructions in the instruction pairs.
+
+   b. Similarly, convert the default code table into a string "dflt".
+
+   c. Treat the string "code" as a target window and "dflt" as the
+      corresponding source data and apply an encoding algorithm to
+      compute the delta encoding of "code" in terms of "dflt".  This
+      computation MUST use the default code table for encoding the delta
+      instructions.
+
+   The decoder can then reverse the above steps to decode the compressed
+   table data using the method of Section 6, employing the default code
+   table, to generate the new code table.  Note that the decoder does
+   not need to know about the details of the encoding algorithm used in
+   step (c).  It is able to decode the new code table because the Vcdiff
+   format is independent from the choice of encoding algorithm, and
+   because the encoder in step (c) uses the known, default code table.
+
+8. Performance
+
+   The encoding format is compact.  For compression only, using the LZ-
+   77 string parsing strategy and without any secondary compressors, the
+   typical compression rate is better than Unix compress and close to
+   gzip.  For differencing, the data format is better than all known
+   methods in terms of its stated goal, which is primarily decoding
+   speed and encoding efficiency.
+
+   We compare the performance of compress, gzip and Vcdiff using the
+   archives of three versions of the Gnu C compiler, gcc-2.95.1.tar,
+   gcc-2.95.2.tar and gcc-2.95.3.tar.  Gzip was used at its default
+   compression level.  The Vcdiff data were obtained using the
+   Vcodex/Vcdiff software (Section 13).
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 22]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   Below are the different Vcdiff runs:
+
+      Vcdiff: vcdiff is used as a compressor only.
+
+      Vcdiff-d: vcdiff is used as a differencer only.  That is, it only
+         compares target data against source data.  Since the files
+         involved are large, they are broken into windows.  In this
+         case, each target window, starting at some file offset in the
+         target file, is compared against a source window with the same
+         file offset (in the source file).  The source window is also
+         slightly larger than the target window to increase matching
+         opportunities.
+
+      Vcdiff-dc: This is similar to Vcdiff-d, but vcdiff can also
+         compare target data against target data as applicable.  Thus,
+         vcdiff both computes differences and compresses data.  The
+         windowing algorithm is the same as above.  However, the above
+         hint is recinded in this case.
+
+      Vcdiff-dcw: This is similar to Vcdiff-dc but the windowing
+         algorithm uses a content-based heuristic to select a source
+         window that is more likely to match with a given target window.
+         Thus, the source data segment selected for a target window
+         often will not be aligned with the file offsets of this target
+         window.
+
+                       gcc-2.95.1     gcc-2.95.2     gcc-2.95.3
+      ---------------------------------------------------------
+      1. raw size      55,746,560     55,797,760     55,787,520
+      2. compress         -           19,939,390     19,939,453
+      3. gzip             -           12,973,443     12,998,097
+      4. Vcdiff           -           15,358,786     15,371,737
+      5. Vcdiff-d         -              100,971     26,383,849
+      6. Vcdiff-dc        -               97,246     14,461,203
+      7. Vcdiff-dcw       -              256,445      1,248,543
+
+   The above table shows the raw sizes of the tar files and the sizes of
+   the compressed results.  The differencing results in the gcc-2.95.2
+   column were obtained by compressing gcc-2.95.2, given gcc-2.95.1.
+   The same results for the column gcc-2.95.3 were obtained by
+   compressing gcc-2.95.3, given gcc-2.95.2.
+
+   Rows 2, 3 and 4 show that, for compression only, the compression rate
+   from Vcdiff is worse than gzip and better than compress.
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 23]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   The last three rows in the column gcc-2.95.2 show that when two file
+   versions are very similar, differencing can give dramatically good
+   compression rates.  Vcdiff-d and Vcdiff-dc use the same simple window
+   selection method of aligning by file offsets, but Vcdiff-dc also does
+   compression so its output is slightly smaller.  Vcdiff-dcw uses a
+   content-based algorithm to search for source data that likely will
+   match a given target window.  Although it does a good job, the
+   algorithm does not always find the best matches, which in this case,
+   are given by the simple algorithm of Vcdiff-d.  As a result, the
+   output size for Vcdiff-dcw is slightly larger.
+
+   The situation is reversed in the gcc-2.95.3 column.  Here, the files
+   and their contents were sufficiently rearranged or changed between
+   the making of the gcc-2.95.3.tar archive and the gcc-2.95.2 archive
+   so that the simple method of aligning windows by file offsets no
+   longer works.  As a result, Vcdiff-d and Vcdiff-dc do not perform
+   well.  By allowing compression, along with differencing, Vcdiff-dc
+   manages to beat Vcdiff-c, which does compression only.  The content-
+   based window matching algorithm in Vcdiff-dcw is effective in
+   matching the right source and target windows so that Vcdiff-dcw is
+   the overall winner.
+
+9. Further Issues
+
+   This document does not address a few issues:
+
+   Secondary compressors:
+      As discussed in Section 4.3, certain sections in the delta
+      encoding of a window may be further compressed by a secondary
+      compressor.  In our experience, the basic Vcdiff format is
+      adequate for most purposes so that secondary compressors are
+      seldom needed.  In particular, for normal use of data
+      differencing, where the files to be compared have long stretches
+      of matches, much of the gain in compression rate is already
+      achieved by normal string matching.  Thus, the use of secondary
+      compressors is seldom needed in this case.  However, for
+      applications beyond differencing of such nearly identical files,
+      secondary compressors may be needed to achieve maximal compressed
+      results.
+
+      Therefore, we recommend leaving the Vcdiff data format defined as
+      in this document so that the use of secondary compressors can be
+      implemented when they become needed in the future.  The formats of
+      the compressed data via such compressors or any compressors that
+      may be defined in the future are left open to their
+      implementations.  These could include Huffman encoding, arithmetic
+      encoding, and splay tree encoding [8,9].
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 24]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   Large file system vs. small file system:
+      As discussed in Section 4, a target window in a large file may be
+      compared against some source window in another file or in the same
+      file (from some earlier part).  In that case, the file offset of
+      the source window is specified as a variable-sized integer in the
+      delta encoding.  There is a possibility that the encoding was
+      computed on a system supporting much larger files than in a system
+      where the data may be decoded (e.g., 64-bit file systems vs. 32-
+      bit file systems).  In that case, some target data may not be
+      recoverable.  This problem could afflict any compression format,
+      and ought to be resolved with a generic negotiation mechanism in
+      the appropriate protocol(s).
+
+10.  Summary
+
+   We have described Vcdiff, a general and portable encoding format for
+   compression and differencing.  The format is good in that it allows
+   implementing a decoder without knowledge of the encoders.  Further,
+   ignoring the use of secondary compressors not defined within the
+   format, the decoding algorithms run in linear time and requires
+   working space proportional to window size.
+
+11. Acknowledgements
+
+   Thanks are due to Balachander Krishnamurthy, Jeff Mogul and Arthur
+   Van Hoff who provided much encouragement to publicize Vcdiff.  In
+   particular, Jeff helped in clarifying the description of the data
+   format presented here.
+
+12. Security Considerations
+
+   Vcdiff only provides a format to encode compressed and differenced
+   data.  It does not address any issues concerning how such data are,
+   in fact, stored in a given file system or the run-time memory of a
+   computer system.  Therefore, we do not anticipate any security issues
+   with respect to Vcdiff.
+
+13. Source Code Availability
+
+   Vcdiff is implemented as a data transforming method in Phong Vo's
+   Vcodex library.  AT&T Corp. has made the source code for Vcodex
+   available for anyone to use to transmit data via HTTP/1.1 Delta
+   Encoding [10,11].  The source code and according license is
+   accessible at the below URL:
+
+      http://www.research.att.com/sw/tools
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 25]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+14. Intellectual Property Rights
+
+   The IETF has been notified of intellectual property rights claimed in
+   regard to some or all of the specification contained in this
+   document.  For more information consult the online list of claimed
+   rights, at <http://www.ietf.org/ipr.html>.
+
+   The IETF takes no position regarding the validity or scope of any
+   intellectual property or other rights that might be claimed to
+   pertain to the implementation or use of the technology described in
+   this document or the extent to which any license under such rights
+   might or might not be available; neither does it represent that it
+   has made any effort to identify any such rights.  Information on the
+   IETF's procedures with respect to rights in standards-track and
+   standards-related documentation can be found in BCP 11.  Copies of
+   claims of rights made available for publication and any assurances of
+   licenses to be made available, or the result of an attempt made to
+   obtain a general license or permission for the use of such
+   proprietary rights by implementors or users of this specification can
+   be obtained from the IETF Secretariat.
+
+15. IANA Considerations
+
+   The Internet Assigned Numbers Authority (IANA) administers the number
+   space for Secondary Compressor ID values.  Values and their meaning
+   must be documented in an RFC or other peer-reviewed, permanent, and
+   readily available reference, in sufficient detail so that
+   interoperability between independent implementations is possible.
+   Subject to these constraints, name assignments are First Come, First
+   Served - see RFC 2434 [13].  Legal ID values are in the range 1..255.
+
+   This document does not define any values in this number space.
+
+16. References
+
+   [1]  D.G. Korn and K.P. Vo, Vdelta: Differencing and Compression,
+        Practical Reusable Unix Software, Editor B. Krishnamurthy, John
+        Wiley & Sons, Inc., 1995.
+
+   [2]  J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data
+        Compression, IEEE Trans. on Information Theory, 23(3):337-343,
+        1977.
+
+   [3]  W. Tichy, The String-to-String Correction Problem with Block
+        Moves, ACM Transactions on Computer Systems, 2(4):309-321,
+        November 1984.
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 26]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+   [4]  E.M. McCreight, A Space-Economical Suffix Tree Construction
+        Algorithm, Journal of the ACM, 23:262-272, 1976.
+
+   [5]  J.J. Hunt, K.P. Vo, W. Tichy, An Empirical Study of Delta
+        Algorithms, IEEE Software Configuration and Maintenance
+        Workshop, 1996.
+
+   [6]  J.J. Hunt, K.P. Vo, W. Tichy, Delta Algorithms: An Empirical
+        Analysis, ACM Trans. on Software Engineering and Methodology,
+        7:192-214, 1998.
+
+   [7]  D.G. Korn, K.P. Vo, Sfio: A buffered I/O Library, Proc. of the
+        Summer '91 Usenix Conference, 1991.
+
+   [8]  D. W. Jones, Application of Splay Trees to Data Compression,
+        CACM, 31(8):996:1007.
+
+   [9]  M. Nelson, J. Gailly, The Data Compression Book, ISBN 1-55851-
+        434-1, M&T Books, New York, NY, 1995.
+
+   [10] J.C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy,
+        Potential benefits of delta encoding and data compression for
+        HTTP, SIGCOMM '97, Cannes, France, 1997.
+
+   [11] Mogul, J., Krishnamurthy, B., Douglis, F., Feldmann, A., Goland,
+        Y. and A. Van Hoff, "Delta Encoding in HTTP", RFC 3229, January
+        2002.
+
+   [12] Bradner, S., "Key words for use in RFCs to Indicate Requirement
+        Levels", BCP 14, RFC 2119, March 1997.
+
+   [13] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA
+        Considerations Section in RFCs", BCP 26, RFC 2434, October 1998.
+
+   [14] D.G. Korn and K.P. Vo, Engineering a Differencing and
+        Compression Data Format, Submitted to Usenix'2002, 2001.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 27]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+17. Authors' Addresses
+
+   Kiem-Phong Vo (main contact)
+   AT&T Labs, Room D223
+   180 Park Avenue
+   Florham Park, NJ 07932
+
+   Phone: 1 973 360 8630
+   EMail: kpv@research.att.com
+
+
+   David G. Korn
+   AT&T Labs, Room D237
+   180 Park Avenue
+   Florham Park, NJ 07932
+
+   Phone: 1 973 360 8602
+   EMail: dgk@research.att.com
+
+
+   Jeffrey C. Mogul
+   Western Research Laboratory
+   Hewlett-Packard Company
+   1501 Page Mill Road, MS 1251
+   Palo Alto, California, 94304, U.S.A.
+
+   Phone: 1 650 857 2206 (email preferred)
+   EMail: JeffMogul@acm.org
+
+
+   Joshua P. MacDonald
+   Computer Science Division
+   University of California, Berkeley
+   345 Soda Hall
+   Berkeley, CA 94720
+
+   EMail: jmacd@cs.berkeley.edu
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 28]
+
+RFC 3284                         VCDIFF                        June 2002
+
+
+18.  Full Copyright Statement
+
+   Copyright (C) The Internet Society (2002).  All Rights Reserved.
+
+   This document and translations of it may be copied and furnished to
+   others, and derivative works that comment on or otherwise explain it
+   or assist in its implementation may be prepared, copied, published
+   and distributed, in whole or in part, without restriction of any
+   kind, provided that the above copyright notice and this paragraph are
+   included on all such copies and derivative works.  However, this
+   document itself may not be modified in any way, such as by removing
+   the copyright notice or references to the Internet Society or other
+   Internet organizations, except as needed for the purpose of
+   developing Internet standards in which case the procedures for
+   copyrights defined in the Internet Standards process must be
+   followed, or as required to translate it into languages other than
+   English.
+
+   The limited permissions granted above are perpetual and will not be
+   revoked by the Internet Society or its successors or assigns.
+
+   This document and the information contained herein is provided on an
+   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
+   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
+   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
+   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
+   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Acknowledgement
+
+   Funding for the RFC Editor function is currently provided by the
+   Internet Society.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Korn, et. al.               Standards Track                    [Page 29]
+