From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc3229.txt | 2747 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 2747 insertions(+) create mode 100644 doc/rfc/rfc3229.txt (limited to 'doc/rfc/rfc3229.txt') diff --git a/doc/rfc/rfc3229.txt b/doc/rfc/rfc3229.txt new file mode 100644 index 0000000..9d53081 --- /dev/null +++ b/doc/rfc/rfc3229.txt @@ -0,0 +1,2747 @@ + + + + + + +Network Working Group J. Mogul +Request for Comments: 3229 Compaq WRL +Category: Standards Track B. Krishnamurthy + F. Douglis + AT&T + A. Feldmann + Univ. of Saarbruecken + Y. Goland + A. van Hoff + Marimba + D. Hellerstein + ERS/USDA + January 2002 + + + Delta encoding in HTTP + +Status of this Memo + + This document specifies an Internet standards track protocol for the + Internet community, and requests discussion and suggestions for + improvements. Please refer to the current edition of the "Internet + Official Protocol Standards" (STD 1) for the standardization state + and status of this protocol. Distribution of this memo is unlimited. + +Copyright Notice + + Copyright (C) The Internet Society (2002). All Rights Reserved. + +Abstract + + This document describes how delta encoding can be supported as a + compatible extension to HTTP/1.1. + + Many HTTP (Hypertext Transport Protocol) requests cause the retrieval + of slightly modified instances of resources for which the client + already has a cache entry. Research has shown that such modifying + updates are frequent, and that the modifications are typically much + smaller than the actual entity. In such cases, HTTP would make more + efficient use of network bandwidth if it could transfer a minimal + description of the changes, rather than the entire new instance of + the resource. This is called "delta encoding." + + + + + + + + + +Mogul, et al. Standards Track [Page 1] + +RFC 3229 Delta encoding in HTTP January 2002 + + +Table of Contents + + 1 Introduction.................................................... 3 + 1.1 Related research and proposals........................... 4 + 2 Goals........................................................... 5 + 3 Terminology..................................................... 6 + 4 The HTTP message-generation sequence............................ 8 + 4.1 Relationship between deltas and ranges................... 11 + 5 Basic mechanisms................................................ 13 + 5.1 Background: an overview of HTTP cache validation......... 13 + 5.2 Requesting the transmission of deltas.................... 14 + 5.3 Choice of delta algorithm and format..................... 16 + 5.4 Identification of delta-encoded responses................ 16 + 5.5 Guaranteeing cache safety................................ 17 + 5.6 Transmission of delta-encoded responses.................. 18 + 5.7 Examples of requests combining Range and delta encoding.. 19 + 6 Encoding algorithms and formats................................. 22 + 7 Management of base instances.................................... 23 + 7.1 Multiple entity tags in the If-None-Match header......... 24 + 7.2 Hints for managing the client cache...................... 25 + 8 Deltas and intermediate caches.................................. 27 + 9 Digests for data integrity...................................... 28 + 10 Specification.................................................. 28 + 10.1 Protocol parameter specifications....................... 28 + 10.2 IANA Considerations..................................... 30 + 10.3 Basic requirements for delta-encoded responses.......... 30 + 10.4 Status code specifications.............................. 30 + 10.4.1 226 IM Used...................................... 31 + 10.5 Header specifications................................... 31 + 10.5.1 Delta-Base....................................... 31 + 10.5.2 IM............................................... 32 + 10.5.3 A-IM............................................. 33 + 10.6 Caching rules for 226 responses......................... 35 + 10.7 Rules for deltas in the presence of content-codings..... 36 + 10.7.1 Rules for generating deltas in the presence of + content-codings.................................. 37 + 10.7.2 Rules for applying deltas in the presence of + content-codings.................................. 37 + 10.7.3 Examples for using A-IM, IM, and content-codings. 38 + 10.8 New Cache-Control directives............................ 40 + 10.8.1 Retain directive................................. 40 + 10.8.2 IM directive..................................... 40 + 10.9 Use of compression with delta encoding.................. 41 + 10.10 Delta encoding and multipart/byteranges................ 42 + 11 Quantifying the protocol overhead.............................. 42 + 12 Security Considerations........................................ 44 + 13 Acknowledgements............................................... 44 + 14 Intellectual Property Rights................................... 44 + + + +Mogul, et al. Standards Track [Page 2] + +RFC 3229 Delta encoding in HTTP January 2002 + + + 15 References..................................................... 44 + 16 Authors' addresses............................................. 47 + 17 Full Copyright Statement....................................... 49 + +1 Introduction + + The World Wide Web is a distributed system, and so often benefits + from caching to reduce retrieval delays. Retrieval of a Web resource + (such as a document, image, icon, or applet) over the Internet or + other wide-area networks usually takes enough time that the delay is + over the human threshold of perception. Often, that delay is + measured in seconds. Caching can often eliminate or significantly + reduce retrieval delays. + + Many Web resources change over time, so a practical caching approach + must include a coherency mechanism, to avoid presenting stale + information to the user. Originally, the Hypertext Transfer Protocol + (HTTP) provided little support for caching, but under operational + pressures, it quickly evolved to support a simple mechanism for + maintaining cache coherency. + + In HTTP/1.0 [2], the server may supply a "last-modified" timestamp + with a response. If a client stores this response in a cache entry, + and then later wishes to re-use the response, it may transmit a + request message with an "If-modified-since" field containing that + timestamp; this is known as a conditional retrieval. Upon receiving + a conditional request, the server may either reply with a full + response, or, if the resource has not changed, it may send an + abbreviated reply, indicating that the client's cache entry is still + valid. HTTP/1.0 also includes a means for the server to indicate, + via an "Expires" timestamp, that a response will be valid until that + time; if so, a client may use a cached copy of the response until + that time, without first validating it using a conditional retrieval. + + HTTP/1.1 [10] adds many new features to improve cache coherency and + performance. However, it preserves the all-or-none model for + responses to conditional retrievals: either the server indicates that + the resource value has not changed at all, or it must transmit the + entire current value. + + Common sense suggests (and traces confirm), however, that even when a + Web resource does change, the new instance is often substantially + similar to the old one. If the difference, or "delta", between the + two instances could be sent to the client instead of the entire new + instance, a client holding a cached copy of the old instance could + apply the delta to construct the new version. In a world of finite + bandwidth, the reduction in response size and delay could be + significant. + + + +Mogul, et al. Standards Track [Page 3] + +RFC 3229 Delta encoding in HTTP January 2002 + + + One can think of deltas as a way to squeeze as much benefit as + possible from client and proxy caches. Rather than treating an + entire response as the "cache line", with deltas we can treat + arbitrary pieces of a cached response as the replaceable unit, and + avoid transferring pieces that have not changed. + + This document proposes a set of compatible extensions to HTTP/1.1 + that allow clients and servers to use delta encoding with minimal + overhead. + + We assume that the reader is familiar with the HTTP/1.1 + specification. + +1.1 Related research and proposals + + The idea of delta encoding to reduce communication or storage costs + is not new. For example, the MPEG-1 video compression standard + transmits occasional still-image frames, but most of the frames sent + are encoded (to oversimplify) as changes from an adjacent frame. The + SCCS and RCS [27] systems for software version control represent + intermediate versions as deltas; SCCS starts with an original version + and encodes subsequent ones with forward deltas, whereas RCS encodes + previous versions as reverse deltas from their successors. + Jacobson's technique for compressing IP and TCP headers over slow + links [17] uses a clever, highly specialized form of delta encoding. + + In spite of this history, it appears to have taken several years + before anyone thought of applying delta encoding to HTTP, perhaps + because the development of HTTP caching has been somewhat haphazard. + The first published suggestion for delta encoding appears to have + been by Williams et al. in a paper about HTTP cache removal policies + [30], but these authors did not elaborate on their design until later + [29]. + + The WebExpress project [15] appears to be the first published + description of an implementation of delta encoding for HTTP (which + they call "differencing"). WebExpress is aimed specifically at + wireless environments, and includes a number of orthogonal + optimizations. Also, the WebExpress design does not propose changing + the HTTP protocol itself, but rather uses a pair of interposed + proxies to convert the HTTP message stream into an optimized form. + The results reported for WebExpress differencing are impressive, but + are limited to a few selected benchmarks. + + Banga et al. [1] describe the use of optimistic deltas, in which a + layer of interposed proxies on either end of a slow link collaborate + to reduce latency. If the client-side proxy has a cached copy of a + resource, the server-side proxy can simply send a delta (or a 304 + + + +Mogul, et al. Standards Track [Page 4] + +RFC 3229 Delta encoding in HTTP January 2002 + + + [Not Modified] response). If only the server-side proxy has a cached + copy, it may optimistically send its (possibly stale) copy to the + client-side proxy, followed (if necessary) by a delta once the + server-side proxy has validated its own cache entry with the origin + server. The use of optimistic deltas, unlike delta encoding, + actually increases the number of bytes sent over the network, in an + attempt to improve latency by anticipating a "Not Modified" response + from the origin server. The optimistic delta paper, like the + WebExpress paper, did not propose a change to the HTTP protocol + itself, and reported results only for a small set of selected URLs. + + Mogul et al. [23] collected lengthy traces, at two different sites, + of the full contents of HTTP messages, to quantify the potential + benefits of delta-encoded responses. They showed that delta encoding + can provide remarkable improvements in response-size and response- + delay for an important subset of HTTP content types. They proposed a + set of HTTP extensions, but without the level of detail required for + a specification. Douglis et al. [8] used the same sets of full- + content traces to quantify the rate at which resources change in the + Web. + + The HTTP Distribution and Replication Protocol (DRP), proposed to W3C + by Marimba, Netscape, Sun, Novell, and At Home, aims to provide a + collection of new features for HTTP, to support "the efficient + replication of data over HTTP" [13]. One aspect of the DRP proposal + is the use of "differential downloading," which is essentially a form + of delta encoding. The original DRP proposal uses a different + approach than is described here, but a forthcoming revision of DRP + will be revised to conform to the proposal in this document. + + Tridgell and Mackerras [28] describe the "rsync" algorithm, which + accomplishes something similar to delta encoding. In rsync, the + client breaks a cache entry into a series of fixed-sized blocks, + computes a digest value for each block, and sends the series of + digest values to the server as part of its request. The origin + server does the same block-based computation, and returns only those + blocks whose digest values differ. We believe that it might be + possible to support rsync using the "instance manipulation" framework + described later in this document, but this has not been worked out in + any detail. + +2 Goals + + The goals of this proposal are: + + 1. Reduce the mean size of HTTP responses, thereby improving + latency and network utilization. + + + + +Mogul, et al. Standards Track [Page 5] + +RFC 3229 Delta encoding in HTTP January 2002 + + + 2. Avoid any extra network round trips. + + 3. Minimize the amount of per-request and per-response overheads. + + 4. Support a variety of encoding algorithms and formats. + + 5. Interoperate with HTTP/1.0 and HTTP/1.1. + + 6. Be fully optional for clients, proxies, and servers. + + 7. Allow moderately simple implementations. + + The goals do not include: + + - Reducing the number of HTTP requests sent to an origin server. + + - Reducing the size of every HTTP message. + + - Increasing the cache-hit ratio of HTTP caches. + + - Allowing excessively simplistic implementations of delta + encoding. + + - Delta encoding of request messages, or of responses to methods + other than GET. + + Nothing in this specification specifically precludes the use of + a delta encoding for the body of a PUT request. However, no + mechanism currently exists for the client to discover if the + server can interpret such messages, and so we do not attempt to + specify how they might be used. + +3 Terminology + + HTTP/1.1 [10] defines the following terms: + + resource A network data object or service that can be + identified by a URI, as defined in section 3.2. + Resources may be available in multiple + representations (e.g. multiple languages, data + formats, size, resolutions) or vary in other ways. + + entity The information transferred as the payload of a + request or response. An entity consists of + metainformation in the form of entity-header fields + and content in the form of an entity-body, as + described in section 7. + + + + +Mogul, et al. Standards Track [Page 6] + +RFC 3229 Delta encoding in HTTP January 2002 + + + variant A resource may have one, or more than one, + representation(s) associated with it at any given + instant. Each of these representations is termed a + `variant.' Use of the term `variant' does not + necessarily imply that the resource is subject to + content negotiation. + + The dictionary definition for "entity" is "something that has + separate and distinct existence and objective or conceptual reality" + [21]. Unfortunately, the definition for "entity" in HTTP/1.1 is + similar to that used in MIME [12], based on a false analogy between + MIME and HTTP. + + In MIME, electronic mail messages do have distinct and separate + existences. MIME defines "entity" as something that "refers + specifically to the MIME-defined header fields and contents of either + a message or one of the parts in the body of a multipart entity." + + In HTTP, however, a response message to a GET does not have a + distinct and separate existence. Rather, it reflects the current + state of a resource (or a variant, subject to a set of constraints). + The HTTP/1.1 specification has no term to describe "the value that + would be returned in response to a GET request at the current time + for the selected variant of the specified resource." This leads to + awkward wordings in the HTTP/1.1 specification in places where this + concept is necessary. + + To express this concept, we define a new term, for use in this + document: + + instance The entity that would be returned in a status-200 + response to a GET request, at the current time, for + the selected variant of the specified resource, with + the application of zero or more content-codings, but + without the application of any instance manipulations + (see below) or transfer-codings. + + It is convenient to think of an entity tag, in HTTP/1.1, as being + associated with an instance, rather than an entity. That is, for a + given resource, two different response messages might include the + same entity tag, but two different instances of the resource should + never be associated with the same (strong) entity tag. + + We will informally use the term "delta," in this document, to mean an + HTTP response encoded as the difference between two instances. + + + + + + +Mogul, et al. Standards Track [Page 7] + +RFC 3229 Delta encoding in HTTP January 2002 + + + More formally, delta encodings are members of a potentially larger + class of transformations on instances, leading to this new term: + + instance manipulation + An operation on one or more instances which may + result in an instance being conveyed from server to + client in parts, or in more than one response + message. For example, a range selection or a delta + encoding. Instance manipulations are end-to-end, and + often involve the use of a cache at the client. + + For reasons that will become clear later on, it is convenient to + think about subrange selection as a form of instance manipulation. + In some contexts, compression might also be treated as an instance + manipulation, rather than as a content-coding or transfer-coding. + +4 The HTTP message-generation sequence + + HTTP/1.1 supports a number of different transformations on the body + of a value: + + Content-coding According to the specification, "Content coding + values indicate an encoding transformation that has + been or can be applied to an entity. Content codings + are primarily used to allow a document to be + compressed or otherwise usefully transformed without + losing the identity of its underlying media type and + without loss of information. Frequently, the entity + is stored in coded form, transmitted directly, and + only decoded by the recipient." Content-codings are + normally end-to-end transformations; i.e., once + applied at the sender, they are not removed except at + the ultimate recipient. An intermediate server may + apply a content-coding, in appropriate circumstances. + + Transfer-coding According to the specification, "Transfer coding + values are used to indicate an encoding + transformation that has been, can be, or may need to + be applied to an entity-body in order to ensure "safe + transport" through the network. This differs from a + content coding in that the transfer coding is a + property of the message, not of the original entity." + Transfer-codings are explicitly hop-by-hop + transformations (although, as an optimization, an + intermediate proxy may store the transfer-coded + version of a message if this behavior is not + inconsistent with its externally visible function.) + + + + +Mogul, et al. Standards Track [Page 8] + +RFC 3229 Delta encoding in HTTP January 2002 + + + Ranges An HTTP client, using the Range header, may request + that the server return one or more subranges of the + instance, rather than the entire instance value. + HTTP/1.1 only supports byte-ranges, although there is + some possibility that future extensions will allow + for other kinds of range-specifiers (such as chapters + of a document). + + A client signals its willingness to receive a content-coding by + sending an "Accept-Encoding" header, listing the set of content- + codings that it understands. It may optionally include information + about which content-codings it prefers. If a server uses any non- + identity content-coding(s), it includes a "Content-Encoding" header + field in the response, listing these content-codings in their order + of application. + + RFC 2068 [9] did not include an analogous mechanism for negotiating + the use of transfer-codings, although it does include an analogous + "Transfer-Encoding" header for marking the response. A new "TE" + header has since been added to HTTP/1.1 [10], analogous to the + "Accept-Encoding" header. + + In this document, we add new, optional message headers to support the + use of instance manipulations. A client signals its willingness to + receive an instance-manipulation by sending an "A-IM" header (short + for "Accept-Instance-Manipulation", which is far too long to spell + out), analogous to the "Accept-Encoding" header. Similarly, a server + lists the set of instance-manipulations it has applied using an "IM" + header. + + One must understand the relationship between these transformations in + order to see how delta encoding applies to HTTP responses. + + Conceptually, the various transformations are applied in the + following sequence: + + 1. Upon receiving a GET request, the server uses the URI in the + request to identify the requested resource. + + 2. Optionally, it uses information from the request (and perhaps + additional information) to select a variant of that resource. + + 3. At this point, the server may apply a non-identity content- + coding to the instance, or one might have been inherent in its + generation. This also results in a Content-Encoding header. + + + + + + +Mogul, et al. Standards Track [Page 9] + +RFC 3229 Delta encoding in HTTP January 2002 + + + 4. The result of the first three steps, at the time when the + request is processed, is an instance. The instance includes a + body (possibly empty) and possibly some instance headers. The + entity tag, if any, is assigned at this point. That is, an + entity tag is associated with an instance, NOT an entity. + + 5. The server may then apply an instance-manipulation. For + example, if the request included a Range header, the server may + optionally produce a range response, consisting of the original + set of headers, a Content-Range header, and the appropriate + range(s) from the (possibly encoded) body. Delta encodings are + instance-manipulations, and are computed at this stage. + + 6. The result of the fifth step becomes the entity, consisting of + entity headers and an entity body. + + 7. The server may then apply a non-identity transfer-coding; on- + the-fly compression could be done in this step. If so, a + Transfer-Encoding header is added to the message. + + 8. The results of the seventh step is the message, consisting of a + message body (the transfer-coded version of the entity body), + the entity headers, and additional response and general + headers. + + Note: Section 14.13 of the HTTP/1.1 specification [10] says "The + Content-Length entity-header field indicates the size of the + entity-body." In other words, Content-Length measures the length + of an entity, not of an instance or of a variant. For example, if + the message is a delta encoding, Content-Length gives the length + of the delta encoding, not the length of the current instance. + + + + + + + + + + + + + + + + + + + + +Mogul, et al. Standards Track [Page 10] + +RFC 3229 Delta encoding in HTTP January 2002 + + + Diagrammatically, the sequence is: + + datatype operation leading to next datatype + ======== ================================== + resource + | choose acceptable variant, if needed + v + variant + | apply content-coding, if any + v + + | compute/assign entity tag + v + instance + | apply instance manipulation, if any + v (delta encoding, range selection, etc.) + entity-body + | apply transfer-coding, if any + v + message-body + + This formalization of the HTTP message generation sequence has not + previously been described. However, it is clear that Range selection + needs to be done after the entity tag has been assigned and after any + content-coding has been applied, and before any transfer-coding is + applied. Therefore, this formalization is fully consistent with + previous practice and specification. + +4.1 Relationship between deltas and ranges + + If both Ranges and delta encodings are forms of instance + manipulation, which should be applied first? This depends on how the + Range is being used. + + Ranges are used for two main purposes, at the discretion of the + requesting client: + + 1. to complete a partial response after a premature termination of + a message transmission. + + 2. to obtain just selected sections of an instance. + + In the first use of Range, it would have to be applied after any + delta encoding, since the intended use is to recover an intact copy + of the delta-encoded instance. In the second use of Range, it would + have to be applied before any delta encoding, because otherwise the + + + + + +Mogul, et al. Standards Track [Page 11] + +RFC 3229 Delta encoding in HTTP January 2002 + + + offsets specified in the Range request would be meaningless (the + client generally cannot know how a server's delta encoding maps + instance byte offsets to entity byte offsets). + + Therefore, we need a mechanism to allow the client to specify the + order in which two or more instance-manipulations should be applied. + This is easily provided as part of the specification of the "A-IM" + header (see section 10.5.3), where we require that the server apply + instance-manipulations in the order that they are listed in the "A- + IM" header. We also include a "range" literal in the set of + registered instance-manipulations, to allow the client to specify (by + its ordering with respect to other instance-manipulations) whether + range selection is done before or after delta encoding. + + We also need a mechanism for the server to indicate in which order + two or more instance-manipulations have been applied; this is part of + the specification of the "IM" header (see section 10.5.2), where we + follow the same practice used for the "Content-Encoding" header: the + "IM" header lists the instance-manipulations in the order that were + applied (including, perhaps, the special "range" literal). + + A similar issue arises when Ranges are combined with compression. If + the client is using a Range to complete a partial response after a + premature termination of a compressed message, then the Range would + have to be applied after the compression. This is feasible in + unmodified HTTP/1.1, because the compression can be done as a + content-coding. However, if the client is using a Range to obtain + selected sections of an instance, it would normally be able to + specify offsets only in terms of the uncompressed variant. If the + selected portion was large enough to warrant compression, the client + could request a compressed transfer-coding, but this is a hop-by-hop + transformation and is not the most efficient approach (especially if + an HTTP/1.0 proxy is in the path). + + We can resolve this issue by supporting the use of compression as an + instance-manipulation (as well as as a content-coding or transfer- + coding), and by using the new mechanism that allows the client to + specify that the compression instance-manipulation is done after the + Range instance-manipulation. + + This also allows the client to control whether compression is done + before or after delta encoding, since some simple differencing + algorithms (such as the UNIX "diff" command) require post-compression + of their output to yield the best results. + + + + + + + +Mogul, et al. Standards Track [Page 12] + +RFC 3229 Delta encoding in HTTP January 2002 + + +5 Basic mechanisms + + In this section, we explain the concepts behind delta encoding. This + is not meant as a formal specification of the proposed extensions; + see section 10 for that. + +5.1 Background: an overview of HTTP cache validation + + When a client has a response in its cache, and wishes to ensure that + this cache entry is current, HTTP/1.1 allows the client to do a + "conditional GET", using one of two forms of "cache validators." In + the traditional form, available in both HTTP/1.0 and in HTTP/1.1, the + client may use the "If-Modified-Since" request-header to present to + the server the "Last-Modified" timestamp (if any) that the server + provided with the response. If the server's timestamp for the + resource has not changed, it may send a response with a status code + of 304 (Not Modified), which does not transmit the body of the + resource. If the timestamp has changed, the server would normally + send a response with a status code of 200 (OK), which carries a + complete copy of the resource, and a new Last-Modified timestamp. + + This timestamp-based approach is prone to error because of the lack + of timestamp resolution: if a resource changes twice during one + second, the change might not be detectable. Therefore, HTTP/1.1 also + allows the server to provide an entity tag with a response. An + entity tag is an opaque string, constructed by the server according + to its own needs; the protocol specification imposes a bare minimum + of requirements on entity tags. (In particular, a "strong" entity + tag must change if the value of the resource changes.) In this case, + the client may validate its cache entry by sending its conditional + request using the "If-None-Match" request-header, presenting the + entity tag associated with the cached response. (The protocol + defines several other ways to transmit entity tags, such as the "If- + Range" header, used for short-circuiting an otherwise necessary round + trip.) If the presented entity tag matches the server's current tag + for the resource, the server should send a 304 (Not Modified) + response. Otherwise, the server should send a 200 (OK) response, + along with a complete copy of the resource. + + In the existing HTTP protocol (HTTP/1.0 or HTTP/1.1), a client + sending a conditional request can expect either of two responses: + + - status = 200 (OK), with a full copy of the resource, because + the server's copy of the resource is presumably different from + the client's cached copy. + + + + + + +Mogul, et al. Standards Track [Page 13] + +RFC 3229 Delta encoding in HTTP January 2002 + + + - status = 304 (Not Modified), with no body, because the server's + copy of the resource is presumably the same as the client's + cached copy. + + Informally, one could think of these as "deltas" of 100% and 0% of + the resource, respectively. Note that these deltas are relative to a + specific cached response. That is, a client cannot request a delta + without specifying, somehow, which two instances of a resource are + being differenced. The "new" instance is implicitly the current + instance that the server would return for an unconditional request, + and the "old" instance is the one that is currently in the client's + cache. The cache validator (last-modified time or entity tag) is + what is used to communicate to the server the identity of the old + instance. + +5.2 Requesting the transmission of deltas + + In order to support the transmission of actual deltas, an extension + to HTTP/1.1 needs to provide these features: + + 1. A way to mark a request as conditional. + + 2. A way to specify the old instance, to which the delta will be + applied by the client. + + 3. A way to indicate that the client is able to apply one or more + specific forms of delta encoding. + + 4. A way to mark a response as being delta-encoded in a particular + format. + + The first two features are already provided by HTTP/1.1: the presence + of a conditional request-header (such as "If-Modified-Since" or "If- + None-Match") marks a request as conditional, and the value of that + header uniquely specifies the old instance (ignoring the problem of + last-modified timestamp granularity). + + We defer discussion of the fourth feature, until section 5.6. + + The third feature, a way for the client to indicate that it is able + to apply deltas (aside from the trivial 0% and 100% deltas), can be + accomplished by transmitting a list of acceptable delta-encoding + formats in a request-header field; specifically, the "A-IM" header. + The presence of this list in a conditional request indicates that the + client is able to apply delta-encoded cache updates. + + + + + + +Mogul, et al. Standards Track [Page 14] + +RFC 3229 Delta encoding in HTTP January 2002 + + + For example, a client might send this request: + + GET /foo.html HTTP/1.1 + Host: bar.example.net + If-None-Match: "123xyz" + A-IM: vcdiff, diffe, gzip + + The meaning of this request is that: + + - The client wants to obtain the current value of /foo.html. + + - It already has a cached response (instance) for that resource, + whose entity tag is "123xyz". + + - It is willing to accept delta-encoded updates using either of + two formats, "diffe" (i.e., output from the UNIX "diff -e" + command), and "vcdiff". (Encoding algorithms and formats, such + as "vcdiff", are described in section 6.) + + - It is willing to accept responses that have been compressed + using "gzip," whether or not these are delta-encoded. (It + might be useful to compress the output of "diff -e".) However, + based on the mandatory ordering constraint specified in section + 10.5.3, if both delta encoding and compression are applied, + then this "A-IM" request header specifies that compression + should be done last. + + If, in this example, the server's current entity tag for the resource + is still "123xyz", then it should simply return a 304 (Not Modified) + response, as would a traditional server. + + If the entity tag has changed, presumably but not necessarily because + of a modification of the resource, the server could instead compute + the delta between the instance whose entity tag was "123xyz" and the + current instance. + + We defer discussion of what the server needs to store, in order to + compute deltas, until section 7. + + We note that if a client indicates it is willing to accept deltas, + but the server does not support this form of instance-manipulation, + the server will simply ignore this aspect of the request. (HTTP + always allows an implementation to ignore a header that is not + required by a specification that the implementation complies with, + and the specification of "A-IM" allows the server to ignore an + instance-manipulation it does not understand.) So if a server either + does not implement the A-IM header at all, or does not implement any + + + + +Mogul, et al. Standards Track [Page 15] + +RFC 3229 Delta encoding in HTTP January 2002 + + + of the instance manipulations listed in the A-IM header, it acts as + if the client had not requested a delta-encoded response: the server + generates a status-200 response. + +5.3 Choice of delta algorithm and format + + The server is not required to transmit a delta-encoded response. For + example, the result might be larger than the current size of the + resource. The server might not be able to compute a delta for this + type of resource (e.g., a compressed binary format); the server might + not have sufficient CPU cycles for the delta computation; the server + might not support any of the delta formats supported by the client; + or, the network bandwidth might be high enough that the delay + involved in computing the delta is not worth the delay avoided by + sending a smaller response. + + However, if the server does want to compute a delta, and the set of + encodings it supports has more than one encoding in common with the + set offered by the client, which encoding should it use? This is + mostly at the option of the server, although the client can express + preferences using "Quality Values" (or "qvalues") in the "A-IM" + header. The HTTP/1.1 specification [10] describes qvalues in more + detail. (Clients may prefer one delta encoding format over another + that generates a smaller encoding, if the decoding costs for the + first format are lower and the client is resource-constrained.) + + Server implementations have a number of possible approaches. For + example, if CPU cycles are plentiful and network bandwidth is scarce, + the server might compute each of the possible encodings and then send + the smallest result. Or the server might use heuristics to choose an + encoding format, based on things such as the content-type of the + resource, the current size of the resource, and the expected amount + of change between instances of the resource. + + Note that it might pay to cache the deltas internally to the server, + if a resource is typically requested by several different delta- + capable clients between modifications. In this case, the cost of + computing a delta may be amortized over many responses, and so the + server might use a more expensive computation. + +5.4 Identification of delta-encoded responses + + A response using delta encoding must be identified as such. This is + done using the "IM" response-header, specified in section 10.5.2. + + However, a simplistic application of this approach would cause + serious problems if a delta-encoded response flows through an + intermediate (proxy) cache that is not cognizant of the delta + + + +Mogul, et al. Standards Track [Page 16] + +RFC 3229 Delta encoding in HTTP January 2002 + + + mechanism. Because the Internet still includes a significant number + of HTTP/1.0 caches, which might never be entirely replaced, and + because the HTTP specifications insist that message recipients ignore + any header field that they do not understand, a non-delta-capable + proxy cache that receives a delta-encoded response might store that + response, and might later return it to a non-delta-capable client + that has made a request for the same resource. This naive client + would believe that it has received a valid copy of the entire + resource, with predictably unpleasant results. + + To solve this problem, we propose that delta-encoded responses + (actually, all instance-manipulated responses) be identified as such + using a new HTTP status code. For specificity in the discussion that + follows, we will use the (currently unassigned) code of 226, with a + reason phrase of "IM Used". (We see no benefit in spelling out the + words "Instance Manipulation Used," since this requires the + transmission of unnecessary bytes, and this Reason-phrase should not + normally be seen by human users.) There is some precedent for this + approach: the HTTP/1.1 specification introduces the 206 (Partial + Content) status code, for the transmission of sub-ranges of a + resource. Existing proxies apparently forward responses with unknown + status codes, and do not attempt to cache them. + + An alternative to using a new status code would be to use the + "Expires" header to prevent HTTP/1.0 caches from storing the + response, then use "Cache-Control: max-age" (defined in HTTP/1.1) to + allow more modern caches to store delta-encoded responses. This adds + many bytes to the response headers, and so would reduce the + effectiveness of delta encoding. It is also not entirely clear that + this approach suppresses all caching by all HTTP/1.0 proxies. + + We were reluctant to define an additional status code as part of + the support for delta encoding. However, we see no other + efficient way to remain compatible with the deployed base of + HTTP/1.0 cache implementations. + +5.5 Guaranteeing cache safety + + Although we are not aware of any HTTP/1.1 proxy implementations that + would attempt to cache a response with an unknown 2xx status code, + the HTTP/1.1 specification does allow this behavior if the response + carries an Expires or Cache-Control header field that explicitly + allows caching. This would present a problem when a 226 (IM Used) + response carries such headers. + + + + + + + +Mogul, et al. Standards Track [Page 17] + +RFC 3229 Delta encoding in HTTP January 2002 + + + The solution in that case is to exploit the Cache Control Extensions + mechanism from the HTTP/1.1 specification. We define a new cache- + directive, "im", which indicates that the "no-store" cache-directive + may be ignored by implementations that conform to the specification + for the IM and A-IM headers. + + For example, this response: + + HTTP/1.1 226 IM Used + ETag: "489uhw" + IM: vcdiff + Date: Tue, 25 Nov 1997 18:30:05 GMT + Cache-Control: no-store, im, max-age=30 + + ... + + "MUST NOT" be stored by a cache that complies with the HTTP/1.1 + specification (which states that the max-age cache-directive "implies + that the response is cacheable [...] unless some other, more + restrictive cache directive is also present."). However, a cache + that does comply with the specification for the im cache-directive + (i.e., a cache that complies with the specification for the A-IM and + IM header fields, and the 226 status code) ignores the no-store + directive, and therefore sees the max-age directive as allowing + caching. + + We are not entirely sure that all HTTP/1.1 caches obey the rule + that the max-age directive is overridden by the no-store + directive. If operational testing reveals this to be a problem, + more elaborate solutions are possible. + + Warning to origin server implementors: it does not suffice to send + + Vary: If-None-Match, A-IM + + in status-226 responses. We have discovered at least one scenario + where this does not prevent a proxy cache that does not implement IM + and A-IM from incorrectly "validating" a cached 226 response. + +5.6 Transmission of delta-encoded responses + + A delta-encoded response differs from a standard response in four + ways: + + 1. It carries a status code of 226 (IM Used). + + 2. It carries an "IM" response-header field, indicating which + delta encoding is used in this response. + + + +Mogul, et al. Standards Track [Page 18] + +RFC 3229 Delta encoding in HTTP January 2002 + + + 3. Its message-body is a delta encoding of the current instance, + rather than a full copy of the instance. + + 4. It might carry several other new headers, as described later in + this document. + + For example, a response to the request given in section 5.2 might + look like: + + HTTP/1.1 226 IM Used + ETag: "489uhw" + IM: vcdiff + Date: Tue, 25 Nov 1997 18:30:05 GMT + + ... + + (We do not show the actual contents of the response body, since this + is a binary format.) + + Note: the Etag header in a 226 response with a delta encoding + provides the entity tag of the current instance of the resource + variant. It is not meaningful to associate an entity tag with the + delta value, which is not an instance. + +5.7 Examples of requests combining Range and delta encoding + + In the example used in section 5.2, the client sends: + + GET /foo.html HTTP/1.1 + Host: bar.example.net + If-None-Match: "123xyz" + A-IM: vcdiff, diffe, gzip + + and the server either responds with a 304 (Not Modified) response, or + with the appropriate delta encoding. + + Here are a few more examples, to clarify how the client request + should be interpreted. + + If the client sends + + GET /foo.html HTTP/1.1 + Host: bar.example.net + If-None-Match: "123xyz" + A-IM: vcdiff, diffe, gzip, range + Range: bytes=0-99 + + + + + +Mogul, et al. Standards Track [Page 19] + +RFC 3229 Delta encoding in HTTP January 2002 + + + then the meaning is the same as in the example above, except that + after the delta encoding (and compression, if any) is computed, the + server then returns only the first 100 bytes of the output of the + delta encoding. (If it is shorter than 100 bytes, the entire delta + encoding is returned.) Because the "range" token appears last in the + "A-IM" header, this tells the origin server to apply any range + selection after the other instance-manipulations. + + The interaction between the If-Range mechanism and delta encoding is + somewhat complex. (If-Range means, informally, "if the entity is + unchanged, send me the part(s) that I am missing; otherwise, send me + the entire new entity.") Here is an example that should clarify the + use of this combination. + + Suppose that the client wants to have the complete current instance + of http://bar.example.net/foo.html. It already has a (complete) + cache entry for this URI, with entity tag "A", so it issues this + request: + + GET /foo.html HTTP/1.1 + host: bar.example.net + If-None-Match: "A" + A-IM: vcdiff + + Suppose that the server's current instance has entity tag "B", and + that the server also has retained a copy of the instance with entity + tag "A". Then, the server could compute the difference between "B" + and "A", and respond with: + + HTTP/1.1 226 IM Used + Etag: "B" + IM: vcdiff + Date: Tue, 25 Nov 1997 18:30:05 GMT + Content-Length: 1000 + + ... + + but the network connection is terminated after the client has + received exactly 900 bytes of the message body for the delta-encoded + content. + + The client wants to retrieve the remaining 100 bytes of the delta + encoding that was being sent in the interrupted response. It + therefore should send: + + + + + + + +Mogul, et al. Standards Track [Page 20] + +RFC 3229 Delta encoding in HTTP January 2002 + + + GET /foo.html HTTP/1.1 + host: bar.example.net + If-None-Match: "A" + If-Range: "B" + A-IM: vcdiff,range + Range: bytes=900- + + This rather elaborate request has a well-defined meaning, which + depends on the current entity tag Tcur of the instance when the + server receives the request: + + Tcur = "A" (i.e., for some reason, the instance has reverted to + the value already in the client's cache). The server + should return a 304 (Not Modified) response, as + required by the HTTP/1.1 specification for "If-None- + Match". + + Tcur = "B" (i.e., the instance has not changed again). The + HTTP/1.1 specification for "If-None-Match", in this + case, is that the header field is ignored (by a + server that does not understand delta encoding). + Therefore, this is equivalent to the client's + previous request, except that the Range selection is + applied after the vcdiff instance manipulation (if + both are to be applied). So the (delta-aware) server + again computes the delta between the "A" instance and + the "B" instance (or uses a cached computation of the + delta), then applies the Range selection, and returns + a 226 (IM Used) response, with an message-body + containing bytes 900 to 999 of the result of the + vcdiff encoding, with an "IM:vcdiff,range" response + header. + + Tcur = "C" (i.e., the instance has changed again). In this + case, the HTTP/1.1 specification for "If-None-Match" + again means that this is equivalent to an + unconditional request for the current instance. The + specification for "If-Range" requires the server to + return the entire current instance. However, a + delta-aware server can construct the delta between + the "A" instance described by the "If-None-Match" + field and the current ("C") instance, and return a + 226 (IM Used) response, with an "IM:vcdiff" response + header. + + If the client's request had not included the "If-None-Match: "A"" + header field, the server could not have computed a delta, since it + would not have known which entire instance was already available to + + + +Mogul, et al. Standards Track [Page 21] + +RFC 3229 Delta encoding in HTTP January 2002 + + + the client. If the request had not included the "If-Range: "B"" + header field, the server could not have distinguished between the + latter two cases (Tcur = "B" or Tcur = "C") and would not have been + able to apply the Range selection to the result of delta encoding. + + On the other hand, suppose that the client has a cache entry for the + "A" instance of http://bar.example.net/foo.html, and it has already + received the first 900 bytes of a new instance "B" (perhaps as the + result of an aborted transfer). Now the client wants to receive the + entire current instance, so it could send this request: + + GET /foo.html HTTP/1.1 + host: bar.example.net + If-None-Match: "A" + If-Range: "B" + A-IM: range,vcdiff + Range: bytes=900- + + In this example, as in the previous example, if Tcur = "A" then the + server should send 304 (Not Modified), and if Tcur = "C", then the + server should send the entire new instance, either as a 200 response + or as a delta encoding against instance "A". + + However, if Tcur = "B", in this case the server should first select + the specified range (bytes 900 through the end) from both instances + "A" and "B", then compute the delta encoding between these ranges + (using vcdiff), and then transmit the result using a 226 (IM Used) + response with an "IM:range,vcdiff" response header. + +6 Encoding algorithms and formats + + A number of delta encoding algorithms and formats have been described + in the literature: + + diff -e The UNIX "diff" program is ubiquitously available, + and is relatively fast for both encoding and decoding + (decoding is actually done using the "ed" program). + However, the size of the resulting deltas is + relatively large. This algorithm can only be used on + text-format files. + + diff -e | gzip Running the output of "diff" through a compression + algorithm such as "gzip" [5] (or, perhaps better, + "deflate" [7, 6]) yields a more compact encoding, but + the costs of encoding and decoding are much higher + than for "diff" by itself. This algorithm can only + be used on text-format files. + + + + +Mogul, et al. Standards Track [Page 22] + +RFC 3229 Delta encoding in HTTP January 2002 + + + vcdiff (vdelta) The algorithm that generates the "vcdiff" format [19, + 20] inherently compresses its output, and generally + produces smaller results than the combination of + "diff" and "gzip". The algorithm also runs much + faster, and can be applied to binary-format input. + The "vcdiff" format is based on previous work on an + algorithm named "vdelta." (Note that the "vcdiff" + format can be used either for delta encoding or as a + compressed format, so two different instance- + manipulation values would have to be registered in + order to distinguish these two uses, should its use + as a compressed format be adopted.) The most recent + published study suggests that "vdelta" is the best + overall delta algorithm [16]. + + gdiff The gdiff format [14] was specified as a generic, + algorithm-independent format for expressing deltas. + Because it is more generic it is easy to implement, + but it may not be the most compact encoding format. + + Our proposal does not recommend any specific algorithm or format, but + rather encourages client and server implementors to choose the most + appropriate one(s). However, to avoid the possibility of excessively + long "A-IM" headers, we suggest that, after some period of + experimentation, it might be reasonable to specify a "recommended" + set of delta formats for general-purpose HTTP implementations. + + We suspect that it should be possible to devise a delta encoding + algorithm appropriate for use on typical image encodings, such as GIF + and JPEG. Although experiments with vdelta have not shown much + potential [23], this may simply be because these experiments used + vdelta directly on the already-compressed forms of these encodings. + However, it might be necessary to devise a delta encoding algorithm + that is aware of the two-dimensional nature of images. We have some + expectation that this is possible, since MPEG compression relies on + computing deltas between successive frames of a video stream. + +7 Management of base instances + + If the time between modifications of a resource is less than the + typical eviction time for responses in client caches, this means that + the "old instance" indicated in a client's conditional request might + not refer to the most recent prior instance. This raises the + question of how many old instances of a resource should be maintained + by the server, if any. We call these old instances "base instances." + + + + + + +Mogul, et al. Standards Track [Page 23] + +RFC 3229 Delta encoding in HTTP January 2002 + + + There are many possible options for server implementors. For + example: + + - The server might not store any old instances, and so would + never respond with a delta. + + - The server might only store the most recent prior instance; + requests attempting to validate this instance could be answered + with a delta, but requests attempting to validate older + instances would be answered with a full copy of the resource. + + - The server might store all prior instances, allowing it to + provide a delta response for any client request. + + - The server might store only a subset of the prior instances. + The use of a Least Recently Used (LRU) algorithm to determine + this kind of subset has proved effective in some similar + circumstances, such as cache replacement. + + The server might not have to store prior instances explicitly. It + might, instead, store just the deltas between specific base instances + and subsequent instances (or the inverse deltas between base + instances and prior instances). This approach might be integrated + with a cache of computed deltas. + + None of these approaches necessarily requires additional protocol + support. However, if a server administrator wants to store only a + subset of the prior instances, but would like the server to be able + to respond using deltas as often as possible, then the client needs + some additional information. Otherwise, the client's "If-None-Match" + header might specify a base instance not stored at the server, even + though an appropriate base instance is held in the client's cache. + + We identify two additional protocol changes to help solve this + problem. + +7.1 Multiple entity tags in the If-None-Match header + + Although the examples we have given so far show only one entity tag + in an "If-None-Match" header, the HTTP/1.1 specification allows the + header to carry more than one entity-tag. This feature was included + in HTTP/1.1 to support efficient caching of multiple variants of a + resource, but it is not restricted to that use. + + Suppose that a client has kept more than one instance of a resource + in its cache. That is, not only does it keep the most recent + instance, but it also holds onto copies of one or more prior, invalid + instances. (Alternatively, it might retain sufficient delta or + + + +Mogul, et al. Standards Track [Page 24] + +RFC 3229 Delta encoding in HTTP January 2002 + + + inverse-delta information to reconstruct older instances.) In this + case, it could use its conditional request to tell the server about + all of the instances it could apply a delta to. For example, the + client might send: + + GET /foo.html HTTP/1.1 + host: bar.example.net + If-None-Match: "123xyz", "337pey", "489uhw" + A-IM: vcdiff + + to indicate that it has three instances of this resource in its + cache. If the server is able to generate a delta from any of these + prior instances, it can select the appropriate base instance, compute + the delta, and return the result to the client. + + In this case, however, the server must also tell the client which + base instance to use, and so we need to define a response header, + named "Delta-Base", for this purpose. For example, the server might + reply: + + HTTP/1.1 226 IM Used + ETag: "1acl059" + IM: vcdiff + Delta-Base: "337pey" + Date: Tue, 25 Nov 1997 18:30:05 GMT + + This response tells the client to apply the delta to the cached + response with entity tag "337pey", and to associate the entity tag + "1acl059" with the result. + + Of course, if the server has retained more than one of the prior + instances identified by the client, this could complicate the problem + of choosing the optimal delta to return, since now the server has a + choice not only of the delta format, but also of the base instance to + use. + +7.2 Hints for managing the client cache + + Support for multiple entity tags in choosing the base instance + implies that a client might benefit from storing multiple old + instances of a resource in its cache. A client with finite space + would not want to keep all old instances, so it must manage its cache + for maximal effectiveness by saving those instances most likely to be + useful for future deltas. Although this could be accomplished using + information purely local to the client (e.g., an LRU algorithm), + certain "hint" information from the server could improve the client's + ability to manage its cache. The use of hints for improving Web + cache performance has been described previously [4, 22]. + + + +Mogul, et al. Standards Track [Page 25] + +RFC 3229 Delta encoding in HTTP January 2002 + + + If the server intends to retain certain instances and not others, it + can label the responses that transmit the retained instances. This + would help the client manage its cache, since it would not have to + retain all prior instances on the possibility that only some of them + might be useful later. The label is a hint to the client, not a + promise that the server will indefinitely retain an instance. + + We propose adding a new directive to the existing "Cache-Control" + header for this purpose, named "retain". For example, in response to + an unconditional request, the server might send: + + HTTP/1.1 200 OK + ETag: "337pey" + Date: Tue, 25 Nov 1997 18:30:05 GMT + Cache-Control: retain + + to suggest that a delta-capable client should retain this instance. + The "retain" directive could also appear in a delta response, + referring to the current instance: + + HTTP/1.1 226 IM Used + ETag: "1acl059" + Date: Tue, 25 Nov 1997 18:30:05 GMT + Cache-Control: retain + IM: vcdiff + Delta-Base: "337pey" + + The "retain" directive includes an optional timeout parameter, which + the server can use if it expects to delete an old base instance at a + particular time. For example, + + HTTP/1.1 200 OK + ETag: "337pey" + Date: Tue, 25 Nov 1997 18:30:05 GMT + Cache-Control: retain=3600 + + means that the server intends to retain this base instance for one + hour. + + Another situation where a server can provide a hint to a client is + where the server supports the delta mechanism in general, but does + not intend to provide delta-encoded responses for a particular + resource. By sending a "retain=0" directive, it indicates that the + client should not waste request-header bytes attempting to obtain a + delta-encoded response using this base instance (and, by implication, + for this resource). It also indicates that the client ought not + waste cache space on this instance after it has become stale. To + + + + +Mogul, et al. Standards Track [Page 26] + +RFC 3229 Delta encoding in HTTP January 2002 + + + avoid wasting response-header bytes, a server ought not send + "retain=0", except in reply to a request that attempts to obtain a + delta-encoded response. + + Note that the "retain" directive is orthogonal to the "max-age" + directive. The "max-age" directive indicates how long a cache + entry remains fresh (i.e.,can be used without contacting the + origin server for revalidation); the "retain" directive is of + interest to a client AFTER the cache entry has become stale. + + In practice, the "Cache-Control" response-header field might already + be present, so the cost (in bytes) of sending this directive might be + smaller than these examples implies. + +8 Deltas and intermediate caches + + Although we have designed the delta-encoded responses so that they + will not be stored by naive proxy caches, if a proxy does understand + the delta mechanism, it might be beneficial for it to participate in + sending and receiving deltas. + + A proxy could participate in several independent ways: + + - In addition to forwarding a delta-encoded response, the proxy + might store it, and then use it to reply to a subsequent + request with a compatible "If-None-Match" field (i.e., one that + is either a superset of the corresponding field of the request + that first elicited the response, or one that includes the + "Delta-Base" value in the cached response), and with a + compatible "IM" response-header field (one that includes the + actual delta-encoding format used in the response.) Of course, + such uses are subject to all of the other HTTP rules concerning + the validity of cache entries. + + - In addition to forwarding a delta-encoded response, the proxy + might apply the delta to the appropriate entry in its own + cache, which could then be used for later responses (even from + non-delta-capable clients). + + - When the proxy receives a conditional request from a delta- + capable client, and the proxy has a complete copy of an up-to- + date ("fresh," in HTTP/1.1 terminology) response in its cache, + it could generate a delta locally and return it to the + requesting client. + + - When the proxy receives a request from a non-delta-capable + client, it might convert this into a delta request before + forwarding it to the server, and then (after applying a + + + +Mogul, et al. Standards Track [Page 27] + +RFC 3229 Delta encoding in HTTP January 2002 + + + resulting delta response to one of its own cache entries) it + would return a full-body response to the client (or a response + with status code 206 or 304, as appropriate). + + All of these optional techniques increase proxy software complexity, + and might increase proxy storage or CPU requirements. However, if + applied carefully, they should help to reduce the latencies seen by + end users, and load on the network. Generally, CPU speed and disk + costs are improving faster than network latencies, so we expect to + see increasing value available from complex proxy implementations. + +9 Digests for data integrity + + When a recipient reassembles a complete HTTP response from several + individual messages, it might be necessary to check the integrity of + the complete response. For example, the client's cache might be + corrupt, or the implementation of delta encoding (either at client or + server) might have a bug. + + HTTP/1.1 includes mechanisms for ensuring the integrity of individual + messages. A message may include a "Content-MD5" response header, + which provides an MD5 message digest of the body of the message (but + not the headers). The Digest Authentication mechanism [11] provides + a similar message-digest function, except that it includes certain + header fields. Neither of these mechanisms makes any provision for + covering a set of data transmitted over several messages, as would be + the case for the result of applying a delta-encoded response (or, for + that matter, a Range response). + + Data integrity for reassembled messages requires the introduction of + a new message header. Such a mechanism is proposed in a separate + document [24]. One might still want to use the Digest Authentication + mechanism, or something stronger, to protect delta messages against + tampering. + +10 Specification + + In this specification, the key words "MUST", "MUST NOT", "SHOULD", + "SHOULD NOT", and "MAY" are to be interpreted as described in RFC + 2119 [3]. + +10.1 Protocol parameter specifications + + This specification defines a new HTTP parameter type, an instance- + manipulation: + + + + + + +Mogul, et al. Standards Track [Page 28] + +RFC 3229 Delta encoding in HTTP January 2002 + + + instance-manipulation = token [imparams] + + imparams = ";" imparam-name [ "=" ( token | quoted-string ) ] + imparam-name = token + + Note that the imparam-name MUST NOT be "q", to avoid ambiguity with + the use of qvalues (see [10]). + + The set of instance-manipulation values is initially: + + - vcdiff + A delta using the "vcdiff" encoding format [19, 20]. + + - diffe + The output of the UNIX "diff -e" command [26]. + + - gdiff + The GDIFF encoding format [14]. + + - gzip + Same definition as the HTTP "gzip" content-coding. + + - deflate + Same definition as the HTTP "deflate" content-coding. + + - range + A token indicating that the result is partial content, as the + result of a range selection. + + - identity + A token used only in the A-IM header (not in the IM header), to + indicate whether or not the identity instance-manipulation is + acceptable. + + For convenience in the rest of this specification, we define a subset + of instance-manipulation values as delta-coding values: + + delta-coding = "vcdiff" | "diffe" | "gdiff" | token + + Future instance-manipulation values might also be included in this + list. + + + + + + + + + + +Mogul, et al. Standards Track [Page 29] + +RFC 3229 Delta encoding in HTTP January 2002 + + +10.2 IANA Considerations + + The Internet Assigned Numbers Authority (IANA) administers the name + space for instance-manipulation values. Values and their meaning + must be documented in an RFC or other peer-reviewed, permanent, and + readily available reference, in sufficient detail so that + interoperability between independent implementations is possible. + Subject to these constraints, name assignments are First Come, First + Served (see RFC 2434 [25]). + + This specification also inserts a new value in the IANA HTTP Status + Code Registry (see RFC 2817 [18]). See section 10.4.1 for the + specification of this code. + +10.3 Basic requirements for delta-encoded responses + + A server MAY send a delta-encoded response if all of these conditions + are true: + + 1. The server would be able to send a 200 (OK) response for the + request. + + 2. The client's request includes an A-IM header field listing at + least one delta-coding. + + 3. The client's request includes an If-None-Match header field + listing at least one valid entity tag for an instance of the + Request-URI (a "base instance"). + + A delta-encoded response: + + - MUST carry a status code of 226 (IM Used). + + - MUST include an IM header field listing, at least, the delta- + coding employed. + + - MAY include a Delta-Base header field listing the entity tag of + the base-instance. + +10.4 Status code specifications + + The following new status code is defined for HTTP. + + + + + + + + + +Mogul, et al. Standards Track [Page 30] + +RFC 3229 Delta encoding in HTTP January 2002 + + +10.4.1 226 IM Used + + The server has fulfilled a GET request for the resource, and the + response is a representation of the result of one or more instance- + manipulations applied to the current instance. The actual current + instance might not be available except by combining this response + with other previous or future responses, as appropriate for the + specific instance-manipulation(s). If so, the headers of the + resulting instance are the result of combining the headers from the + status-226 response and the other instances, following the rules in + section 13.5.3 of the HTTP/1.1 specification [10]. + + The request MUST have included an A-IM header field listing at least + one instance-manipulation. The response MUST include an Etag header + field giving the entity tag of the current instance. + + A response received with a status code of 226 MAY be stored by a + cache and used in reply to a subsequent request, subject to the HTTP + expiration mechanism and any Cache-Control headers, and to the + requirements in section 10.6. + + A response received with a status code of 226 MAY be used by a cache, + in conjunction with a cache entry for the base instance, to create a + cache entry for the current instance. + +10.5 Header specifications + + The following headers are defined, for use as entity-headers. (Due + to the terminological confusion discussed in section 3, some entity- + headers are more properly associated with instances than with + entities.) + +10.5.1 Delta-Base + + The Delta-Base entity-header field is used in a delta-encoded + response to specify the entity tag of the base instance. + + Delta-Base = "Delta-Base" ":" entity-tag + + A Delta-Base header field MUST be included in a response with an IM + header that includes a delta-coding, if the request included more + than one entity tag in its If-None-Match header field. + + Any response with an IM header that includes a delta-coding MAY + include a Delta-Base header. + + + + + + +Mogul, et al. Standards Track [Page 31] + +RFC 3229 Delta encoding in HTTP January 2002 + + + We are not aware of other cases where a delta-encoded response + MUST or SHOULD include a Delta-Base header, but we have not done + an exhaustive or formal analysis. Implementors might be wise to + include a Delta-Base header in every delta-encoded response. + + A cache or proxy that receives a delta-encoded response that lacks a + Delta-base header MAY add a Delta-Base header whose value is the + entity tag given in the If-None-Match field of the request (but only + if that field lists exactly one entity tag). + +10.5.2 IM + + The IM response-header field is used to indicate the instance- + manipulations, if any, that have been applied to the instance + represented by the response. Typical instance manipulations include + delta encoding and compression. + + IM = "IM" ":" #(instance-manipulation) + + Instance-manipulations are defined in section 10.1. + + As a special case, if the instance-manipulations include both range + selection and at least one other non-identity instance-manipulation, + the IM header field MUST be used to indicate the order in which all + of these instance-manipulations, including range selection, were + applied. If the IM header lists the "range" instance-manipulation, + the response MUST include either a Content-Range header or a + multipart/byteranges Content-Type in which each part contains a + Content-Range header. (See section 10.10 for specific discussion of + combining delta encoding and multipart/byteranges.) + + Responses that include an IM header MUST carry a response status code + of 226 (IM Used), as specified in section 10.4.1. + + The server SHOULD omit the IM header if it would list only the + "range" instance-manipulation. Such responses would normally be sent + with response status code 206 (Partial Content), as specified by + HTTP/1.1 [10]. + + Examples of the use of the IM header include: + + IM: vcdiff + + This example indicates that the entity-body is a delta encoding of + the instance, using the vcdiff encoding. + + IM: diffe, deflate, range + + + + +Mogul, et al. Standards Track [Page 32] + +RFC 3229 Delta encoding in HTTP January 2002 + + + This example indicates that the instance has first been delta-encoded + using the diffe encoding, then the result of that has been compressed + using deflate, and finally one or more ranges of that compressed + encoding have been selected. + + IM: range, vcdiff + + This example indicates that one or more ranges of the instance have + been selected, and the result has then been delta encoded against + identical ranges of a previous base instance. + + A cache using a response received in reply to one request to reply to + a subsequent request MUST follow the rules in section 10.6 if the + cached response includes an IM header field. + +10.5.3 A-IM + + The A-IM request-header field is similar to Accept, but restricts the + instance-manipulations (section 10.1) that are acceptable in the + response. As specified in section 10.5.2, a response may be the + result of applying multiple instance-manipulations. + + A-IM = "A-IM" ":" #( instance-manipulation + [ ";" "q" "=" qvalue ] ) + + When an A-IM request-header field includes one or more delta-coding + values, the request MUST contain an If-None-Match header field, + listing one or more entity tags from prior responses for the + request-URI. + + A server tests whether an instance-manipulation (among the ones it is + capable of employing) is acceptable, according to a given A-IM header + field, using these rules: + + 1. If the instance-manipulation is listed in the A-IM field, then + it is acceptable, unless it is accompanied by a qvalue of 0. + (As defined in section 3.9 of the HTTP/1.1 specification [10], + a qvalue of 0 means "not acceptable.") A server MUST NOT use a + non-identity instance-manipulation for a response unless the + instance-manipulation is listed in an A-IM header in the + request. + + 2. If multiple but incompatible instance-manipulations are + acceptable, then the acceptable instance-manipulation with the + highest non-zero qvalue is preferred. + + + + + + +Mogul, et al. Standards Track [Page 33] + +RFC 3229 Delta encoding in HTTP January 2002 + + + 3. The "identity" instance-manipulation is always acceptable, + unless specifically refused because the A-IM field includes + "identity;q=0". + + If an A-IM field is present in a request, and if the server cannot + send a response which is acceptable according to the A-IM header, + then the server SHOULD send an error response with the 406 (Not + Acceptable) status code. + + If a response uses more than one instance-manipulation, the + instance-manipulations MUST be applied in the order in which they + appear in the A-IM request-header field. + + The server's choice about whether to apply an instance-manipulation + SHOULD be independent of its choice to apply any subsequent two-input + instance-manipulations to the response. (Two-input instance- + manipulations include delta-codings, because they take two different + values as input. Compression and "range" instance-manipulations take + only one input. Other instance-manipulations may be defined in the + future.) + + Note: the intent of this requirement is to prevent the server from + generating a delta-encoded response that the client can only + decode by first applying an instance-manipulation encoding to its + cached base instance. A server implementor might wish to consider + what the client would logically have in its cache, when deciding + which instance-manipulations to apply prior to a delta-coding. + + Examples: + + A-IM: vcdiff, gdiff + + This example means that the client will accept a delta encoding in + either vcdiff or gdiff format. + + A-IM: vcdiff, gdiff;q=0.3 + + This example means that the client will accept a delta encoding in + either vcdiff or gdiff format, but prefers the vcdiff format. + + A-IM: vcdiff, diffe, gzip + + This example means that the client will accept a delta encoding in + either vcdiff or diffe format, and will accept the output of the + delta encoding compressed with gzip. It also means that the client + will accept a gzip compression of the instance, without any delta + encoding, because A-IM provides no way to insist that gzip be used + only if diffe is used. + + + +Mogul, et al. Standards Track [Page 34] + +RFC 3229 Delta encoding in HTTP January 2002 + + + It is left to the server implementor to choose useful combinations of + acceptable instance-manipulations (for example, following diffe by + gzip is useful, but following vcdiff by gzip probably is not useful). + +10.6 Caching rules for 226 responses + + When a client or proxy receives a 226 (IM Used) response, it MAY use + this response to create a cache entry in three ways: + + 1. It MAY decode all of the instance-manipulations to recover the + original instance, and store that instance in the cache. In + this case, the recovered instance is stored as a status-200 + response, and MUST be used in accordance with the normal HTTP + caching rules. + + 2. It MAY decode all of the instance-manipulations except for + range selection(s), and store the result in the cache. In this + case, the result is stored as a status-206 response, and MUST + be used in accordance with the normal HTTP caching rules for + Partial Content. + + 3. It MAY store the status-226 (IM Used) response as a cache + entry. + + A status-226 cache entry MUST NOT be used in response to a subsequent + request under any of these conditions (a cache that never stores + status-226 responses may ignore these tests): + + 1. If any of the instance-manipulation values from the IM header + field in the cached response do not appear in the subsequent + request's A-IM header field. The comparison between the + headers is done using an exact match on each instance- + manipulation value including any associated imparams values + (see section 10.1). + + 2. If the order of instance-manipulation values appearing in the + cached IM header field differs from the order of that set of + instance-manipulations in the A-IM header field of the + subsequent request. + + 3. If the cache implementation is not aware of, or is not at least + conditionally compliant with, the specification of any of the + instance-manipulation values in the cached IM header field. + + + + + + + + +Mogul, et al. Standards Track [Page 35] + +RFC 3229 Delta encoding in HTTP January 2002 + + + Note: This rule allows for extending the set of instance- + manipulations without causing deployed cache implementations to + commit errors. The specification of new instance-manipulations + may include additional caching rules to improve cache-hit rates + in cognizant implementations. + + 4. If any of the instance-manipulation values in the cached IM + header field is a delta-coding, and the cache entry includes a + Delta-Base header field, and that Delta-Base entity tag is not + one of the entity tags listed in an If-None-Match header field + of the subsequent request. + + 5. If any of the instance-manipulation values in the cached IM + header field is a delta-coding, the cache entry does not + include a Delta-Base header field, and the If-None-Match header + field of the request that led to that cache entry does not + match the If-None-Match header field of the subsequent request. + + If the IM header field of the cached response includes the "range" + instance-manipulation, then a status-226 cache entry MUST NOT be used + in response to a subsequent request if the cached response is + inconsistent with the Range header field value(s) in the request, as + would be the case for a cached 206 (Partial Content) response. + + Note: we know of no existing, published formal specification for + deciding if a cached status-206 response is consistent with a + subsequent request. We believe that either of these conditions is + sufficient: + + 1. The ranges specified in the headers of the request that led + to the cached response are the same as specified in the + headers of the subsequent request. + + 2. The ranges specified in the cached response are the same as + specified in the headers of the subsequent request. + + Further analysis might be necessary. + +10.7 Rules for deltas in the presence of content-codings + + The use of delta encoding with content-encoded instances adds some + slight complexity. When a client (perhaps a proxy) has received a + delta encoded response, either or both of that new response and a + cached previous response may have non-identity content-codings. We + specify rules for the server and client, to prevent situations where + the client is unable to make sense of the server's response. + + + + + +Mogul, et al. Standards Track [Page 36] + +RFC 3229 Delta encoding in HTTP January 2002 + + +10.7.1 Rules for generating deltas in the presence of content-codings + + When a server generates a delta-encoded response, the list of + content-codings the server uses (i.e., the value of the response's + Content-Encoding header field) SHOULD be a prefix of the list of + content-codings the server would have used had it not generated a + delta encoding. + + This requirement allows a client receiving a delta-encoded response + to apply the delta to a cached base instance without having to apply + any content-codings during the process (although the client might, of + course, be required to decode some content-codings). + +10.7.2 Rules for applying deltas in the presence of content-codings + + When a client receives a delta response with one or more non-identity + content codings: + + 1. If both the new (delta) response and the cached response + (instance) have exactly the same set of content-codings, the + client applies the delta response to the cached response + without removing the content-codings from either response. + + 2. If the new (delta) response and the cached response have a + different set of content-codings, before applying the delta the + client decodes one or more content-codings from the cached + response, until the result has the same set of content-codings + as the delta response. + + 3. If a proxy or cache is forwarding the result of applying the + delta response to a cached base instance response, or later + forwards this result from a cache entry, the forwarded response + MUST carry the same Content-Encoding header field as the new + (delta) response (and so it must be content-encoded as + indicated by that header field). + + The intent of these rules (and in particular, rule #3) is that the + results are always consistent with the rule that the entity tag is + associated with the result of the content-coding, and that any + recipient after the application of the delta-coding receives exactly + the same response it would have received as a status-200 response + from the origin server (without any delta-coding). + + + + + + + + + +Mogul, et al. Standards Track [Page 37] + +RFC 3229 Delta encoding in HTTP January 2002 + + +10.7.3 Examples for using A-IM, IM, and content-codings + + Suppose a client, with an empty cache, sends this request: + + GET /foo.html HTTP/1.1 + Host: example.com + Accept-encoding: gzip + + and the origin server responds with: + + HTTP/1.1 200 OK + Date: Wed, 24 Dec 1997 14:00:00 GMT + Etag: "abc" + Content-encoding: gzip + + We will use the notation URI;entity-tag to denote specific instances, + so this response would cause the client to store in its cache the + entity GZIP(foo.html;"abc"). + + Then suppose that the client, a minute later, issues this conditional + request: + + GET /foo.html HTTP/1.1 + Host: example.com + If-none-match: "abc" + Accept-encoding: gzip + A-IM: vcdiff + + If the server is able to generate a delta-encoded response, it might + choose one of two alternatives. The first is to compute the delta + from the compressed instances (although this might not yield the most + efficient coding): + + HTTP/1.1 226 IM Used + Date: Wed, 24 Dec 1997 14:01:00 GMT + Etag: "def" + Delta-base: "abc" + Content-encoding: gzip + IM: vcdiff + + The body of this response would be the result of + VCDIFF_DELTA(GZIP(foo.html;"abc"), GZIP(foo.html;"def")). The client + would store as a new cache entry the entity GZIP(foo.html;"def"), + after recovering that entity by applying the delta to its previous + cache entry. + + The server's other alternative would be to compute the delta from the + uncompressed values, returning: + + + +Mogul, et al. Standards Track [Page 38] + +RFC 3229 Delta encoding in HTTP January 2002 + + + HTTP/1.1 226 IM Used + Date: Wed, 24 Dec 1997 14:01:00 GMT + Delta-base: "abc" + Etag: "ghi" + IM: vcdiff + + The body of this response would be the result of + VCDIFF_DELTA(GUNZIP(GZIP(foo.html;"abc")), foo.html;"ghi"), or more + simply VCDIFF_DELTA(foo.html;"abc", foo.html;"ghi"). The client + would store as a new cache entry the entity foo.html;"ghi" (i.e., + without any content-coding), after recovering that entity by applying + the delta to its previous cache entry. + + Note that the new value of foo.html (at 14:01:00 GMT) without the + gzip content-coding must have a different entity tag from the + compressed instance of the same underlying file. + + The client's second request might have been: + + GET /foo.html HTTP/1.1 + Host: example.com + If-none-match: "abc" + Accept-encoding: gzip + A-IM: diffe, gzip + + The client lists gzip in both the Accept-Encoding and A-IM headers, + because if the server does not support delta encoding, the client + would at least like to achieve the benefits of compression (as a + content-coding). However, if the server does support the diffe + delta-coding, the client would like the result to be compressed, and + this must be done as an instance-manipulation. + + A server that does support diffe might reply: + + HTTP/1.1 226 IM Used + Date: Wed, 24 Dec 1997 14:01:00 GMT + Delta-base: "abc" + Etag: "ghi" + IM: diffe, gzip + + The body of this response would be the result of + GZIP(DIFFE_DELTA(GUNZIP(GZIP(foo.html;"abc")), foo.html;"ghi")), or + more simply GZIP(DIFFE_DELTA(foo.html;"abc", foo.html;"ghi")). + Because the gzip compression is, in this case, an instance- + manipulation and not a content-coding, it is not retained when the + reassembled response is stored or forwarded, so the client would + store as a new cache entry the entity foo.html;"ghi" (without any + content-coding or compression). + + + +Mogul, et al. Standards Track [Page 39] + +RFC 3229 Delta encoding in HTTP January 2002 + + +10.8 New Cache-Control directives + + We define two new cache-directives (see section 14.9 of RFC 2616 [10] + for the specification of cache-directive). + +10.8.1 Retain directive + + The set of cache-response-directive values is augmented to include + the retain directive. + + cache-response-directive = ... + | "retain" [ "=" delta-seconds ] + + A retain directive is always a "hint" from a server to a client; it + never specifies a mandatory action for the recipient. + + The presence of a retain directive indicates that a delta-capable + client ought to retain the instance in the response in its cache, + space permitting, and ought to use the corresponding entity tag in a + future request for a delta-encoded response. I.e., the server is + likely to provide delta-encoded responses using the corresponding + instance as a base instance. By implication, if a client has + retrieved and cached several instances of a resource, some of which + are marked with "retain" and some not, then there is no point in + caching the instances not marked with "retain". + + If the retain directive includes a delta-seconds value, then the + server is likely to stop using the corresponding instance as a base + instance after the specified number of seconds. A client ought not + use the corresponding entity tag in a future request for a delta- + encoded response after that interval ends. The interval is measured + from the time that the response is generated, so a client ought to + include the response's Age in its calculations. + + If the retain directive includes a delta-seconds value of zero, a + client SHOULD NOT use the corresponding entity tag in a future + request for a delta-encoded response. + + Note: We recommend that server implementors consider the bandwidth + implications of sending the "retain=0" directive to clients or + proxies that might not have the ability to make use of it. + +10.8.2 IM directive + + The set of cache-response-directive values is augmented to include + the im directive. + + + + + +Mogul, et al. Standards Track [Page 40] + +RFC 3229 Delta encoding in HTTP January 2002 + + + cache-response-directive = ... + | "im" + + A cache that complies with the specification for the IM header, the + A-IM header, and the 226 response-status code SHOULD ignore a no- + store cache-directive if an im directive is present in the same + response. All other implementations MUST ignore the im directive + (i.e., MUST observe a no-store directive, if present). + +10.9 Use of compression with delta encoding + + The application of data compression to the diffe and gdiff delta + codings has been shown to greatly reduce the size of the resulting + message bodies, in many cases. (The vcdiff coding, on the other + hand, is inherently compressed and does not benefit from further + compression.) Therefore, it is strongly recommended that + implementations that support the diffe and/or gdiff delta codings + also support the gzip and/or deflate compression codings. (The + deflate coding provides a more compact result.) However, this is not + a requirement for the use of delta encoding, primarily because the + CPU-time costs associated with compression and decompression may be + excessive in some environments. + + A client that supports both delta encoding and compression as + instance-manipulations signals this by, for example + + A-IM: diffe, deflate + + The ordering rule stated in section 10.5.3 requires, if the server + uses both instance-manipulations in the response, that compression be + applied to the result of the delta encoding, rather than vice versa. + I.e., the response in this case would include + + IM: diffe, deflate + + Note that a client might accept compression either as a content- + coding or as an instance-manipulation. For example: + + Accept-Encoding: gzip + A-IM: gzip, gdiff + + In this example, the server may apply the gzip compression, either as + a content-coding or as an instance-manipulation, before delta + encoding. Remember that the entity tag is assigned after content- + coding but before instance-manipulation, so this choice does affect + the semantics of delta encoding. + + + + + +Mogul, et al. Standards Track [Page 41] + +RFC 3229 Delta encoding in HTTP January 2002 + + +10.10 Delta encoding and multipart/byteranges + + A client may request multiple, non-contiguous byte ranges in a single + request. The server's response uses the "multipart/byteranges" media + type (section 19.2 of [10]) to convey multiple ranges in a response. + If a multipart/byteranges response is delta encoded (i.e, uses a + delta-coding as an instance-manipulation), the delta-related headers + are associated with the entire response, not with the individual + parts. (This is because there is only one base instance and one + current instance involved.) A delta-encoded response with multiple + ranges MUST use the same delta-coding for all of the ranges. + + If a server chooses to use a delta encoding for a + multipart/byteranges response, it MUST generate a response in + accordance with the following rules. + + When a multipart/byteranges response uses a delta-coding prior to a + range selection, the A-IM and IM header fields list the delta-coding + before the "range" literal. (Recall that this is the approach taken + to obtain a partial response after a premature termination of a + message transmission.) The server firsts generates a sequence of + bytes representing the difference (delta) between the base instance + and the current instance, then selects the specified ranges of bytes, + and transmits each such range in a part of the multipart/byteranges + media type. + + When a multipart/byteranges response uses a delta-coding after a + range selection, the A-IM and IM header fields list the delta-coding + after the "range" literal. (Recall that this is the approach taken + to obtain an updated version just of selected sections of an + instance.) The server first selects the specified ranges from the + current instance, and also selects the same specified ranges from the + base instance. (Some of these selected ranges might be the empty + sequence, if the instance is not long enough.) The server then + generates the individual differences (deltas) between the pairs of + ranges, and transmits each such difference in a part of the + multipart/byteranges media type. + +11 Quantifying the protocol overhead + + The proposed protocol changes increase the size of the HTTP message + headers slightly. In the simplest case, a conditional request (i.e., + one for a URI for which the client already has a cache entry) would + include one more header, e.g.: + + A-IM:vcdiff + + + + + +Mogul, et al. Standards Track [Page 42] + +RFC 3229 Delta encoding in HTTP January 2002 + + + This is about 13 extra bytes. A recent study [23] reports mean + request sizes from two different traces of 281 and 306 bytes, so the + net increase in request size would be between 4% and 5%. + + Because a client must have an existing cache entry to use as a base + for a delta-encoded response, it would never send "A-IM: vcdiff" (or + listing other delta encoding formats) for its unconditional requests. + The same study showed that at least 46% of the requests in lengthy + traces were for URLs not seen previously in the trace; this means + that no more than about half of typical client requests could be + conditional (and the actual fraction is likely to be smaller, given + the finite size of real caches). + + The study also showed that 64% of the responses in a lengthy trace + were for image content-types (GIF and JPEG). As noted in section 6, + we do not currently know of a delta-encoding format suitable for such + image types. Unless a client did support such a delta-encoding + format, it would presumably not ask for a delta when making a + conditional request for image content-types. + + Taken together, these factors suggest that the mean increase in + request header size would be much less than 5%, and probably below + 1%. + + Delta-encoded responses carry slightly longer headers. In the + simplest case, a response carries one more header, e.g.: + + IM:vcdiff + + This is about 11 bytes. Other headers (such as "Delta-Base") might + also be included. However, none of these extra headers would be + included except in cases where a delta encoding is actually employed, + and the sender of the response can avoid sending a delta encoding if + this results in a net increase in response size. Thus, a delta- + encoded response should never be larger than a regular response for + the same request. + + Simulations suggest that, when delta encoding pays off at all, it + saves several thousand bytes [23]. Thus, adding a few dozen bytes to + the response headers should almost never obviate the savings in the + message-body size. + + Finally, the use of the "retain" Cache-Control directive might cause + some additional overhead. Some server heuristics might be successful + in limiting the use of these headers to situations where they would + probably optimize future responses. Neither of these headers is + necessary for the simpler uses of delta encoding. + + + + +Mogul, et al. Standards Track [Page 43] + +RFC 3229 Delta encoding in HTTP January 2002 + + +12 Security Considerations + + We are not aware of any aspects of the basic delta encoding mechanism + that affect the existing security considerations for the HTTP/1.1 + protocol. + +13 Acknowledgements + + Phong Vo has provided a great deal of guidance in the choice of delta + encoding algorithms and formats. Issac Goldstand and Mike Dahlin + provided a number of useful comments on the specification. Dave + Kristol suggested many textual corrections. + +14 Intellectual Property Rights + + The IETF has been notified of intellectual property rights claimed in + regard to some or all of the specification contained in this + document. For more information consult the online list of claimed + rights, at . + + The IETF takes no position regarding the validity or scope of any + intellectual property or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; neither does it represent that it + has made any effort to identify any such rights. Information on the + IETF's procedures with respect to rights in standards-track and + standards-related documentation can be found in BCP 11. Copies of + claims of rights made available for publication and any assurances of + licenses to be made available, or the result of an attempt made to + obtain a general license or permission for the use of such + proprietary rights by implementors or users of this specification can + be obtained from the IETF Secretariat. + +15 References + + 1. Gaurav Banga, Fred Douglis, and Michael Rabinovich. Optimistic + Deltas for WWW Latency Reduction. Proc. 1997 USENIX Technical + Conference, Anaheim, CA, January, 1997, pp. 289-303. + + 2. Berners-Lee, T., Fielding, R. and H. Frystyk, "Hypertext Transfer + Protocol -- HTTP/1.0", RFC 1945, May 1996. + + 3. Bradner, S., "Key words for use in RFCs to Indicate Requirement + Levels", BCP 14, RFC 2119, March 1997. + + + + + + +Mogul, et al. Standards Track [Page 44] + +RFC 3229 Delta encoding in HTTP January 2002 + + + 4. Edith Cohen, Balachander Krishnamurthy, and Jennifer Rexford. + Improving End-to-End Performance of the Web Using Server Volumes + and Proxy Filters. Proc. SIGCOMM '98, September, 1998, pp. 241- + 253. + + 5. Deutsch, P., "GZIP file format specification version 4.3", RFC + 1952, May 1996. + + 6. Deutsch, P., "DEFLATE Compressed Data Format Specification + version 1.3", RFC 1951, May 1996. + + 7. Deutsch, P. and J-L. Gailly, "ZLIB Compressed Data Format + Specification version 3.3", RFC 1950, May 1996. + + 8. Fred Douglis, Anja Feldmann, Balachander Krishnamurthy, and + Jeffrey Mogul. Rate of Change and Other Metrics: a Live Study + of the World Wide Web. Proc. Symposium on Internet Technologies + and Systems, USENIX, Monterey, CA, December, 1997, pp. 147-158. + + 9. Fielding, R., Gettys, J., Mogul, J., Nielsen, H. and T. Berners- + Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068, January + 1997. + + 10. Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L., + Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -- + HTTP/1.1", RFC 2616, June 1999. + + 11. Franks, J., Hallam-Baker, P., Hostetler, J., Leach, P., Luotonen, + A., Luotonen, L. and L. Stewart, "HTTP Authentication: Basic and + Digest Access Authnetication", RFC 2617, June 1999. + + 12. Freed, N. and N. Borenstein, "Multipurpose Internet Mail + Extensions (MIME) Part One: Format of Internet Message Bodies", + RFC 2045, November 1996. + + 13. Arthur van Hoff, John Giannandrea, Mark Hapner, Steve Carter, and + Milo Medin. The HTTP Distribution and Replication Protocol. + Technical Report NOTE-DRP, World Wide Web Consortium, August, + 1997. + + 14. Arthur van Hoff and Jonathan Payne. Generic Diff Format + Specification. Technical Report NOTE-GDIFF, World Wide Web + Consortium, August, 1997. + + + + + + + + +Mogul, et al. Standards Track [Page 45] + +RFC 3229 Delta encoding in HTTP January 2002 + + + 15. Barron C. Housel and David B. Lindquist. WebExpress: A System + for Optimizing Web Browsing in a Wireless Environment. Proc. 2nd + Annual Intl. Conf. on Mobile Computing and Networking, ACM, Rye, + New York, November, 1996, pp. 108-116. + + 16. James J. Hunt, Kiem-Phong Vo, and Walter F. Tichy. An Empirical + Study of Delta Algorithms. IEEE Soft. Config. and Maint. + Workshop, 1996. + + 17. Jacobson, V., "Compressing TCP/IP Headers for Low-Speed Serial + Links", RFC 1144, February 1990. + + 18. Khare, R. and S. Lawrence, "Upgrading to TLS Within HTTP/1.1", + RFC 2817, May 2000. + + 19. David G. Korn and Kiem-Phong Vo. A Generic Differencing and + Compression Data Format. Technical Report HA1630000-021899-02TM, + AT&T Labs - Research, February, 1999. + + 20. Korn, D. and K. Vo, "The VCDIFF Generic Differencing and + Compression Data Format", Work in Progress. + + 21. Merriam-Webster. Webster's Seventh New Collegiate Dictionary. + G. & C. Merriam Co., Springfield, MA, 1963. + + 22. Jeffrey C. Mogul. Hinted caching in the Web. Proc. Seventh ACM + SIGOPS European Workshop, Connemara, Ireland, September, 1996, + pp. 103-108. + + 23. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander + Krishnamurthy. Potential benefits of delta encoding and data + compression for HTTP. Research Report 97/4, DECWRL, July, 1997. + + 24. Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", RFC 3230, + January 2002. + + 25. Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA + Considerations Section in RFCs", BCP 26, RFC 2434, October 1998. + + 26. The Open Group. The Single UNIX Specification, Version 2 - 6 Vol + Set for UNIX 98. Document number T912, The Open Group, February, + 1997. + + + + + + + + + +Mogul, et al. Standards Track [Page 46] + +RFC 3229 Delta encoding in HTTP January 2002 + + + 27. W. Tichy. "RCS - A System For Version Control". Software - + Practice and Experience 15, 7 (July 1985), 637-654. + + 28. Andrew Tridgell and Paul Mackerras. The rsync algorithm. + Technical Report TR-CS-96-05, Department of Computer Science, + Australian National University, June, 1996. + + 29. Stephen Williams. Personal communication. + http://ei.cs.vt.edu/~williams/DIFF/prelim.html. + + 30. Stephen Williams, Marc Abrams, Charles R. Standridge, Ghaleb + Abdulla, and Edward A. Fox. Removal Policies in Network Caches + for World-Wide Web Documents. Proc. SIGCOMM '96, Stanford, CA, + August, 1996, pp. 293-305. + +16 Authors' addresses + + Jeffrey C. Mogul + Western Research Laboratory + Compaq Computer Corporation + 250 University Avenue + Palo Alto, California, 94305, U.S.A. + + Phone: 1 650 617 3304 (email preferred) + EMail: JeffMogul@acm.org + + Balachander Krishnamurthy + AT&T Labs - Research + 180 Park Ave, Room D-229 + Florham Park, NJ 07932-0971, U.S.A. + + EMail: bala@research.att.com + + Fred Douglis + AT&T Labs - Research + 180 Park Ave, Room B-137 + Florham Park, NJ 07932-0971, U.S.A. + + Phone: 1 973 360-8775 + EMail: douglis@research.att.com + + Anja Feldmann + University of Saarbruecken, Germany, + Computer Science Department + Im Stadtwald, Geb. 36.1, Zimmer 310 + D-66123 Saarbruecken, Germany + + EMail: anja@cs.uni-sb.de + + + +Mogul, et al. Standards Track [Page 47] + +RFC 3229 Delta encoding in HTTP January 2002 + + + Yaron Y. Goland + + Email: yaron@goland.org + + Arthur van Hoff + Marimba, Inc. + 440 Clyde Avenue + Mountain View, CA 94043, U.S.A. + + Phone: 1 650 930 5283 + EMail: avh@marimba.com + + Daniel M. Hellerstein + Economic Research Service, USDA + 1909 Franwall Ave, Wheaton MD 20902 + + Phone: 1 202 694-5613 or 1 301 649-4728 + EMail: danielh@crosslink.net or webmaster@srehttp.org + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Mogul, et al. Standards Track [Page 48] + +RFC 3229 Delta encoding in HTTP January 2002 + + +17 Full Copyright Statement + + Copyright (C) The Internet Society (2002). All Rights Reserved. + + This document and translations of it may be copied and furnished to + others, and derivative works that comment on or otherwise explain it + or assist in its implementation may be prepared, copied, published + and distributed, in whole or in part, without restriction of any + kind, provided that the above copyright notice and this paragraph are + included on all such copies and derivative works. However, this + document itself may not be modified in any way, such as by removing + the copyright notice or references to the Internet Society or other + Internet organizations, except as needed for the purpose of + developing Internet standards in which case the procedures for + copyrights defined in the Internet Standards process must be + followed, or as required to translate it into languages other than + English. + + The limited permissions granted above are perpetual and will not be + revoked by the Internet Society or its successors or assigns. + + This document and the information contained herein is provided on an + "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING + TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING + BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION + HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF + MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + + + + + + + + + + + + + +Mogul, et al. Standards Track [Page 49] + -- cgit v1.2.3