summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc1071.txt
diff options
context:
space:
mode:
authorThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
committerThomas Voss <mail@thomasvoss.com> 2024-11-27 20:54:24 +0100
commit4bfd864f10b68b71482b35c818559068ef8d5797 (patch)
treee3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc1071.txt
parentea76e11061bda059ae9f9ad130a9895cc85607db (diff)
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc1071.txt')
-rw-r--r--doc/rfc/rfc1071.txt1417
1 files changed, 1417 insertions, 0 deletions
diff --git a/doc/rfc/rfc1071.txt b/doc/rfc/rfc1071.txt
new file mode 100644
index 0000000..3b94108
--- /dev/null
+++ b/doc/rfc/rfc1071.txt
@@ -0,0 +1,1417 @@
+
+
+
+
+Network Working Group R. Braden
+Request for Comments: 1071 ISI
+ D. Borman
+ Cray Research
+ C. Partridge
+ BBN Laboratories
+ September 1988
+
+
+ Computing the Internet Checksum
+
+
+Status of This Memo
+
+ This memo summarizes techniques and algorithms for efficiently
+ computing the Internet checksum. It is not a standard, but a set of
+ useful implementation techniques. Distribution of this memo is
+ unlimited.
+
+1. Introduction
+
+ This memo discusses methods for efficiently computing the Internet
+ checksum that is used by the standard Internet protocols IP, UDP, and
+ TCP.
+
+ An efficient checksum implementation is critical to good performance.
+ As advances in implementation techniques streamline the rest of the
+ protocol processing, the checksum computation becomes one of the
+ limiting factors on TCP performance, for example. It is usually
+ appropriate to carefully hand-craft the checksum routine, exploiting
+ every machine-dependent trick possible; a fraction of a microsecond
+ per TCP data byte can add up to a significant CPU time savings
+ overall.
+
+ In outline, the Internet checksum algorithm is very simple:
+
+ (1) Adjacent octets to be checksummed are paired to form 16-bit
+ integers, and the 1's complement sum of these 16-bit integers is
+ formed.
+
+ (2) To generate a checksum, the checksum field itself is cleared,
+ the 16-bit 1's complement sum is computed over the octets
+ concerned, and the 1's complement of this sum is placed in the
+ checksum field.
+
+ (3) To check a checksum, the 1's complement sum is computed over the
+ same set of octets, including the checksum field. If the result
+ is all 1 bits (-0 in 1's complement arithmetic), the check
+ succeeds.
+
+ Suppose a checksum is to be computed over the sequence of octets
+
+
+
+Braden, Borman, & Partridge [Page 1]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ A, B, C, D, ... , Y, Z. Using the notation [a,b] for the 16-bit
+ integer a*256+b, where a and b are bytes, then the 16-bit 1's
+ complement sum of these bytes is given by one of the following:
+
+ [A,B] +' [C,D] +' ... +' [Y,Z] [1]
+
+ [A,B] +' [C,D] +' ... +' [Z,0] [2]
+
+ where +' indicates 1's complement addition. These cases
+ correspond to an even or odd count of bytes, respectively.
+
+ On a 2's complement machine, the 1's complement sum must be
+ computed by means of an "end around carry", i.e., any overflows
+ from the most significant bits are added into the least
+ significant bits. See the examples below.
+
+ Section 2 explores the properties of this checksum that may be
+ exploited to speed its calculation. Section 3 contains some
+ numerical examples of the most important implementation
+ techniques. Finally, Section 4 includes examples of specific
+ algorithms for a variety of common CPU types. We are grateful
+ to Van Jacobson and Charley Kline for their contribution of
+ algorithms to this section.
+
+ The properties of the Internet checksum were originally
+ discussed by Bill Plummer in IEN-45, entitled "Checksum Function
+ Design". Since IEN-45 has not been widely available, we include
+ it as an extended appendix to this RFC.
+
+ 2. Calculating the Checksum
+
+ This simple checksum has a number of wonderful mathematical
+ properties that may be exploited to speed its calculation, as we
+ will now discuss.
+
+
+ (A) Commutative and Associative
+
+ As long as the even/odd assignment of bytes is respected, the
+ sum can be done in any order, and it can be arbitrarily split
+ into groups.
+
+ For example, the sum [1] could be split into:
+
+ ( [A,B] +' [C,D] +' ... +' [J,0] )
+
+ +' ( [0,K] +' ... +' [Y,Z] ) [3]
+
+
+
+
+
+
+
+Braden, Borman, & Partridge [Page 2]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ (B) Byte Order Independence
+
+ The sum of 16-bit integers can be computed in either byte order.
+ Thus, if we calculate the swapped sum:
+
+ [B,A] +' [D,C] +' ... +' [Z,Y] [4]
+
+ the result is the same as [1], except the bytes are swapped in
+ the sum! To see why this is so, observe that in both orders the
+ carries are the same: from bit 15 to bit 0 and from bit 7 to bit
+ 8. In other words, consistently swapping bytes simply rotates
+ the bits within the sum, but does not affect their internal
+ ordering.
+
+ Therefore, the sum may be calculated in exactly the same way
+ regardless of the byte order ("big-endian" or "little-endian")
+ of the underlaying hardware. For example, assume a "little-
+ endian" machine summing data that is stored in memory in network
+ ("big-endian") order. Fetching each 16-bit word will swap
+ bytes, resulting in the sum [4]; however, storing the result
+ back into memory will swap the sum back into network byte order.
+
+ Byte swapping may also be used explicitly to handle boundary
+ alignment problems. For example, the second group in [3] can be
+ calculated without concern to its odd/even origin, as:
+
+ [K,L] +' ... +' [Z,0]
+
+ if this sum is byte-swapped before it is added to the first
+ group. See the example below.
+
+ (C) Parallel Summation
+
+ On machines that have word-sizes that are multiples of 16 bits,
+ it is possible to develop even more efficient implementations.
+ Because addition is associative, we do not have to sum the
+ integers in the order they appear in the message. Instead we
+ can add them in "parallel" by exploiting the larger word size.
+
+ To compute the checksum in parallel, simply do a 1's complement
+ addition of the message using the native word size of the
+ machine. For example, on a 32-bit machine we can add 4 bytes at
+ a time: [A,B,C,D]+'... When the sum has been computed, we "fold"
+ the long sum into 16 bits by adding the 16-bit segments. Each
+ 16-bit addition may produce new end-around carries that must be
+ added.
+
+ Furthermore, again the byte order does not matter; we could
+ instead sum 32-bit words: [D,C,B,A]+'... or [B,A,D,C]+'... and
+ then swap the bytes of the final 16-bit sum as necessary. See
+ the examples below. Any permutation is allowed that collects
+
+
+
+Braden, Borman, & Partridge [Page 3]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ all the even-numbered data bytes into one sum byte and the odd-
+ numbered data bytes into the other sum byte.
+
+
+ There are further coding techniques that can be exploited to speed up
+ the checksum calculation.
+
+ (1) Deferred Carries
+
+ Depending upon the machine, it may be more efficient to defer
+ adding end-around carries until the main summation loop is
+ finished.
+
+ One approach is to sum 16-bit words in a 32-bit accumulator, so
+ the overflows build up in the high-order 16 bits. This approach
+ typically avoids a carry-sensing instruction but requires twice
+ as many additions as would adding 32-bit segments; which is
+ faster depends upon the detailed hardware architecture.
+
+ (2) Unwinding Loops
+
+ To reduce the loop overhead, it is often useful to "unwind" the
+ inner sum loop, replicating a series of addition commands within
+ one loop traversal. This technique often provides significant
+ savings, although it may complicate the logic of the program
+ considerably.
+
+ (3) Combine with Data Copying
+
+ Like checksumming, copying data from one memory location to
+ another involves per-byte overhead. In both cases, the
+ bottleneck is essentially the memory bus, i.e., how fast the
+ data can be fetched. On some machines (especially relatively
+ slow and simple micro-computers), overhead can be significantly
+ reduced by combining memory-to-memory copy and the checksumming,
+ fetching the data only once for both.
+
+ (4) Incremental Update
+
+ Finally, one can sometimes avoid recomputing the entire checksum
+ when one header field is updated. The best-known example is a
+ gateway changing the TTL field in the IP header, but there are
+ other examples (for example, when updating a source route). In
+ these cases it is possible to update the checksum without
+ scanning the message or datagram.
+
+ To update the checksum, simply add the differences of the
+ sixteen bit integers that have been changed. To see why this
+ works, observe that every 16-bit integer has an additive inverse
+ and that addition is associative. From this it follows that
+ given the original value m, the new value m', and the old
+
+
+
+Braden, Borman, & Partridge [Page 4]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ checksum C, the new checksum C' is:
+
+ C' = C + (-m) + m' = C + (m' - m)
+
+
+3. Numerical Examples
+
+ We now present explicit examples of calculating a simple 1's
+ complement sum on a 2's complement machine. The examples show the
+ same sum calculated byte by bye, by 16-bits words in normal and
+ swapped order, and 32 bits at a time in 3 different orders. All
+ numbers are in hex.
+
+ Byte-by-byte "Normal" Swapped
+ Order Order
+
+ Byte 0/1: 00 01 0001 0100
+ Byte 2/3: f2 03 f203 03f2
+ Byte 4/5: f4 f5 f4f5 f5f4
+ Byte 6/7: f6 f7 f6f7 f7f6
+ --- --- ----- -----
+ Sum1: 2dc 1f0 2ddf0 1f2dc
+
+ dc f0 ddf0 f2dc
+ Carrys: 1 2 2 1
+ -- -- ---- ----
+ Sum2: dd f2 ddf2 f2dd
+
+ Final Swap: dd f2 ddf2 ddf2
+
+
+ Byte 0/1/2/3: 0001f203 010003f2 03f20100
+ Byte 4/5/6/7: f4f5f6f7 f5f4f7f6 f7f6f5f4
+ -------- -------- --------
+ Sum1: 0f4f7e8fa 0f6f4fbe8 0fbe8f6f4
+
+ Carries: 0 0 0
+
+ Top half: f4f7 f6f4 fbe8
+ Bottom half: e8fa fbe8 f6f4
+ ----- ----- -----
+ Sum2: 1ddf1 1f2dc 1f2dc
+
+ ddf1 f2dc f2dc
+ Carrys: 1 1 1
+ ---- ---- ----
+ Sum3: ddf2 f2dd f2dd
+
+ Final Swap: ddf2 ddf2 ddf2
+
+
+
+
+
+Braden, Borman, & Partridge [Page 5]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Finally, here an example of breaking the sum into two groups, with
+ the second group starting on a odd boundary:
+
+
+ Byte-by-byte Normal
+ Order
+
+ Byte 0/1: 00 01 0001
+ Byte 2/ : f2 (00) f200
+ --- --- -----
+ Sum1: f2 01 f201
+
+ Byte 4/5: 03 f4 03f4
+ Byte 6/7: f5 f6 f5f6
+ Byte 8/: f7 (00) f700
+ --- --- -----
+ Sum2: 1f0ea
+
+ Sum2: f0ea
+ Carry: 1
+ -----
+ Sum3: f0eb
+
+ Sum1: f201
+ Sum3 byte swapped: ebf0
+ -----
+ Sum4: 1ddf1
+
+ Sum4: ddf1
+ Carry: 1
+ -----
+ Sum5: ddf2
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Braden, Borman, & Partridge [Page 6]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+4. Implementation Examples
+
+ In this section we show examples of Internet checksum implementation
+ algorithms that have been found to be efficient on a variety of
+ CPU's. In each case, we show the core of the algorithm, without
+ including environmental code (e.g., subroutine linkages) or special-
+ case code.
+
+4.1 "C"
+
+ The following "C" code algorithm computes the checksum with an inner
+ loop that sums 16-bits at a time in a 32-bit accumulator.
+
+ in 6
+ {
+ /* Compute Internet Checksum for "count" bytes
+ * beginning at location "addr".
+ */
+ register long sum = 0;
+
+ while( count > 1 ) {
+ /* This is the inner loop */
+ sum += * (unsigned short) addr++;
+ count -= 2;
+ }
+
+ /* Add left-over byte, if any */
+ if( count > 0 )
+ sum += * (unsigned char *) addr;
+
+ /* Fold 32-bit sum to 16 bits */
+ while (sum>>16)
+ sum = (sum & 0xffff) + (sum >> 16);
+
+ checksum = ~sum;
+ }
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Braden, Borman, & Partridge [Page 7]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+4.2 Motorola 68020
+
+ The following algorithm is given in assembler language for a Motorola
+ 68020 chip. This algorithm performs the sum 32 bits at a time, and
+ unrolls the loop with 16 replications. For clarity, we have omitted
+ the logic to add the last fullword when the length is not a multiple
+ of 4. The result is left in register d0.
+
+ With a 20MHz clock, this routine was measured at 134 usec/kB summing
+ random data. This algorithm was developed by Van Jacobson.
+
+
+ movl d1,d2
+ lsrl #6,d1 | count/64 = # loop traversals
+ andl #0x3c,d2 | Then find fractions of a chunk
+ negl d2
+ andb #0xf,cc | Clear X (extended carry flag)
+
+ jmp pc@(2$-.-2:b,d2) | Jump into loop
+
+ 1$: | Begin inner loop...
+
+ movl a0@+,d2 | Fetch 32-bit word
+ addxl d2,d0 | Add word + previous carry
+ movl a0@+,d2 | Fetch 32-bit word
+ addxl d2,d0 | Add word + previous carry
+
+ | ... 14 more replications
+ 2$:
+ dbra d1,1$ | (NB- dbra doesn't affect X)
+
+ movl d0,d1 | Fold 32 bit sum to 16 bits
+ swap d1 | (NB- swap doesn't affect X)
+ addxw d1,d0
+ jcc 3$
+ addw #1,d0
+ 3$:
+ andl #0xffff,d0
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Braden, Borman, & Partridge [Page 8]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+4.3 Cray
+
+ The following example, in assembler language for a Cray CPU, was
+ contributed by Charley Kline. It implements the checksum calculation
+ as a vector operation, summing up to 512 bytes at a time with a basic
+ summation unit of 32 bits. This example omits many details having to
+ do with short blocks, for clarity.
+
+ Register A1 holds the address of a 512-byte block of memory to
+ checksum. First two copies of the data are loaded into two vector
+ registers. One is vector-shifted right 32 bits, while the other is
+ vector-ANDed with a 32 bit mask. Then the two vectors are added
+ together. Since all these operations chain, it produces one result
+ per clock cycle. Then it collapses the result vector in a loop that
+ adds each element to a scalar register. Finally, the end-around
+ carry is performed and the result is folded to 16-bits.
+
+ EBM
+ A0 A1
+ VL 64 use full vectors
+ S1 <32 form 32-bit mask from the right.
+ A2 32
+ V1 ,A0,1 load packet into V1
+ V2 S1&V1 Form right-hand 32-bits in V2.
+ V3 V1>A2 Form left-hand 32-bits in V3.
+ V1 V2+V3 Add the two together.
+ A2 63 Prepare to collapse into a scalar.
+ S1 0
+ S4 <16 Form 16-bit mask from the right.
+ A4 16
+ CK$LOOP S2 V1,A2
+ A2 A2-1
+ A0 A2
+ S1 S1+S2
+ JAN CK$LOOP
+ S2 S1&S4 Form right-hand 16-bits in S2
+ S1 S1>A4 Form left-hand 16-bits in S1
+ S1 S1+S2
+ S2 S1&S4 Form right-hand 16-bits in S2
+ S1 S1>A4 Form left-hand 16-bits in S1
+ S1 S1+S2
+ S1 #S1 Take one's complement
+ CMR At this point, S1 contains the checksum.
+
+
+
+
+
+
+
+
+
+
+
+Braden, Borman, & Partridge [Page 9]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+4.4 IBM 370
+
+ The following example, in assembler language for an IBM 370 CPU, sums
+ the data 4 bytes at a time. For clarity, we have omitted the logic
+ to add the last fullword when the length is not a multiple of 4, and
+ to reverse the bytes when necessary. The result is left in register
+ RCARRY.
+
+ This code has been timed on an IBM 3090 CPU at 27 usec/KB when
+ summing all one bits. This time is reduced to 24.3 usec/KB if the
+ trouble is taken to word-align the addends (requiring special cases
+ at both the beginning and the end, and byte-swapping when necessary
+ to compensate for starting on an odd byte).
+
+ * Registers RADDR and RCOUNT contain the address and length of
+ * the block to be checksummed.
+ *
+ * (RCARRY, RSUM) must be an even/odd register pair.
+ * (RCOUNT, RMOD) must be an even/odd register pair.
+ *
+ CHECKSUM SR RSUM,RSUM Clear working registers.
+ SR RCARRY,RCARRY
+ LA RONE,1 Set up constant 1.
+ *
+ SRDA RCOUNT,6 Count/64 to RCOUNT.
+ AR RCOUNT,RONE +1 = # times in loop.
+ SRL RMOD,26 Size of partial chunk to RMOD.
+ AR RADDR,R3 Adjust addr to compensate for
+ S RADDR,=F(64) jumping into the loop.
+ SRL RMOD,1 (RMOD/4)*2 is halfword index.
+ LH RMOD,DOPEVEC9(RMOD) Use magic dope-vector for offset,
+ B LOOP(RMOD) and jump into the loop...
+ *
+ * Inner loop:
+ *
+ LOOP AL RSUM,0(,RADDR) Add Logical fullword
+ BC 12,*+6 Branch if no carry
+ AR RCARRY,RONE Add 1 end-around
+ AL RSUM,4(,RADDR) Add Logical fullword
+ BC 12,*+6 Branch if no carry
+ AR RCARRY,RONE Add 1 end-around
+ *
+ * ... 14 more replications ...
+ *
+ A RADDR,=F'64' Increment address ptr
+ BCT RCOUNT,LOOP Branch on Count
+ *
+ * Add Carries into sum, and fold to 16 bits
+ *
+ ALR RCARRY,RSUM Add SUM and CARRY words
+ BC 12,*+6 and take care of carry
+
+
+
+Braden, Borman, & Partridge [Page 10]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ AR RCARRY,RONE
+ SRDL RCARRY,16 Fold 32-bit sum into
+ SRL RSUM,16 16-bits
+ ALR RCARRY,RSUM
+ C RCARRY,=X'0000FFFF' and take care of any
+ BNH DONE last carry
+ S RCARRY,=X'0000FFFF'
+ DONE X RCARRY,=X'0000FFFF' 1's complement
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Braden, Borman, & Partridge [Page 11]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ IEN 45
+ Section 2.4.4.5
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ TCP Checksum Function Design
+
+
+
+ William W. Plummer
+
+
+ Bolt Beranek and Newman, Inc.
+ 50 Moulton Street
+ Cambridge MA 02138
+
+
+ 5 June 1978
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Braden, Borman, & Partridge [Page 12]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ 1. Introduction
+
+ Checksums are included in packets in order that errors
+ encountered during transmission may be detected. For Internet
+ protocols such as TCP [1,9] this is especially important because
+ packets may have to cross wireless networks such as the Packet
+ Radio Network [2] and Atlantic Satellite Network [3] where
+ packets may be corrupted. Internet protocols (e.g., those for
+ real time speech transmission) can tolerate a certain level of
+ transmission errors and forward error correction techniques or
+ possibly no checksum at all might be better. The focus in this
+ paper is on checksum functions for protocols such as TCP where
+ the required reliable delivery is achieved by retransmission.
+
+ Even if the checksum appears good on a message which has been
+ received, the message may still contain an undetected error. The
+ probability of this is bounded by 2**(-C) where C is the number
+ of checksum bits. Errors can arise from hardware (and software)
+ malfunctions as well as transmission errors. Hardware induced
+ errors are usually manifested in certain well known ways and it
+ is desirable to account for this in the design of the checksum
+ function. Ideally no error of the "common hardware failure" type
+ would go undetected.
+
+ An example of a failure that the current checksum function
+ handles successfully is picking up a bit in the network interface
+ (or I/O buss, memory channel, etc.). This will always render the
+ checksum bad. For an example of how the current function is
+ inadequate, assume that a control signal stops functioning in the
+ network interface and the interface stores zeros in place of the
+ real data. These "all zero" messages appear to have valid
+ checksums. Noise on the "There's Your Bit" line of the ARPANET
+ Interface [4] may go undetected because the extra bits input may
+ cause the checksum to be perturbed (i.e., shifted) in the same
+ way as the data was.
+
+ Although messages containing undetected errors will occasionally
+ be passed to higher levels of protocol, it is likely that they
+ will not make sense at that level. In the case of TCP most such
+ messages will be ignored, but some could cause a connection to be
+ aborted. Garbled data could be viewed as a problem for a layer
+ of protocol above TCP which itself may have a checksuming scheme.
+
+ This paper is the first step in design of a new checksum function
+ for TCP and some other Internet protocols. Several useful
+ properties of the current function are identified. If possible
+
+ - 1 -
+
+
+
+Braden, Borman, & Partridge [Page 13]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ these should be retained in any new function. A number of
+ plausible checksum schemes are investigated. Of these only the
+ "product code" seems to be simple enough for consideration.
+
+ 2. The Current TCP Checksum Function
+
+ The current function is oriented towards sixteen-bit machines
+ such as the PDP-11 but can be computed easily on other machines
+ (e.g., PDP-10). A packet is thought of as a string of 16-bit
+ bytes and the checksum function is the one's complement sum (add
+ with end-around carry) of those bytes. It is the one's
+ complement of this sum which is stored in the checksum field of
+ the TCP header. Before computing the checksum value, the sender
+ places a zero in the checksum field of the packet. If the
+ checksum value computed by a receiver of the packet is zero, the
+ packet is assumed to be valid. This is a consequence of the
+ "negative" number in the checksum field exactly cancelling the
+ contribution of the rest of the packet.
+
+ Ignoring the difficulty of actually evaluating the checksum
+ function for a given packet, the way of using the checksum
+ described above is quite simple, but it assumes some properties
+ of the checksum operator (one's complement addition, "+" in what
+ follows):
+
+ (P1) + is commutative. Thus, the order in which
+ the 16-bit bytes are "added" together is
+ unimportant.
+
+ (P2) + has at least one identity element (The
+ current function has two: +0 and -0). This
+ allows the sender to compute the checksum
+ function by placing a zero in the packet checksum
+ field before computing the value.
+
+ (P3) + has an inverse. Thus, the receiver may
+ evaluate the checksum function and expect a zero.
+
+ (P4) + is associative, allowing the checksum field
+ to be anywhere in the packet and the 16-bit bytes
+ to be scanned sequentially.
+
+ Mathematically, these properties of the binary operation "+" over
+ the set of 16-bit numbers forms an Abelian group [5]. Of course,
+ there are many Abelian groups but not all would be satisfactory
+ for use as checksum operators. (Another operator readily
+
+ - 2 -
+
+
+
+Braden, Borman, & Partridge [Page 14]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ available in the PDP-11 instruction set that has all of these
+ properties is exclusive-OR, but XOR is unsatisfactory for other
+ reasons.)
+
+ Albeit imprecise, another property which must be preserved in any
+ future checksum scheme is:
+
+ (P5) + is fast to compute on a variety of machines
+ with limited storage requirements.
+
+ The current function is quite good in this respect. On the
+ PDP-11 the inner loop looks like:
+
+ LOOP: ADD (R1)+,R0 ; Add the next 16-bit byte
+ ADC R0 ; Make carry be end-around
+ SOB R2,LOOP ; Loop over entire packet.
+
+ ( 4 memory cycles per 16-bit byte )
+
+ On the PDP-10 properties P1-4 are exploited further and two
+ 16-bit bytes per loop are processed:
+
+ LOOP: ILDB THIS,PTR ; Get 2 16-bit bytes
+ ADD SUM,THIS ; Add into current sum
+ JUMPGE SUM,CHKSU2 ; Jump if fewer than 8 carries
+ LDB THIS,[POINT 20,SUM,19] ; Get left 16 and carries
+ ANDI SUM,177777 ; Save just low 16 here
+ ADD SUM,THIS ; Fold in carries
+ CHKSU2: SOJG COUNT,LOOP ; Loop over entire packet
+
+ ( 3.1 memory cycles per 16-bit byte )
+
+ The "extra" instruction in the loops above are required to
+ convert the two's complement ADD instruction(s) into a one's
+ complement add by making the carries be end-around. One's
+ complement arithmetic is better than two's complement because it
+ is equally sensitive to errors in all bit positions. If two's
+ complement addition were used, an even number of 1's could be
+ dropped (or picked up) in the most significant bit channel
+ without affecting the value of the checksum. It is just this
+ property that makes some sort of addition preferable to a simple
+ exclusive-OR which is frequently used but permits an even number
+ of drops (pick ups) in any bit channel. RIM10B paper tape format
+ used on PDP-10s [10] uses two's complement add because space for
+ the loader program is extremely limited.
+
+ - 3 -
+
+
+
+
+Braden, Borman, & Partridge [Page 15]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ Another property of the current checksum scheme is:
+
+ (P6) Adding the checksum to a packet does not change
+ the information bytes. Peterson [6] calls this a
+ "systematic" code.
+
+ This property allows intermediate computers such as gateway
+ machines to act on fields (i.e., the Internet Destination
+ Address) without having to first decode the packet. Cyclical
+ Redundancy Checks used for error correction are not systematic
+ either. However, most applications of CRCs tend to emphasize
+ error detection rather than correction and consequently can send
+ the message unchanged, with the CRC check bits being appended to
+ the end. The 24-bit CRC used by ARPANET IMPs and Very Distant
+ Host Interfaces [4] and the ANSI standards for 800 and 6250 bits
+ per inch magnetic tapes (described in [11]) use this mode.
+
+ Note that the operation of higher level protocols are not (by
+ design) affected by anything that may be done by a gateway acting
+ on possibly invalid packets. It is permissible for gateways to
+ validate the checksum on incoming packets, but in general
+ gateways will not know how to do this if the checksum is a
+ protocol-specific feature.
+
+ A final property of the current checksum scheme which is actually
+ a consequence of P1 and P4 is:
+
+ (P7) The checksum may be incrementally modified.
+
+ This property permits an intermediate gateway to add information
+ to a packet, for instance a timestamp, and "add" an appropriate
+ change to the checksum field of the packet. Note that the
+ checksum will still be end-to-end since it was not fully
+ recomputed.
+
+ 3. Product Codes
+
+ Certain "product codes" are potentially useful for checksuming
+ purposes. The following is a brief description of product codes
+ in the context of TCP. More general treatment can be found in
+ Avizienis [7] and probably other more recent works.
+
+ The basic concept of this coding is that the message (packet) to
+ be sent is formed by transforming the original source message and
+ adding some "check" bits. By reading this and applying a
+ (possibly different) transformation, a receiver can reconstruct
+
+ - 4 -
+
+
+
+Braden, Borman, & Partridge [Page 16]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ the original message and determine if it has been corrupted
+ during transmission.
+
+ Mo Ms Mr
+
+ ----- ----- -----
+ | A | code | 7 | decode | A |
+ | B | ==> | 1 | ==> | B |
+ | C | | 4 | | C |
+ ----- |...| -----
+ | 2 | check plus "valid" flag
+ ----- info
+
+ Original Sent Reconstructed
+
+ With product codes the transformation is Ms = K * Mo . That is,
+ the message sent is simply the product of the original message
+ Mo and some well known constant K . To decode, the received
+ Ms is divided by K which will yield Mr as the quotient and
+ 0 as the remainder if Mr is to be considered the same as Mo .
+
+ The first problem is selecting a "good" value for K, the "check
+ factor". K must be relatively prime to the base chosen to
+ express the message. (Example: Binary messages with K
+ incorrectly chosen to be 8. This means that Ms looks exactly
+ like Mo except that three zeros have been appended. The only
+ way the message could look bad to a receiver dividing by 8 is if
+ the error occurred in one of those three bits.)
+
+ For TCP the base R will be chosen to be 2**16. That is, every
+ 16-bit byte (word on the PDP-11) will be considered as a digit of
+ a big number and that number is the message. Thus,
+
+ Mo = SIGMA [ Bi * (R**i)] , Bi is i-th byte
+ i=0 to N
+
+ Ms = K * Mo
+
+ Corrupting a single digit of Ms will yield Ms' = Ms +or-
+ C*(R**j) for some radix position j . The receiver will compute
+ Ms'/K = Mo +or- C(R**j)/K. Since R and K are relatively prime,
+ C*(R**j) cannot be any exact multiple of K. Therefore, the
+ division will result in a non-zero remainder which indicates that
+ Ms' is a corrupted version of Ms. As will be seen, a good
+ choice for K is (R**b - 1), for some b which is the "check
+ length" which controls the degree of detection to be had for
+
+ - 5 -
+
+
+
+Braden, Borman, & Partridge [Page 17]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ burst errors which affect a string of digits (i.e., 16-bit bytes)
+ in the message. In fact b will be chosen to be 1, so K will
+ be 2**16 - 1 so that arithmetic operations will be simple. This
+ means that all bursts of 15 or fewer bits will be detected.
+ According to [7] this choice for b results in the following
+ expression for the fraction of undetected weight 2 errors:
+
+ f = 16(k-1)/[32(16k-3) + (6/k)] where k is the message length.
+
+ For large messages f approaches 3.125 per cent as k goes to
+ infinity.
+
+ Multiple precision multiplication and division are normally quite
+ complex operations, especially on small machines which typically
+ lack even single precision multiply and divide operations. The
+ exception to this is exactly the case being dealt with here --
+ the factor is 2**16 - 1 on machines with a word length of 16
+ bits. The reason for this is due to the following identity:
+
+ Q*(R**j) = Q, mod (R-1) 0 <= Q < R
+
+ That is, any digit Q in the selected radix (0, 1, ... R-1)
+ multiplied by any power of the radix will have a remainder of Q
+ when divided by the radix minus 1.
+
+ Example: In decimal R = 10. Pick Q = 6.
+
+ 6 = 0 * 9 + 6 = 6, mod 9
+ 60 = 6 * 9 + 6 = 6, mod 9
+ 600 = 66 * 9 + 6 = 6, mod 9 etc.
+
+ More to the point, rem(31415/9) = rem((30000+1000+400+10+5)/9)
+ = (3 mod 9) + (1 mod 9) + (4 mod 9) + (1 mod 9) + (5 mod 9)
+ = (3+1+4+1+5) mod 9
+ = 14 mod 9
+ = 5
+
+ So, the remainder of a number divided by the radix minus one can
+ be found by simply summing the digits of the number. Since the
+ radix in the TCP case has been chosen to be 2**16 and the check
+ factor is 2**16 - 1, a message can quickly be checked by summing
+ all of the 16-bit words (on a PDP-11), with carries being
+ end-around. If zero is the result, the message can be considered
+ valid. Thus, checking a product coded message is exactly the
+ same complexity as with the current TCP checksum!
+
+ - 6 -
+
+
+
+
+Braden, Borman, & Partridge [Page 18]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ In order to form Ms, the sender must multiply the multiple
+ precision "number" Mo by 2**16 - 1. Or, Ms = (2**16)Mo - Mo.
+ This is performed by shifting Mo one whole word's worth of
+ precision and subtracting Mo. Since carries must propagate
+ between digits, but it is only the current digit which is of
+ interest, one's complement arithmetic is used.
+
+ (2**16)Mo = Mo0 + Mo1 + Mo2 + ... + MoX + 0
+ - Mo = - ( Mo0 + Mo1 + ......... + MoX)
+ --------- ----------------------------------
+ Ms = Ms0 + Ms1 + ... - MoX
+
+ A loop which implements this function on a PDP-11 might look
+ like:
+ LOOP: MOV -2(R2),R0 ; Next byte of (2**16)Mo
+ SBC R0 ; Propagate carries from last SUB
+ SUB (R2)+,R0 ; Subtract byte of Mo
+ MOV R0,(R3)+ ; Store in Ms
+ SOB R1,LOOP ; Loop over entire message
+ ; 8 memory cycles per 16-bit byte
+
+ Note that the coding procedure is not done in-place since it is
+ not systematic. In general the original copy, Mo, will have to
+ be retained by the sender for retransmission purposes and
+ therefore must remain readable. Thus the MOV R0,(R3)+ is
+ required which accounts for 2 of the 8 memory cycles per loop.
+
+ The coding procedure will add exactly one 16-bit word to the
+ message since Ms < (2**16)Mo . This additional 16 bits will be
+ at the tail of the message, but may be moved into the defined
+ location in the TCP header immediately before transmission. The
+ receiver will have to undo this to put Ms back into standard
+ format before decoding the message.
+
+ The code in the receiver for fully decoding the message may be
+ inferred by observing that any word in Ms contains the
+ difference between two successive words of Mo minus the carries
+ from the previous word, and the low order word contains minus the
+ low word of Mo. So the low order (i.e., rightmost) word of Mr is
+ just the negative of the low order byte of Ms. The next word of
+ Mr is the next word of Ms plus the just computed word of Mr
+ plus the carry from that previous computation.
+
+ A slight refinement of the procedure is required in order to
+ protect against an all-zero message passing to the destination.
+ This will appear to have a valid checksum because Ms'/K = 0/K
+
+ - 7 -
+
+
+
+Braden, Borman, & Partridge [Page 19]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ = 0 with 0 remainder. The refinement is to make the coding be
+ Ms = K*Mo + C where C is some arbitrary, well-known constant.
+ Adding this constant requires a second pass over the message, but
+ this will typically be very short since it can stop as soon as
+ carries stop propagating. Chosing C = 1 is sufficient in most
+ cases.
+
+ The product code checksum must be evaluated in terms of the
+ desired properties P1 - P7. It has been shown that a factor of
+ two more machine cycles are consumed in computing or verifying a
+ product code checksum (P5 satisfied?).
+
+ Although the code is not systematic, the checksum can be verified
+ quickly without decoding the message. If the Internet
+ Destination Address is located at the least significant end of
+ the packet (where the product code computation begins) then it is
+ possible for a gateway to decode only enough of the message to
+ see this field without having to decode the entire message.
+ Thus, P6 is at least partially satisfied. The algebraic
+ properties P1 through P4 are not satisfied, but only a small
+ amount of computation is needed to account for this -- the
+ message needs to be reformatted as previously mentioned.
+
+ P7 is satisfied since the product code checksum can be
+ incrementally updated to account for an added word, although the
+ procedure is somewhat involved. Imagine that the original
+ message has two halves, H1 and H2. Thus, Mo = H1*(R**j) + H2.
+ The timestamp word is to be inserted between these halves to form
+ a modified Mo' = H1*(R**(j+1)) + T*(R**j) + H2. Since K has
+ been chosen to be R-1, the transmitted message Ms' = Mo'(R-1).
+ Then,
+
+ Ms' = Ms*R + T(R-1)(R**j) + P2((R-1)**2)
+
+ = Ms*R + T*(R**(j+1)) + T*(R**j) + P2*(R**2) - 2*P2*R - P2
+
+ Recalling that R is 2**16, the word size on the PDP-11,
+ multiplying by R means copying down one word in memory. So,
+ the first term of Ms' is simply the unmodified message copied
+ down one word. The next term is the new data T added into the
+ Ms' being formed beginning at the (j+1)th word. The addition is
+ fairly easy here since after adding in T all that is left is
+ propagating the carry, and that can stop as soon as no carry is
+ produced. The other terms can be handle similarly.
+
+ - 8 -
+
+
+
+
+
+Braden, Borman, & Partridge [Page 20]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ 4. More Complicated Codes
+
+ There exists a wealth of theory on error detecting and correcting
+ codes. Peterson [6] is an excellent reference. Most of these
+ "CRC" schemes are designed to be implemented using a shift
+ register with a feedback network composed of exclusive-ORs.
+ Simulating such a logic circuit with a program would be too slow
+ to be useful unless some programming trick is discovered.
+
+ One such trick has been proposed by Kirstein [8]. Basically, a
+ few bits (four or eight) of the current shift register state are
+ combined with bits from the input stream (from Mo) and the result
+ is used as an index to a table which yields the new shift
+ register state and, if the code is not systematic, bits for the
+ output stream (Ms). A trial coding of an especially "good" CRC
+ function using four-bit bytes showed showed this technique to be
+ about four times as slow as the current checksum function. This
+ was true for both the PDP-10 and PDP-11 machines. Of the
+ desirable properties listed above, CRC schemes satisfy only P3
+ (It has an inverse.), and P6 (It is systematic.). Placement of
+ the checksum field in the packet is critical and the CRC cannot
+ be incrementally modified.
+
+ Although the bulk of coding theory deals with binary codes, most
+ of the theory works if the alphabet contains q symbols, where
+ q is a power of a prime number. For instance q taken as 2**16
+ should make a great deal of the theory useful on a word-by-word
+ basis.
+
+ 5. Outboard Processing
+
+ When a function such as computing an involved checksum requires
+ extensive processing, one solution is to put that processing into
+ an outboard processor. In this way "encode message" and "decode
+ message" become single instructions which do not tax the main
+ host processor. The Digital Equipment Corporation VAX/780
+ computer is equipped with special hardware for generating and
+ checking CRCs [13]. In general this is not a very good solution
+ since such a processor must be constructed for every different
+ host machine which uses TCP messages.
+
+ It is conceivable that the gateway functions for a large host may
+ be performed entirely in an "Internet Frontend Machine". This
+ machine would be responsible for forwarding packets received
+
+ - 9 -
+
+
+
+
+
+Braden, Borman, & Partridge [Page 21]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ either from the network(s) or from the Internet protocol modules
+ in the connected host, and for reassembling Internet fragments
+ into segments and passing these to the host. Another capability
+ of this machine would be to check the checksum so that the
+ segments given to the host are known to be valid at the time they
+ leave the frontend. Since computer cycles are assumed to be both
+ inexpensive and available in the frontend, this seems reasonable.
+
+ The problem with attempting to validate checksums in the frontend
+ is that it destroys the end-to-end character of the checksum. If
+ anything, this is the most powerful feature of the TCP checksum!
+ There is a way to make the host-to-frontend link be covered by
+ the end-to-end checksum. A separate, small protocol must be
+ developed to cover this link. After having validated an incoming
+ packet from the network, the frontend would pass it to the host
+ saying "here is an Internet segment for you. Call it #123". The
+ host would save this segment, and send a copy back to the
+ frontend saying, "Here is what you gave me as #123. Is it OK?".
+ The frontend would then do a word-by-word comparison with the
+ first transmission, and tell the host either "Here is #123
+ again", or "You did indeed receive #123 properly. Release it to
+ the appropriate module for further processing."
+
+ The headers on the messages crossing the host-frontend link would
+ most likely be covered by a fairly strong checksum so that
+ information like which function is being performed and the
+ message reference numbers are reliable. These headers would be
+ quite short, maybe only sixteen bits, so the checksum could be
+ quite strong. The bulk of the message would not be checksumed of
+ course.
+ The reason this scheme reduces the computing burden on the host
+ is that all that is required in order to validate the message
+ using the end-to-end checksum is to send it back to the frontend
+ machine. In the case of the PDP-10, this requires only 0.5
+ memory cycles per 16-bit byte of Internet message, and only a few
+ processor cycles to setup the required transfers.
+
+ 6. Conclusions
+
+ There is an ordering of checksum functions: first and simplest is
+ none at all which provides no error detection or correction.
+ Second, is sending a constant which is checked by the receiver.
+ This also is extremely weak. Third, the exclusive-OR of the data
+ may be sent. XOR takes the minimal amount of computer time to
+ generate and check, but is not a good checksum. A two's
+ complement sum of the data is somewhat better and takes no more
+
+ - 10 -
+
+
+
+Braden, Borman, & Partridge [Page 22]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ computer time to compute. Fifth, is the one's complement sum
+ which is what is currently used by TCP. It is slightly more
+ expensive in terms of computer time. The next step is a product
+ code. The product code is strongly related to one's complement
+ sum, takes still more computer time to use, provides a bit more
+ protection against common hardware failures, but has some
+ objectionable properties. Next is a genuine CRC polynomial code,
+ used for checking purposes only. This is very expensive for a
+ program to implement. Finally, a full CRC error correcting and
+ detecting scheme may be used.
+
+ For TCP and Internet applications the product code scheme is
+ viable. It suffers mainly in that messages must be (at least
+ partially) decoded by intermediate gateways in order that they
+ can be forwarded. Should product codes not be chosen as an
+ improved checksum, some slight modification to the existing
+ scheme might be possible. For instance the "add and rotate"
+ function used for paper tape by the PDP-6/10 group at the
+ Artificial Intelligence Laboratory at M.I.T. Project MAC [12]
+ could be useful if it can be proved that it is better than the
+ current scheme and that it can be computed efficiently on a
+ variety of machines.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ - 11 -
+
+
+
+
+
+
+
+Braden, Borman, & Partridge [Page 23]
+
+RFC 1071 Computing the Internet Checksum September 1988
+
+
+ Internet Experiment Note 45 5 June 1978
+ TCP Checksum Function Design William W. Plummer
+
+ References
+
+ [1] Cerf, V.G. and Kahn, Robert E., "A Protocol for Packet Network
+ Communications," IEEE Transactions on Communications, vol.
+ COM-22, No. 5, May 1974.
+
+ [2] Kahn, Robert E., "The Organization of Computer Resources into
+ a Packet Radio Network", IEEE Transactions on Communications,
+ vol. COM-25, no. 1, pp. 169-178, January 1977.
+
+ [3] Jacobs, Irwin, et al., "CPODA - A Demand Assignment Protocol
+ for SatNet", Fifth Data Communications Symposium, September
+ 27-9, 1977, Snowbird, Utah
+
+ [4] Bolt Beranek and Newman, Inc. "Specifications for the
+ Interconnection of a Host and an IMP", Report 1822, January
+ 1976 edition.
+
+ [5] Dean, Richard A., "Elements of Abstract Algebra", John Wyley
+ and Sons, Inc., 1966
+
+ [6] Peterson, W. Wesley, "Error Correcting Codes", M.I.T. Press
+ Cambridge MA, 4th edition, 1968.
+
+ [7] Avizienis, Algirdas, "A Study of the Effectiveness of Fault-
+ Detecting Codes for Binary Arithmetic", Jet Propulsion
+ Laboratory Technical Report No. 32-711, September 1, 1965.
+
+ [8] Kirstein, Peter, private communication
+
+ [9] Cerf, V. G. and Postel, Jonathan B., "Specification of
+ Internetwork Transmission Control Program Version 3",
+ University of Southern California Information Sciences
+ Institute, January 1978.
+
+ [10] Digital Equipment Corporation, "PDP-10 Reference Handbook",
+ 1970, pp. 114-5.
+
+ [11] Swanson, Robert, "Understanding Cyclic Redundancy Codes",
+ Computer Design, November, 1975, pp. 93-99.
+
+ [12] Clements, Robert C., private communication.
+
+ [13] Conklin, Peter F., and Rodgers, David P., "Advanced
+ Minicomputer Designed by Team Evaluation of Hardware/Software
+ Tradeoffs", Computer Design, April 1978, pp. 136-7.
+
+ - 12 -
+
+
+
+Braden, Borman, & Partridge [Page 24]
+