diff options
author | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
---|---|---|
committer | Thomas Voss <mail@thomasvoss.com> | 2024-11-27 20:54:24 +0100 |
commit | 4bfd864f10b68b71482b35c818559068ef8d5797 (patch) | |
tree | e3989f47a7994642eb325063d46e8f08ffa681dc /doc/rfc/rfc1071.txt | |
parent | ea76e11061bda059ae9f9ad130a9895cc85607db (diff) |
doc: Add RFC documents
Diffstat (limited to 'doc/rfc/rfc1071.txt')
-rw-r--r-- | doc/rfc/rfc1071.txt | 1417 |
1 files changed, 1417 insertions, 0 deletions
diff --git a/doc/rfc/rfc1071.txt b/doc/rfc/rfc1071.txt new file mode 100644 index 0000000..3b94108 --- /dev/null +++ b/doc/rfc/rfc1071.txt @@ -0,0 +1,1417 @@ + + + + +Network Working Group R. Braden +Request for Comments: 1071 ISI + D. Borman + Cray Research + C. Partridge + BBN Laboratories + September 1988 + + + Computing the Internet Checksum + + +Status of This Memo + + This memo summarizes techniques and algorithms for efficiently + computing the Internet checksum. It is not a standard, but a set of + useful implementation techniques. Distribution of this memo is + unlimited. + +1. Introduction + + This memo discusses methods for efficiently computing the Internet + checksum that is used by the standard Internet protocols IP, UDP, and + TCP. + + An efficient checksum implementation is critical to good performance. + As advances in implementation techniques streamline the rest of the + protocol processing, the checksum computation becomes one of the + limiting factors on TCP performance, for example. It is usually + appropriate to carefully hand-craft the checksum routine, exploiting + every machine-dependent trick possible; a fraction of a microsecond + per TCP data byte can add up to a significant CPU time savings + overall. + + In outline, the Internet checksum algorithm is very simple: + + (1) Adjacent octets to be checksummed are paired to form 16-bit + integers, and the 1's complement sum of these 16-bit integers is + formed. + + (2) To generate a checksum, the checksum field itself is cleared, + the 16-bit 1's complement sum is computed over the octets + concerned, and the 1's complement of this sum is placed in the + checksum field. + + (3) To check a checksum, the 1's complement sum is computed over the + same set of octets, including the checksum field. If the result + is all 1 bits (-0 in 1's complement arithmetic), the check + succeeds. + + Suppose a checksum is to be computed over the sequence of octets + + + +Braden, Borman, & Partridge [Page 1] + +RFC 1071 Computing the Internet Checksum September 1988 + + + A, B, C, D, ... , Y, Z. Using the notation [a,b] for the 16-bit + integer a*256+b, where a and b are bytes, then the 16-bit 1's + complement sum of these bytes is given by one of the following: + + [A,B] +' [C,D] +' ... +' [Y,Z] [1] + + [A,B] +' [C,D] +' ... +' [Z,0] [2] + + where +' indicates 1's complement addition. These cases + correspond to an even or odd count of bytes, respectively. + + On a 2's complement machine, the 1's complement sum must be + computed by means of an "end around carry", i.e., any overflows + from the most significant bits are added into the least + significant bits. See the examples below. + + Section 2 explores the properties of this checksum that may be + exploited to speed its calculation. Section 3 contains some + numerical examples of the most important implementation + techniques. Finally, Section 4 includes examples of specific + algorithms for a variety of common CPU types. We are grateful + to Van Jacobson and Charley Kline for their contribution of + algorithms to this section. + + The properties of the Internet checksum were originally + discussed by Bill Plummer in IEN-45, entitled "Checksum Function + Design". Since IEN-45 has not been widely available, we include + it as an extended appendix to this RFC. + + 2. Calculating the Checksum + + This simple checksum has a number of wonderful mathematical + properties that may be exploited to speed its calculation, as we + will now discuss. + + + (A) Commutative and Associative + + As long as the even/odd assignment of bytes is respected, the + sum can be done in any order, and it can be arbitrarily split + into groups. + + For example, the sum [1] could be split into: + + ( [A,B] +' [C,D] +' ... +' [J,0] ) + + +' ( [0,K] +' ... +' [Y,Z] ) [3] + + + + + + + +Braden, Borman, & Partridge [Page 2] + +RFC 1071 Computing the Internet Checksum September 1988 + + + (B) Byte Order Independence + + The sum of 16-bit integers can be computed in either byte order. + Thus, if we calculate the swapped sum: + + [B,A] +' [D,C] +' ... +' [Z,Y] [4] + + the result is the same as [1], except the bytes are swapped in + the sum! To see why this is so, observe that in both orders the + carries are the same: from bit 15 to bit 0 and from bit 7 to bit + 8. In other words, consistently swapping bytes simply rotates + the bits within the sum, but does not affect their internal + ordering. + + Therefore, the sum may be calculated in exactly the same way + regardless of the byte order ("big-endian" or "little-endian") + of the underlaying hardware. For example, assume a "little- + endian" machine summing data that is stored in memory in network + ("big-endian") order. Fetching each 16-bit word will swap + bytes, resulting in the sum [4]; however, storing the result + back into memory will swap the sum back into network byte order. + + Byte swapping may also be used explicitly to handle boundary + alignment problems. For example, the second group in [3] can be + calculated without concern to its odd/even origin, as: + + [K,L] +' ... +' [Z,0] + + if this sum is byte-swapped before it is added to the first + group. See the example below. + + (C) Parallel Summation + + On machines that have word-sizes that are multiples of 16 bits, + it is possible to develop even more efficient implementations. + Because addition is associative, we do not have to sum the + integers in the order they appear in the message. Instead we + can add them in "parallel" by exploiting the larger word size. + + To compute the checksum in parallel, simply do a 1's complement + addition of the message using the native word size of the + machine. For example, on a 32-bit machine we can add 4 bytes at + a time: [A,B,C,D]+'... When the sum has been computed, we "fold" + the long sum into 16 bits by adding the 16-bit segments. Each + 16-bit addition may produce new end-around carries that must be + added. + + Furthermore, again the byte order does not matter; we could + instead sum 32-bit words: [D,C,B,A]+'... or [B,A,D,C]+'... and + then swap the bytes of the final 16-bit sum as necessary. See + the examples below. Any permutation is allowed that collects + + + +Braden, Borman, & Partridge [Page 3] + +RFC 1071 Computing the Internet Checksum September 1988 + + + all the even-numbered data bytes into one sum byte and the odd- + numbered data bytes into the other sum byte. + + + There are further coding techniques that can be exploited to speed up + the checksum calculation. + + (1) Deferred Carries + + Depending upon the machine, it may be more efficient to defer + adding end-around carries until the main summation loop is + finished. + + One approach is to sum 16-bit words in a 32-bit accumulator, so + the overflows build up in the high-order 16 bits. This approach + typically avoids a carry-sensing instruction but requires twice + as many additions as would adding 32-bit segments; which is + faster depends upon the detailed hardware architecture. + + (2) Unwinding Loops + + To reduce the loop overhead, it is often useful to "unwind" the + inner sum loop, replicating a series of addition commands within + one loop traversal. This technique often provides significant + savings, although it may complicate the logic of the program + considerably. + + (3) Combine with Data Copying + + Like checksumming, copying data from one memory location to + another involves per-byte overhead. In both cases, the + bottleneck is essentially the memory bus, i.e., how fast the + data can be fetched. On some machines (especially relatively + slow and simple micro-computers), overhead can be significantly + reduced by combining memory-to-memory copy and the checksumming, + fetching the data only once for both. + + (4) Incremental Update + + Finally, one can sometimes avoid recomputing the entire checksum + when one header field is updated. The best-known example is a + gateway changing the TTL field in the IP header, but there are + other examples (for example, when updating a source route). In + these cases it is possible to update the checksum without + scanning the message or datagram. + + To update the checksum, simply add the differences of the + sixteen bit integers that have been changed. To see why this + works, observe that every 16-bit integer has an additive inverse + and that addition is associative. From this it follows that + given the original value m, the new value m', and the old + + + +Braden, Borman, & Partridge [Page 4] + +RFC 1071 Computing the Internet Checksum September 1988 + + + checksum C, the new checksum C' is: + + C' = C + (-m) + m' = C + (m' - m) + + +3. Numerical Examples + + We now present explicit examples of calculating a simple 1's + complement sum on a 2's complement machine. The examples show the + same sum calculated byte by bye, by 16-bits words in normal and + swapped order, and 32 bits at a time in 3 different orders. All + numbers are in hex. + + Byte-by-byte "Normal" Swapped + Order Order + + Byte 0/1: 00 01 0001 0100 + Byte 2/3: f2 03 f203 03f2 + Byte 4/5: f4 f5 f4f5 f5f4 + Byte 6/7: f6 f7 f6f7 f7f6 + --- --- ----- ----- + Sum1: 2dc 1f0 2ddf0 1f2dc + + dc f0 ddf0 f2dc + Carrys: 1 2 2 1 + -- -- ---- ---- + Sum2: dd f2 ddf2 f2dd + + Final Swap: dd f2 ddf2 ddf2 + + + Byte 0/1/2/3: 0001f203 010003f2 03f20100 + Byte 4/5/6/7: f4f5f6f7 f5f4f7f6 f7f6f5f4 + -------- -------- -------- + Sum1: 0f4f7e8fa 0f6f4fbe8 0fbe8f6f4 + + Carries: 0 0 0 + + Top half: f4f7 f6f4 fbe8 + Bottom half: e8fa fbe8 f6f4 + ----- ----- ----- + Sum2: 1ddf1 1f2dc 1f2dc + + ddf1 f2dc f2dc + Carrys: 1 1 1 + ---- ---- ---- + Sum3: ddf2 f2dd f2dd + + Final Swap: ddf2 ddf2 ddf2 + + + + + +Braden, Borman, & Partridge [Page 5] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Finally, here an example of breaking the sum into two groups, with + the second group starting on a odd boundary: + + + Byte-by-byte Normal + Order + + Byte 0/1: 00 01 0001 + Byte 2/ : f2 (00) f200 + --- --- ----- + Sum1: f2 01 f201 + + Byte 4/5: 03 f4 03f4 + Byte 6/7: f5 f6 f5f6 + Byte 8/: f7 (00) f700 + --- --- ----- + Sum2: 1f0ea + + Sum2: f0ea + Carry: 1 + ----- + Sum3: f0eb + + Sum1: f201 + Sum3 byte swapped: ebf0 + ----- + Sum4: 1ddf1 + + Sum4: ddf1 + Carry: 1 + ----- + Sum5: ddf2 + + + + + + + + + + + + + + + + + + + + + + +Braden, Borman, & Partridge [Page 6] + +RFC 1071 Computing the Internet Checksum September 1988 + + +4. Implementation Examples + + In this section we show examples of Internet checksum implementation + algorithms that have been found to be efficient on a variety of + CPU's. In each case, we show the core of the algorithm, without + including environmental code (e.g., subroutine linkages) or special- + case code. + +4.1 "C" + + The following "C" code algorithm computes the checksum with an inner + loop that sums 16-bits at a time in a 32-bit accumulator. + + in 6 + { + /* Compute Internet Checksum for "count" bytes + * beginning at location "addr". + */ + register long sum = 0; + + while( count > 1 ) { + /* This is the inner loop */ + sum += * (unsigned short) addr++; + count -= 2; + } + + /* Add left-over byte, if any */ + if( count > 0 ) + sum += * (unsigned char *) addr; + + /* Fold 32-bit sum to 16 bits */ + while (sum>>16) + sum = (sum & 0xffff) + (sum >> 16); + + checksum = ~sum; + } + + + + + + + + + + + + + + + + + + +Braden, Borman, & Partridge [Page 7] + +RFC 1071 Computing the Internet Checksum September 1988 + + +4.2 Motorola 68020 + + The following algorithm is given in assembler language for a Motorola + 68020 chip. This algorithm performs the sum 32 bits at a time, and + unrolls the loop with 16 replications. For clarity, we have omitted + the logic to add the last fullword when the length is not a multiple + of 4. The result is left in register d0. + + With a 20MHz clock, this routine was measured at 134 usec/kB summing + random data. This algorithm was developed by Van Jacobson. + + + movl d1,d2 + lsrl #6,d1 | count/64 = # loop traversals + andl #0x3c,d2 | Then find fractions of a chunk + negl d2 + andb #0xf,cc | Clear X (extended carry flag) + + jmp pc@(2$-.-2:b,d2) | Jump into loop + + 1$: | Begin inner loop... + + movl a0@+,d2 | Fetch 32-bit word + addxl d2,d0 | Add word + previous carry + movl a0@+,d2 | Fetch 32-bit word + addxl d2,d0 | Add word + previous carry + + | ... 14 more replications + 2$: + dbra d1,1$ | (NB- dbra doesn't affect X) + + movl d0,d1 | Fold 32 bit sum to 16 bits + swap d1 | (NB- swap doesn't affect X) + addxw d1,d0 + jcc 3$ + addw #1,d0 + 3$: + andl #0xffff,d0 + + + + + + + + + + + + + + + + +Braden, Borman, & Partridge [Page 8] + +RFC 1071 Computing the Internet Checksum September 1988 + + +4.3 Cray + + The following example, in assembler language for a Cray CPU, was + contributed by Charley Kline. It implements the checksum calculation + as a vector operation, summing up to 512 bytes at a time with a basic + summation unit of 32 bits. This example omits many details having to + do with short blocks, for clarity. + + Register A1 holds the address of a 512-byte block of memory to + checksum. First two copies of the data are loaded into two vector + registers. One is vector-shifted right 32 bits, while the other is + vector-ANDed with a 32 bit mask. Then the two vectors are added + together. Since all these operations chain, it produces one result + per clock cycle. Then it collapses the result vector in a loop that + adds each element to a scalar register. Finally, the end-around + carry is performed and the result is folded to 16-bits. + + EBM + A0 A1 + VL 64 use full vectors + S1 <32 form 32-bit mask from the right. + A2 32 + V1 ,A0,1 load packet into V1 + V2 S1&V1 Form right-hand 32-bits in V2. + V3 V1>A2 Form left-hand 32-bits in V3. + V1 V2+V3 Add the two together. + A2 63 Prepare to collapse into a scalar. + S1 0 + S4 <16 Form 16-bit mask from the right. + A4 16 + CK$LOOP S2 V1,A2 + A2 A2-1 + A0 A2 + S1 S1+S2 + JAN CK$LOOP + S2 S1&S4 Form right-hand 16-bits in S2 + S1 S1>A4 Form left-hand 16-bits in S1 + S1 S1+S2 + S2 S1&S4 Form right-hand 16-bits in S2 + S1 S1>A4 Form left-hand 16-bits in S1 + S1 S1+S2 + S1 #S1 Take one's complement + CMR At this point, S1 contains the checksum. + + + + + + + + + + + +Braden, Borman, & Partridge [Page 9] + +RFC 1071 Computing the Internet Checksum September 1988 + + +4.4 IBM 370 + + The following example, in assembler language for an IBM 370 CPU, sums + the data 4 bytes at a time. For clarity, we have omitted the logic + to add the last fullword when the length is not a multiple of 4, and + to reverse the bytes when necessary. The result is left in register + RCARRY. + + This code has been timed on an IBM 3090 CPU at 27 usec/KB when + summing all one bits. This time is reduced to 24.3 usec/KB if the + trouble is taken to word-align the addends (requiring special cases + at both the beginning and the end, and byte-swapping when necessary + to compensate for starting on an odd byte). + + * Registers RADDR and RCOUNT contain the address and length of + * the block to be checksummed. + * + * (RCARRY, RSUM) must be an even/odd register pair. + * (RCOUNT, RMOD) must be an even/odd register pair. + * + CHECKSUM SR RSUM,RSUM Clear working registers. + SR RCARRY,RCARRY + LA RONE,1 Set up constant 1. + * + SRDA RCOUNT,6 Count/64 to RCOUNT. + AR RCOUNT,RONE +1 = # times in loop. + SRL RMOD,26 Size of partial chunk to RMOD. + AR RADDR,R3 Adjust addr to compensate for + S RADDR,=F(64) jumping into the loop. + SRL RMOD,1 (RMOD/4)*2 is halfword index. + LH RMOD,DOPEVEC9(RMOD) Use magic dope-vector for offset, + B LOOP(RMOD) and jump into the loop... + * + * Inner loop: + * + LOOP AL RSUM,0(,RADDR) Add Logical fullword + BC 12,*+6 Branch if no carry + AR RCARRY,RONE Add 1 end-around + AL RSUM,4(,RADDR) Add Logical fullword + BC 12,*+6 Branch if no carry + AR RCARRY,RONE Add 1 end-around + * + * ... 14 more replications ... + * + A RADDR,=F'64' Increment address ptr + BCT RCOUNT,LOOP Branch on Count + * + * Add Carries into sum, and fold to 16 bits + * + ALR RCARRY,RSUM Add SUM and CARRY words + BC 12,*+6 and take care of carry + + + +Braden, Borman, & Partridge [Page 10] + +RFC 1071 Computing the Internet Checksum September 1988 + + + AR RCARRY,RONE + SRDL RCARRY,16 Fold 32-bit sum into + SRL RSUM,16 16-bits + ALR RCARRY,RSUM + C RCARRY,=X'0000FFFF' and take care of any + BNH DONE last carry + S RCARRY,=X'0000FFFF' + DONE X RCARRY,=X'0000FFFF' 1's complement + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Braden, Borman, & Partridge [Page 11] + +RFC 1071 Computing the Internet Checksum September 1988 + + + IEN 45 + Section 2.4.4.5 + + + + + + + + + + + + + + + + + TCP Checksum Function Design + + + + William W. Plummer + + + Bolt Beranek and Newman, Inc. + 50 Moulton Street + Cambridge MA 02138 + + + 5 June 1978 + + + + + + + + + + + + + + + + + + + + + + + +Braden, Borman, & Partridge [Page 12] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + 1. Introduction + + Checksums are included in packets in order that errors + encountered during transmission may be detected. For Internet + protocols such as TCP [1,9] this is especially important because + packets may have to cross wireless networks such as the Packet + Radio Network [2] and Atlantic Satellite Network [3] where + packets may be corrupted. Internet protocols (e.g., those for + real time speech transmission) can tolerate a certain level of + transmission errors and forward error correction techniques or + possibly no checksum at all might be better. The focus in this + paper is on checksum functions for protocols such as TCP where + the required reliable delivery is achieved by retransmission. + + Even if the checksum appears good on a message which has been + received, the message may still contain an undetected error. The + probability of this is bounded by 2**(-C) where C is the number + of checksum bits. Errors can arise from hardware (and software) + malfunctions as well as transmission errors. Hardware induced + errors are usually manifested in certain well known ways and it + is desirable to account for this in the design of the checksum + function. Ideally no error of the "common hardware failure" type + would go undetected. + + An example of a failure that the current checksum function + handles successfully is picking up a bit in the network interface + (or I/O buss, memory channel, etc.). This will always render the + checksum bad. For an example of how the current function is + inadequate, assume that a control signal stops functioning in the + network interface and the interface stores zeros in place of the + real data. These "all zero" messages appear to have valid + checksums. Noise on the "There's Your Bit" line of the ARPANET + Interface [4] may go undetected because the extra bits input may + cause the checksum to be perturbed (i.e., shifted) in the same + way as the data was. + + Although messages containing undetected errors will occasionally + be passed to higher levels of protocol, it is likely that they + will not make sense at that level. In the case of TCP most such + messages will be ignored, but some could cause a connection to be + aborted. Garbled data could be viewed as a problem for a layer + of protocol above TCP which itself may have a checksuming scheme. + + This paper is the first step in design of a new checksum function + for TCP and some other Internet protocols. Several useful + properties of the current function are identified. If possible + + - 1 - + + + +Braden, Borman, & Partridge [Page 13] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + these should be retained in any new function. A number of + plausible checksum schemes are investigated. Of these only the + "product code" seems to be simple enough for consideration. + + 2. The Current TCP Checksum Function + + The current function is oriented towards sixteen-bit machines + such as the PDP-11 but can be computed easily on other machines + (e.g., PDP-10). A packet is thought of as a string of 16-bit + bytes and the checksum function is the one's complement sum (add + with end-around carry) of those bytes. It is the one's + complement of this sum which is stored in the checksum field of + the TCP header. Before computing the checksum value, the sender + places a zero in the checksum field of the packet. If the + checksum value computed by a receiver of the packet is zero, the + packet is assumed to be valid. This is a consequence of the + "negative" number in the checksum field exactly cancelling the + contribution of the rest of the packet. + + Ignoring the difficulty of actually evaluating the checksum + function for a given packet, the way of using the checksum + described above is quite simple, but it assumes some properties + of the checksum operator (one's complement addition, "+" in what + follows): + + (P1) + is commutative. Thus, the order in which + the 16-bit bytes are "added" together is + unimportant. + + (P2) + has at least one identity element (The + current function has two: +0 and -0). This + allows the sender to compute the checksum + function by placing a zero in the packet checksum + field before computing the value. + + (P3) + has an inverse. Thus, the receiver may + evaluate the checksum function and expect a zero. + + (P4) + is associative, allowing the checksum field + to be anywhere in the packet and the 16-bit bytes + to be scanned sequentially. + + Mathematically, these properties of the binary operation "+" over + the set of 16-bit numbers forms an Abelian group [5]. Of course, + there are many Abelian groups but not all would be satisfactory + for use as checksum operators. (Another operator readily + + - 2 - + + + +Braden, Borman, & Partridge [Page 14] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + available in the PDP-11 instruction set that has all of these + properties is exclusive-OR, but XOR is unsatisfactory for other + reasons.) + + Albeit imprecise, another property which must be preserved in any + future checksum scheme is: + + (P5) + is fast to compute on a variety of machines + with limited storage requirements. + + The current function is quite good in this respect. On the + PDP-11 the inner loop looks like: + + LOOP: ADD (R1)+,R0 ; Add the next 16-bit byte + ADC R0 ; Make carry be end-around + SOB R2,LOOP ; Loop over entire packet. + + ( 4 memory cycles per 16-bit byte ) + + On the PDP-10 properties P1-4 are exploited further and two + 16-bit bytes per loop are processed: + + LOOP: ILDB THIS,PTR ; Get 2 16-bit bytes + ADD SUM,THIS ; Add into current sum + JUMPGE SUM,CHKSU2 ; Jump if fewer than 8 carries + LDB THIS,[POINT 20,SUM,19] ; Get left 16 and carries + ANDI SUM,177777 ; Save just low 16 here + ADD SUM,THIS ; Fold in carries + CHKSU2: SOJG COUNT,LOOP ; Loop over entire packet + + ( 3.1 memory cycles per 16-bit byte ) + + The "extra" instruction in the loops above are required to + convert the two's complement ADD instruction(s) into a one's + complement add by making the carries be end-around. One's + complement arithmetic is better than two's complement because it + is equally sensitive to errors in all bit positions. If two's + complement addition were used, an even number of 1's could be + dropped (or picked up) in the most significant bit channel + without affecting the value of the checksum. It is just this + property that makes some sort of addition preferable to a simple + exclusive-OR which is frequently used but permits an even number + of drops (pick ups) in any bit channel. RIM10B paper tape format + used on PDP-10s [10] uses two's complement add because space for + the loader program is extremely limited. + + - 3 - + + + + +Braden, Borman, & Partridge [Page 15] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + Another property of the current checksum scheme is: + + (P6) Adding the checksum to a packet does not change + the information bytes. Peterson [6] calls this a + "systematic" code. + + This property allows intermediate computers such as gateway + machines to act on fields (i.e., the Internet Destination + Address) without having to first decode the packet. Cyclical + Redundancy Checks used for error correction are not systematic + either. However, most applications of CRCs tend to emphasize + error detection rather than correction and consequently can send + the message unchanged, with the CRC check bits being appended to + the end. The 24-bit CRC used by ARPANET IMPs and Very Distant + Host Interfaces [4] and the ANSI standards for 800 and 6250 bits + per inch magnetic tapes (described in [11]) use this mode. + + Note that the operation of higher level protocols are not (by + design) affected by anything that may be done by a gateway acting + on possibly invalid packets. It is permissible for gateways to + validate the checksum on incoming packets, but in general + gateways will not know how to do this if the checksum is a + protocol-specific feature. + + A final property of the current checksum scheme which is actually + a consequence of P1 and P4 is: + + (P7) The checksum may be incrementally modified. + + This property permits an intermediate gateway to add information + to a packet, for instance a timestamp, and "add" an appropriate + change to the checksum field of the packet. Note that the + checksum will still be end-to-end since it was not fully + recomputed. + + 3. Product Codes + + Certain "product codes" are potentially useful for checksuming + purposes. The following is a brief description of product codes + in the context of TCP. More general treatment can be found in + Avizienis [7] and probably other more recent works. + + The basic concept of this coding is that the message (packet) to + be sent is formed by transforming the original source message and + adding some "check" bits. By reading this and applying a + (possibly different) transformation, a receiver can reconstruct + + - 4 - + + + +Braden, Borman, & Partridge [Page 16] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + the original message and determine if it has been corrupted + during transmission. + + Mo Ms Mr + + ----- ----- ----- + | A | code | 7 | decode | A | + | B | ==> | 1 | ==> | B | + | C | | 4 | | C | + ----- |...| ----- + | 2 | check plus "valid" flag + ----- info + + Original Sent Reconstructed + + With product codes the transformation is Ms = K * Mo . That is, + the message sent is simply the product of the original message + Mo and some well known constant K . To decode, the received + Ms is divided by K which will yield Mr as the quotient and + 0 as the remainder if Mr is to be considered the same as Mo . + + The first problem is selecting a "good" value for K, the "check + factor". K must be relatively prime to the base chosen to + express the message. (Example: Binary messages with K + incorrectly chosen to be 8. This means that Ms looks exactly + like Mo except that three zeros have been appended. The only + way the message could look bad to a receiver dividing by 8 is if + the error occurred in one of those three bits.) + + For TCP the base R will be chosen to be 2**16. That is, every + 16-bit byte (word on the PDP-11) will be considered as a digit of + a big number and that number is the message. Thus, + + Mo = SIGMA [ Bi * (R**i)] , Bi is i-th byte + i=0 to N + + Ms = K * Mo + + Corrupting a single digit of Ms will yield Ms' = Ms +or- + C*(R**j) for some radix position j . The receiver will compute + Ms'/K = Mo +or- C(R**j)/K. Since R and K are relatively prime, + C*(R**j) cannot be any exact multiple of K. Therefore, the + division will result in a non-zero remainder which indicates that + Ms' is a corrupted version of Ms. As will be seen, a good + choice for K is (R**b - 1), for some b which is the "check + length" which controls the degree of detection to be had for + + - 5 - + + + +Braden, Borman, & Partridge [Page 17] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + burst errors which affect a string of digits (i.e., 16-bit bytes) + in the message. In fact b will be chosen to be 1, so K will + be 2**16 - 1 so that arithmetic operations will be simple. This + means that all bursts of 15 or fewer bits will be detected. + According to [7] this choice for b results in the following + expression for the fraction of undetected weight 2 errors: + + f = 16(k-1)/[32(16k-3) + (6/k)] where k is the message length. + + For large messages f approaches 3.125 per cent as k goes to + infinity. + + Multiple precision multiplication and division are normally quite + complex operations, especially on small machines which typically + lack even single precision multiply and divide operations. The + exception to this is exactly the case being dealt with here -- + the factor is 2**16 - 1 on machines with a word length of 16 + bits. The reason for this is due to the following identity: + + Q*(R**j) = Q, mod (R-1) 0 <= Q < R + + That is, any digit Q in the selected radix (0, 1, ... R-1) + multiplied by any power of the radix will have a remainder of Q + when divided by the radix minus 1. + + Example: In decimal R = 10. Pick Q = 6. + + 6 = 0 * 9 + 6 = 6, mod 9 + 60 = 6 * 9 + 6 = 6, mod 9 + 600 = 66 * 9 + 6 = 6, mod 9 etc. + + More to the point, rem(31415/9) = rem((30000+1000+400+10+5)/9) + = (3 mod 9) + (1 mod 9) + (4 mod 9) + (1 mod 9) + (5 mod 9) + = (3+1+4+1+5) mod 9 + = 14 mod 9 + = 5 + + So, the remainder of a number divided by the radix minus one can + be found by simply summing the digits of the number. Since the + radix in the TCP case has been chosen to be 2**16 and the check + factor is 2**16 - 1, a message can quickly be checked by summing + all of the 16-bit words (on a PDP-11), with carries being + end-around. If zero is the result, the message can be considered + valid. Thus, checking a product coded message is exactly the + same complexity as with the current TCP checksum! + + - 6 - + + + + +Braden, Borman, & Partridge [Page 18] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + In order to form Ms, the sender must multiply the multiple + precision "number" Mo by 2**16 - 1. Or, Ms = (2**16)Mo - Mo. + This is performed by shifting Mo one whole word's worth of + precision and subtracting Mo. Since carries must propagate + between digits, but it is only the current digit which is of + interest, one's complement arithmetic is used. + + (2**16)Mo = Mo0 + Mo1 + Mo2 + ... + MoX + 0 + - Mo = - ( Mo0 + Mo1 + ......... + MoX) + --------- ---------------------------------- + Ms = Ms0 + Ms1 + ... - MoX + + A loop which implements this function on a PDP-11 might look + like: + LOOP: MOV -2(R2),R0 ; Next byte of (2**16)Mo + SBC R0 ; Propagate carries from last SUB + SUB (R2)+,R0 ; Subtract byte of Mo + MOV R0,(R3)+ ; Store in Ms + SOB R1,LOOP ; Loop over entire message + ; 8 memory cycles per 16-bit byte + + Note that the coding procedure is not done in-place since it is + not systematic. In general the original copy, Mo, will have to + be retained by the sender for retransmission purposes and + therefore must remain readable. Thus the MOV R0,(R3)+ is + required which accounts for 2 of the 8 memory cycles per loop. + + The coding procedure will add exactly one 16-bit word to the + message since Ms < (2**16)Mo . This additional 16 bits will be + at the tail of the message, but may be moved into the defined + location in the TCP header immediately before transmission. The + receiver will have to undo this to put Ms back into standard + format before decoding the message. + + The code in the receiver for fully decoding the message may be + inferred by observing that any word in Ms contains the + difference between two successive words of Mo minus the carries + from the previous word, and the low order word contains minus the + low word of Mo. So the low order (i.e., rightmost) word of Mr is + just the negative of the low order byte of Ms. The next word of + Mr is the next word of Ms plus the just computed word of Mr + plus the carry from that previous computation. + + A slight refinement of the procedure is required in order to + protect against an all-zero message passing to the destination. + This will appear to have a valid checksum because Ms'/K = 0/K + + - 7 - + + + +Braden, Borman, & Partridge [Page 19] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + = 0 with 0 remainder. The refinement is to make the coding be + Ms = K*Mo + C where C is some arbitrary, well-known constant. + Adding this constant requires a second pass over the message, but + this will typically be very short since it can stop as soon as + carries stop propagating. Chosing C = 1 is sufficient in most + cases. + + The product code checksum must be evaluated in terms of the + desired properties P1 - P7. It has been shown that a factor of + two more machine cycles are consumed in computing or verifying a + product code checksum (P5 satisfied?). + + Although the code is not systematic, the checksum can be verified + quickly without decoding the message. If the Internet + Destination Address is located at the least significant end of + the packet (where the product code computation begins) then it is + possible for a gateway to decode only enough of the message to + see this field without having to decode the entire message. + Thus, P6 is at least partially satisfied. The algebraic + properties P1 through P4 are not satisfied, but only a small + amount of computation is needed to account for this -- the + message needs to be reformatted as previously mentioned. + + P7 is satisfied since the product code checksum can be + incrementally updated to account for an added word, although the + procedure is somewhat involved. Imagine that the original + message has two halves, H1 and H2. Thus, Mo = H1*(R**j) + H2. + The timestamp word is to be inserted between these halves to form + a modified Mo' = H1*(R**(j+1)) + T*(R**j) + H2. Since K has + been chosen to be R-1, the transmitted message Ms' = Mo'(R-1). + Then, + + Ms' = Ms*R + T(R-1)(R**j) + P2((R-1)**2) + + = Ms*R + T*(R**(j+1)) + T*(R**j) + P2*(R**2) - 2*P2*R - P2 + + Recalling that R is 2**16, the word size on the PDP-11, + multiplying by R means copying down one word in memory. So, + the first term of Ms' is simply the unmodified message copied + down one word. The next term is the new data T added into the + Ms' being formed beginning at the (j+1)th word. The addition is + fairly easy here since after adding in T all that is left is + propagating the carry, and that can stop as soon as no carry is + produced. The other terms can be handle similarly. + + - 8 - + + + + + +Braden, Borman, & Partridge [Page 20] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + 4. More Complicated Codes + + There exists a wealth of theory on error detecting and correcting + codes. Peterson [6] is an excellent reference. Most of these + "CRC" schemes are designed to be implemented using a shift + register with a feedback network composed of exclusive-ORs. + Simulating such a logic circuit with a program would be too slow + to be useful unless some programming trick is discovered. + + One such trick has been proposed by Kirstein [8]. Basically, a + few bits (four or eight) of the current shift register state are + combined with bits from the input stream (from Mo) and the result + is used as an index to a table which yields the new shift + register state and, if the code is not systematic, bits for the + output stream (Ms). A trial coding of an especially "good" CRC + function using four-bit bytes showed showed this technique to be + about four times as slow as the current checksum function. This + was true for both the PDP-10 and PDP-11 machines. Of the + desirable properties listed above, CRC schemes satisfy only P3 + (It has an inverse.), and P6 (It is systematic.). Placement of + the checksum field in the packet is critical and the CRC cannot + be incrementally modified. + + Although the bulk of coding theory deals with binary codes, most + of the theory works if the alphabet contains q symbols, where + q is a power of a prime number. For instance q taken as 2**16 + should make a great deal of the theory useful on a word-by-word + basis. + + 5. Outboard Processing + + When a function such as computing an involved checksum requires + extensive processing, one solution is to put that processing into + an outboard processor. In this way "encode message" and "decode + message" become single instructions which do not tax the main + host processor. The Digital Equipment Corporation VAX/780 + computer is equipped with special hardware for generating and + checking CRCs [13]. In general this is not a very good solution + since such a processor must be constructed for every different + host machine which uses TCP messages. + + It is conceivable that the gateway functions for a large host may + be performed entirely in an "Internet Frontend Machine". This + machine would be responsible for forwarding packets received + + - 9 - + + + + + +Braden, Borman, & Partridge [Page 21] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + either from the network(s) or from the Internet protocol modules + in the connected host, and for reassembling Internet fragments + into segments and passing these to the host. Another capability + of this machine would be to check the checksum so that the + segments given to the host are known to be valid at the time they + leave the frontend. Since computer cycles are assumed to be both + inexpensive and available in the frontend, this seems reasonable. + + The problem with attempting to validate checksums in the frontend + is that it destroys the end-to-end character of the checksum. If + anything, this is the most powerful feature of the TCP checksum! + There is a way to make the host-to-frontend link be covered by + the end-to-end checksum. A separate, small protocol must be + developed to cover this link. After having validated an incoming + packet from the network, the frontend would pass it to the host + saying "here is an Internet segment for you. Call it #123". The + host would save this segment, and send a copy back to the + frontend saying, "Here is what you gave me as #123. Is it OK?". + The frontend would then do a word-by-word comparison with the + first transmission, and tell the host either "Here is #123 + again", or "You did indeed receive #123 properly. Release it to + the appropriate module for further processing." + + The headers on the messages crossing the host-frontend link would + most likely be covered by a fairly strong checksum so that + information like which function is being performed and the + message reference numbers are reliable. These headers would be + quite short, maybe only sixteen bits, so the checksum could be + quite strong. The bulk of the message would not be checksumed of + course. + The reason this scheme reduces the computing burden on the host + is that all that is required in order to validate the message + using the end-to-end checksum is to send it back to the frontend + machine. In the case of the PDP-10, this requires only 0.5 + memory cycles per 16-bit byte of Internet message, and only a few + processor cycles to setup the required transfers. + + 6. Conclusions + + There is an ordering of checksum functions: first and simplest is + none at all which provides no error detection or correction. + Second, is sending a constant which is checked by the receiver. + This also is extremely weak. Third, the exclusive-OR of the data + may be sent. XOR takes the minimal amount of computer time to + generate and check, but is not a good checksum. A two's + complement sum of the data is somewhat better and takes no more + + - 10 - + + + +Braden, Borman, & Partridge [Page 22] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + computer time to compute. Fifth, is the one's complement sum + which is what is currently used by TCP. It is slightly more + expensive in terms of computer time. The next step is a product + code. The product code is strongly related to one's complement + sum, takes still more computer time to use, provides a bit more + protection against common hardware failures, but has some + objectionable properties. Next is a genuine CRC polynomial code, + used for checking purposes only. This is very expensive for a + program to implement. Finally, a full CRC error correcting and + detecting scheme may be used. + + For TCP and Internet applications the product code scheme is + viable. It suffers mainly in that messages must be (at least + partially) decoded by intermediate gateways in order that they + can be forwarded. Should product codes not be chosen as an + improved checksum, some slight modification to the existing + scheme might be possible. For instance the "add and rotate" + function used for paper tape by the PDP-6/10 group at the + Artificial Intelligence Laboratory at M.I.T. Project MAC [12] + could be useful if it can be proved that it is better than the + current scheme and that it can be computed efficiently on a + variety of machines. + + + + + + + + + + + + + + + + + + + + + + - 11 - + + + + + + + +Braden, Borman, & Partridge [Page 23] + +RFC 1071 Computing the Internet Checksum September 1988 + + + Internet Experiment Note 45 5 June 1978 + TCP Checksum Function Design William W. Plummer + + References + + [1] Cerf, V.G. and Kahn, Robert E., "A Protocol for Packet Network + Communications," IEEE Transactions on Communications, vol. + COM-22, No. 5, May 1974. + + [2] Kahn, Robert E., "The Organization of Computer Resources into + a Packet Radio Network", IEEE Transactions on Communications, + vol. COM-25, no. 1, pp. 169-178, January 1977. + + [3] Jacobs, Irwin, et al., "CPODA - A Demand Assignment Protocol + for SatNet", Fifth Data Communications Symposium, September + 27-9, 1977, Snowbird, Utah + + [4] Bolt Beranek and Newman, Inc. "Specifications for the + Interconnection of a Host and an IMP", Report 1822, January + 1976 edition. + + [5] Dean, Richard A., "Elements of Abstract Algebra", John Wyley + and Sons, Inc., 1966 + + [6] Peterson, W. Wesley, "Error Correcting Codes", M.I.T. Press + Cambridge MA, 4th edition, 1968. + + [7] Avizienis, Algirdas, "A Study of the Effectiveness of Fault- + Detecting Codes for Binary Arithmetic", Jet Propulsion + Laboratory Technical Report No. 32-711, September 1, 1965. + + [8] Kirstein, Peter, private communication + + [9] Cerf, V. G. and Postel, Jonathan B., "Specification of + Internetwork Transmission Control Program Version 3", + University of Southern California Information Sciences + Institute, January 1978. + + [10] Digital Equipment Corporation, "PDP-10 Reference Handbook", + 1970, pp. 114-5. + + [11] Swanson, Robert, "Understanding Cyclic Redundancy Codes", + Computer Design, November, 1975, pp. 93-99. + + [12] Clements, Robert C., private communication. + + [13] Conklin, Peter F., and Rodgers, David P., "Advanced + Minicomputer Designed by Team Evaluation of Hardware/Software + Tradeoffs", Computer Design, April 1978, pp. 136-7. + + - 12 - + + + +Braden, Borman, & Partridge [Page 24] + |