diff options
Diffstat (limited to 'doc/rfc/rfc1810.txt')
-rw-r--r-- | doc/rfc/rfc1810.txt | 395 |
1 files changed, 395 insertions, 0 deletions
diff --git a/doc/rfc/rfc1810.txt b/doc/rfc/rfc1810.txt new file mode 100644 index 0000000..77817f4 --- /dev/null +++ b/doc/rfc/rfc1810.txt @@ -0,0 +1,395 @@ + + + + + + +Network Working Group J. Touch +Request for Comments: 1810 ISI +Category: Informational June 1995 + + + Report on MD5 Performance + +Status of this Memo + + This memo provides information for the Internet community. This memo + does not specify an Internet standard of any kind. Distribution of + this memo is unlimited. + +Abstract + + MD5 is an authentication algorithm, which has been proposed as the + default authentication option in IPv6. When enabled, the MD5 + algorithm operates over the entire data packet, including header. + This RFC addresses how fast MD5 can be implemented in software and + hardware, and whether it supports currently available IP bandwidth. + MD5 can be implemented in existing hardware technology at 256 Mbps, + and in software at 87 Mbps. These rates cannot support current IP + rates, e.g., 100 Mbps TCP and 130 Mbps UDP over ATM. If MD5 cannot + support existing network bandwidth using existing technology, it will + not scale as network speeds increase in the future. This RFC is + intended to alert the IP community about the performance limitations + of MD5, and to suggest that alternatives be considered for use in + high speed IP implementations. + +Introduction + + MD5 is an authentication algorithm, which has been proposed as one + authentication option in IPv6 [1]. RFC 1321 describes the MD5 + algorithm and gives a reference implementation [3]. When enabled, + the MD5 algorithm operates over the entire data packet, including + header (with dummy values for volatile fields). This RFC addresses + how fast MD5 can be implemented in software and hardware, and whether + it supports currently available IP bandwidth. + + This RFC considers the general issue of checksumming and security at + high speed in IPv6. IPv6 has no header checksum (which IPv4 has + [5]), but proposes an authentication digest over the entire body of + the packet (including header where volatile fields are zeroed) [1]. + This RFC specifically addresses the performance of that + authentication mechanism. + + + + + + +Touch Informational [Page 1] + +RFC 1810 Report on MD5 Performance June 1995 + + +Measurements + + The performance of MD5 was measured. The code was an optimized + version of the MD5 reference implementation from the RFC [3], and is + available for anonymous FTP [7]. The following are the results of + the performance test "md5 -t", modified to prohibit on-chip caching + of the data block: + + 87 Mbps DEC Alpha (190 Mhz) + 33 Mbps HP 9000/720 + 48 Mbps IBM RS/6000 7006 (PPC 601 @80 Mhz) + 31 Mbps Intel i486/66 NetBSD + 44 Mbps Intel Pentium/90 NeXTStep + 52 Mbps SGI/IP-20 IRIX 5.2 + 37 Mbps Sun SPARC-10/51, SPARC-20/50 SunOS 4.1.3 + 57 Mbps Sun SPARC-20/71 SunOS 4.1.3 + + These rates do not keep up with currently available IP bandwidth, + e.g., 100 Mbps TCP and 130 Mbps UDP over a Fore SBA-200 ATM host + interface in a Sun SPARC-20/71. + + Values as high as 100 Mbps have been reported for the DEC Alpha (190 + Mhz). These values reflect on-chip caching of the data. It is not + clear at this time whether in-memory, off-chip cache, or on-chip + cache performance measures are more relevant to IP performance. + +Analysis of the MD5 Algorithm + + The MD5 algorithm is a block-chained hashing algorithm. The first + block is hashed with an initial seed, resulting in a hash. The hash + is summed with the seed, and that result becomes the seed for the + next block. When the last block is computed, it's "next-seed' value + becomes the hash for the entire stream. Thus, the seed for block + depends on both the hash and the seed of its preceding block. As a + result, blocks cannot be hashed in parallel. + + Each 16-word (64-byte) block is hashed via 64 basic steps, using a + 4-word intermediate hash, and collapsing the intermediate hash at the + end. The 64 steps are 16 groups of 4 steps, one step per + intermediate hash word. This RFC uses the following notation (as + from RFC-1321 [3]): + + A,B,C,D intermediate hash words + X[i] input data block + T[i] sine table lookup + << i rotate i bits + F logical functions of 3 args + + + + +Touch Informational [Page 2] + +RFC 1810 Report on MD5 Performance June 1995 + + + The subscripts to X, I, and << are fixed for each step, and are + omitted here. There are four different logical functions, also + omitted. Each 4-step group looks like: + + A = B + ((A + F(B,C,D) + X[i] + T[i]) << i) + D = A + ((D + F(A,B,C) + X[i] + T[i]) << i) + C = D + ((C + F(D,A,B) + X[i] + T[i]) << i) + B = C + ((B + F(C,D,A) + X[i] + T[i]) << i) + + Note that this has the general form shown below. Due to the + complexity of the function 'f', these equations cannot be transformed + into a less serial set. + + A = f(D); B = f(A); C = f(B); D = f(C) + + Each steps is composed of two table lookups, one rotation, a 3- + component logical operation, and 4 additions. The best + parallelization possible leaves F(x,y,z) to the last step, waiting as + long as possible for the result from the previous step. The + resulting tree is shown below. + + (t0) B* C C D X T + | | | | | | + | | | | | | + \/ \/ \ / + t1 op op A + X T + \ / \ / | | + \ / \ / | | + \/ \/ \ / + t2 op + (t0) B* C C D A + + \ / | | | | \ / + \ / \ | | / \ / + \ / \\// \/ + t3 + t1 op + + | \ / + | \ / + | \ / + t4 << B* t2 + B* + \ / \ / + \ / << / + \ / \ / + t5 + t3 + + | | + | | + | | + A** A** + + Binary operation tree Optimized hardware tree + + + +Touch Informational [Page 3] + +RFC 1810 Report on MD5 Performance June 1995 + + + This diagram assumes that each operation takes one unit time. The + tree shows the items that depend on the previous step as B*, and the + item that the next step depends on as A**. Sequences of the binary + operation tree cannot be overlapped, but the optimized hardware tree + can (by one time step). + + There are 4 steps processed per word of input, ignoring inter-block + processing. The speed of the overall algorithm depends on how fast + we can process these 4 steps, vs. the bandwidth of one word of input + being processed. + + The binary tree takes 5 time units per step of the algorithm, and + permits at best 3-way parallelism (at time t1). In software, this + means it takes 5 * 4 = 20 instructions per word input. A computer + capable of M MIPS can support a data bandwidth of M/20 * 32 Mbps, + i.e., bits per second equal to 1.6x its MIPS rate. Therefore, a 100 + MIPS machine can support a 160 Mbps stream. + + Parallel software rate in Mbps = 1.6 * MIPS rate + + This assumes that register reads and writes are overlapped with + computation entirely. Without any parallelism, there are 8 + operations per step, and 4 steps per word, so 32 operations per word, + i.e., the data rate in Mbps would be identical to the MIPS rate: + + Serial software rate in Mbps = MIPS rate + + Predictions using SpecInt92 numbers as MIPS estimators can be + compared with measured rates [2]: + + Spec- Predicted MD5 + Int92 Upper-Bound Measured Machine + ------------------------------------------------------------ + 122 122-195 87 Mbps DEC Alpha (190 Mhz) + 48 48- 77 33 Mbps HP 9000/720 + 88 88-141 48 Mbps IBM RS/6000 7006 (PPC 601 @80 Mhz) + 32 32- 51 31 Mbps Intel i486/33 NetBSD + 90 90-144 44 Mbps Intel Pentium/90 NeXTStep + 90 90-144 52 Mbps SGI/IP-20 IRIX 5.2 + 65 65-104 37 Mbps Sun SPARC-10/51 SunOS 4.1.3 + 126 126-202 57 Mbps Sun SPARC-20/71 SunOS 4.1.3 + + The hardware rate takes 3 time units per step, i.e. 3 * 4 = 12 time + units per word of input. Hardware capable of doing an operation + (e.g., 32-bit addition) in N nanoseconds can support a data bandwidth + of 32/12/N bps, i.e., 2/3N bps. + + Hardware rate in Mbps = 8/3N * 1,000 + + + +Touch Informational [Page 4] + +RFC 1810 Report on MD5 Performance June 1995 + + + For CMOS, an operation (32-bit addition, including register retrieval + and storage) costs about 5.2 ns (2.6 ns per add, 2 ns for latching) + [6]. There are 6 clocks through the most highly-parallelized + implementation, resulting in 31.2 ns per 32-bit word, or 256 Mbps + [6]. This will not keep pace with existing hardware, which is + capable of link speeds in excess of 622 Mbps (ATM). + + By comparison, IPv4 uses the Internet Checksum [5]. This checksum + can be performed in 32-bit-wide units in excess of 1 Gbps in an + existing, low-cost PLD. The checksum can also be parallelized by + computing partial sums and reducing the result. + +One Proposed Solution + + There are several ways to increase the performance of the IPv6 + authentication mechanism. One is to increase the hardware + performance of MD5 by slightly modifying the algorithm, the other is + to propose a replacement algorithm. This RFC discusses briefly the + modification of MD5 for high-speed hardware implementation. + Alternate algorithms, capable of 3.5x the speed of MD5, have been + discussed elsewhere [6]. + + MD5 uses block chaining to ensure sensitivity to block order. Block + chaining also prevents arbitrary parallelism, which can be as much a + benefit to the spoofer as to the user. MD5 can be slightly altered + to accommodate a higher bandwidth data rate. There should be a + predetermined finite number of blocks processed from independent + seeds, such that the I-th block is part of the "I mod K"-th chain. + The resulting K digests themselves form a message, which can be MD5- + encoded using a single-block algorithm. This idea was proposed + independently by the author and by Burt Kaliski of RSA. + + The goal is to support finite parallelism to provide adequate + bandwidth at current processing rates, without providing arbitrary + power for spoofing. It would require further analysis to ensure that + it provides an adequate level of security. + + For current technology and network bandwidth, a minimum of 4-way + parallel chaining would suffice, and 16-way chaining would be + preferable. This would support network bandwidth of 1 Gbps with 4- + way chaining, in CMOS hardware. The chaining parallelism should be a + multiple of 4-way, to generate a complete block of digests (4 words + per digest, 16 words per block). This modification is believed to + achieve the goals of MD5, without the penalties of implementation of + the current MD5 algorithm. + + + + + + +Touch Informational [Page 5] + +RFC 1810 Report on MD5 Performance June 1995 + + +Security Considerations + + This entire document addresses a mechanism for providing security in + IPv6. MD5 is the proposed default optional authentication mechanism + for IPv6 traffic. This RFC specifically addresses the concern that + security mechanisms such as MD5 that cannot support high bandwidth + with available hardware will compromise their deployment, and + ultimately, the security of the systems they are intended to + maintain. + + The IPv6 requirements document emphasizes that IPv6 implementations + should not compromise performance, compared to IPv4. This is + presumably despite IPv6's increased functionality. "Required + optional" components of IPv6 should be held to this same standard. + MD5 compromises performance, and so its use as a required default + option in IPv6 should be reconsidered. + + The use of MD5 as the default to the required authentication option + may compromise security in high-bandwidth systems, because enabling + the option causes performance degradation, defeating its inclusion as + an IPv6 option. As a result, the authentication option may be + disabled entirely. + + It is important to the use of authentication in high-performance + systems that an alternative mechanism be available in IPv6 from the + outset. This may require the specification of multiple "required" + authentication algorithms - one that's slower but believed strong, + and one that's faster but may inspire somewhat less confidence. + +Conclusions + + MD5 cannot be implemented in existing technology at rates in excess + of 256 Mbps in hardware, or 86 Mbps in software. MD5 is a proposed + authentication option in IPv6, a protocol that should support + existing networking technology, which is capable of 130 Mbps UDP. + + As a result, MD5 cannot be used to support IP authentication in + existing networks at existing rates. Although MD5 will support + higher bandwidth in the future due to technological advances, these + will be offset by similar advances in networking. If MD5 cannot + support existing network bandwidth using existing technology, it will + not be able to scale as network speeds increase in the future. This + RFC proposes that MD5 be modified to support a 16-way block chaining, + in order to allow existing technology (CMOS hardware) to support + existing networking rates (1 Gbps). It further proposes that + alternatives to MD5 be considered for use in high-speed networks. + + + + + +Touch Informational [Page 6] + +RFC 1810 Report on MD5 Performance June 1995 + + +Acknowledgements + + The author would like to thank Steve Kent at BBN, Burt Kaliski, + Victor Chang, and Steve Burnett at RSA, Ran Atkinson at the NRL, and + the HPCC Division at ISI for reviewing the contents of this document. + Burt independently suggested the block-parallel modification proposed + here. + +References + + [1] Atkinson, R., "IPv6 Authentication Header", Work in Progress, + Naval Research Lab, February 1995. + + [2] DiMarco, J., "Spec Benchmark table, V. 4.12", + <ftp://ftp.cfd.toronto.edu/pub/spectable>. + + [3] Rivest, R., "The MD5 Message-Digest Algorithm", RFC1321, MIT LCS + & RSA Data Security, Inc., April 1992. + + [4] Partridge, C., and F. Kastenholz, "Technical Criteria for + Choosing IP The Next Generation (IPng)", RFC 1726, BBN Systems + and Technologies, FTP Software, December 1994. + + [5] Postel, J., "Internet Protocol - DARPA Internet Program Protocol + Specification," STD 5, RFC 791, USC/Information Sciences + Institute, September 1981. + + [6] Touch, J., "Performance Analysis fo MD5," to appear in ACM + Sigcomm '95, Boston. + + [7] Touch, J., Optimized MD5 software, <ftp://ftp.isi.edu/pub/hpcc- + papers/touch/md5-opt.tar>. + +Author's Address + + Joe Touch + Information Sciences Institute + University of Southern California + 4676 Admiralty Way + Marina del Rey, CA 90292-6695 + USA + + Phone: +1 310-822-1511 x151 + Fax: +1 310-823-6714 + URL: ftp://ftp.isi.edu/pub/hpcc-papers/touch + EMail: touch@isi.edu + + + + + +Touch Informational [Page 7] + |