diff options
Diffstat (limited to 'doc/rfc/rfc4963.txt')
-rw-r--r-- | doc/rfc/rfc4963.txt | 563 |
1 files changed, 563 insertions, 0 deletions
diff --git a/doc/rfc/rfc4963.txt b/doc/rfc/rfc4963.txt new file mode 100644 index 0000000..f02e72d --- /dev/null +++ b/doc/rfc/rfc4963.txt @@ -0,0 +1,563 @@ + + + + + + +Network Working Group J. Heffner +Request for Comments: 4963 M. Mathis +Category: Informational B. Chandler + PSC + July 2007 + + + IPv4 Reassembly Errors at High Data Rates + +Status of This Memo + + This memo provides information for the Internet community. It does + not specify an Internet standard of any kind. Distribution of this + memo is unlimited. + +Copyright Notice + + Copyright (C) The IETF Trust (2007). + +Abstract + + IPv4 fragmentation is not sufficiently robust for use under some + conditions in today's Internet. At high data rates, the 16-bit IP + identification field is not large enough to prevent frequent + incorrectly assembled IP fragments, and the TCP and UDP checksums are + insufficient to prevent the resulting corrupted datagrams from being + delivered to higher protocol layers. This note describes some easily + reproduced experiments demonstrating the problem, and discusses some + of the operational implications of these observations. + + + + + + + + + + + + + + + + + + + + + + +Heffner, et al. Informational [Page 1] + +RFC 4963 IPv4 Reassembly Errors at High Data Rates July 2007 + + +1. Introduction + + The IPv4 header was designed at a time when data rates were several + orders of magnitude lower than those achievable today. This document + describes a consequent scale-related failure in the IP identification + (ID) field, where fragments may be incorrectly assembled at a rate + high enough that it is likely to invalidate assumptions about data + integrity failure rates. + + That IP fragmentation results in inefficient use of the network has + been well documented [Kent87]. This note presents a different kind + of problem, which can result not only in significant performance + degradation, but also frequent data corruption. This is especially + pertinent due to the recent proliferation of UDP bulk transport tools + that sometimes fragment every datagram. + + Additionally, there is some network equipment that ignores the Don't + Fragment (DF) bit in the IP header to work around MTU discovery + problems [RFC2923]. This equipment indirectly exposes properly + implemented protocols and applications to corrupt data. + +2. Wrapping the IP ID Field + + The Internet Protocol standard [RFC0791] specifies: + + "The choice of the Identifier for a datagram is based on the need + to provide a way to uniquely identify the fragments of a + particular datagram. The protocol module assembling fragments + judges fragments to belong to the same datagram if they have the + same source, destination, protocol, and Identifier. Thus, the + sender must choose the Identifier to be unique for this source, + destination pair and protocol for the time the datagram (or any + fragment of it) could be alive in the Internet." + + Strict conformance to this standard limits transmissions in one + direction between any address pair to no more than 65536 packets per + protocol (e.g., TCP, UDP, or ICMP) per maximum packet lifetime. + + Clearly, not all hosts follow this standard because it implies an + unreasonably low maximum data rate. For example, a host sending + 1500-byte packets with a 30-second maximum packet lifetime could send + at only about 26 Mbps before exceeding 65535 packets per packet + lifetime. Or, filling a 1 Gbps interface with 1500-byte packets + requires sending 65536 packets in less than 1 second, an unreasonably + short maximum packet lifetime, being less than the round-trip time on + some paths. This requirement is widely ignored. + + + + + +Heffner, et al. Informational [Page 2] + +RFC 4963 IPv4 Reassembly Errors at High Data Rates July 2007 + + + Additionally, it is worth noting that reusing values in the IP ID + field once per 65536 datagrams is the best case. Some + implementations randomize the IP ID to prevent leaking information + out of the kernel [Bellovin02], which causes reuse of the IP ID field + to occur probabilistically at all sending rates. + + IP receivers store fragments in a reassembly buffer until all + fragments in a datagram arrive, or until the reassembly timeout + expires (15 seconds is suggested in [RFC0791]). Fragments in a + datagram are associated with each other by their protocol number, the + value in their ID field, and by the source/destination address pair. + If a sender wraps the ID field in less than the reassembly timeout, + it becomes possible for fragments from different datagrams to be + incorrectly spliced together ("mis-associated"), and delivered to the + upper layer protocol. + + A case of particular concern is when mis-association is self- + propagating. This occurs, for example, when there is reliable + ordering of packets and the first fragment of a datagram is lost in + the network. The rest of the fragments are stored in the fragment + reassembly buffer, and when the sender wraps the ID field, the first + fragment of the new datagram will be mis-associated with the rest of + the old datagram. The new datagram will be now be incomplete (since + it is missing its first fragment), so the rest of it will be saved in + the fragment reassembly buffer, forming a cycle that repeats every + 65536 datagrams. It is possible to have a number of simultaneous + cycles, bounded by the size of the fragment reassembly buffer. + + IPv6 is considerably less vulnerable to this type of problem, since + its fragment header contains a 32-bit identification field [RFC2460]. + Mis-association will only be a problem at packet rates 65536 times + higher than for IPv4. + +3. Effects of Mis-Associated Fragments + + When the mis-associated fragments are delivered, transport-layer + checksumming should detect these datagrams as incorrect and discard + them. When the datagrams are discarded, it could create a + performance problem for loss-feedback congestion control algorithms, + particularly when a large congestion window is required, since it + will introduce a certain amount of non-congestive loss. + + Transport checksums, however, may not be designed to handle such high + error rates. The TCP/UDP checksum is only 16 bits in length. If + these checksums follow a uniform random distribution, we expect mis- + associated datagrams to be accepted by the checksum at a rate of one + per 65536. With only one mis-association cycle, we expect corrupt + data delivered to the application layer once per 2^32 datagrams. + + + +Heffner, et al. Informational [Page 3] + +RFC 4963 IPv4 Reassembly Errors at High Data Rates July 2007 + + + This number can be significantly higher with multiple concurrent + cycles. + + With non-random data, the TCP/UDP checksum may be even weaker still. + It is possible to construct datasets where mis-associated fragments + will always have the same checksum. Such a case may be considered + unlikely, but is worth considering. "Real" data may be more likely + than random data to cause checksum hot spots and increase the + probability of false checksum match [Stone98]. Also, some + applications or higher-level protocols may turn off checksumming to + increase speed, though this practice has been found to be dangerous + for other reasons when data reliability is important [Stone00]. + +4. Experimental Observations + + To test the practical impact of fragmentation on UDP, we ran a series + of experiments using a UDP bulk data transport protocol that was + designed to be used as an alternative to TCP for transporting large + data sets over specialized networks. The tool, Reliable Blast UDP + (RBUDP), part of the QUANTA networking toolkit [QUANTA], was selected + because it has a clean interface which facilitated automated + experiments. The decision to use RBUDP had little to do with the + details of the transport protocol itself. Any UDP transport protocol + that does not have additional means to detect corruption, and that + could be configured to use IP fragmentation, would have the same + results. + + In order to diagnose corruption on files transferred with the UDP + bulk transfer tool, we used a file format that included embedded + sequence numbers and MD5 checksums in each fragment of each datagram. + Thus, it was possible to distinguish random corruption from that + caused by mis-associated fragments. We used two different types of + files. One was constructed so that all the UDP checksums were + constant -- we will call this the "constant" dataset. The other was + constructed so that UDP checksums were uniformly random -- the + "random" dataset. All tests were done using 400 MB files, sent in + 1524-byte datagrams so that they were fragmented on standard Fast + Ethernet with a 1500-byte MTU. + + The UDP bulk file transport tool was used to send the datasets + between a pair of hosts at slightly less than the available data rate + (100 Mbps). Near the beginning of each flow, a brief secondary flow + was started to induce packet loss in the primary flow. Throughout + the life of the primary flow, we typically observed mis-association + rates on the order of a few hundredths of a percent. + + + + + + +Heffner, et al. Informational [Page 4] + +RFC 4963 IPv4 Reassembly Errors at High Data Rates July 2007 + + + Tests run with the "constant" dataset resulted in corruption on all + mis-associated fragments, that is, corruption on the order of a few + hundredths of a percent. In sending approximately 10 TB of "random" + datasets, we observed 8847668 UDP checksum errors and 121 corruptions + of the data due to mis-associated fragments. + +5. Preventing Mis-Association + + The most straightforward way to avoid mis-association is to avoid + fragmentation altogether by implementing Path MTU Discovery [RFC1191] + [RFC4821]. However, this is not always feasible for all + applications. Further, as a work-around for MTU discovery problems + [RFC2923], some TCP implementations and communications gear provide + mechanisms to disable path MTU discovery by clearing or ignoring the + DF bit. Doing so will expose all protocols using IPv4, even those + that participate in MTU discovery, to mis-association errors. + + If IP fragmentation is in use, it may be possible to reduce the + timeout sufficiently so that mis-association will not occur. + However, there are a number of difficulties with such an approach. + Since the sender controls the rate of packets sent and the selection + of IP ID, while the receiver controls the reassembly timeout, there + would need to be some mutual assurance between each party as to + participation in the scheme. Further, it is not generally possible + to set the timeout low enough so that a fast sender's fragments will + not be mis-associated, yet high enough so that a slow sender's + fragments will not be unconditionally discarded before it is possible + to reassemble them. Therefore, the timeout and IP ID selection would + need to be done on a per-peer basis. Also, it is likely NAT will + break any per-peer tables keyed by IP address. It is not within the + scope of this document to recommend solutions to these problems, + though we believe a per-peer adaptive timeout is likely to prevent + mis-association under circumstances where it would most commonly + occur. + + A case particularly worth noting is that of tunnels encapsulating + payload in IPv4. To deal with difficulties in MTU Discovery + [RFC4459], tunnels may rely on fragmentation between the two + endpoints, even if the payload is marked with a DF bit [RFC4301]. In + such a mode, the two tunnel endpoints behave as IP end hosts, with + all tunneled traffic having the same protocol type. Thus, the + aggregate rate of tunneled packets may not exceed 65536 per maximum + packet lifetime, or tunneled data becomes exposed to possible mis- + association. Even protocols doing MTU discovery such as TCP will be + affected. Operators of tunnels should ensure that the receiving + end's reassembly timeout is short enough that mis-association cannot + occur given the tunnel's maximum rate. + + + + +Heffner, et al. Informational [Page 5] + +RFC 4963 IPv4 Reassembly Errors at High Data Rates July 2007 + + +6. Mitigating Mis-Association + + It is difficult to concisely describe all possible situations under + which fragments might be mis-associated. Even if an end host + carefully follows the specification, ensuring unique IP IDs, the + presence of NATs or tunnels may expose applications to IP ID space + conflicts. Further, devices in the network that the end hosts cannot + see or control, such as tunnels, may cause mis-association. Even a + fragmenting application that sends at a low rate might possibly be + exposed when running simultaneously with a non-fragmenting + application that sends at a high rate. As described above, the + receiver might implement to reduce or eliminate the possibility of + conflict, but there is no mechanism in place for a sender to know + what the receiver is doing in this respect. As a consequence, there + is no general mechanism for an application that is using IPv4 + fragmentation to know if it is deterministically or statistically + protected from mis-associated fragments. + + Under circumstances when it is impossible or impractical to prevent + mis-association, its effects may be mitigated by use of stronger + integrity checking at any layer above IP. This is a natural side + effect of using cryptographic authentication. For example, IPsec AH + [RFC4302] will discard any corrupted datagrams, preventing their + deliver to upper layers. A stronger transport layer checksum such as + SCTP's, which is 32 bits in length [RFC2960], may help significantly. + At the application layer, SSH message authentication codes [RFC4251] + will prevent delivery of corrupted data, though since the TCP + connection underneath is not protected, it is considered invalid and + the session is immediately terminated. While stronger integrity + checking may prevent data corruption, it will not prevent the + potential performance impact described above of non-congestive loss + on congestion control at high congestion windows. + + It should also be noted that mis-association is not the only possible + source of data corruption above the network layer [Stone00]. Most + applications for which data integrity is critically important should + implement strong integrity checking regardless of exposure to mis- + association. + + In general, applications that rely on IPv4 fragmentation should be + written with these issues in mind, as well as those issues documented + in [Kent87]. Applications that rely on IPv4 fragmentation while + sending at high speeds (the order of 100 Mbps or higher) and devices + that deliberately introduce fragmentation to otherwise unfragmented + traffic (e.g., tunnels) should be particularly cautious, and + introduce strong mechanisms to ensure data integrity. + + + + + +Heffner, et al. Informational [Page 6] + +RFC 4963 IPv4 Reassembly Errors at High Data Rates July 2007 + + +7. Security Considerations + + If a malicious entity knows that a pair of hosts are communicating + using a fragmented stream, it may be presented with an opportunity to + corrupt the flow. By sending "high" fragments (those with offset + greater than zero) with a forged source address, the attacker can + deliberately cause corruption as described above. Exploiting this + vulnerability requires only knowledge of the source and destination + addresses of the flow, its protocol number, and fragment boundaries. + It does not require knowledge of port or sequence numbers. + + If the attacker has visibility of packets on the path, the attack + profile is similar to injecting full segments. Using this attack + makes blind disruptions easier and might possibly be used to cause + degradation of service. We believe only streams using IPv4 + fragmentation are likely vulnerable. Because of the nature of the + problems outlined in this document, the use of IPv4 fragmentation for + critical applications may not be advisable, regardless of security + concerns. + +8. Informative References + + [Kent87] Kent, C. and J. Mogul, "Fragmentation considered + harmful", Proc. SIGCOMM '87 vol. 17, No. 5, October + 1987. + + [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", RFC + 2923, September 2000. + + [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, + September 1981. + + [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC + 1191, November 1990. + + [Stone98] Stone, J., Greenwald, M., Partridge, C., and J. Hughes, + "Performance of Checksums and CRC's over Real Data", + IEEE/ ACM Transactions on Networking vol. 6, No. 5, + October 1998. + + [Stone00] Stone, J. and C. Partridge, "When The CRC and TCP + Checksum Disagree", Proc. SIGCOMM 2000 vol. 30, No. 4, + October 2000. + + + + + + + + +Heffner, et al. Informational [Page 7] + +RFC 4963 IPv4 Reassembly Errors at High Data Rates July 2007 + + + [QUANTA] He, E., Alimohideen, J., Eliason, J., Krishnaprasad, N., + Leigh, J., Yu, O., and T. DeFanti, "Quanta: a toolkit + for high performance data delivery over photonic + networks", Future Generation Computer Systems Vol. 19, + No. 6, August 2003. + + [Bellovin02] Bellovin, S., "A Technique for Counting NATted Hosts", + Internet Measurement Conference, Proceedings of the 2nd + ACM SIGCOMM Workshop on Internet Measurement, November + 2002. + + [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 + (IPv6) Specification", RFC 2460, December 1998. + + [RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C., + Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., + Zhang, L., and V. Paxson, "Stream Control Transmission + Protocol", RFC 2960, October 2000. + + [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) + Protocol Architecture", RFC 4251, January 2006. + + [RFC4301] Kent, S. and K. Seo, "Security Architecture for the + Internet Protocol", RFC 4301, December 2005. + + [RFC4302] Kent, S., "IP Authentication Header", RFC 4302, December + 2005. + + [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- + Network Tunneling", RFC 4459, April 2006. + + [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU + Discovery", RFC 4821, March 2007. + + + + + + + + + + + + + + + + + + +Heffner, et al. Informational [Page 8] + +RFC 4963 IPv4 Reassembly Errors at High Data Rates July 2007 + + +Appendix A. Acknowledgements + + This work was supported by the National Science Foundation under + Grant No. 0083285. + +Authors' Addresses + + John W. Heffner + Pittsburgh Supercomputing Center + 4400 Fifth Avenue + Pittsburgh, PA 15213 + US + + Phone: 412-268-2329 + EMail: jheffner@psc.edu + + + Matt Mathis + Pittsburgh Supercomputing Center + 4400 Fifth Avenue + Pittsburgh, PA 15213 + US + + Phone: 412-268-3319 + EMail: mathis@psc.edu + + + Ben Chandler + Pittsburgh Supercomputing Center + 4400 Fifth Avenue + Pittsburgh, PA 15213 + US + + Phone: 412-268-9783 + EMail: bchandle@gmail.com + + + + + + + + + + + + + + + + +Heffner, et al. Informational [Page 9] + +RFC 4963 IPv4 Reassembly Errors at High Data Rates July 2007 + + +Full Copyright Statement + + Copyright (C) The IETF Trust (2007). + + This document is subject to the rights, licenses and restrictions + contained in BCP 78, and except as set forth therein, the authors + retain all their rights. + + This document and the information contained herein are provided on an + "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS + OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND + THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS + OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF + THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED + WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. + +Intellectual Property + + The IETF takes no position regarding the validity or scope of any + Intellectual Property Rights or other rights that might be claimed to + pertain to the implementation or use of the technology described in + this document or the extent to which any license under such rights + might or might not be available; nor does it represent that it has + made any independent effort to identify any such rights. Information + on the procedures with respect to rights in RFC documents can be + found in BCP 78 and BCP 79. + + Copies of IPR disclosures made to the IETF Secretariat and any + assurances of licenses to be made available, or the result of an + attempt made to obtain a general license or permission for the use of + such proprietary rights by implementers or users of this + specification can be obtained from the IETF on-line IPR repository at + http://www.ietf.org/ipr. + + The IETF invites any interested party to bring to its attention any + copyrights, patents or patent applications, or other proprietary + rights that may cover technology that may be required to implement + this standard. Please address the information to the IETF at + ietf-ipr@ietf.org. + +Acknowledgement + + Funding for the RFC Editor function is currently provided by the + Internet Society. + + + + + + + +Heffner, et al. Informational [Page 10] + |