diff options
Diffstat (limited to 'doc/rfc/rfc2140.txt')
-rw-r--r-- | doc/rfc/rfc2140.txt | 619 |
1 files changed, 619 insertions, 0 deletions
diff --git a/doc/rfc/rfc2140.txt b/doc/rfc/rfc2140.txt new file mode 100644 index 0000000..200671c --- /dev/null +++ b/doc/rfc/rfc2140.txt @@ -0,0 +1,619 @@ + + + + + + +Network Working Group J. Touch +Request for Comments: 2140 ISI +Category: Informational April 1997 + + + TCP Control Block Interdependence + +Status of this Memo + + This memo provides information for the Internet community. This memo + does not specify an Internet standard of any kind. Distribution of + this memo is unlimited. + + +Abstract + + This memo makes the case for interdependent TCP control blocks, where + part of the TCP state is shared among similar concurrent connections, + or across similar connection instances. TCP state includes a + combination of parameters, such as connection state, current round- + trip time estimates, congestion control information, and process + information. This state is currently maintained on a per-connection + basis in the TCP control block, but should be shared across + connections to the same host. The goal is to improve transient + transport performance, while maintaining backward-compatibility with + existing implementations. + + This document is a product of the LSAM project at ISI. + + +Introduction + + TCP is a connection-oriented reliable transport protocol layered over + IP [9]. Each TCP connection maintains state, usually in a data + structure called the TCP Control Block (TCB). The TCB contains + information about the connection state, its associated local process, + and feedback parameters about the connection's transmission + properties. As originally specified and usually implemented, the TCB + is maintained on a per-connection basis. This document discusses the + implications of that decision, and argues for an alternate + implementation that shares some of this state across similar + connection instances and among similar simultaneous connections. The + resulting implementation can have better transient performance, + especially for numerous short-lived and simultaneous connections, as + often used in the World-Wide Web [1]. These changes affect only the + TCB initialization, and so have no effect on the long-term behavior + of TCP after a connection has been established. + + + + +Touch Informational [Page 1] + +RFC 2140 TCP Control Block Interdependence April 1997 + + +The TCP Control Block (TCB) + + A TCB is associated with each connection, i.e., with each association + of a pair of applications across the network. The TCB can be + summarized as containing [9]: + + + Local process state + + pointers to send and receive buffers + pointers to retransmission queue and current segment + pointers to Internet Protocol (IP) PCB + + Per-connection shared state + + macro-state + + connection state + timers + flags + local and remote host numbers and ports + + micro-state + + send and receive window state (size*, current number) + round-trip time and variance + cong. window size* + cong. window size threshold* + max windows seen* + MSS# + round-trip time and variance# + + + The per-connection information is shown as split into macro-state and + micro-state, terminology borrowed from [5]. Macro-state describes the + finite state machine; we include the endpoint numbers and components + (timers, flags) used to help maintain that state. This includes the + protocol for establishing and maintaining shared state about the + connection. Micro-state describes the protocol after a connection has + been established, to maintain the reliability and congestion control + of the data transferred in the connection. + + We further distinguish two other classes of shared micro-state that + are associated more with host-pairs than with application pairs. One + class is clearly host-pair dependent (#, e.g., MSS, RTT), and the + other is host-pair dependent in its aggregate (*, e.g., cong. window + info., curr. window sizes). + + + + +Touch Informational [Page 2] + +RFC 2140 TCP Control Block Interdependence April 1997 + + +TCB Interdependence + + The observation that some TCB state is host-pair specific rather than + application-pair dependent is not new, and is a common engineering + decision in layered protocol implementations. A discussion of sharing + RTT information among protocols layered over IP, including UDP and + TCP, occurred in [8]. T/TCP uses caches to maintain TCB information + across instances, e.g., smoothed RTT, RTT variance, congestion + avoidance threshold, and MSS [3]. These values are in addition to + connection counts used by T/TCP to accelerate data delivery prior to + the full three-way handshake during an OPEN. The goal is to aggregate + TCB components where they reflect one association - that of the + host-pair, rather than artificially separating those components by + connection. + + At least one current T/TCP implementation saves the MSS and + aggregates the RTT parameters across multiple connections, but omits + caching the congestion window information [4], as originally + specified in [2]. There may be other values that may be cached, such + as current window size, to permit new connections full access to + accumulated channel resources. + + We observe that there are two cases of TCB interdependence. Temporal + sharing occurs when the TCB of an earlier (now CLOSED) connection to + a host is used to initialize some parameters of a new connection to + that same host. Ensemble sharing occurs when a currently active + connection to a host is used to initialize another (concurrent) + connection to that host. T/TCP documents considered the temporal + case; we consider both. + +An Example of Temporal Sharing + + Temporal sharing of cached TCB data has been implemented in the SunOS + 4.1.3 T/TCP extensions [4] and the FreeBSD port of same [7]. As + mentioned before, only the MSS and RTT parameters are cached, as + originally specified in [2]. Later discussion of T/TCP suggested + including congestion control parameters in this cache [3]. + + The cache is accessed in two ways: it is read to initialize new TCBs, + and written when more current per-host state is available. New TCBs + are initialized as follows; snd_cwnd reuse is not yet implemented, + although discussed in the T/TCP concepts [2]: + + + + + + + + + +Touch Informational [Page 3] + +RFC 2140 TCP Control Block Interdependence April 1997 + + + TEMPORAL SHARING - TCB Initialization + + Cached TCB New TCB + ---------------------------------------- + old-MSS old-MSS + + old-RTT old-RTT + + old-RTTvar old-RTTvar + + old-snd_cwnd old-snd_cwnd (not yet impl.) + + + Most cached TCB values are updated when a connection closes. An + exception is MSS, which is updated whenever the MSS option is + received in a TCP header. + + + TEMPORAL SHARING - Cache Updates + + Cached TCB Current TCB when? New Cached TCB + --------------------------------------------------------------- + old-MSS curr-MSS MSSopt curr-MSS + + old-RTT curr-RTT CLOSE old += (curr - old) >> 2 + + old-RTTvar curr-RTTvar CLOSE old += (curr - old) >> 2 + + old-snd_cwnd curr-snd_cwnd CLOSE curr-snd_cwnd (not yet impl.) + + MSS caching is trivial; reported values are cached, and the most + recent value is used. The cache is updated when the MSS option is + received, so the cache always has the most recent MSS value from any + connection. The cache is consulted only at connection establishment, + and not otherwise updated, which means that MSS options do not affect + current connections. The default MSS is never saved; only reported + MSS values update the cache, so an explicit override is required to + reduce the MSS. + + RTT values are updated by a more complicated mechanism [3], [8]. + Dynamic RTT estimation requires a sequence of RTT measurements, even + though a single T/TCP transaction may not accumulate enough samples. + As a result, the cached RTT (and its variance) is an average of its + previous value with the contents of the currently active TCB for that + host, when a TCB is closed. RTT values are updated only when a + connection is closed. Further, the method for averaging the RTT + values is not the same as the method for computing the RTT values + within a connection, so that the cached value may not be appropriate. + + + +Touch Informational [Page 4] + +RFC 2140 TCP Control Block Interdependence April 1997 + + + For temporal sharing, the cache requires updating only when a + connection closes, because the cached values will not yet be used to + initialize a new TCB. For the ensemble sharing, this is not the case, + as discussed below. + + Other TCB variables may also be cached between sequential instances, + such as the congestion control window information. Old cache values + can be overwritten with the current TCB estimates, or a MAX or MIN + function can be used to merge the results, depending on the optimism + or pessimism of the reused values. For example, the congestion window + can be reused if there are no concurrent connections. + +An Example of Ensemble Sharing + + Sharing cached TCB data across concurrent connections requires + attention to the aggregate nature of some of the shared state. + Although MSS and RTT values can be shared by copying, it may not be + appropriate to copy congestion window information. At this point, we + present only the MSS and RTT rules: + + + ENSEMBLE SHARING - TCB Initialization + + Cached TCB New TCB + ---------------------------------- + old-MSS old-MSS + + old-RTT old-RTT + + old-RTTvar old-RTTvar + + + + ENSEMBLE SHARING - Cache Updates + + Cached TCB Current TCB when? New Cached TCB + ----------------------------------------------------------- + old-MSS curr-MSS MSSopt curr-MSS + + old-RTT curr-RTT update rtt_update(old,curr) + + old-RTTvar curr-RTTvar update rtt_update(old,curr) + + + For ensemble sharing, TCB information should be cached as early as + possible, sometimes before a connection is closed. Otherwise, opening + multiple concurrent connections may not result in TCB data sharing if + no connection closes before others open. An optimistic solution would + + + +Touch Informational [Page 5] + +RFC 2140 TCP Control Block Interdependence April 1997 + + + be to update cached data as early as possible, rather than only when + a connection is closing. Some T/TCP implementations do this for MSS + when the TCP MSS header option is received [4], although it is not + addressed specifically in the concepts or functional specification + [2][3]. + + In current T/TCP, RTT values are updated only after a CLOSE, which + does not benefit concurrent sessions. As mentioned in the temporal + case, averaging values between concurrent connections requires + incorporating new RTT measurements. The amount of work involved in + updating the aggregate average should be minimized, but the resulting + value should be equivalent to having all values measured within a + single connection. The function "rtt_update" in the ensemble sharing + table indicates this operation, which occurs whenever the RTT would + have been updated in the individual TCP connection. As a result, the + cache contains the shared RTT variables, which no longer need to + reside in the TCB [8]. + + Congestion window size aggregation is more complicated in the + concurrent case. When there is an ensemble of connections, we need + to decide how that ensemble would have shared the congestion window, + in order to derive initial values for new TCBs. Because concurrent + connections between two hosts share network paths (usually), they + also share whatever capacity exists along that path. With regard to + congestion, the set of connections might behave as if it were + multiplexed prior to TCP, as if all data were part of a single + connection. As a result, the current window sizes would maintain a + constant sum, presuming sufficient offered load. This would go beyond + caching to truly sharing state, as in the RTT case. + + We pause to note that any assumption of this sharing can be + incorrect, including this one. In current implementations, new + congestion windows are set at an initial value of one segment, so + that the sum of the current windows is increased for any new + connection. This can have detrimental consequences where several + connections share a highly congested link, such as in trans-Atlantic + Web access. + + There are several ways to initialize the congestion window in a new + TCB among an ensemble of current connections to a host, as shown + below. Current TCP implementations initialize it to one segment [9], + and T/TCP hinted that it should be initialized to the old window size + [3]. In the former, the assumption is that new connections should + behave as conservatively as possible. In the latter, no accommodation + is made to concurrent aggregate behavior. + + In either case, the sum of window sizes can increase, rather than + remain constant. Another solution is to give each pending connection + + + +Touch Informational [Page 6] + +RFC 2140 TCP Control Block Interdependence April 1997 + + + its "fair share" of the available congestion window, and let the + connections balance from there. The assumption we make here is that + new connections are implicit requests for an equal share of available + link bandwidth which should be granted at the expense of current + connections. This may or may not be the appropriate function; we + propose that it be examined further. + + + ENSEMBLE SHARING - TCB Initialization + Some Options for Sharing Window-size + + Cached TCB New TCB + ----------------------------------------------------------------- + old-snd_cwnd (current) one segment + + (T/TCP hint) old-snd_cwnd + + (proposed) old-snd_cwnd/(N+1) + subtract old-snd_cwnd/(N+1)/N + from each concurrent + + + ENSEMBLE SHARING - Cache Updates + + Cached TCB Current TCB when? New Cached TCB + ---------------------------------------------------------------- + old-snd_cwnd curr-snd_cwnd update (adjust sum as appropriate) + + +Compatibility Issues + + Current TCP implementations do not use TCB caching, with the + exception of T/TCP variants [4][7]. New connections use the default + initial values of all non-instantiated TCB variables. As a result, + each connection calculates its own RTT measurements, MSS value, and + congestion information. Eventually these values are updated for each + connection. + + For the congestion and current window information, the initial values + may not be consistent with the long-term aggregate behavior of a set + of concurrent connections. If a single connection has a window of 4 + segments, new connections assume initial windows of 1 segment (the + minimum), although the current connection's window doesn't decrease + to accommodate this additional load. As a result, connections can + mutually interfere. One example of this has been seen on trans- + Atlantic links, where concurrent connections supporting Web traffic + can collide because their initial windows are too large, even when + set at one segment. + + + +Touch Informational [Page 7] + +RFC 2140 TCP Control Block Interdependence April 1997 + + + Because this proposal attempts to anticipate the aggregate steady- + state values of TCB state among a group or over time, it should avoid + the transient effects of new connections. In addition, because it + considers the ensemble and temporal properties of those aggregates, + it should also prevent the transients of short-lived or multiple + concurrent connections from adversely affecting the overall network + performance. We are performing analysis and experiments to validate + these assumptions. + +Performance Considerations + + Here we attempt to optimize transient behavior of TCP without + modifying its long-term properties. The predominant expense is in + maintaining the cached values, or in using per-host state rather than + per-connection state. In cases where performance is affected, + however, we note that the per-host information can be kept in per- + connection copies (as done now), because with higher performance + should come less interference between concurrent connections. + + Sharing TCB state can occur only at connection establishment and + close (to update the cache), to minimize overhead, optimize transient + behavior, and minimize the effect on the steady-state. It is possible + that sharing state during a connection, as in the RTT or window-size + variables, may be of benefit, provided its implementation cost is not + high. + +Implications + + There are several implications to incorporating TCB interdependence + in TCP implementations. First, it may prevent the need for + application-layer multiplexing for performance enhancement [6]. + Protocols like persistent-HTTP avoid connection reestablishment costs + by serializing or multiplexing a set of per-host connections across a + single TCP connection. This avoids TCP's per-connection OPEN + handshake, and also avoids recomputing MSS, RTT, and congestion + windows. By avoiding the so-called, "slow-start restart," performance + can be optimized. Our proposal provides the MSS, RTT, and OPEN + handshake avoidance of T/TCP, and the "slow-start restart avoidance" + of multiplexing, without requiring a multiplexing mechanism at the + application layer. This multiplexing will be complicated when + quality-of-service mechanisms (e.g., "integrated services + scheduling") are provided later. + + Second, we are attempting to push some of the TCP implementation from + the traditional transport layer (in the ISO model [10]), to the + network layer. This acknowledges that some state currently maintained + as per-connection is in fact per-path, which we simplify as per- + host-pair. Transport protocols typically manage per-application-pair + + + +Touch Informational [Page 8] + +RFC 2140 TCP Control Block Interdependence April 1997 + + + associations (per stream), and network protocols manage per-path + associations (routing). Round-trip time, MSS, and congestion + information is more appropriately handled in a network-layer fashion, + aggregated among concurrent connections, and shared across connection + instances. + + An earlier version of RTT sharing suggested implementing RTT state at + the IP layer, rather than at the TCP layer [8]. Our observations are + for sharing state among TCP connections, which avoids some of the + difficulties in an IP-layer solution. One such problem is determining + the associated prior outgoing packet for an incoming packet, to infer + RTT from the exchange. Because RTTs are still determined inside the + TCP layer, this is simpler than at the IP layer. This is a case where + information should be computed at the transport layer, but shared at + the network layer. + + We also note that per-host-pair associations are not the limit of + these techniques. It is possible that TCBs could be similarly shared + between hosts on a LAN, because the predominant path can be LAN-LAN, + rather than host-host. + + There may be other information that can be shared between concurrent + connections. For example, knowing that another connection has just + tried to expand its window size and failed, a connection may not + attempt to do the same for some period. The idea is that existing TCP + implementations infer the behavior of all competing connections, + including those within the same host or LAN. One possible + optimization is to make that implicit feedback explicit, via extended + information in the per-host TCP area. + +Security Considerations + + These suggested implementation enhancements do not have additional + ramifications for direct attacks. These enhancements may be + susceptible to denial-of-service attacks if not otherwise secured. + For example, an application can open a connection and set its window + size to 0, denying service to any other subsequent connection between + those hosts. + + TCB sharing may be susceptible to denial-of-service attacks, wherever + the TCB is shared, between connections in a single host, or between + hosts if TCB sharing is implemented on the LAN (see Implications + section). Some shared TCB parameters are used only to create new + TCBs, others are shared among the TCBs of ongoing connections. New + connections can join the ongoing set, e.g., to optimize send window + size among a set of connections to the same host. + + + + + +Touch Informational [Page 9] + +RFC 2140 TCP Control Block Interdependence April 1997 + + + Attacks on parameters used only for initialization affect only the + transient performance of a TCP connection. For short connections, + the performance ramification can approach that of a denial-of-service + attack. E.g., if an application changes its TCB to have a false and + small window size, subsequent connections would experience + performance degradation until their window grew appropriately. + + The solution is to limit the effect of compromised TCB values. TCBs + are compromised when they are modified directly by an application or + transmitted between hosts via unauthenticated means (e.g., by using a + dirty flag). TCBs that are not compromised by application + modification do not have any unique security ramifications. Note that + the proposed parameters for TCB sharing are not currently modifiable + by an application. + + All shared TCBs MUST be validated against default minimum parameters + before used for new connections. This validation would not impact + performance, because it occurs only at TCB initialization. This + limits the effect of attacks on new connections, to reducing the + benefit of TCB sharing, resulting in the current default TCP + performance. For ongoing connections, the effect of incoming packets + on shared information should be both limited and validated against + constraints before use. This is a beneficial precaution for existing + TCP implementations as well. + + TCBs modified by an application SHOULD not be shared, unless the new + connection sharing the compromised information has been given + explicit permission to use such information by the connection API. No + mechanism for that indication currently exists, but it could be + supported by an augmented API. This sharing restriction SHOULD be + implemented in both the host and the LAN. Sharing on a LAN SHOULD + utilize authentication to prevent undetected tampering of shared TCB + parameters. These restrictions limit the security impact of modified + TCBs both for connection initialization and for ongoing connections. + + Finally, shared values MUST be limited to performance factors only. + Other information, such as TCP sequence numbers, when shared, are + already known to compromise security. + +Acknowledgements + + The author would like to thank the members of the High-Performance + Computing and Communications Division at ISI, notably Bill Manning, + Bob Braden, Jon Postel, Ted Faber, and Cliff Neuman for their + assistance in the development of this memo. + + + + + + +Touch Informational [Page 10] + +RFC 2140 TCP Control Block Interdependence April 1997 + + +References + + [1] Berners-Lee, T., et al., "The World-Wide Web," Communications of + the ACM, V37, Aug. 1994, pp. 76-82. + + [2] Braden, R., "Transaction TCP -- Concepts," RFC-1379, + USC/Information Sciences Institute, September 1992. + + [3] Braden, R., "T/TCP -- TCP Extensions for Transactions Functional + Specification," RFC-1644, USC/Information Sciences Institute, + July 1994. + + [4] Braden, B., "T/TCP -- Transaction TCP: Source Changes for Sun OS + 4.1.3,", Release 1.0, USC/ISI, September 14, 1994. + + [5] Comer, D., and Stevens, D., Internetworking with TCP/IP, V2, + Prentice-Hall, NJ, 1991. + + [6] Fielding, R., et al., "Hypertext Transfer Protocol -- HTTP/1.1," + Work in Progress. + + [7] FreeBSD source code, Release 2.10, <http://www.freebsd.org/>. + + [8] Jacobson, V., (mail to public list "tcp-ip", no archive found), + 1986. + + [9] Postel, Jon, "Transmission Control Protocol," Network Working + Group RFC-793/STD-7, ISI, Sept. 1981. + + [10] Tannenbaum, A., Computer Networks, Prentice-Hall, NJ, 1988. + +Author's Address + + Joe Touch + University of Southern California/Information Sciences Institute + 4676 Admiralty Way + Marina del Rey, CA 90292-6695 + USA + Phone: +1 310-822-1511 x151 + Fax: +1 310-823-6714 + URL: http://www.isi.edu/~touch + Email: touch@isi.edu + + + + + + + + + +Touch Informational [Page 11] + |