From 4bfd864f10b68b71482b35c818559068ef8d5797 Mon Sep 17 00:00:00 2001 From: Thomas Voss Date: Wed, 27 Nov 2024 20:54:24 +0100 Subject: doc: Add RFC documents --- doc/rfc/rfc7786.txt | 1123 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1123 insertions(+) create mode 100644 doc/rfc/rfc7786.txt (limited to 'doc/rfc/rfc7786.txt') diff --git a/doc/rfc/rfc7786.txt b/doc/rfc/rfc7786.txt new file mode 100644 index 0000000..25f36a8 --- /dev/null +++ b/doc/rfc/rfc7786.txt @@ -0,0 +1,1123 @@ + + + + + + +Internet Engineering Task Force (IETF) M. Kuehlewind, Ed. +Request for Comments: 7786 ETH Zurich +Category: Experimental R. Scheffenegger +ISSN: 2070-1721 NetApp, Inc. + May 2016 + + + TCP Modifications for Congestion Exposure (ConEx) + +Abstract + + Congestion Exposure (ConEx) is a mechanism by which senders inform + the network about expected congestion based on congestion feedback + from previous packets in the same flow. This document describes the + necessary modifications to use ConEx with the Transmission Control + Protocol (TCP). + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for examination, experimental implementation, and + evaluation. + + This document defines an Experimental Protocol for the Internet + community. This document is a product of the Internet Engineering + Task Force (IETF). It represents the consensus of the IETF + community. It has received public review and has been approved for + publication by the Internet Engineering Steering Group (IESG). Not + all documents approved by the IESG are a candidate for any level of + Internet Standard; see Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc7786. + + + + + + + + + + + + + + + + + +Kuehlewind & Scheffenegger Experimental [Page 1] + +RFC 7786 TCP Modifications for ConEx May 2016 + + +Copyright Notice + + Copyright (c) 2016 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 + 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 + 2. Sender-Side Modifications . . . . . . . . . . . . . . . . . . 4 + 3. Counting Congestion . . . . . . . . . . . . . . . . . . . . . 5 + 3.1. Loss Detection . . . . . . . . . . . . . . . . . . . . . 6 + 3.1.1. Without SACK Support . . . . . . . . . . . . . . . . 7 + 3.2. Explicit Congestion Notification (ECN) . . . . . . . . . 8 + 3.2.1. Accurate ECN Feedback . . . . . . . . . . . . . . . . 10 + 3.2.2. Classic ECN Support . . . . . . . . . . . . . . . . . 10 + 4. Setting the ConEx Flags . . . . . . . . . . . . . . . . . . . 11 + 4.1. Setting the E or the L Flag . . . . . . . . . . . . . . . 11 + 4.2. Setting the Credit Flag . . . . . . . . . . . . . . . . . 11 + 5. Loss of ConEx Information . . . . . . . . . . . . . . . . . . 14 + 6. Timeliness of the ConEx Signals . . . . . . . . . . . . . . . 14 + 7. Open Areas for Experimentation . . . . . . . . . . . . . . . 15 + 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 + 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 + 9.1. Normative References . . . . . . . . . . . . . . . . . . 18 + 9.2. Informative References . . . . . . . . . . . . . . . . . 19 + Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 20 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 + + + + + + + + + + + + + +Kuehlewind & Scheffenegger Experimental [Page 2] + +RFC 7786 TCP Modifications for ConEx May 2016 + + +1. Introduction + + Congestion Exposure (ConEx) is a mechanism by which senders inform + the network about expected congestion based on congestion feedback + from previous packets in the same flow. ConEx concepts and use cases + are further explained in [RFC6789]. The abstract ConEx mechanism is + explained in [RFC7713]. This document describes the necessary + modifications to use ConEx with the Transmission Control Protocol + (TCP). + + The markings for ConEx signaling are defined in the ConEx Destination + Option (CDO) for IPv6 [RFC7837]. Specifically, the use of four flags + is defined: X (ConEx-capable), L (loss experienced), E (ECN + experienced), and C (credit). + + ConEx signaling is based on the use of either loss or Explicit + Congestion Notification (ECN) marks [RFC3168] as congestion + indication. The sender collects this congestion information based on + existing TCP feedback mechanisms from the receiver to the sender. No + changes are needed at the receiver side to implement ConEx signaling. + Therefore, no additional negotiation is needed to implement and use + ConEx at the sender side. This document specifies the sender's + actions that are needed to provide meaningful ConEx information to + the network. + + Section 2 provides an overview of the modifications needed for TCP + senders to implement ConEx. First, congestion information has to be + extracted from TCP's loss or ECN feedback as described in Section 3. + Section 4 details how to set the CDO marking based on this congestion + information. Section 5 discusses the loss of packets carrying ConEx + information. Section 6 discusses the timeliness of the ConEx + feedback signal, given that congestion is a temporary state. + + This document describes congestion accounting for TCP with and + without the Selective Acknowledgement (SACK) extension [RFC2018] (in + Section 3.1). However, ConEx benefits from the more accurate + information that SACK provides about the number of bytes dropped in + the network, and it is therefore preferable to use the SACK extension + when using TCP with ConEx. The detailed mechanism to set the L flag + in response to the loss-based congestion feedback signal is given in + Section 4.1. + + While loss has to be minimized, ECN can provide more fine-grained + feedback information. ConEx-based traffic measurement or management + mechanisms could benefit from this. Unfortunately, the current ECN + feedback mechanism does not reflect multiple congestion markings if + they occur within the same Round-Trip Time (RTT). A more accurate + + + + +Kuehlewind & Scheffenegger Experimental [Page 3] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + feedback extension to ECN (AccECN) is proposed in a separate document + [ACCURATE], as this is also useful for other mechanisms. + + Congestion accounting for both classic ECN feedback and AccECN + feedback is explained in detail in Section 3.2. Setting the E flag + in response to ECN-based congestion feedback is again detailed in + Section 4.1. + +1.1. Requirements Language + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC2119]. + +2. Sender-Side Modifications + + This section gives an overview of actions that need to be taken by a + TCP sender modified to use ConEx signaling. + + In the TCP handshake, a ConEx sender MUST negotiate for SACK and ECN + preferably with AccECN feedback. Therefore, a ConEx sender MUST also + implement SACK and ECN. Depending on the capability of the receiver, + the following operation modes exist: + + o SACK-accECN-ConEx (SACK and accurate ECN feedback) + + o SACK-ECN-ConEx (SACK and classic instead of accurate ECN) + + o accECN-ConEx (no SACK but accurate ECN feedback) + + o ECN-ConEx (no SACK and no accurate ECN feedback, but classic ECN) + + o SACK-ConEx (SACK but no ECN at all) + + o Basic-ConEx (neither SACK nor ECN) + + A ConEx sender MUST expose all congestion information to the network + according to the congestion information received by ECN or based on + loss information provided by the TCP feedback loop. A TCP sender + SHOULD count congestion byte-wise (rather than packet-wise; see next + paragraph). After any congestion notification, a sender MUST mark + subsequent packets with the appropriate ConEx flag in the IP header. + Furthermore, a ConEx sender must send enough credit to cover all + experienced congestion for the connection so far, as well as the risk + of congestion for the current transmission (see Section 4.2). + + + + + + +Kuehlewind & Scheffenegger Experimental [Page 4] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + With SACK the number of lost payload bytes is known, but not the + number of packets carrying these bytes. With classic ECN only an + indication is given that a marking occurred, but not the exact number + of payload bytes nor packets. As network congestion is usually byte- + congestion [RFC7141], the byte-size of a packet marked with a CDO + flag is defined to represent that number of bytes of congestion + signaling [RFC7837]. Therefore, the exact number of bytes should be + taken into account, if available, to make the ConEx Signal as exact + as possible. + + Detailed mechanisms for congestion counting in each operation mode + are described in the next section. + +3. Counting Congestion + + A ConEx TCP sender maintains two counters: one that counts congestion + based on the information retrieved by loss detection, and a second + that accounts for ECN-based congestion feedback. These counters hold + the number of outstanding bytes that should be ConEx-Marked with, + respectively, the E flag or the L flag in subsequent packets. + + The outstanding bytes for congestion indications based on loss are + maintained in the Loss Exposure Gauge (LEG), as explained in + Section 3.1. + + The outstanding bytes counted based on ECN feedback information are + maintained in the Congestion Exposure Gauge (CEG), as explained in + Section 3.2. + + When the sender sends a ConEx-capable packet with the E or L flag + set, it reduces the respective counter by the byte-size of the + packet. This is explained for both counters in Section 4.1. + + Note that all bytes of an IP packet must be counted in the LEG or CEG + to capture the right number of bytes that should be marked. + Therefore, the sender SHOULD take the payload and headers into + account, up to and including the IP header. However, in TCP the + information regarding how large the headers of a lost or marked + packet were is usually not available, as only payload data will be + acknowledged. + + If equal-sized packets, or at least equally distributed packet sizes, + can be assumed, the sender MAY only add and subtract TCP payload + bytes. In this case, there should be about the same number of ConEx- + Marked packets as the original packets that were causing the + congestion. Thus, both contain about the same number of header bytes + so they will cancel out. This case is assumed for simplicity in the + following sections. + + + +Kuehlewind & Scheffenegger Experimental [Page 5] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + Otherwise, if a sender sends different sized packets (with unequally + distributed packet sizes), the sender needs to memorize or estimate + the number of lost or ECN-marked packets. If the sender has + sufficient memory available, the most accurate way to reconstruct the + number of lost or marked packets is to remember the sequence number + of all sent but not acknowledged packets. In this case, a sender is + able to reconstruct the number of packets, and thus the header bytes + that were sent during the last RTT. Otherwise (e.g., if not enough + memory is available), the sender would need to estimate the packet + size. The average packet size can be estimated if the distribution + pattern of packet sizes in the last RTT is known; alternatively, the + minimum packet size seen in the last RTT can be used as the most + conservative estimate. + + If the number of newly sent-out packets with the ConEx L or E flag + set is smaller (or larger) than this estimated number of lost/ECN- + marked packets, the additional header bytes should be added to (or + can be subtracted from) the respective gauge. + +3.1. Loss Detection + + This section applies whether or not SACK support is available. The + following subsection (Section 3.1.1) handles the case when SACK is + not available. + + A TCP sender detects losses and subsequently retransmits the lost + data. Therefore, the ConEx sender can simply set the ConEx L flag on + all retransmissions in order to at least cover the amount of bytes + lost. If this approach is taken, no LEG is needed. + + However, any retransmission may be spurious. In this case, more + bytes have been marked than necessary. To compensate for this + effect, a ConEx sender can maintain a local signed counter (the LEG) + that indicates the number of outstanding bytes to be sent with the + ConEx L flag and also can become negative. + + Using the LEG, when a TCP sender decides that a data segment needs to + be retransmitted, it will increase the LEG by the size of the TCP + payload bytes in the retransmission (assuming equal sized segments + such that the retransmitted packet will have the same number of + header bytes as the original ones): + + For each retransmission: + + LEG += payload + + Note how the LEG is reduced when the ConEx L marking is set as + described in Section 4. + + + +Kuehlewind & Scheffenegger Experimental [Page 6] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + Further, to accommodate spurious retransmissions, a ConEx sender + SHOULD make use of heuristics to detect such spurious retransmissions + (e.g., F-RTO [RFC5682], DSACK [RFC3708], and Eifel [RFC3522], + [RFC4015]), if already available in a given implementation. If no + mechanism for detecting spurious retransmissions is available, the + ConEx sender MAY chose to implement one of the mechanisms stated + above. However, given the inaccuracy that ConEx may have anyway and + the timeliness of ConEx information, a ConEx MAY also chose not to + compensate for spurious retransmission. In this case, if spurious + retransmissions occur, the ConEx sender has simply sent too many + ConEx Signals which, e.g., would decrease the congestion allowance in + a ConEx policer unnecessarily. + + If a heuristic method is used to detect spurious retransmission and + has determined that a certain number of packets were retransmitted + erroneously, the ConEx sender subtracts the payload size of these TCP + packets from LEG. + + If a spurious retransmission is detected: + + LEG -= payload + + Note that LEG can become negative if too many L markings have already + been sent. This case is further discussed in Section 6. + +3.1.1. Without SACK Support + + If multiple losses occur within one RTT and SACK is not used, it may + take several RTTs until all lost data is retransmitted. With the + scheme described above, the ConEx information will be delayed + considerably, but timeliness is important for ConEx. For ConEx, it + is important to know how much data was lost; it is not important to + know what data is lost. During the first RTT after the initial loss + detection, the amount of received data, and thus also the amount of + lost data, can be estimated based on the number of received ACKs. + + Therefore, a ConEx sender can use the following algorithm to + estimated the number of lost bytes with an additional delay of one + RTT using an additional Loss Estimation Counter (LEC): + + flight_bytes: current flight size in bytes + retransmit_bytes: payload size of the retransmission + + + + + + + + + +Kuehlewind & Scheffenegger Experimental [Page 7] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + At the first retransmission in a congestion event, LEC is set: + + LEC = flight_bytes - 3*SMSS + + (At this point in the transmission, in the worst case, + all packets in flight minus three that triggered the dupACks + could have been lost.) + + Then, during the first RTT of the congestion event: + + For each retransmission: + LEG += retransmit_bytes + LEC -= retransmit_bytes + + For each ACK: + LEC -= SMSS + + After one RTT: + + LEG += LEC + + (The LEC now estimates the number of outstanding bytes + that should be ConEx L-marked.) + + After the first RTT for each following retransmissions: + + if (LEC > 0): LEC -= retransmit_bytes + else if (LEC==0): LEG += retransmit_bytes + + if (LEC < 0): LEG += -LEC + + (The LEG is not increased for those bytes that were + already counted.) + +3.2. Explicit Congestion Notification (ECN) + + ECN [RFC3168] is an IP/TCP mechanism that allows network nodes to + mark packets with the Congestion Experienced (CE) mark instead of + dropping them when congestion occurs. + + A receiver might support classic ECN, the more accurate ECN feedback + scheme (AccECN), or neither. In the case that ECN is not supported + for a connection, of course no ECN marks will occur; thus, the sender + will never set the E flag. Otherwise, a ConEx sender needs to + maintain a signed counter, the Congestion Exposure Gauge (CEG), for + the number of outstanding bytes that have to be ConEx-Marked with the + E flag. + + + + +Kuehlewind & Scheffenegger Experimental [Page 8] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + The CEG is increased when ECN information is received from an ECN- + capable receiver supporting the classic ECN scheme or the accurate + ECN feedback scheme. When the ConEx sender receives an ACK + indicating one or more segments were received with a CE mark, CEG is + increased by the appropriate number of bytes as described further + below. + + Unfortunately, in case of duplicate acknowledgements, the number of + newly acknowledged bytes will be zero even though (CE-marked) data + has been received. Therefore, we increase the CEG by DeliveredData, + as defined below: + + DeliveredData = acked_bytes + SACK_diff + (is_dup)*1SMSS - + (is_after_dup)*num_dup*1SMSS + + DeliveredData covers the number of bytes that has been newly + delivered to the receiver. Therefore, on each arrival of an ACK, + DeliveredData will be increased by the newly acknowledged bytes + (acked_bytes) as indicated by the current ACK, relative to all past + ACKs. The formula depends on whether SACK is available: if SACK is + not available, SACK_diff is always zero, whereas if ACK information + is available, is_dup and is_after_dup are always zero. + + With SACK, DeliveredData is increased by the number of bytes provided + by (new) SACK information (SACK_diff). Note that if less + unacknowledged bytes are announced in the new SACK information than + in the previous ACK, SACK_diff can be negative. In this case, data + is newly acknowledged (in acked_bytes) that was previously + accumulated into DeliveredData, based on SACK information. + + Otherwise without SACK, DeliveredData is increased by 1 Sender + Maximum Segment Size (SMSS) on duplicate acknowledgements because + duplicate acknowledgements do not acknowledge any new data (and + acked_bytes will be zero). For the subsequent partial or full ACK, + acked_bytes cover all newly acknowledged bytes including those + already accounted for with the receipt of any duplicate + acknowledgement. Therefore, DeliveredData is reduced by one SMSS for + each preceding duplicate ACK. Consequently, is_dup is one if the + current ACK is a duplicated ACK without SACK, and zero otherwise. + is_after_dup is only one for the next full or partial ACK after a + number of duplicated ACKs without SACK and num_dup counts the number + of duplicated ACKs in a row (which usually is 3 or more). + + With classic ECN, one congestion-marked packet causes continuous + congestion feedback for a whole round trip, thus hiding the arrival + of any further congestion-marked packets during that round trip. A + more accurate ECN feedback scheme (AccECN) is needed to ensure that + feedback properly reflects the extent of congestion marking. The two + + + +Kuehlewind & Scheffenegger Experimental [Page 9] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + cases, with and without a receiver capable of AccECN, are discussed + in the following sections. + +3.2.1. Accurate ECN Feedback + + With a more accurate ECN feedback scheme (AccECN) that is supported + by the receiver, either the number of marked packets or the number of + marked bytes will be fed back from the receiver to the sender and, + therefore is known at the sender side. In the latter case, the CEG + can be increased directly by the number of marked bytes. Otherwise + if D is assumed to be the number of marks, the gauge (CEG) will be + conservatively increased by one SMSS for each marking or, at the + maximum, the number of newly acknowledged bytes: + + CEG += min(SMSS*D, DeliveredData) + +3.2.2. Classic ECN Support + + With classic ECN, as soon as a CE mark is seen at the receiver side, + it will feed this information back to the sender by setting the Echo + Congestion Experienced (ECE) flag in the TCP header of subsequent + ACKs. Once the sender receives the first ECE of a congestion + notification, it sets the Congestion Window Reduced (CWR) flag in the + TCP header once. When this packet with the CWR flag in the TCP + header arrives at the receiver side acknowledging its first ECE + feedback, the receiver stops setting the ECE flag. + + If the ConEx sender fully conforms to the semantics of ECN signaling + as defined by [RFC3168], it will receive one full RTT of ACKs with + the ECE flag set whenever at least one CE mark was received by the + receiver. As the sender cannot estimate how many packets have + actually been CE-marked during this RTT, the most conservative + assumption MAY be taken, namely assuming that all packets were + marked. This can be achieved by increasing the CEG by DeliveredData + for each ACK with the ECE flag: + + CEG += DeliveredData + + Optionally, a ConEx sender could implement the following technique + (that does not conform to [RFC3168]), called "advanced compatibility + mode", to considerably improve its estimate of the number of ECN- + marked packets: + + To extract more than one ECE indication per RTT, a ConEx sender could + set the CWR flag continuously to force the receiver to signal only + one ECE per CE mark. Unfortunately, the use of delayed ACKs + [RFC5681] (which is common) will prevent feedback of every CE mark; + if a CWR confirmation is received before the ECE can be sent out on + + + +Kuehlewind & Scheffenegger Experimental [Page 10] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + the next ACK, ECN feedback information could get lost (depending on + the actual receiver implementation). Thus, a sender SHOULD set CWR + only on those data segments that will presumably trigger a (delayed) + ACK. The sender would need an additional control loop to estimate + which data segments will trigger an ACK in order to extract more + timely congestion notifications. Still, the CEG SHOULD be increased + by DeliveredData, as one or more CE-marked packets could be + acknowledged by one delayed ACK. + +4. Setting the ConEx Flags + + By setting the X flag, a packet is marked as ConEx-capable. All + packets carrying payload MUST be marked with the X flag set, + including retransmissions. Only if no congestion feedback + information is (currently) available, SHOULD the X flag be zero + (e.g., for control packets on a connection that has not sent any user + data for some time and, therefore is sending only pure ACKs that are + not carrying any payload). + +4.1. Setting the E or the L Flag + + As described in Section 3.1, the sender needs to maintain a CEG + counter and might also maintain a LEG counter. If no LEG is used, + all retransmission will be marked with the L flag. + + Further, as long as the LEG or CEG counter is positive, the sender + marks each ConEx-capable packet with L or E respectively, and + decreases the LEG or CEG counter by the TCP payload bytes carried in + the marked packet (assuming headers are not being counted because + packet sizes are regular). No matter how small the value of LEG or + CEG, if the value is positive the sender MUST NOT defer packet + marking; this ensures that ConEx Signals are timely. Therefore, the + value of LEG and CEG will commonly be negative. + + If both the LEG and CEG are positive, the sender MUST mark each + ConEx-capable packet with both L and E. If a credit signal is also + pending (see the next section), the C flag can be set as well. + +4.2. Setting the Credit Flag + + The ConEx abstract mechanism [RFC7713] requires that sufficient + credit MUST be signaled in advance to cover the expected congestion + during the feedback delay of one RTT. + + To monitor the credit state at the audit, a ConEx sender needs to + maintain a Credit State Counter (CSC) in bytes. If congestion + occurs, credits will be consumed and the CSC is reduced by the number + of bytes that were lost or estimated to be ECN-marked. If the risk + + + +Kuehlewind & Scheffenegger Experimental [Page 11] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + of congestion was estimated wrongly, and thus too few credits were + sent, the CSC becomes zero but cannot go negative. + + To be sure that the credit state in the audit never reaches zero, the + number of credits should always equal the number of bytes in flight + as all packets could potentially get lost or congestion-marked. In + this case, a ConEx sender also monitors the number of bytes in flight + F. If F ever becomes larger than the CSC, the ConEx sender sets the + C flag on each ConEx-capable packet and increases the CSC by the + payload size of each marked packet until the CSC is no less than F + again. However, a ConEx sender might also be less conservative and + send fewer credits if it, e.g., assumes that the congestion will be + low on a certain path based on previous experience. + + Recall that the CSC will be decreased whenever congestion occurs; + therefore the CSC will need to be replenished as soon as the CSC + drops below F. Also recall that the sender can set the C flag on a + ConEx-capable packet whether or not the E or L flags are also set. + + In TCP Slow Start, the congestion window might grow much larger than + during the rest of the transmission. Likely, a sender could consider + sending fewer than F credits but risking being penalized by an audit + function. However, the credits should at least cover the increase in + sending rate. Given the exponential increase as implemented in the + TCP Slow Start algorithm, which means that the sending rate doubles + every RTT, a ConEx sender should at least cover half the number of + packets in flight by credits. + + Note that the number of losses or markings within one RTT does not + depend solely on the sender's actions. In general, the behavior of + the cross traffic, whether Active Queue Management (AQM) is used and + how it is parameterized influence how many packets might be dropped + or marked. As long as any AQM encountered is not overly aggressive + with ECN marking, sending half the flight size as credits should be + sufficient whether congestion is signaled by loss or ECN. + + To maintain half of the packets in flight as credits, half of the + packet of the initial window must also be C-marked. In Slow Start + marking, every fourth packet introduces the correct amount of credit + as can be seen in Figure 1. + + + + + + + + + + + +Kuehlewind & Scheffenegger Experimental [Page 12] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + in_flight credits + RTT1 |------XC------>| 1 1 + |------X------->| 2 1 + |------XC------>| 3 2 + | | + RTT2 |------X------->| 3 2 + |------X------->| 4 2 + |------X------->| 4 2 + |------XC------>| 5 3 + |------X------->| 5 3 + |------X------->| 6 3 + | | + RTT3 |------X------->| 6 3 + |------XC------>| 7 4 + |------X------->| 7 4 + |------X------->| 8 4 + |------X------->| 8 4 + |------XC------>| 9 5 + |------X------->| 9 5 + |------X------->| 10 5 + |------X------->| 10 5 + |------XC------>| 11 6 + |------X------->| 11 6 + |------X------->| 12 6 + | . | + | : | + + Figure 1: Credits in Slow Start (with an initial window of 3) + + It is possible that a TCP flow will encounter an audit function + without relevant flow state due to, e.g., rerouting or memory + limitations. Therefore, the sender needs to detect this case and + resend credits. A ConEx sender might reset the credit counter CSC to + zero if losses occur in subsequent RTTs (assuming that the sending + rate was correctly reduced based on the received congestion signal + and using a conservatively large RTT estimation). + + This section proposes a concrete algorithm for determining how much + credit to signal (with a separate approach used for Slow Start). + However, experimentation in credit setting algorithms is expected and + encouraged. The wider goal of ConEx is to reflect the "cost" of the + risk of causing congestion on those that contribute most to it. + Thus, experimentation is encouraged to improve or maintain + performance while reducing the risk of causing congestion and, + therefore potentially reducing the need to signal so much credit. + + + + + + +Kuehlewind & Scheffenegger Experimental [Page 13] + +RFC 7786 TCP Modifications for ConEx May 2016 + + +5. Loss of ConEx Information + + Packets carrying ConEx Signals could be discarded themselves. This + will be a second order problem (e.g., if the loss probability is + 0.1%, the probability of losing a ConEx L signal will be 0.1% of 0.1% + = 0.01%). Further, the penalty an audit induces should be + proportional to the mismatch of expected ConEx marks and observed + congestion, therefore the audit might only slightly increase the loss + level of this flow. Therefore, an implementer MAY choose to ignore + this problem, accepting instead the risk that an audit function might + wrongly penalize a flow. + + Nonetheless, a ConEx sender is responsible for always signaling + sufficient congestion feedback, and therefore SHOULD remember which + packet was marked with either the L, the E, or the C flag. If one of + these packets is detected as lost, the sender SHOULD increase the + respective gauge(s), LEG or CEG, by the number of lost payload bytes + in addition to increasing LEG for the loss. + +6. Timeliness of the ConEx Signals + + ConEx Signals will only be useful to a network node within a time + delay of about one RTT after the congestion occurred. To avoid + further delays, a ConEx sender SHOULD send the ConEx signaling on the + next available packet. + + Any or all of the ConEx flags can be used in the same packet, which + allows delays to be minimized when multiple signals are pending. The + need to set multiple ConEx flags at the same time can occur if, e.g, + an ACK is received by the sender that simultaneously indicates that + at least one ECN mark was received, and that one or more segments + were lost. This may happen during excessive congestion, if the + queues overflow even though ECN was used and currently all forwarded + packets are marked, while others have to be dropped. Another case + when this might happen is when ACKs are lost, so that a subsequent + ACK carries summary information not previously available to the + sender. + + If a flow becomes application-limited, there could be insufficient + bytes to send to reduce the gauges to zero or below. In such cases, + the sender cannot help but delay ConEx Signals. Nonetheless, as long + as the sender is marking all outgoing packets, an audit function is + unlikely to penalize ConEx-Marked packets. Therefore, no matter how + long a gauge has been positive, a sender MUST NOT reduce the gauge by + more than the ConEx-Marked bytes it has sent. + + + + + + +Kuehlewind & Scheffenegger Experimental [Page 14] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + If the CEG or LEG counter is negative, the respective counter MAY be + reset to zero within one RTT after it was decreased the last time, or + one RTT after recovery if no further congestion occurred. + +7. Open Areas for Experimentation + + All proposed mechanisms in this document are experimental, and + therefore further large-scale experimentation on the Internet is + required to evaluate if the signaling provided by these mechanisms is + accurate and timely enough to produce value for ConEx-based (traffic + management or other) mechanisms. + + The current ConEx specifications assume that congestion is counted in + the number of bytes (including the IP header that directly + encapsulates the CDO and everything that the IP header encapsulates) + [RFC7837]. This decision was taken because most network devices + today experience byte-congestion where the memory is filled exactly + with the number of bytes a packet carries [RFC7141]. However, there + are also devices that may allocate a certain amount of memory per + packet, no matter how large a packet is. These devices get congested + based on the number of packets in their memory and therefore, in this + case, congestion is determined by the number of packets that have + been lost or marked. Furthermore, a transport-layer endpoint such as + a TCP sender or receiver, might not know the exact number of bytes + that a lower layer was carrying. Therefore, a TCP endpoint may only + be able to estimate the exact number of congested bytes (assuming + that all lower-layer headers have the same length). If this + estimation is sufficient to work with, the ConEx Signal needs to be + further evaluated in tests on the Internet together with different + auditor implementations. + + Further, the proposed marking schemes in this document are designed + under the assumption that all TCP packets of a ConEx-capable flow are + of equal size or that flows have a constant mean packet size over a + rather small time frame, like one RTT or less. In most + implementations, this assumption might be taken as well and is + probably true for most of the traffic flows. If this proposed scheme + is used, it is necessary to evaluate how much accuracy degrades if + this precondition is not met. Evaluating with real traffic from + different applications is especially important in making the decision + regarding whether the proposed schemes are sufficient or whether a + more complex scheme is needed. + + In this context, the proposed scheme to set credit markings in Slow + Start runs the risk of providing an insufficient number of markings, + which can cause an audit function to penalize this flow. Both the + proposed credit scheme for Slow Start as well as the scheme in + Congestion Avoidance must be evaluated together with one or more + + + +Kuehlewind & Scheffenegger Experimental [Page 15] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + specific implementations of a ConEx auditor to ensure that both + algorithms, in the sender and in the auditor, work properly together + with a low risk of false positives (which would lead to penalization + of an honest sender). However, if a sender is wrongly assumed to + cheat, the penalization of the audit should be adequate and should + allow an honest sender using a congestion control scheme that is + commonly used today to recover quickly. + + Another open issue is the accuracy of the ECN feedback signal. At + the time of this document's publication, there is no AccECN mechanism + specified yet, and further AccECN will also take some time to be + widely deployed. This document proposes an advanced compatibility + mode for classic ECN. The proposed mechanism can provide more + accurate feedback by utilizing the way classic ECN is specified but + has a higher risk of losing information. To figure out how high this + risk is in a real deployment scenario, further experimental + evaluation is needed. The following argument is intended to prove + that suppressing repetitions of ECE, however, is still safe against + possible congestion collapse due to lost congestion feedback and + should be further proven in experimentation: + + Repetition of ECE in classic ECN is intended to ensure reliable + delivery of congestion feedback. However, with advanced + compatibility mode, it is possible to miss congestion notifications. + This can happen in some implementations if delayed acknowledgements + are used. Further, an ACK containing ECE can simply get lost. If + only a few CE marks are received within one congestion event (e.g., + only one), the loss of one acknowledgement due to (heavy) congestion + on the reverse path can prevent that any congestion notification is + received by the sender. + + However, if loss of feedback exacerbates congestion on the forward + path, more forward packets will be CE-marked, increasing the + likelihood that feedback from at least one CE will get through per + RTT. As long as one ECE reaches the sender per RTT, the sender's + congestion response will be the same as if CWR were not continuous. + The only way that heavy congestion on the forward path could be + completely hidden would be if all ACKs on the reverse path were lost. + If total ACK loss persisted, the sender would time out and do a + congestion response anyway. Therefore, the problem seems confined to + potential suppression of a congestion response during light + congestion. + + Furthermore, even if loss of all ECN feedback leads to no congestion + response, the worst that could happen would be loss instead of ECN- + signaled congestion on the forward path. Given that compatibility + mode does not affect loss feedback, there would be no risk of + congestion collapse. + + + +Kuehlewind & Scheffenegger Experimental [Page 16] + +RFC 7786 TCP Modifications for ConEx May 2016 + + +8. Security Considerations + + General ConEx security considerations are covered extensively in the + ConEx abstract mechanism [RFC7713]. This section covers TCP-specific + concerns that may occur with the addition of ConEx to TCP (while not + discussing generally well-known attacks against TCP). It is assumed + that any altering of ConEx information can be detected by protection + mechanisms in the IP layer and is, therefore, not discussed here but + in [RFC7837]. Further, [RFC7837] describes how to use ConEx to + mitigate flooding attacks by using preferential drop where the use of + ConEx can even increase security. + + The ConEx modifications to TCP provide no mechanism for a receiver to + force a sender not to use ConEx. A receiver can degrade the accuracy + of ConEx by claiming that it does not support SACK, AccECN, or ECN, + but the sender will never have to turn ConEx off. Further, the + receiver cannot force the sender to have to mark ConEx more + conservatively, in order to cover the risk of any inaccuracy. + Instead, it is always the sender's choice to either mark very + conservatively, which ensures that the audit always sees enough + markings to not penalize the flow, or estimate the needed number of + markings more tightly. This second case can lead to inaccurate + marking, and therefore increases the likelihood of loss at an audit + function that will only harm the receiver itself. + + Assuming the sender is limited in some way by a congestion allowance + or quota, a receiver could spoof more loss or ECN congestion feedback + than it actually experiences, in an attempt to make the sender draw + down its allowance faster than necessary. However, over-declaring + congestion simply makes the sender slow down. If the receiver is + interested in the content, it will not want to harm its own + performance. + + However, if the receiver is solely interested in making the sender + draw down its allowance, the net effect will depend on the sender's + congestion control algorithm as permanently adding more and more + additional congestion would cause the sender to more and more reduce + its sending rate. Therefore, a receiver can only maintain a certain + congestion level that is corresponding to a certain sending rate. + With NewReno [RFC6582], doubling congestion feedback causes the + sender to reduce its sending rate such that it would only consume + sqrt(2) = 1.4 times more congestion allowance. However, to improve + scaling, congestion control algorithms are tending towards less + responsive algorithms like Cubic or Compound TCP, and ultimately to + linear algorithms like Data Center TCP (DCTCP) [DCTCP] that aim to + maintain the same congestion level independent of the current sending + rate and always reduce its sending window if the signaled congestion + feedback is higher. In each case, if the receiver doubles congestion + + + +Kuehlewind & Scheffenegger Experimental [Page 17] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + feedback, it causes the sender to respectively consume more allowance + by a factor of 1.2, 1.15, or 1, where 1 implies the attack has become + completely ineffective as no further congestion allowance is consumed + but the flow will decrease its sending rate to a minimum instead. + +9. References + +9.1. Normative References + + [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP + Selective Acknowledgment Options", RFC 2018, + DOI 10.17487/RFC2018, October 1996, + . + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + . + + [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition + of Explicit Congestion Notification (ECN) to IP", + RFC 3168, DOI 10.17487/RFC3168, September 2001, + . + + [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion + Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, + . + + [RFC7713] Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) + Concepts, Abstract Mechanism, and Requirements", RFC 7713, + DOI 10.17487/RFC7713, December 2015, + . + + [RFC7837] Krishnan, S., Kuehlewind, M., Briscoe, B., and C. Ralli, + "IPv6 Destination Option for Congestion Exposure (ConEx)", + RFC 7837, DOI 10.17487/RFC7837, May 2016, + . + + + + + + + + + + + + + + +Kuehlewind & Scheffenegger Experimental [Page 18] + +RFC 7786 TCP Modifications for ConEx May 2016 + + +9.2. Informative References + + [ACCURATE] Briscoe, B., Kuehlewind, M., and R. Scheffenegger, "More + Accurate ECN Feedback in TCP", Work in Progress, + draft-ietf-tcpm-accurate-ecn-00, December 2015. + + [DCTCP] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, + P., Prabhakar, B., Sengupta, S., and M. Sridharan, "Data + Center TCP (DCTCP)", ACM SIGCOMM Computer Communication + Review, Volume 40, Issue 4, pages 63-74, + DOI 10.1145/1851182.1851192, October 2010, + . + + [ECNTCP] Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, + "Re-ECN: Adding Accountability for Causing Congestion to + TCP/IP", Work in Progress, draft-briscoe-conex-re-ecn- + tcp-04, July 2014. + + [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm + for TCP", RFC 3522, DOI 10.17487/RFC3522, April 2003, + . + + [RFC3708] Blanton, E. and M. Allman, "Using TCP Duplicate Selective + Acknowledgement (DSACKs) and Stream Control Transmission + Protocol (SCTP) Duplicate Transmission Sequence Numbers + (TSNs) to Detect Spurious Retransmissions", RFC 3708, + DOI 10.17487/RFC3708, February 2004, + . + + [RFC4015] Ludwig, R. and A. Gurtov, "The Eifel Response Algorithm + for TCP", RFC 4015, DOI 10.17487/RFC4015, February 2005, + . + + [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, + "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting + Spurious Retransmission Timeouts with TCP", RFC 5682, + DOI 10.17487/RFC5682, September 2009, + . + + [RFC6582] Henderson, T., Floyd, S., Gurtov, A., and Y. Nishida, "The + NewReno Modification to TCP's Fast Recovery Algorithm", + RFC 6582, DOI 10.17487/RFC6582, April 2012, + . + + [RFC6789] Briscoe, B., Ed., Woundy, R., Ed., and A. Cooper, Ed., + "Congestion Exposure (ConEx) Concepts and Use Cases", + RFC 6789, DOI 10.17487/RFC6789, December 2012, + . + + + +Kuehlewind & Scheffenegger Experimental [Page 19] + +RFC 7786 TCP Modifications for ConEx May 2016 + + + [RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion + Notification", BCP 41, RFC 7141, DOI 10.17487/RFC7141, + February 2014, . + +Acknowledgements + + The authors would like to thank Bob Briscoe who contributed with + these initial ideas [ECNTCP] and valuable feedback. Moreover, thanks + to Jana Iyengar who also provided valuable feedback. + +Authors' Addresses + + Mirja Kuehlewind (editor) + ETH Zurich + Switzerland + + Email: mirja.kuehlewind@tik.ee.ethz.ch + + + Richard Scheffenegger + NetApp, Inc. + Am Euro Platz 2 + Vienna 1120 + Austria + + Email: rs.ietf@gmx.at + + + + + + + + + + + + + + + + + + + + + + + + + +Kuehlewind & Scheffenegger Experimental [Page 20] + -- cgit v1.2.3