diff options
Diffstat (limited to 'doc/rfc/rfc9347.txt')
-rw-r--r-- | doc/rfc/rfc9347.txt | 1614 |
1 files changed, 1614 insertions, 0 deletions
diff --git a/doc/rfc/rfc9347.txt b/doc/rfc/rfc9347.txt new file mode 100644 index 0000000..51e98eb --- /dev/null +++ b/doc/rfc/rfc9347.txt @@ -0,0 +1,1614 @@ + + + + +Internet Engineering Task Force (IETF) C. Hopps +Request for Comments: 9347 LabN Consulting, L.L.C. +Category: Standards Track January 2023 +ISSN: 2070-1721 + + + Aggregation and Fragmentation Mode for Encapsulating Security Payload + (ESP) and Its Use for IP Traffic Flow Security (IP-TFS) + +Abstract + + This document describes a mechanism for aggregation and fragmentation + of IP packets when they are being encapsulated in Encapsulating + Security Payload (ESP). This new payload type can be used for + various purposes, such as decreasing encapsulation overhead for small + IP packets; however, the focus in this document is to enhance IP + Traffic Flow Security (IP-TFS) by adding Traffic Flow Confidentiality + (TFC) to encrypted IP-encapsulated traffic. TFC is provided by + obscuring the size and frequency of IP traffic using a fixed-size, + constant-send-rate IPsec tunnel. The solution allows for congestion + control, as well as nonconstant send-rate usage. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc9347. + +Copyright Notice + + Copyright (c) 2023 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Revised BSD License text as described in Section 4.e of the + Trust Legal Provisions and are provided without warranty as described + in the Revised BSD License. + +Table of Contents + + 1. Introduction + 1.1. Terminology & Concepts + 2. The AGGFRAG Tunnel + 2.1. Tunnel Content + 2.2. Payload Content + 2.2.1. DataBlocks + 2.2.2. End Padding + 2.2.3. Fragmentation, Sequence Numbers, and All-Pad Payloads + 2.2.4. Empty Payload + 2.2.5. IP Header Value Mapping + 2.2.6. IPv4 Time To Live (TTL), IPv6 Hop Limit, and ICMP + Messages + 2.2.7. Effective MTU of the Tunnel + 2.3. Exclusive SA Use + 2.4. Modes of Operation + 2.4.1. Non-Congestion-Controlled Mode + 2.4.2. Congestion-Controlled Mode + 2.5. Summary of Receiver Processing + 3. Congestion Information + 3.1. ECN Support + 4. Configuration of AGGFRAG Tunnels for IP-TFS + 4.1. Bandwidth + 4.2. Fixed Packet Size + 4.3. Congestion Control + 5. IKEv2 + 5.1. USE_AGGFRAG Notification Message + 6. Packet and Data Formats + 6.1. AGGFRAG_PAYLOAD Payload + 6.1.1. Non-Congestion-Control AGGFRAG_PAYLOAD Payload Format + 6.1.2. Congestion Control AGGFRAG_PAYLOAD Payload Format + 6.1.3. Data Blocks + 6.1.4. IKEv2 USE_AGGFRAG Notification Message + 7. IANA Considerations + 7.1. ESP Next Header Value + 7.2. AGGFRAG_PAYLOAD Sub-Types + 7.3. USE_AGGFRAG Notify Message Status Type + 8. Security Considerations + 9. References + 9.1. Normative References + 9.2. Informative References + Appendix A. Example of an Encapsulated IP Packet Flow + Appendix B. A Send and Loss Event Rate Calculation + Appendix C. Comparisons of IP-TFS + C.1. Comparing Overhead + C.1.1. IP-TFS Overhead + C.1.2. ESP with Padding Overhead + C.2. Overhead Comparison + C.3. Comparing Available Bandwidth + C.3.1. Ethernet + Acknowledgements + Contributors + Author's Address + +1. Introduction + + Traffic analysis [RFC4301] [AppCrypt] is the act of extracting + information about data being sent through a network. While directly + obscuring the data with encryption [RFC4303], the patterns in the + message traffic may expose information due to variations in its shape + and timing [RFC8546] [AppCrypt]. Hiding the size and frequency of + traffic is referred to as Traffic Flow Confidentiality (TFC), per + [RFC4303]. + + [RFC4303] provides for TFC by allowing padding to be added to + encrypted IP packets and allowing for transmission of all-pad packets + (indicated using protocol 59). This method has the major limitation + that it can significantly underutilize the available bandwidth. + + This document defines an aggregation and fragmentation (AGGFRAG) mode + for ESP, as well as ESP's use for IP Traffic Flow Security (IP-TFS). + This solution provides for full TFC without the aforementioned + bandwidth limitation. This is accomplished by using a constant-send- + rate IPsec [RFC4303] tunnel with fixed-size encapsulating packets; + however, these fixed-size packets can contain partial, whole, or + multiple IP packets to maximize the bandwidth of the tunnel. A + nonconstant send rate is allowed, but the confidentiality properties + of its use are outside the scope of this document. + + For a comparison of the overhead of IP-TFS with the TFC solution + prescribed in [RFC4303], see Appendix C. + + Additionally, IP-TFS provides for operating fairly within congested + networks [RFC2914]. This is important for when the IP-TFS user is + not in full control of the domain through which the IP-TFS tunnel + path flows. + + The mechanisms, such as the AGGFRAG mode, defined in this document + are generic with the intent of allowing for non-TFS uses, but such + uses are outside the scope of this document. + +1.1. Terminology & Concepts + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + + This document assumes familiarity with IP security concepts, + including TFC, as described in [RFC4301]. + +2. The AGGFRAG Tunnel + + As mentioned in Section 1, the AGGFRAG mode utilizes an IPsec + [RFC4303] tunnel as its transport. For the purpose of IP-TFS, fixed- + size encapsulating packets are sent at a constant rate on the AGGFRAG + tunnel. + + The primary input to the tunnel algorithm is the requested bandwidth + to be used by the tunnel. Two values are then required to provide + for this bandwidth use: the fixed size of the encapsulating packets + and the rate at which to send them. + + The fixed packet size MAY either be specified manually or be + determined through other methods, such as the Packetization Layer MTU + Discovery (PLMTUD) [RFC4821] [RFC8899] or Path MTU Discovery (PMTUD) + [RFC1191] [RFC8201]. PMTUD is known to have issues, so PLMTUD is + considered the more robust option. For PLMTUD, congestion control + payloads can be used as in-band probes (see Section 6.1.2 and + [RFC8899]). + + Given the encapsulating packet size and the requested bandwidth to be + used, the corresponding packet send rate can be calculated. The + packet send rate is the requested bandwidth to be used, which is then + divided by the size of the encapsulating packet. + + The egress (receiving) side of the AGGFRAG tunnel MUST allow for and + expect the ingress (sending) side of the AGGFRAG tunnel to vary the + size and rate of sent encapsulating packets, unless constrained by + other policy. + +2.1. Tunnel Content + + As previously mentioned, one issue with the TFC padding solution in + [RFC4303] is the large amount of wasted bandwidth, as only one IP + packet can be sent per encapsulating packet. In order to maximize + bandwidth, IP-TFS breaks this one-to-one association by introducing + an AGGFRAG mode for ESP. + + The AGGFRAG mode aggregates and fragments the inner IP traffic flow + into encapsulating IPsec tunnel packets. For IP-TFS, the IPsec + encapsulating tunnel packets are a fixed size. Padding is only added + to the tunnel packets if there is no data available to be sent at the + time of tunnel packet transmission or if fragmentation has been + disabled by the receiver. + + This is accomplished using a new Encapsulating Security Payload (ESP) + [RFC4303] Next Header field value AGGFRAG_PAYLOAD (Section 6.1). + + Other non-IP-TFS uses of this AGGFRAG mode have been suggested, such + as increased performance through packet aggregation, as well as + handling MTU issues using fragmentation. These uses are not defined + here but are also not restricted by this document. + +2.2. Payload Content + + The AGGFRAG_PAYLOAD payload content defined in this document consists + of a 4- or 24-octet header, followed by either a partial data block, + a full data block, or multiple partial or full data blocks. The + following diagram illustrates this payload within the ESP packet. + See Section 6.1 for the exact formats of the AGGFRAG_PAYLOAD payload. + + . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + . Outer Encapsulating Header ... . + . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + . ESP Header... . + +---------------------------------------------------------------+ + | [AGGFRAG sub-type/flags] : BlockOffset | + +---------------------------------------------------------------+ + : [Optional Congestion Info] : + +---------------------------------------------------------------+ + | DataBlocks ... ~ + ~ ~ + ~ | + +---------------------------------------------------------------| + . ESP Trailer... . + . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + + Figure 1: Layout of an AGGFRAG Mode IPsec Packet + + The BlockOffset value is either zero or some offset into or past the + end of the DataBlocks data. + + If the BlockOffset value is zero, it means that the DataBlocks data + begins with a new data block. + + Conversely, if the BlockOffset value is non-zero, it points to the + start of the new data block, and the initial DataBlocks data belongs + to the data block that is still being reassembled. + + If the BlockOffset points past the end of the DataBlocks data, then + the next data block occurs in a subsequent encapsulating packet. + + Having the BlockOffset always point at the next available data block + allows for recovering the next inner packet in the presence of outer + encapsulating packet loss. + + An example AGGFRAG mode packet flow can be found in Appendix A. + +2.2.1. DataBlocks + + +---------------------------------------------------------------+ + | Type | rest of IPv4, IPv6, or pad... + +-------- + + Figure 2: Layout of a Data Block + + A data block is defined by a 4-bit type code, followed by the data + block data. The type values have been carefully chosen to coincide + with the IPv4/IPv6 version field values so that no per-data block + type overhead is required to encapsulate an IP packet. Likewise, the + length of the data block is extracted from the encapsulated IPv4's + Total Length or IPv6's Payload Length fields. + +2.2.2. End Padding + + Since a data block's type is identified in its first 4 bits, the only + time padding is required is when there is no data to encapsulate. + For this end padding, a Pad Data Block is used. + +2.2.3. Fragmentation, Sequence Numbers, and All-Pad Payloads + + In order for a receiver to reassemble fragmented inner packets, the + sender MUST send the inner packet fragments back to back in the + logical outer packet stream (i.e., using consecutive ESP sequence + numbers). However, the sender is allowed to insert "all-pad" + payloads (i.e., payloads with a BlockOffset of zero and a single pad + data block ) in between the packets carrying the inner packet + fragment payloads. This interleaving of all-pad payloads allows the + sender to always send a tunnel packet, regardless of the + encapsulation computational requirements. + + When a receiver is reassembling an inner packet, and it receives an + "all-pad" payload, it increments the expected sequence number that + the next inner packet fragment is expected to arrive in. + + Given the above, the receiver will need to handle out-of-order + arrival of outer ESP packets prior to reassembly processing. ESP + already provides for optionally detecting replay attacks. Detecting + replay attacks normally utilizes a window method. A similar + sequence-number-based sliding window can be used to correct + reordering of the outer packet stream. Receiving a larger (newer) + sequence number packet advances the window, and if any older ESP + packets whose sequence numbers the window has passed by are received, + then the packets are dropped. A good choice for the size of this + window depends on the amount of misordering the user is experiencing; + however, a value of 3 has been suggested as a default when no more + informed choice exists. + + As the amount of misordering that may be present is hard to predict, + the window size SHOULD be configurable by the user. Implementations + MAY also dynamically adjust the reordering window based on actual + misordering seen in arriving packets. + + Please note, when IP-TFS sends a continuous stream of packets, there + is no requirement for an explicit lost packet timer; however, using a + lost packet timer is RECOMMENDED. If an implementation does not use + a lost packet timer and only considers an outer packet lost when the + reorder window moves by it, the inner traffic can be delayed by up to + the reorder window size times the per-packet send rate. This delay + could be significant for slower send rates or when larger reorder + window sizes are in use. As the lost packet timer affects the delay + of inner packet delivery, an implementation or user could choose to + set it proportionate to the tunnel rate. + + While ESP guarantees an increasing sequence number with subsequently + sent packets, it does not actually require the sequence numbers to be + generated consecutively (e.g., sending only even-numbered sequence + numbers would be allowed, as long as they are always increasing). + Gaps in the sequence numbers will not work for this document, so the + sequence number stream MUST increase monotonically by 1 for each + subsequent packet. + + When using the AGGFRAG_PAYLOAD in conjunction with replay detection, + the window size for both MAY be reduced to the smaller of the two + window sizes. This is because packets outside of the smaller window + but inside the larger window would still be dropped by the mechanism + with the smaller window size. However, there is also no requirement + to make these values the same. Indeed, in some cases, such as slow + tunnels where a very small or zero reorder window size is + appropriate, the user may still want a large replay detection window + to log replayed packets. Additionally, large replay windows can be + implemented with very little overhead, compared to large reorder + windows. + + Finally, as sequence numbers are reset when switching Security + Associations (SAs) (e.g., when rekeying a Child SA), senders MUST NOT + send initial fragments of an inner packet using one SA and subsequent + fragments in a different SA. + + | A note on BlockOffset values: Senders MUST encode the + | BlockOffset consistently with the immediately preceding non- + | all-pad payload packet. Specifically, if the immediately + | preceding non-all-pad payload packet ended with a Pad Data + | Block, this BlockOffset MUST be zero, as Pad Data Blocks are + | never fragmented. The BlockOffset MUST be consistent with the + | remaining size implied by the length field from the fragmented + | inner packet. + +2.2.3.1. Optional Extra Padding + + When the tunnel bandwidth is not being fully utilized, a sender MAY + pad out the current encapsulating packet in order to deliver an inner + packet unfragmented in the following outer packet. The benefit would + be to avoid inner packet fragmentation in the presence of a bursty + offered load (non-bursty traffic will naturally not fragment). + Senders MAY also choose to allow for a minimum fragment size to be + configured (e.g., as a percentage of the AGGFRAG_PAYLOAD payload + size) to avoid fragmentation at the cost of tunnel bandwidth. The + costs with these methods are complexity and an added delay of inner + traffic. The main advantage to avoiding fragmentation is to minimize + inner packet loss in the presence of outer packet loss. When this is + worthwhile (e.g., how much loss and what type of loss is required, + given different inner traffic shapes and utilization, for this to + make sense) and what values to use for the allowable/added delay may + be worth researching but is outside the scope of this document. + + While use of padding to avoid fragmentation does not impact + interoperability, if padding is used inappropriately, it can reduce + the effective throughput of a tunnel. Senders implementing either of + the above approaches will need to take care to not reduce the + effective capacity, and overall utility, of the tunnel through the + overuse of padding. + +2.2.4. Empty Payload + + To support reporting of congestion control information (described + later) using a non-AGGFRAG_PAYLOAD-enabled SA, it is allowed to send + an AGGFRAG_PAYLOAD payload with no data blocks (i.e., the ESP payload + length is equal to the AGGFRAG_PAYLOAD header length). This special + payload is called an empty payload. + + Currently, this situation is only applicable in use cases without + Internet Key Exchange Protocol Version 2 (IKEv2). + +2.2.5. IP Header Value Mapping + + [RFC4301] provides some direction on when and how to map various + values from an inner IP header to the outer encapsulating header, + namely the Don't Fragment (DF) bit [RFC0791], the Differentiated + Services (DS) field [RFC2474], and the Explicit Congestion + Notification (ECN) field [RFC3168]. Unlike in [RFC4301], the AGGFRAG + mode may, and often will, be encapsulating more than one IP packet + per ESP packet. To deal with this, these mappings are restricted + further. + +2.2.5.1. DF Bit + + The AGGFRAG mode never maps the inner DF bit, as it is unrelated to + the AGGFRAG tunnel functionality; the AGGFRAG mode never needs to IP + fragment the inner packets, and the inner packets will not affect the + fragmentation of the outer encapsulation packets. + +2.2.5.2. ECN Value + + The ECN value need not be mapped, as any congestion related to the + constant-send-rate IP-TFS tunnel is unrelated (by design) to the + inner traffic flow. The sender MAY still set the ECN value of inner + packets based on the normal ECN specification [RFC3168] [RFC4301] + [RFC6040]. + +2.2.5.3. DS Field + + By default, the DS field SHOULD NOT be copied, although a sender MAY + choose to allow for configuration to override this behavior. A + sender SHOULD also allow the DS value to be set by configuration. + +2.2.6. IPv4 Time To Live (TTL), IPv6 Hop Limit, and ICMP Messages + + How to modify the inner packet IPv4 TTL [RFC0791] or IPv6 Hop Limit + [RFC8200] is specified in [RFC4301]. + + [RFC4301] specifies how to apply policy to authenticated and + unauthenticated ICMP error packets (e.g., Destination Unreachable) + arriving at or being forwarded through the endpoint, in particular, + whether to process, ignore, or forward said packets. With the one + exception that this document does not change the handling of these + packets, they should be handled as specified in [RFC4301]. + + The one way in which an AGGFRAG tunnel differs in ICMP error packet + mechanics is with PMTU. When fragmentation is enabled on the AGGFRAG + tunnel, then no ICMP "Too Big" errors need to be generated for + arriving ingress traffic, as the arriving inner packets will be + naturally fragmented by the AGGFRAG encapsulation. + + Otherwise, when fragmentation has been disabled on the AGGFRAG + tunnel, then the treatment of arriving inner traffic exactly maps to + that of a non-AGGFRAG ESP tunnel. Explicitly, IPv4 with DF set and + IPv6 packets that cannot fit in its own outer packet payload will + generate the appropriate ICMP "Too Big" error, as described in + [RFC4301], and IPv4 packets without DF set will be IP fragmented, as + described in [RFC4301]. + + Packets egressing the tunnel continue to be handled as specified in + [RFC4301]. + + All other aspects of PMTU and the handling of ICMP "Too Big" messages + (i.e., with regards to the outer AGGFRAG/ESP tunnel packet size) also + remain unchanged from [RFC4301]. + +2.2.7. Effective MTU of the Tunnel + + Unlike in [RFC4301], there is normally no effective MTU (EMTU) on an + AGGFRAG tunnel, as all IP packet sizes are properly transmitted + without requiring IP fragmentation prior to tunnel ingress. That + said, a sender MAY allow for explicitly configuring an MTU for the + tunnel. + + If fragmentation has been disabled on the AGGFRAG tunnel, then the + tunnel's EMTU and behaviors are the same as normal IPsec tunnels + [RFC4301]. + +2.3. Exclusive SA Use + + This document does not specify mixed use of an AGGFRAG_PAYLOAD- + enabled SA. A sender MUST only send AGGFRAG_PAYLOAD payloads over an + SA configured for AGGFRAG mode. + +2.4. Modes of Operation + + Just as with normal IPsec/ESP SAs, AGGFRAG SAs are unidirectional. + Bidirectional IP-TFS functionality is achieved by setting up 2 + AGGFRAG SAs, one in either direction. + + An AGGFRAG tunnel used for IP-TFS can operate in 2 modes, a non- + congestion-controlled mode and congestion-controlled mode. + +2.4.1. Non-Congestion-Controlled Mode + + In the non-congestion-controlled mode, IP-TFS sends fixed-size + packets over an AGGFRAG tunnel at a constant rate. The packet send + rate is constant and is not automatically adjusted, regardless of any + network congestion (e.g., packet loss). + + For similar reasons as given in [RFC7510], the non-congestion- + controlled mode MUST only be used where the user has full + administrative control over any path the tunnel will take and MUST + NOT be used if this is not the case. This is required so the user + can guarantee the bandwidth and also be sure as to not be negatively + affecting network congestion [RFC2914]. In this case, packet loss + should be reported to the administrator (e.g., via syslog, YANG + notification, SNMP traps, etc.) so that any failures due to a lack of + bandwidth can be corrected. The use of circuit breakers is also + RECOMMENDED (Section 2.4.2.1). + + Users that choose the non-congestion-controlled mode need to + understand that this mode will send packets at a constant rate, + utilizing a constant, fixed bandwidth, and will not adjust based on + congestion. Thus, if they do not guarantee the bandwidth required by + the tunnel, the tunnel's operation, as well as the rest of their + network, may be negatively impacted. + + One expected use case for the non-congestion-controlled mode is to + guarantee the full tunnel bandwidth is available and preferred over + other non-tunnel traffic. In fact, a typical site-to-site use case + might have all of the user traffic utilizing the IP-TFS tunnel. + + The non-congestion-controlled mode is also appropriate if ESP over + TCP is in use [RFC9329]. However, the use of TCP is considered a + fallback-only solution for IPsec; it is highly not preferred. This + is also one of the reasons that TCP was not chosen as the + encapsulation for IP-TFS instead of AGGFRAG. + +2.4.2. Congestion-Controlled Mode + + With the congestion-controlled mode, IP-TFS adapts to network + congestion by lowering the packet send rate to accommodate the + congestion, as well as raising the rate when congestion subsides. + Since overhead is per packet, by allowing for maximal fixed-size + packets and varying the send rate, transport overhead is minimized. + + The output of the congestion control algorithm will adjust the rate + at which the ingress sends packets. While this document does not + require a specific congestion control algorithm, best current + practice RECOMMENDS that the algorithm conform to [RFC5348]. + Congestion control principles are documented in [RFC2914] as well. + There is an example in [RFC4342] of the algorithm in [RFC5348], which + matches the requirements of IP-TFS (i.e., designed for fixed-size + packets and send rate varied based on congestion). + + The required inputs for the TCP-friendly rate control algorithm + described in [RFC5348] are the receiver's loss event rate and the + sender's estimated round-trip time (RTT). These values are provided + by IP-TFS using the congestion information header fields described in + Section 3. In particular, these values are sufficient to implement + the algorithm described in [RFC5348]. + + At a minimum, the congestion information MUST be sent, from the + receiver and from the sender, at least once per RTT. Prior to + establishing an RTT, the information SHOULD be sent constantly from + the sender and the receiver so that an RTT estimate can be + established. Not receiving this information over multiple + consecutive RTT intervals should be considered a congestion event + that causes the sender to adjust its sending rate lower. For + example, this is called the "no feedback timeout" in [RFC4342], and + it is equal to 4 RTT intervals. When a "no feedback timeout" has + occurred, the sending rate is halved, as per [RFC4342]. + + An implementation MAY choose to always include the congestion + information in its AGGFRAG payload header if it is sending it on an + IP-TFS-enabled SA. Since IP-TFS normally will operate with a large + packet size, the congestion information should represent a small + portion of the available tunnel bandwidth. An implementation + choosing to always send the data MAY also choose to only update the + LossEventRate and RTT header field values it sends every RTT through. + + When choosing a congestion control algorithm (or a selection of + algorithms), note that IP-TFS is not providing for reliable delivery + of IP traffic, and so per-packet acknowledgements (ACKs) are not + required and are not provided. + + It is worth noting that the variable send rate of a congestion- + controlled AGGFRAG tunnel is not private; however, this send rate is + being driven by network congestion, and as long as the encapsulated + (inner) traffic flow shape and timing are not directly affecting the + (outer) network congestion, the variations in the tunnel rate will + not weaken the provided inner traffic flow confidentiality. + +2.4.2.1. Circuit Breakers + + In addition to congestion control, implementations that support the + non-congestion-control mode SHOULD implement circuit breakers + [RFC8084] as a recovery method of last resort. When circuit breakers + are enabled, an implementation SHOULD also enable congestion control + reports so that circuit breakers have information to act on. + + The pseudowire congestion considerations [RFC7893] are equally + applicable to the mechanisms defined in this document, notably the + text on inelastic traffic. + + One example of a simple, slow-trip circuit breaker that an + implementation may provide would utilize 2 values: the amount of + persistent loss rate required to trip the circuit breaker and the + required length of time this persistent loss rate must be seen to + trip the circuit breaker. These 2 value are required configurations + from the user. When the circuit breaker is tripped, the tunnel + traffic is disabled and an appropriate log message or other + management type alarm is triggered, indicating operation intervention + is required. + +2.5. Summary of Receiver Processing + + An AGGFRAG-enabled SA receiver has a few tasks to perform. + + The receiver MAY process incoming AGGFRAG_PAYLOAD payloads as soon as + they arrive, as much as it can, i.e., if the incoming AGGFRAG_PAYLOAD + packet contains complete inner packet(s), the receiver should extract + and transmit them immediately. For partial packets, the receiver + needs to keep the partial packets in the memory until they fall out + from the reordering window or until the missing parts of the packets + are received, in which case, it will reassemble and transmit them. + If the AGGFRAG_PAYLOAD payload contains multiple packets, they SHOULD + be sent out in the order they are in the AGGFRAG_PAYLOAD (i.e., keep + the original order they were received on the other end). The cost of + using this method is that an amplification of out-of-order delivery + of inner packets can occur due to inner packet aggregation. + + Instead of the method described in the previous paragraph, the + receiver MAY reorder out-of-order AGGFRAG_PAYLOAD payloads received + into in-sequence-order AGGFRAG_PAYLOAD payloads (Section 2.2.3), and + only after it has an in-order AGGFRAG_PAYLOAD payload stream would + the receiver transmit the inner packets. Using this method will + ensure the inner packets are sent in order. The cost of this method + is that a lost packet will cause a delay of up to the lost packet + timer interval (or the full reorder window if no lost packet timer is + used). Additionally, there can be extra burstiness in the output + stream. This burstiness can happen when a lost packet is dropped + from the reorder window, and the remaining outer packets in the + reorder window are immediately processed and sent out back to back. + + Additionally, if congestion control is enabled, the receiver sends + congestion control data (Section 6.1.2) back to the sender, as + described in Sections 2.4.2 and 3. + + Finally, a note on receiving incorrect BlockOffset values: To account + for misbehaving senders, a receiver SHOULD gracefully handle the case + where the BlockOffset of consecutive packets, and/or the inner packet + they share, do not agree. It MAY drop the inner packet or one or + both of the outer packets. + +3. Congestion Information + + In order to support the congestion-controlled mode, the sender needs + to know the loss event rate and to approximate the RTT [RFC5348]. In + order to obtain these values, the receiver sends congestion control + information on its SA back to the sender. Thus, to support + congestion control, the receiver MUST have a paired SA back to the + sender (this is always the case when the tunnel was created using + IKEv2). If the SA back to the sender is a non-AGGFRAG_PAYLOAD- + enabled SA, then an AGGFRAG_PAYLOAD empty payload (i.e., header only) + is used to convey the information. + + In order to calculate a loss event rate compatible with [RFC5348], + the receiver needs to have an RTT estimate. Thus, the sender + communicates this estimate in the RTT header field. On startup, this + value will be zero, as no RTT estimate is yet known. + + In order for the sender to estimate its RTT value, the sender places + a timestamp value in the TVal header field. On first receipt of this + TVal, the receiver records the new TVal value, along with the time it + arrived locally. Subsequent receipt of the same TVal MUST NOT update + the recorded time. + + When the receiver sends its congestion control header, it places this + latest recorded TVal in the TEcho header field, along with 2 delay + values: Echo Delay and Transmit Delay. The Echo Delay value is the + time delta from the recorded arrival time of TVal and the current + clock in microseconds. The second value, Transmit Delay, is the + receiver's current transmission delay on the tunnel (i.e., the + average time between sending packets on its half of the AGGFRAG + tunnel). + + When the sender receives back its TVal in the TEcho header field, it + calculates 2 RTT estimates. The first is the actual delay found by + subtracting the TEcho value from its current clock and then + subtracting the Echo Delay as well. The second RTT estimate is found + by adding the received Transmit Delay header value to the sender's + own transmission delay (i.e., the average time between sending + packets on its half of the AGGFRAG tunnel). The larger of these 2 + RTT estimates SHOULD be used as the RTT value. + + The two RTT estimates are required to handle different combinations + of faster or slower tunnel packet paths with faster or slower fixed + tunnel rates. Choosing the larger of the two values guarantees that + the RTT is never considered faster than the aggregate transmission + delay based on the IP-TFS send rate (the second estimate), as well as + never being considered faster than the actual RTT along the tunnel + packet path (the first estimate). + + The receiver also calculates, and communicates in the LossEventRate + header field, the loss event rate for use by the sender. This is + slightly different from [RFC4342], which periodically sends all the + loss interval data back to the sender so that it can do the + calculation. See Appendix B for a suggested way to calculate the + loss event rate value. Initially, this value will be zero + (indicating no loss) until enough data has been collected by the + receiver to update it. + +3.1. ECN Support + + In addition to normal packet loss information, the AGGFRAG mode + supports use of the ECN bits in the encapsulating IP header [RFC3168] + for identifying congestion. If ECN use is enabled and a packet + arrives at the egress (receiving) side with the Congestion + Experienced (CE) value set, then the receiver considers that packet + as being dropped, although it does not drop it. The receiver MUST + set the E bit in any AGGFRAG_PAYLOAD payload header containing a + LossEventRate value derived from a CE value being considered. + + In [RFC6040], which updates [RFC3168] and [RFC4301], behaviors for + marking the outer ECN field value based on the ECN field of the inner + packet are defined. As the AGGFRAG mode may have multiple inner + packets present in a single outer packet, and there is no obvious + correct way to map these multiple values to the single outer packet + ECN field value, the tunnel ingress endpoint SHOULD operate in the + "compatibility" mode, rather than the "default" mode from [RFC6040]. + In particular, this means that the ingress (sending) endpoint of the + tunnel always sets the newly constructed outer encapsulating packet + header ECN field to Not-ECT [RFC6040]. + +4. Configuration of AGGFRAG Tunnels for IP-TFS + + IP-TFS is meant to be deployable with a minimal amount of + configuration. All IP-TFS-specific configuration should be specified + at the unidirectional tunnel ingress (sending) side. It is intended + that non-IKEv2 operation is supported, at least, with local static + configuration. + + YANG and MIB documents have been defined for IP-TFS in [RFC9348] and + [RFC9349]. + +4.1. Bandwidth + + Bandwidth is a local configuration option. For the non-congestion- + controlled mode, the bandwidth SHOULD be configured. For the + congestion-controlled mode, the bandwidth can be configured or the + congestion control algorithm discovers and uses the maximum bandwidth + available. No standardized configuration method is required. + +4.2. Fixed Packet Size + + The fixed packet size to be used for the tunnel encapsulation packets + MAY be configured manually or can be automatically determined using + other methods, such as PLMTUD [RFC4821] [RFC8899] or PMTUD [RFC1191] + [RFC8201]. As PMTUD is known to have issues, PLMTUD is considered + the more robust option. No standardized configuration method is + required. + +4.3. Congestion Control + + Congestion control is a local configuration option. No standardized + configuration method is required. + +5. IKEv2 + +5.1. USE_AGGFRAG Notification Message + + As mentioned previously, AGGFRAG tunnels utilize ESP payloads of type + AGGFRAG_PAYLOAD. + + When using IKEv2, a new "USE_AGGFRAG" notification message enables + the AGGFRAG_PAYLOAD payload on a Child SA pair. The method used is + similar to how USE_TRANSPORT_MODE is negotiated, as described in + [RFC7296]. + + To request use of the AGGFRAG_PAYLOAD payload on the Child SA pair, + the initiator includes the USE_AGGFRAG notification in an SA payload + requesting a new Child SA (either during the initial IKE_AUTH or + during CREATE_CHILD_SA exchanges). If the request is accepted, then + the response MUST also include a notification of type USE_AGGFRAG. + If the responder declines the request, the Child SA will be + established without AGGFRAG_PAYLOAD payload use enabled. If this is + unacceptable to the initiator, the initiator MUST delete the Child + SA. + + As the use of the AGGFRAG_PAYLOAD payload is currently only defined + for non-transport-mode tunnels, the USE_AGGFRAG notification MUST NOT + be combined with the USE_TRANSPORT notification. + + The USE_AGGFRAG notification contains a 1-octet payload of flags that + specify requirements from the sender of the notification. If any + requirement flags are not understood or cannot be supported by the + receiver, then the receiver SHOULD NOT enable use of AGGFRAG_PAYLOAD + (either by not responding with the USE_AGGFRAG notification or, in + the case of the initiator, by deleting the Child SA if the now- + established non-AGGFRAG_PAYLOAD using SA is unacceptable). + + The notification type and payload flag values are defined in + Section 6.1.4. + +6. Packet and Data Formats + + The packet and data formats defined below are generic with the intent + of allowing for non-IP-TFS uses, but such uses are outside the scope + of this document. + +6.1. AGGFRAG_PAYLOAD Payload + + ESP Next Header value: 144 + + An AGGFRAG payload is identified by the ESP Next Header value + AGGFRAG_PAYLOAD, which has the value 144, which has been reserved in + the IP protocol numbers space. The first octet of the payload + indicates the format of the remaining payload data. + + 0 1 2 3 4 5 6 7 + +-+-+-+-+-+-+-+-+-+-+- + | Sub-type | ... + +-+-+-+-+-+-+-+-+-+-+- + + Figure 3: AGGFRAG_PAYLOAD Payload Format + + Sub-type: + An 8-bit value indicating the payload format. + + This document defines 2 payload sub-types. These payload formats are + defined in the following sections. + +6.1.1. Non-Congestion-Control AGGFRAG_PAYLOAD Payload Format + + The non-congestion-control AGGFRAG_PAYLOAD payload consists of a + 4-octet header, followed by a variable amount of DataBlocks data, as + shown below. + + 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Sub-Type (0) | Reserved | BlockOffset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DataBlocks ... + +-+-+-+-+-+-+-+-+-+-+- + + Figure 4: Non-Congestion-Control Payload Format + + Sub-type: + An octet indicating the payload format. For this non-congestion- + control format, the value is 0. + + Reserved: + An octet set to 0 on generation and ignored on receipt. + + BlockOffset: + A 16-bit unsigned integer counting the number of octets of + DataBlocks data before the start of a new data block. If the + start of a new data block occurs in a subsequent payload, the + BlockOffset will point past the end of the DataBlocks data. In + this case, all the DataBlocks data belongs to the current data + block being assembled. When the BlockOffset extends into + subsequent payloads, it continues to only count DataBlocks data + (i.e., it does not count subsequent packets of the non-DataBlocks + data, such as header octets). + + DataBlocks: + Variable number of octets that begins with the start of a data + block or the continuation of a previous data block, followed by + zero or more additional data blocks. + +6.1.2. Congestion Control AGGFRAG_PAYLOAD Payload Format + + The congestion control AGGFRAG_PAYLOAD payload consists of a 24-octet + header, followed by a variable amount of DataBlocks data, as shown + below. + + 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Sub-type (1) | Reserved |P|E| BlockOffset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | LossEventRate | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | RTT | Echo Delay ... + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + ... Echo Delay | Transmit Delay | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | TVal | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | TEcho | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | DataBlocks ... + +-+-+-+-+-+-+-+-+-+-+- + + Figure 5: Congestion Control Payload Format + + Sub-type: + An octet indicating the payload format. For this congestion + control format, the value is 1. + + Reserved: + A 6-bit field set to 0 on generation and ignored on receipt. + + P: + A 1-bit value that, if set, indicates that PLMTUD probing is in + progress. This information can be used to avoid treating missing + packets as loss events by the congestion control algorithm when + running the PLMTUD probe algorithm. + + E: + A 1-bit value that, if set, indicates that Congestion Experienced + (CE) ECN bits were received and used in deriving the reported + LossEventRate. + + BlockOffset: + The same value as the non-congestion-controlled payload format + value. + + LossEventRate: + A 32-bit value specifying the inverse of the current loss event + rate, as calculated by the receiver. A value of zero indicates no + loss. Otherwise, the loss event rate is 1/LossEventRate. + + RTT: + A 22-bit value specifying the sender's current RTT estimate in + microseconds. The value MAY be zero prior to the sender having + calculated an RTT estimate. The value SHOULD be set to zero on + non-AGGFRAG_PAYLOAD-enabled SAs. If the RTT is equal to or larger + than 0x3FFFFF, the value MUST be set to 0x3FFFFF. + + Echo Delay: + A 21-bit value specifying the delay in microseconds incurred + between the receiver first receiving the TVal value, which it is + sending back in TEcho. If the delay is equal to or larger than + 0x1FFFFF, the value MUST be set to 0x1FFFFF. + + Transmit Delay: + A 21-bit value specifying the transmission delay in microseconds. + This is the fixed (or average) delay on the receiver between it + sending packets on the IP-TFS tunnel. If the delay is equal to or + larger than 0x1FFFFF, the value MUST be set to 0x1FFFFF. + + TVal: + An opaque, 32-bit value that will be echoed back by the receiver + in later packets in the TEcho field, along with an Echo Delay + value of how long that echo took. + + TEcho: + The opaque, 32-bit value from a received packet's TVal field. The + received TVal is placed in TEcho, along with an Echo Delay value + indicating how long it has been since receiving the TVal value. + + DataBlocks: + Variable number of octets that begins with the start of a data + block or the continuation of a previous data block, followed by + zero or more additional data blocks. For the special case of + sending congestion control information on a non-IP-TFS-enabled SA, + this field MUST be empty (i.e., be zero octets long). + +6.1.3. Data Blocks + + 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Type | IPv4, IPv6, or pad... + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- + + Figure 6: Data Block Format + + Type: + A 4-bit field where 0x0 identifies a Pad Data Block, 0x4 indicates + an IPv4 data block, and 0x6 indicates an IPv6 data block. + +6.1.3.1. IPv4 Data Block + + 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | 0x4 | IHL | TypeOfService | TotalLength | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Rest of the inner packet ... + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- + + Figure 7: IPv4 Data Block Format + + These values are the actual values within the encapsulated IPv4 + header. In other words, the start of this data block is the start of + the encapsulated IP packet. + + Type: + A 4-bit value of 0x4 indicating IPv4 (i.e., first nibble of the + IPv4 packet). + + TotalLength: + The 16-bit unsigned integer "Total Length" field of the IPv4 inner + packet. + +6.1.3.2. IPv6 Data Block + + 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | 0x6 | TrafficClass | FlowLabel | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | PayloadLength | Rest of the inner packet ... + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- + + Figure 8: IPv6 Data Block Format + + These values are the actual values within the encapsulated IPv6 + header. In other words, the start of this data block is the start of + the encapsulated IP packet. + + Type: + A 4-bit value of 0x6 indicating IPv6 (i.e., first nibble of the + IPv6 packet). + + PayloadLength: + The 16-bit unsigned integer "Payload Length" field of the inner + IPv6 inner packet. + +6.1.3.3. Pad Data Block + + 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | 0x0 | Padding ... + +-+-+-+-+-+-+-+-+-+-+- + + Figure 9: Pad Data Block Format + + Type: + A 4-bit value of 0x0 indicating a padding data block. + + Padding: + Extends to end of the encapsulating packet. + +6.1.4. IKEv2 USE_AGGFRAG Notification Message + + As discussed in Section 5.1, a notification message USE_AGGFRAG is + used to negotiate use of the ESP AGGFRAG_PAYLOAD Next Header value. + + The USE_AGGFRAG Notification Message State Type is 16442. + + The notification payload contains 1 octet of requirement flags. + There are currently 2 requirement flags defined. This may be revised + by later specifications. + + +-+-+-+-+-+-+-+-+ + |0|0|0|0|0|0|C|D| + +-+-+-+-+-+-+-+-+ + + Figure 10: USE_AGGFRAG Requirement Flags + + 0: + 6 bits - Reserved MUST be zero on send, unless defined by later + specifications. + + C: + Congestion Control bit. If set, then the sender is requiring that + congestion control information MUST be returned to it + periodically, as defined in Section 3. + + D: + Don't Fragment bit. If set, it indicates the sender of the notify + message does not support receiving packet fragments (i.e., inner + packets MUST be sent using a single Data Block). This value only + applies to what the sender is capable of receiving; the sender MAY + still send packet fragments unless similarly restricted by the + receiver in its USE_AGGFRAG notification. + +7. IANA Considerations + +7.1. ESP Next Header Value + + IANA has allocated an IP protocol number from the "Protocol Numbers - + Assigned Internet Protocol Numbers" registry as follows. + + Decimal: 144 + Keyword: AGGFRAG + Protocol: AGGFRAG encapsulation payload for ESP + Reference: RFC 9347 + +7.2. AGGFRAG_PAYLOAD Sub-Types + + IANA has created a registry called "AGGFRAG_PAYLOAD Sub-Types" under + a new category named "ESP AGGFRAG_PAYLOAD". The registration policy + for this registry is "Expert Review" [RFC8126] [RFC7120]. + + Name: AGGFRAG_PAYLOAD Sub-Types + Description: AGGFRAG_PAYLOAD Payload Formats + Reference: RFC 9347 + + This initial content for this registry is as follows: + + +==========+===============================+===========+ + | Sub-Type | Name | Reference | + +==========+===============================+===========+ + | 0 | Non-Congestion-Control Format | RFC 9347 | + +----------+-------------------------------+-----------+ + | 1 | Congestion Control Format | RFC 9347 | + +----------+-------------------------------+-----------+ + | 3-255 | Reserved | | + +----------+-------------------------------+-----------+ + + Table 1: AGGFRAG_PAYLOAD Sub-Types + +7.3. USE_AGGFRAG Notify Message Status Type + + IANA has allocated a status type USE_AGGFRAG from the "IKEv2 Notify + Message Types - Status Types" registry. + + Decimal: 16442 + Name: USE_AGGFRAG + Reference: RFC 9347 + +8. Security Considerations + + This document describes an aggregation and fragmentation mechanism to + efficiently implement TFC for IP traffic. This approach is expected + to reduce the efficacy of traffic analysis on IPsec communication. + Other than the additional security afforded by using this mechanism, + IP-TFS utilizes the security protocols [RFC4303] and [RFC7296], and + so their security considerations apply to IP-TFS as well. + + As noted in Section 3.1, the ECN bits are not protected by IPsec and + thus may constitute a covert channel. For this reason, ECN use + SHOULD NOT be enabled by default. + + As noted previously in Section 2.4.2, for TFC to be maintained, the + encapsulated traffic flow should not be affecting network congestion + in a predictable way, and if it would be, then non-congestion- + controlled mode use should be considered instead. + +9. References + +9.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", + RFC 4303, DOI 10.17487/RFC4303, December 2005, + <https://www.rfc-editor.org/info/rfc4303>. + + [RFC7296] Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T. + Kivinen, "Internet Key Exchange Protocol Version 2 + (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, October + 2014, <https://www.rfc-editor.org/info/rfc7296>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + +9.2. Informative References + + [AppCrypt] Schneier, B., "Applied Cryptography: Protocols, + Algorithms, and Source Code in C", 1996. + + [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, + DOI 10.17487/RFC0791, September 1981, + <https://www.rfc-editor.org/info/rfc791>. + + [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, + DOI 10.17487/RFC1191, November 1990, + <https://www.rfc-editor.org/info/rfc1191>. + + [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, + "Definition of the Differentiated Services Field (DS + Field) in the IPv4 and IPv6 Headers", RFC 2474, + DOI 10.17487/RFC2474, December 1998, + <https://www.rfc-editor.org/info/rfc2474>. + + [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, + RFC 2914, DOI 10.17487/RFC2914, September 2000, + <https://www.rfc-editor.org/info/rfc2914>. + + [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition + of Explicit Congestion Notification (ECN) to IP", + RFC 3168, DOI 10.17487/RFC3168, September 2001, + <https://www.rfc-editor.org/info/rfc3168>. + + [RFC4301] Kent, S. and K. Seo, "Security Architecture for the + Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, + December 2005, <https://www.rfc-editor.org/info/rfc4301>. + + [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for + Datagram Congestion Control Protocol (DCCP) Congestion + Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, + DOI 10.17487/RFC4342, March 2006, + <https://www.rfc-editor.org/info/rfc4342>. + + [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU + Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, + <https://www.rfc-editor.org/info/rfc4821>. + + [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP + Friendly Rate Control (TFRC): Protocol Specification", + RFC 5348, DOI 10.17487/RFC5348, September 2008, + <https://www.rfc-editor.org/info/rfc5348>. + + [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion + Notification", RFC 6040, DOI 10.17487/RFC6040, November + 2010, <https://www.rfc-editor.org/info/rfc6040>. + + [RFC7120] Cotton, M., "Early IANA Allocation of Standards Track Code + Points", BCP 100, RFC 7120, DOI 10.17487/RFC7120, January + 2014, <https://www.rfc-editor.org/info/rfc7120>. + + [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, + "Encapsulating MPLS in UDP", RFC 7510, + DOI 10.17487/RFC7510, April 2015, + <https://www.rfc-editor.org/info/rfc7510>. + + [RFC7893] Stein, Y(J)., Black, D., and B. Briscoe, "Pseudowire + Congestion Considerations", RFC 7893, + DOI 10.17487/RFC7893, June 2016, + <https://www.rfc-editor.org/info/rfc7893>. + + [RFC8084] Fairhurst, G., "Network Transport Circuit Breakers", + BCP 208, RFC 8084, DOI 10.17487/RFC8084, March 2017, + <https://www.rfc-editor.org/info/rfc8084>. + + [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for + Writing an IANA Considerations Section in RFCs", BCP 26, + RFC 8126, DOI 10.17487/RFC8126, June 2017, + <https://www.rfc-editor.org/info/rfc8126>. + + [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 + (IPv6) Specification", STD 86, RFC 8200, + DOI 10.17487/RFC8200, July 2017, + <https://www.rfc-editor.org/info/rfc8200>. + + [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., + "Path MTU Discovery for IP version 6", STD 87, RFC 8201, + DOI 10.17487/RFC8201, July 2017, + <https://www.rfc-editor.org/info/rfc8201>. + + [RFC8546] Trammell, B. and M. Kuehlewind, "The Wire Image of a + Network Protocol", RFC 8546, DOI 10.17487/RFC8546, April + 2019, <https://www.rfc-editor.org/info/rfc8546>. + + [RFC8899] Fairhurst, G., Jones, T., Tüxen, M., Rüngeler, I., and T. + Völker, "Packetization Layer Path MTU Discovery for + Datagram Transports", RFC 8899, DOI 10.17487/RFC8899, + September 2020, <https://www.rfc-editor.org/info/rfc8899>. + + [RFC9329] Pauly, T. and V. Smyslov, "TCP Encapsulation of Internet + Key Exchange Protocol (IKE) and IPsec Packets", RFC 9329, + DOI 10.17487/RFC9329, November 2022, + <https://www.rfc-editor.org/info/rfc9329>. + + [RFC9348] Fedyk, D. and C. Hopps, "A YANG Data Model for IP Traffic + Flow Security", RFC 9348, DOI 10.17487/RFC9348, January + 2023, <https://www.rfc-editor.org/info/rfc9348>. + + [RFC9349] Fedyk, D. and E. Kinzie, "Definitions of Managed Objects + for IP Traffic Flow Security", RFC 9349, + DOI 10.17487/RFC9349, January 2023, + <https://www.rfc-editor.org/info/rfc9349>. + +Appendix A. Example of an Encapsulated IP Packet Flow + + Below, an example inner IP packet flow within the encapsulating + tunnel packet stream is shown. Notice how encapsulated IP packets + can start and end anywhere, and more than one or less than one may + occur in a single encapsulating packet. + + Offset: 0 Offset: 100 Offset: 2000 Offset: 600 + [ ESP1 (1404) ][ ESP2 (1404) ][ ESP3 (1404) ][ ESP4 (1404) ] + [--750--][--750--][60][-240-][--3000----------------------][pad] + + Figure 11: Inner and Outer Packet Flow + + Each outer encapsulating ESP space is a fixed size of 1404 octets, + the first 4 octets of which contain the AGGFRAG header. The + encapsulated IP packet flow (lengths include the IP header and + payload) is as follows: a 750-octet packet, a 750-octet packet, a + 60-octet packet, a 240-octet packet, and a 3000-octet packet. + + The BlockOffset values in the 4 AGGFRAG payload headers for this + packet flow would thus be: 0, 100, 2000, and 600, respectively. The + first encapsulating packet (ESP1) has a zero BlockOffset, which + points at the IP data block immediately following the AGGFRAG header. + The following packet's (ESP2) BlockOffset points inward 100 octets to + the start of the 60-octet data block. The third encapsulating packet + (ESP3) contains the middle portion of the 3000-octet data block, so + the offset points past its end and into the fourth encapsulating + packet. The fourth packet's (ESP4) offset is 600, pointing at the + padding that follows the completion of the continued 3000-octet + packet. + +Appendix B. A Send and Loss Event Rate Calculation + + The current best practice indicates that congestion control SHOULD be + done in a TCP-friendly way. A TCP-friendly congestion control + algorithm is described in [RFC5348]. For this IP-TFS use case (as + with [RFC4342]), the (fixed) packet size is used as the segment size + for the algorithm. The main formula in the algorithm for the send + rate is then as follows: + + 1 + X = ----------------------------------------------- + R * (sqrt(2*p/3) + 12*sqrt(3*p/8)*p*(1+32*p^2)) + + X is the send rate in packets per second, R is the RTT estimate, and + p is the loss event rate (the inverse of which is provided by the + receiver). + + In addition, the algorithm in [RFC5348] also uses an X_recv value + (the receiver's receive rate). For IP-TFS, one MAY set this value + according to the sender's current tunnel send rate (X). + + The IP-TFS receiver, having the RTT estimate from the sender, can use + the same method as described in [RFC5348] and [RFC4342] to collect + the loss intervals and calculate the loss event rate value using the + weighted average as indicated. The receiver communicates the inverse + of this value back to the sender in the AGGFRAG_PAYLOAD payload + header field LossEventRate. + + The IP-TFS sender now has both the R and p values and can calculate + the correct sending rate. If following [RFC5348], the sender should + also use the slow start mechanism described therein when the IP-TFS + SA is first established. + +Appendix C. Comparisons of IP-TFS + +C.1. Comparing Overhead + + For comparing overhead, the overhead of ESP for both normal and + AGGFRAG tunnel packets must be calculated, and so an algorithm for + encryption and authentication must be chosen. For the data below, + AES-GCM-256 was selected. This leads to an IP+ESP overhead of 54. + + 54 = 20 (IP) + 8 (ESPH) + 2 (ESPF) + 8 (IV) + 16 (ICV) + + Additionally, for IP-TFS, non-congestion-control AGGFRAG_PAYLOAD + headers were chosen, which adds 4 octets, for a total overhead of 58. + +C.1.1. IP-TFS Overhead + + For comparison, the overhead of an AGGFRAG payload is 58 octets per + outer packet. Therefore, the octet overhead per inner packet is 58 + divided by the number of outer packets required (fractions allowed). + The overhead as a percentage of inner packet size is a constant based + on the Outer MTU size. + + OH = 58 / Outer Payload Size / Inner Packet Size + OH % of Inner Packet Size = 100 * OH / Inner Packet Size + OH % of Inner Packet Size = 5800 / Outer Payload Size + + +=======+========+========+========+ + | Type | IP-TFS | IP-TFS | IP-TFS | + +=======+========+========+========+ + | MTU | 576 | 1500 | 9000 | + +=======+========+========+========+ + | PSize | 518 | 1442 | 8942 | + +=======+========+========+========+ + | 40 | 11.20% | 4.02% | 0.65% | + +-------+--------+--------+--------+ + | 576 | 11.20% | 4.02% | 0.65% | + +-------+--------+--------+--------+ + | 1500 | 11.20% | 4.02% | 0.65% | + +-------+--------+--------+--------+ + | 9000 | 11.20% | 4.02% | 0.65% | + +-------+--------+--------+--------+ + + Table 2: IP-TFS Overhead as + Percentage of Inner Packet Size + +C.1.2. ESP with Padding Overhead + + The overhead per inner packet for constant-send-rate-padded ESP + (i.e., original IPsec TFC) is 36 octets plus any padding, unless + fragmentation is required. + + When fragmentation of the inner packet is required to fit in the + outer IPsec packet, overhead is the number of outer packets required + to carry the fragmented inner packet times both the inner IP Overhead + (20) and the outer packet overhead (54) minus the initial inner IP + Overhead plus any required tail padding in the last encapsulation + packet. The required tail padding is the number of required packets + times the difference of the Outer Payload Size and the IP Overhead + minus the Inner Payload Size. So: + + Inner Payload Size = IP Packet Size - IP Overhead + Outer Payload Size = MTU - IPsec Overhead + + Inner Payload Size + NF0 = ---------------------------------- + Outer Payload Size - IP Overhead + + NF = CEILING(NF0) + + OH = NF * (IP Overhead + IPsec Overhead) + - IP Overhead + + NF * (Outer Payload Size - IP Overhead) + - Inner Payload Size + + OH = NF * (IPsec Overhead + Outer Payload Size) + - (IP Overhead + Inner Payload Size) + + OH = NF * (IPsec Overhead + Outer Payload Size) + - Inner Packet Size + +C.2. Overhead Comparison + + The following tables collect the overhead values for some common L3 + MTU sizes in order to compare them. The first table is the number of + octets of overhead for a given L3 MTU-sized packet. The second table + is the percentage of overhead in the same MTU-sized packet. + + +========+=========+=========+=========+========+========+========+ + | Type | ESP+Pad | ESP+Pad | ESP+Pad | IP-TFS | IP-TFS | IP-TFS | + +========+=========+=========+=========+========+========+========+ + | L3 MTU | 576 | 1500 | 9000 | 576 | 1500 | 9000 | + +========+=========+=========+=========+========+========+========+ + | PSize | 522 | 1446 | 8946 | 518 | 1442 | 8942 | + +========+=========+=========+=========+========+========+========+ + | 40 | 482 | 1406 | 8906 | 4.5 | 1.6 | 0.3 | + +--------+---------+---------+---------+--------+--------+--------+ + | 128 | 394 | 1318 | 8818 | 14.3 | 5.1 | 0.8 | + +--------+---------+---------+---------+--------+--------+--------+ + | 256 | 266 | 1190 | 8690 | 28.7 | 10.3 | 1.7 | + +--------+---------+---------+---------+--------+--------+--------+ + | 518 | 4 | 928 | 8428 | 58.0 | 20.8 | 3.4 | + +--------+---------+---------+---------+--------+--------+--------+ + | 576 | 576 | 870 | 8370 | 64.5 | 23.2 | 3.7 | + +--------+---------+---------+---------+--------+--------+--------+ + | 1442 | 286 | 4 | 7504 | 161.5 | 58.0 | 9.4 | + +--------+---------+---------+---------+--------+--------+--------+ + | 1500 | 228 | 1500 | 7446 | 168.0 | 60.3 | 9.7 | + +--------+---------+---------+---------+--------+--------+--------+ + | 8942 | 1426 | 1558 | 4 | 1001.2 | 359.7 | 58.0 | + +--------+---------+---------+---------+--------+--------+--------+ + | 9000 | 1368 | 1500 | 9000 | 1007.7 | 362.0 | 58.4 | + +--------+---------+---------+---------+--------+--------+--------+ + + Table 3: Overhead Comparison in Octets + + +=======+=========+=========+==========+========+========+========+ + | Type | ESP+Pad | ESP+Pad | ESP+Pad | IP-TFS | IP-TFS | IP-TFS | + +=======+=========+=========+==========+========+========+========+ + | MTU | 576 | 1500 | 9000 | 576 | 1500 | 9000 | + +=======+=========+=========+==========+========+========+========+ + | PSize | 522 | 1446 | 8946 | 518 | 1442 | 8942 | + +=======+=========+=========+==========+========+========+========+ + | 40 | 1205.0% | 3515.0% | 22265.0% | 11.20% | 4.02% | 0.65% | + +-------+---------+---------+----------+--------+--------+--------+ + | 128 | 307.8% | 1029.7% | 6889.1% | 11.20% | 4.02% | 0.65% | + +-------+---------+---------+----------+--------+--------+--------+ + | 256 | 103.9% | 464.8% | 3394.5% | 11.20% | 4.02% | 0.65% | + +-------+---------+---------+----------+--------+--------+--------+ + | 518 | 0.8% | 179.2% | 1627.0% | 11.20% | 4.02% | 0.65% | + +-------+---------+---------+----------+--------+--------+--------+ + | 576 | 100.0% | 151.0% | 1453.1% | 11.20% | 4.02% | 0.65% | + +-------+---------+---------+----------+--------+--------+--------+ + | 1442 | 19.8% | 0.3% | 520.4% | 11.20% | 4.02% | 0.65% | + +-------+---------+---------+----------+--------+--------+--------+ + | 1500 | 15.2% | 100.0% | 496.4% | 11.20% | 4.02% | 0.65% | + +-------+---------+---------+----------+--------+--------+--------+ + | 8942 | 15.9% | 17.4% | 0.0% | 11.20% | 4.02% | 0.65% | + +-------+---------+---------+----------+--------+--------+--------+ + | 9000 | 15.2% | 16.7% | 100.0% | 11.20% | 4.02% | 0.65% | + +-------+---------+---------+----------+--------+--------+--------+ + + Table 4: Overhead as Percentage of Inner Packet Size + +C.3. Comparing Available Bandwidth + + Another way to compare the two solutions is to look at the amount of + available bandwidth each solution provides. The following sections + consider and compare the percentage of available bandwidth. For the + sake of providing a well-understood baseline, normal (unencrypted) + Ethernet and normal ESP values are included. + +C.3.1. Ethernet + + In order to calculate the available bandwidth, the per-packet + overhead is calculated first. The total overhead of Ethernet is 14+4 + octets of header and Cyclic Redundancy Check (CRC) plus an additional + 20 octets of framing (preamble, start, and inter-packet gap), for a + total of 38 octets. Additionally, the minimum payload is 46 octets. + + +====+=======+=======+=======+=======+=======+=======+======+======+ + |Size| E + P | E + P | E + P | IPTFS | IPTFS | IPTFS | Enet | ESP | + +====+=======+=======+=======+=======+=======+=======+======+======+ + |MTU | 590 | 1514 | 9014 | 590 | 1514 | 9014 | any | any | + +====+=======+=======+=======+=======+=======+=======+======+======+ + |OH | 92 | 92 | 92 | 96 | 96 | 96 | 38 | 74 | + +====+=======+=======+=======+=======+=======+=======+======+======+ + |40 | 614 | 1538 | 9038 | 47 | 42 | 40 | 84 | 114 | + +----+-------+-------+-------+-------+-------+-------+------+------+ + |128 | 614 | 1538 | 9038 | 151 | 136 | 129 | 166 | 202 | + +----+-------+-------+-------+-------+-------+-------+------+------+ + |256 | 614 | 1538 | 9038 | 303 | 273 | 258 | 294 | 330 | + +----+-------+-------+-------+-------+-------+-------+------+------+ + |518 | 614 | 1538 | 9038 | 614 | 552 | 523 | 574 | 610 | + +----+-------+-------+-------+-------+-------+-------+------+------+ + |576 | 1228 | 1538 | 9038 | 682 | 614 | 582 | 614 | 650 | + +----+-------+-------+-------+-------+-------+-------+------+------+ + |1442| 1842 | 1538 | 9038 | 1709 | 1538 | 1457 | 1498 | 1534 | + +----+-------+-------+-------+-------+-------+-------+------+------+ + |1500| 1842 | 3076 | 9038 | 1777 | 1599 | 1516 | 1538 | 1574 | + +----+-------+-------+-------+-------+-------+-------+------+------+ + |8942| 11052 | 10766 | 9038 | 10599 | 9537 | 9038 | 8998 | 9034 | + +----+-------+-------+-------+-------+-------+-------+------+------+ + |9000| 11052 | 10766 | 18076 | 10667 | 9599 | 9096 | 9038 | 9074 | + +----+-------+-------+-------+-------+-------+-------+------+------+ + + Table 5: L2 Octets Per Packet + + +====+=======+=======+======+=======+=======+=======+=======+=======+ + |Size| E + P | E + | E + | IPTFS | IPTFS | IPTFS | Enet | ESP | + | | | P | P | | | | | | + +====+=======+=======+======+=======+=======+=======+=======+=======+ + |MTU | 590 | 1514 | 9014 | 590 | 1514 | 9014 | any | any | + +====+=======+=======+======+=======+=======+=======+=======+=======+ + |OH | 92 | 92 | 92 | 96 | 96 | 96 | 38 | 74 | + +====+=======+=======+======+=======+=======+=======+=======+=======+ + |40 | 2.0M | 0.8M | 0.1M | 26.4M | 29.3M | 30.9M | 14.9M | 11.0M | + +----+-------+-------+------+-------+-------+-------+-------+-------+ + |128 | 2.0M | 0.8M | 0.1M | 8.2M | 9.2M | 9.7M | 7.5M | 6.2M | + +----+-------+-------+------+-------+-------+-------+-------+-------+ + |256 | 2.0M | 0.8M | 0.1M | 4.1M | 4.6M | 4.8M | 4.3M | 3.8M | + +----+-------+-------+------+-------+-------+-------+-------+-------+ + |518 | 2.0M | 0.8M | 0.1M | 2.0M | 2.3M | 2.4M | 2.2M | 2.1M | + +----+-------+-------+------+-------+-------+-------+-------+-------+ + |576 | 1.0M | 0.8M | 0.1M | 1.8M | 2.0M | 2.1M | 2.0M | 1.9M | + +----+-------+-------+------+-------+-------+-------+-------+-------+ + |1442| 678K | 812K | 138K | 731K | 812K | 857K | 844K | 824K | + +----+-------+-------+------+-------+-------+-------+-------+-------+ + |1500| 678K | 406K | 138K | 703K | 781K | 824K | 812K | 794K | + +----+-------+-------+------+-------+-------+-------+-------+-------+ + |8942| 113K | 116K | 138K | 117K | 131K | 138K | 139K | 138K | + +----+-------+-------+------+-------+-------+-------+-------+-------+ + |9000| 113K | 116K | 69K | 117K | 130K | 137K | 138K | 137K | + +----+-------+-------+------+-------+-------+-------+-------+-------+ + + Table 6: Packets Per Second on 10G Ethernet + + +====+======+======+======+======+======+========+========+========+ + |Size|E + P |E + P |E + P |IP-TFS|IP-TFS| IP-TFS | Enet | ESP | + +====+======+======+======+======+======+========+========+========+ + |MTU |590 |1514 |9014 |590 |1514 | 9014 | any | any | + +====+======+======+======+======+======+========+========+========+ + |OH |92 |92 |92 |96 |96 | 96 | 38 | 74 | + +====+======+======+======+======+======+========+========+========+ + |40 |6.51% |2.60% |0.44% |84.36%|93.76%| 98.94% | 47.62% | 35.09% | + +----+------+------+------+------+------+--------+--------+--------+ + |128 |20.85%|8.32% |1.42% |84.36%|93.76%| 98.94% | 77.11% | 63.37% | + +----+------+------+------+------+------+--------+--------+--------+ + |256 |41.69%|16.64%|2.83% |84.36%|93.76%| 98.94% | 87.07% | 77.58% | + +----+------+------+------+------+------+--------+--------+--------+ + |518 |84.36%|33.68%|5.73% |84.36%|93.76%| 98.94% | 93.17% | 87.50% | + +----+------+------+------+------+------+--------+--------+--------+ + |576 |46.91%|37.45%|6.37% |84.36%|93.76%| 98.94% | 93.81% | 88.62% | + +----+------+------+------+------+------+--------+--------+--------+ + |1442|78.28%|93.76%|15.95%|84.36%|93.76%| 98.94% | 97.43% | 95.12% | + +----+------+------+------+------+------+--------+--------+--------+ + |1500|81.43%|48.76%|16.60%|84.36%|93.76%| 98.94% | 97.53% | 95.30% | + +----+------+------+------+------+------+--------+--------+--------+ + |8942|80.91%|83.06%|98.94%|84.36%|93.76%| 98.94% | 99.58% | 99.18% | + +----+------+------+------+------+------+--------+--------+--------+ + |9000|81.43%|83.60%|49.79%|84.36%|93.76%| 98.94% | 99.58% | 99.18% | + +----+------+------+------+------+------+--------+--------+--------+ + + Table 7: Percentage of Bandwidth on 10G Ethernet + + A sometimes unexpected result of using an AGGFRAG tunnel (or any + packet aggregating tunnel) is that, for small- to medium-sized + packets, the available bandwidth is actually greater than plain + Ethernet. This is due to the reduction in Ethernet framing overhead. + This increased bandwidth is paid for with an increase in latency. + This latency is the time to send the unrelated octets in the outer + tunnel frame. The following table illustrates the latency for some + common values on a 10G Ethernet link. The table also includes + latency introduced by padding if using ESP with padding. + + +======+=========+=========+=========+=========+ + | Size | ESP+Pad | ESP+Pad | IP-TFS | IP-TFS | + +======+=========+=========+=========+=========+ + | MTU | 1500 | 9000 | 1500 | 9000 | + +======+=========+=========+=========+=========+ + | 40 | 1.12 us | 7.12 us | 1.17 us | 7.17 us | + +------+---------+---------+---------+---------+ + | 128 | 1.05 us | 7.05 us | 1.10 us | 7.10 us | + +------+---------+---------+---------+---------+ + | 256 | 0.95 us | 6.95 us | 1.00 us | 7.00 us | + +------+---------+---------+---------+---------+ + | 518 | 0.74 us | 6.74 us | 0.79 us | 6.79 us | + +------+---------+---------+---------+---------+ + | 576 | 0.70 us | 6.70 us | 0.74 us | 6.74 us | + +------+---------+---------+---------+---------+ + | 1442 | 0.00 us | 6.00 us | 0.05 us | 6.05 us | + +------+---------+---------+---------+---------+ + | 1500 | 1.20 us | 5.96 us | 0.00 us | 6.00 us | + +------+---------+---------+---------+---------+ + + Table 8: Added Latency + + Notice that the latency values are very similar between the two + solutions; however, whereas IP-TFS provides for constant high + bandwidth, in some cases even exceeding plain Ethernet, ESP with + padding often greatly reduces available bandwidth. + +Acknowledgements + + We would like to thank Don Fedyk for help in reviewing and editing + this work. We would also like to thank Michael Richardson, Sean + Turner, Valery Smyslov, and Tero Kivinen for reviews and many + suggestions for improvements, as well as Joseph Touch for the + transport area review and suggested improvements. + +Contributors + + The following person made significant contributions to this document. + + Lou Berger + LabN Consulting, L.L.C. + Email: lberger@labn.net + + +Author's Address + + Christian Hopps + LabN Consulting, L.L.C. + Email: chopps@chopps.org |