diff options
Diffstat (limited to 'doc/rfc/rfc8931.txt')
-rw-r--r-- | doc/rfc/rfc8931.txt | 1521 |
1 files changed, 1521 insertions, 0 deletions
diff --git a/doc/rfc/rfc8931.txt b/doc/rfc/rfc8931.txt new file mode 100644 index 0000000..ee837b5 --- /dev/null +++ b/doc/rfc/rfc8931.txt @@ -0,0 +1,1521 @@ + + + + +Internet Engineering Task Force (IETF) P. Thubert, Ed. +Request for Comments: 8931 Cisco Systems +Updates: 4944 November 2020 +Category: Standards Track +ISSN: 2070-1721 + + + IPv6 over Low-Power Wireless Personal Area Network (6LoWPAN) Selective + Fragment Recovery + +Abstract + + This document updates RFC 4944 with a protocol that forwards + individual fragments across a route-over mesh and recovers them end + to end, with congestion control capabilities to protect the network. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc8931. + +Copyright Notice + + Copyright (c) 2020 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction + 2. Terminology + 2.1. Requirements Language + 2.2. Background + 2.3. Other Terms + 3. Updating RFC 4944 + 4. Extending RFC 8930 + 4.1. Slack in the First Fragment + 4.2. Gap between Frames + 4.3. Congestion Control + 4.4. Modifying the First Fragment + 5. New Dispatch Types and Headers + 5.1. Recoverable Fragment Dispatch Type and Header + 5.2. RFRAG Acknowledgment Dispatch Type and Header + 6. Fragment Recovery + 6.1. Forwarding Fragments + 6.1.1. Receiving the First Fragment + 6.1.2. Receiving the Next Fragments + 6.2. Receiving RFRAG Acknowledgments + 6.3. Aborting the Transmission of a Fragmented Packet + 6.4. Applying Recoverable Fragmentation along a Diverse Path + 7. Management Considerations + 7.1. Protocol Parameters + 7.2. Observing the Network + 8. Security Considerations + 9. IANA Considerations + 10. References + 10.1. Normative References + 10.2. Informative References + Appendix A. Rationale + Appendix B. Requirements + Appendix C. Considerations on Congestion Control + Acknowledgments + Author's Address + +1. Introduction + + In most Low-Power and Lossy Network (LLN) applications, the bulk of + the traffic consists of small chunks of data (on the order of a few + bytes to a few tens of bytes) at a time. Given that an IEEE Std + 802.15.4 [IEEE.802.15.4] frame can carry a payload of 74 bytes or + more, fragmentation is usually not required. However, and though + this happens only occasionally, a number of mission-critical + applications do require the capability to transfer larger chunks of + data, for instance, to support the firmware upgrade of the LLN nodes + or the extraction of logs from LLN nodes. + + In the former case, the large chunk of data is transferred to the LLN + node, whereas in the latter case, the large chunk flows away from the + LLN node. In both cases, the size can be on the order of 10 KB or + more, and an end-to-end reliable transport is required. + + "Transmission of IPv6 Packets over IEEE 802.15.4 Networks" [RFC4944] + defines the original IPv6 over Low-Power Wireless Personal Area + Network (6LoWPAN) datagram fragmentation mechanism for LLNs. One + critical issue with this original design is that routing an IPv6 + [RFC8200] packet across a route-over mesh requires the reassembly of + the packet at each hop. "An Architecture for IPv6 over the TSCH mode + of IEEE 802.15.4" [6TiSCH] indicates that this may cause latency + along a path and impact critical resources such as memory and + battery; to alleviate those undesirable effects, it recommends using + a 6LoWPAN Fragment Forwarding (6LFF) technique. + + "On Forwarding 6LoWPAN Fragments over a Multihop IPv6 Network" + [RFC8930] specifies the generic behavior that all 6LFF techniques + including this specification follow, and it presents the associated + caveats. In particular, the routing information is fully indicated + in the first fragment, which is always forwarded first. With this + specification, the first fragment is identified by a Sequence of 0 as + opposed to a dispatch type in [RFC4944]. A state is formed and used + to forward all the next fragments along the same path. The + Datagram_Tag is locally significant to the Layer 2 source of the + packet and is swapped at each hop; see Section 6. This specification + encodes the Datagram_Tag in 1 byte, which will saturate if more than + 256 datagrams transit in fragmented form over a single hop at the + same time. This is not realistic at the time of this writing. + Should this happen in a new 6LoWPAN technology, a node will need to + use several link-layer addresses to increase its indexing capacity. + + "Virtual reassembly buffers in 6LoWPAN" [LWIG-FRAG] proposes a 6LFF + technique that is compatible with [RFC4944] without the need to + define a new protocol. However, adding that capability alone to the + local implementation of the original 6LoWPAN fragmentation would not + address the inherent fragility of fragmentation (see [RFC8900]), in + particular, the issues of resources locked on the reassembling + endpoint and the wasted transmissions due to the loss of a single + fragment in a whole datagram. [Kent] compares the unreliable + delivery of fragments with a mechanism it calls "selective + acknowledgments" that recovers the loss of a fragment individually. + The paper illustrates the benefits that can be derived from such a + method; see Figures 1, 2, and 3 in Section 2.3 of [Kent]. [RFC4944] + has no selective recovery, and the whole datagram fails when one + fragment is not delivered to the reassembling endpoint. Constrained + memory resources are blocked on the reassembling endpoint until it + times out, possibly causing the loss of subsequent packets that + cannot be received for the lack of buffers. + + That problem is exacerbated when forwarding fragments over multiple + hops since a loss at an intermediate hop will not be discovered by + either the fragmenting or the reassembling endpoints. Should this + happen, the source will keep on sending fragments, wasting even more + resources in the network since the datagram cannot arrive in its + entirety, which possibly contributes to the condition that caused the + loss. [RFC4944] is lacking a congestion control to avoid + participating in a saturation that may have caused the loss of the + fragment. It has no signaling to abort a multi-fragment transmission + at any time and from either end, and if the capability to forward + fragments is implemented, clean up the related state in the network. + + This specification provides a method to forward fragments over, + typically, a few hops in a route-over 6LoWPAN mesh and a selective + acknowledgment to recover individual fragments between 6LoWPAN + endpoints. The method can help limit the congestion loss in the + network and addresses the requirements in Appendix B. Flow control + is out of scope since the endpoints are expected to be able to store + the full datagram. Deployments are expected to be managed and + homogeneous, and an incremental transition requires a flag day. + +2. Terminology + +2.1. Requirements Language + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + +2.2. Background + + This document uses 6LoWPAN terms and concepts that are presented in + "IPv6 over Low-Power Wireless Personal Area Networks (6LoWPANs): + Overview, Assumptions, Problem Statement, and Goals" [RFC4919]; + "Transmission of IPv6 Packets over IEEE 802.15.4 Networks" [RFC4944]; + and "Problem Statement and Requirements for IPv6 over Low-Power + Wireless Personal Area Network (6LoWPAN) Routing" [RFC6606]. + + [RFC8930] discusses the generic concept of a Virtual Reassembly + Buffer (VRB) and specifies behaviors and caveats that are common to a + large family of 6LFF techniques including the mechanism specified by + this document, which is fully inherited from that specification. It + also defines terms used in this document: Compressed Form, + Datagram_Tag, Datagram_Size, Fragment_Offset, and 6LoWPAN Fragment + Forwarding endpoint (commonly abbreviated as only "endpoint"). + + Past experience with fragmentation has shown that misassociated or + lost fragments can lead to poor network behavior and, occasionally, + trouble at the application layer. The reader is encouraged to read + "IPv4 Reassembly Errors at High Data Rates" [RFC4963] and follow the + references for more information. That experience led to the + definition of the "Path MTU Discovery for IP version 6" [RFC8201] + protocol that limits fragmentation over the Internet. Specifically, + in the case of UDP, valuable additional information can be found in + "UDP Usage Guidelines" [RFC8085]. + + "The Benefits of Using Explicit Congestion Notification (ECN)" + [RFC8087] provides useful information on the potential benefits and + pitfalls of using ECN. + + Quoting "Multiprotocol Label Switching Architecture" [RFC3031]: + + | With MPLS, "packets are "labeled" before they are forwarded [along + | a Label Switched Path (LSP)]. At subsequent hops, there is no + | further analysis of the packet's network layer header. Rather, + | the label is used as an index into a table which specifies the + | next hop, and a new label". + + [RFC8930] leverages MPLS to forward fragments that actually do not + have a network-layer header, since the fragmentation occurs below IP, + and this specification makes it reversible so the reverse path can be + followed as well. + +2.3. Other Terms + + This specification uses the following terms: + + RFRAG: Recoverable Fragment + + RFRAG-ACK: Recoverable Fragment Acknowledgment + + RFRAG Acknowledgment Request: An RFRAG with the Acknowledgment + Request flag ("X" flag) set. + + NULL bitmap: Refers to a bitmap with all bits set to zero. + + FULL bitmap: Refers to a bitmap with all bits set to one. + + Reassembling endpoint: The receiving endpoint. + + Fragmenting endpoint: The sending endpoint. + + Forward direction: The direction of a path, which is followed by the + RFRAG. + + Reverse direction: The reverse direction of a path, which is taken + by the RFRAG-ACK. + + +3. Updating RFC 4944 + + This specification updates the fragmentation mechanism that is + specified in [RFC4944] for use in route-over LLNs by providing a + model where fragments can be forwarded end to end across a 6LoWPAN + LLN and where fragments that are lost on the way can be recovered + individually. A new format for fragments is introduced, and new + dispatch types are defined in Section 5. + + [RFC8138] allows modifying the size of a packet en route by removing + the consumed hops in a compressed Routing Header. This requires that + Fragment_Offset and Datagram_Size (defined in Section 5.1) also be + modified en route, which is difficult to do in the uncompressed form. + This specification expresses those fields in the compressed form and + allows modifying them en route easily (more in Section 4.4). + + To be consistent with Section 2 of [RFC6282], for the fragmentation + mechanism described in Section 5.3 of [RFC4944], any header that + cannot fit within the first fragment MUST NOT be compressed when + using the fragmentation mechanism described in this specification. + +4. Extending RFC 8930 + + This specification implements the generic 6LFF technique defined in + [RFC8930] and provides end-to-end fragment recovery and congestion + control mechanisms. + +4.1. Slack in the First Fragment + + [RFC8930] allows for a refragmentation operation in intermediate + nodes, whereby the trailing bytes from a given fragment may be left + in the VRB to be added as the heading bytes in the next fragment. + This solves the case when the outgoing fragment needs more space than + the incoming fragment; that case may arise when the 6LoWPAN header + compression is not as efficient on the outgoing link or if the Link + MTU is reduced. + + This specification cannot allow that refragmentation operation since + the fragments are recovered end to end based on a sequence number. + The Fragment_Size MUST be tailored to fit the minimal MTU along the + path, and the first fragment that contains a 6LoWPAN compressed + header MUST have enough slack to enable a less-efficient compression + in the next hops to still fit within the Link MTU. + + For instance, if the fragmenting endpoint is also the 6LoWPAN + compression endpoint, it will elide the Interface ID (IID) of the + source IPv6 address when it matches the link-layer address [RFC6282]. + In that case, it MUST leave slack in the first fragment as the if MTU + on the first hop was 8 bytes less, so the next hop can expand the IID + within the same fragment within MTU. + +4.2. Gap between Frames + + [RFC8930] requires that a configurable interval of time be inserted + between transmissions to the same next hop and, in particular, + between fragments of a same datagram. In the case of half duplex + interfaces, this inter-frame gap ensures that the next hop is done + forwarding the previous frame and is capable of receiving the next + one. + + In the case of a mesh operating at a single frequency with + omnidirectional antennas, a larger inter-frame gap is required to + protect the frame against hidden terminal collisions with the + previous frame of the same flow that is still progressing along a + common path. + + The inter-frame gap is useful even for unfragmented datagrams, but it + becomes a necessity for fragments that are typically generated in a + fast sequence and are all sent over the exact same path. + +4.3. Congestion Control + + The inter-frame gap is the only protection that [RFC8930] imposes by + default. This document enables grouping fragments in windows and + requesting intermediate acknowledgments, so the number of in-flight + fragments can be bounded. This document also adds an ECN mechanism + that can be used to protect the network by adapting the size of the + window, the size of the fragments, and/or the inter-frame gap. + + This specification enables the fragmenting endpoint to apply a + congestion control mechanism to tune those parameters, but the + mechanism itself is out of scope. In most cases, the expectation is + that most datagrams will require only a few fragments, and that only + the last fragment will be acknowledged. A basic implementation of + the fragmenting endpoint is NOT REQUIRED to vary the size of the + window, the duration of the inter-frame gap, or the size of a + fragment in the middle of the transmission of a datagram, and it MAY + ignore the ECN signal or simply reset the window to 1 (see + Appendix C) until the end of this datagram upon detecting a + congestion. + + An intermediate node that experiences a congestion MAY set the ECN + bit in a fragment, and the reassembling endpoint echoes the ECN bit + at most once at the next opportunity to acknowledge back. + + The size of the fragments is typically computed from the Link MTU to + maximize the size of the resulting frames. The size of the window + and the duration of the inter-frame gap SHOULD be configurable, to + reduce the chances of congestion and to follow the general + recommendations in [RFC8930], respectively. + +4.4. Modifying the First Fragment + + The compression of the hop limit, of the source and destination + addresses in the IPv6 header, and of the Routing Header, which are + all in the first fragment, may change en route in a route-over mesh + LLN. If the size of the first fragment is modified, then the + intermediate node MUST adapt the Datagram_Size, encoded in the + Fragment_Size field, to reflect that difference. + + The intermediate node MUST also save the difference of Datagram_Size + of the first fragment in the VRB and add it to the Fragment_Offset of + all the subsequent fragments that it forwards for that datagram. In + the case of a Source Routing Header 6LoWPAN Routing Header (SRH- + 6LoRH) [RFC8138] being consumed and thus reduced, that difference is + negative, meaning that the Fragment_Offset is decremented by the + number of bytes that were consumed. + +5. New Dispatch Types and Headers + + This document specifies an alternative to the 6LoWPAN fragmentation + sub-layer [RFC4944] to emulate a Link MTU up to 2048 bytes for the + upper layer, which can be the 6LoWPAN header compression sub-layer + that is defined in "Compression Format for IPv6 Datagrams over IEEE + 802.15.4-Based Networks" [RFC6282]. This specification also provides + a reliable transmission of the fragments over a multi-hop 6LoWPAN + route-over mesh network and a minimal congestion control to reduce + the chances of congestion loss. + + A 6LoWPAN Fragment Forwarding [RFC8930] technique derived from MPLS + enables the forwarding of individual fragments across a 6LoWPAN + route-over mesh without reassembly at each hop. The Datagram_Tag is + used as a label; it is locally unique to the node that owns the + source link-layer address of the fragment, so together the link-layer + address and the label can identify the fragment globally within the + lifetime of the datagram. A node may build the Datagram_Tag in its + own locally significant way, as long as the chosen Datagram_Tag stays + unique to the particular datagram for its lifetime. The result is + that the label does not need to be globally unique, but it must be + swapped at each hop as the source link-layer address changes. + + In the following sections, a Datagram_Tag extends the semantics + defined in "Fragmentation Type and Header" (see Section 5.3 of + [RFC4944]). The Datagram_Tag is a locally unique identifier for the + datagram from the perspective of the sender. This means that the + Datagram_Tag identifies a datagram uniquely in the network when + associated with the source of the datagram. As the datagram gets + forwarded, the source changes, and the Datagram_Tag must be swapped + as detailed in [RFC8930]. + + This specification extends [RFC4944] with two new dispatch types for + RFRAG and the RFRAG-ACK that is received back. The new 6LoWPAN + dispatch types are taken from [RFC8025], as indicated in Table 1 of + Section 9. + +5.1. Recoverable Fragment Dispatch Type and Header + + In this specification, if the packet is compressed, the size and + offset of the fragments are expressed with respect to the compressed + form of the packet, as opposed to the uncompressed (native) form. + + The format of the fragment header is shown in Figure 1. It is the + same for all fragments even though the Fragment_Offset is overloaded. + The format has a length and an offset, as well as a Sequence field. + This would be redundant if the offset was computed as the product of + the Sequence by the length, but this is not the case. The position + of a fragment in the reassembly buffer is correlated with neither the + value of the Sequence field nor the order in which the fragments are + received. This enables splitting fragments to cope with an MTU + deduction; see the example of fragment Sequence 5 that is retried end + to end as smaller fragment Sequences 13 and 14 in Section 6.2. + + The first fragment is recognized by a Sequence of 0; it carries its + Fragment_Size and the Datagram_Size of the compressed packet before + it is fragmented, whereas the other fragments carry their + Fragment_Size and Fragment_Offset. The last fragment for a datagram + is recognized when its Fragment_Offset and its Fragment_Size add up + to the stored Datagram_Size of the packet identified by the sender + link-layer address and the Datagram_Tag. + + 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |1 1 1 0 1 0 0|E| Datagram_Tag | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |X| Sequence| Fragment_Size | Fragment_Offset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + X set == Ack-Request + + Figure 1: RFRAG Dispatch Type and Header + + X: 1 bit; Ack-Request. When set, the fragmenting endpoint requires + an RFRAG Acknowledgment from the reassembling endpoint. + + E: 1 bit; Explicit Congestion Notification. The "E" flag is cleared + by the source of the fragment and set by intermediate routers to + signal that this fragment experienced congestion along its path. + + Fragment_Size: 10-bit unsigned integer. The size of this fragment + in a unit that depends on link-layer technology. Unless + overridden by a more specific specification, that unit is the + byte, which allows fragments up to 1023 bytes. + + Datagram_Tag: 8 bits. An identifier of the datagram that is locally + unique to the link-layer sender. + + Sequence: 5-bit unsigned integer. The sequence number of the + fragment in the acknowledgment bitmap. Fragments are numbered as + [0..N], where N is in [0..31]. A Sequence of 0 indicates the + first fragment in a datagram, but non-zero values are not + indicative of the position in the reassembly buffer. + + Fragment_Offset: 16-bit unsigned integer. + + When the Fragment_Offset is set to a non-zero value, its semantics + depend on the value of the Sequence field as follows: + + * For a first fragment (i.e., with a Sequence of 0), this field + indicates the Datagram_Size of the compressed datagram, to help + the reassembling endpoint allocate an adapted buffer for the + reception and reassembly operations. The fragment may be + stored for local reassembly. Alternatively, it may be routed + based on the destination IPv6 address. In that case, a VRB + state must be installed as described in Section 6.1.1. + + * When the Sequence is not 0, this field indicates the offset of + the fragment in the compressed form of the datagram. The + fragment may be added to a local reassembly buffer or forwarded + based on an existing VRB as described in Section 6.1.2. + + A Fragment_Offset that is set to a value of 0 indicates an abort + condition, and all states regarding the datagram should be cleaned + up once the processing of the fragment is complete; the processing + of the fragment depends on whether there is a VRB already + established for this datagram and if the next hop is still + reachable: + + * if a VRB already exists and the next hop is still reachable, + the fragment is to be forwarded along the associated LSP as + described in Section 6.1.2, without checking the value of the + Sequence field. + + * else, if the Sequence is 0, then the fragment is to be routed + as described in Section 6.1.1, but no state is conserved + afterwards. In that case, the session, if it exists, is + aborted, and the packet is also forwarded in an attempt to + clean up the next hops along the path indicated by the IPv6 + header (possibly including a Routing Header). + + * else (the Sequence is non-zero and either no VRB exists or the + next hop is unavailable), the fragment cannot be forwarded or + routed; the fragment is discarded and an abort RFRAG-ACK is + sent back to the source as described in Section 6.1.2. + + + Recoverable Fragments are sequenced, and a bitmap is used in the + RFRAG Acknowledgment to indicate the received fragments by setting + the individual bits that correspond to their sequence. + + There is no requirement on the reassembling endpoint to check that + the received fragments are consecutive and non-overlapping. This may + be useful, in particular, in the case where the MTU changes and a + fragment Sequence is retried with a smaller Fragment_Size, with the + remainder of the original fragment being retried with new Sequence + values. The fragmenting endpoint knows that the datagram is fully + received when the acknowledged fragments cover the whole datagram, + which is implied by a FULL bitmap. + +5.2. RFRAG Acknowledgment Dispatch Type and Header + + This specification also defines a 4-byte RFRAG Acknowledgment Bitmap + that is used by the reassembling endpoint to selectively confirm the + reception of individual fragments. A given offset in the bitmap maps + one to one with a given sequence number and indicates which fragment + is acknowledged as follows: + + 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | RFRAG Acknowledgment Bitmap | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + ^ ^ + | | bitmap indicating whether: + | +----- Fragment with Sequence 9 was received + +----------------------- Fragment with Sequence 0 was received + + Figure 2: RFRAG Acknowledgment Bitmap Encoding + + Figure 3 shows an example RFRAG Acknowledgment Bitmap that indicates + that all fragments from Sequence 0 to 20 were received, except for + fragments 1, 2, and 16, which were lost and must be retried. + + 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |1|0|0|1|1|1|1|1|1|1|1|1|1|1|1|1|0|1|1|1|1|0|0|0|0|0|0|0|0|0|0|0| + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 3: Example RFRAG Acknowledgment Bitmap + + The RFRAG Acknowledgment Bitmap is included in an RFRAG + Acknowledgment header, as follows: + + 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |1 1 1 0 1 0 1|E| Datagram_Tag | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | RFRAG Acknowledgment Bitmap (32 bits) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 4: RFRAG Acknowledgment Dispatch Type and Header + + E: 1 bit; Explicit Congestion Notification Echo. + + When set, the fragmenting endpoint indicates that at least one of + the acknowledged fragments was received with an Explicit + Congestion Notification, indicating that the path followed by the + fragments is subject to congestion. See more details in + Appendix C. + + Datagram_Tag: 8 bits; an identifier of the datagram that is locally + unique to the link-layer recipient. + + RFRAG Acknowledgment Bitmap: An RFRAG Acknowledgment Bitmap, whereby + setting the bit at offset x indicates that fragment x was + received, as shown in Figure 2. A NULL bitmap indicates that the + fragmentation process is aborted. A FULL bitmap indicates that + the fragmentation process is complete; all fragments were received + at the reassembly endpoint. + +6. Fragment Recovery + + The RFRAG header is used to transport a fragment and optionally + request an RFRAG-ACK that confirms the reception of one or more + fragments. An RFRAG-ACK is carried as a standalone fragment header + (i.e., with no 6LoWPAN payload) in a message that is propagated back + to the fragmenting endpoint. To achieve this, each hop that + performed an MPLS-like operation on fragments reverses that operation + for the RFRAG-ACK by sending a frame from the next hop to the + previous hop as known by its link-layer address in the VRB. The + Datagram_Tag in the RFRAG-ACK is unique to the reassembling endpoint + and is enough information for an intermediate hop to locate the VRB + that contains the Datagram_Tag used by the previous hop and the Layer + 2 information associated with it (interface and link-layer address). + + The fragmenting endpoint (i.e., the node that fragments the packets + at the 6LoWPAN level) also controls the number of acknowledgments by + setting the Ack-Request flag in the RFRAG packets. + + The fragmenting endpoint may set the Ack-Request flag on any fragment + to perform congestion control by limiting the number of outstanding + fragments, which are the fragments that have been sent but for which + reception or loss was not positively confirmed by the reassembling + endpoint. The maximum number of outstanding fragments is controlled + by the Window-Size. It is configurable and may vary in case of ECN + notification. When the endpoint that reassembles the packets at the + 6LoWPAN level receives a fragment with the Ack-Request flag set, it + MUST send an RFRAG-ACK back to the originator to confirm reception of + all the fragments it has received so far. + + The Ack-Request ("X") set in an RFRAG marks the end of a window. + This flag MUST be set on the last fragment if the fragmenting + endpoint wishes to perform an automatic repeat request (ARQ) process + for the datagram, and it MAY be set in any intermediate fragment for + the purpose of congestion control. + + This ARQ process MUST be protected by a Retransmission Timeout (RTO) + timer, and the fragment that carries the "X" flag MAY be retried upon + a timeout for a configurable number of times (see Section 7.1) with + an exponential backoff. Upon exhaustion of the retries, the + fragmenting endpoint may either abort the transmission of the + datagram or resend the first fragment with an "X" flag set in order + to establish a new path for the datagram and obtain the list of + fragments that were received over the old path in the acknowledgment + bitmap. When the fragmenting endpoint knows that an underlying link- + layer mechanism protects the fragments, it may refrain from using the + RFRAG Acknowledgment mechanism and never set the Ack-Request bit. + + The reassembling endpoint MAY issue unsolicited acknowledgments. An + unsolicited acknowledgment signals to the fragmenting endpoint that + it can resume sending in case it has reached its maximum number of + outstanding fragments. Another use is to inform the fragmenting + endpoint that the reassembling endpoint aborted the processing of an + individual datagram. + + The RFRAG Acknowledgment carries an ECN indication for congestion + control (see Appendix C). The reassembling endpoint of a fragment + with the "E" (ECN) flag set MUST echo that information at most once + by setting the "E" (ECN) flag in the next RFRAG-ACK. + + In order to protect the datagram, the fragmenting endpoint transfers + a controlled number of fragments and flags to the last fragment of a + window with an RFRAG Acknowledgment Request. The reassembling + endpoint MUST acknowledge a fragment with the acknowledgment request + bit set. If any fragment immediately preceding an acknowledgment + request is still missing, the reassembling endpoint MAY intentionally + delay its acknowledgment to allow in-transit fragments to arrive. + Because it might defeat the round-trip time computation, delaying the + acknowledgment should be configurable and not enabled by default. + + When enough fragments are received to cover the whole datagram, the + reassembling endpoint reconstructs the packet, passes it to the upper + layer, sends an RFRAG-ACK on the reverse path with a FULL bitmap, and + arms a short timer, e.g., on the order of an average round-trip time + in the network. The FULL bitmap is used as opposed to a bitmap that + acknowledges only the received fragments to let the intermediate + nodes know that the datagram is fully received. As the timer runs, + the reassembling endpoint absorbs the fragments that were still in + flight for that datagram without creating a new state, acknowledging + the ones that bear an Ack-Request with an FRAG Acknowledgment and the + FULL bitmap. The reassembling endpoint aborts the communication if + fragments with a matching source and Datagram-Tag continue to be + received after the timer expires. + + Note that acknowledgments might consume precious resources, so the + use of unsolicited acknowledgments SHOULD be configurable and not + enabled by default. + + An observation is that streamlining the forwarding of fragments + generally reduces the latency over the LLN mesh, providing room for + retries within existing upper-layer reliability mechanisms. The + fragmenting endpoint protects the transmission over the LLN mesh with + a retry timer that is configured for a use case and may be adapted + dynamically, e.g., according to the method detailed in [RFC6298]. It + is expected that the upper-layer retry mechanism obeys the + recommendations in [RFC8085], in which case a single round of + fragment recovery should fit within the upper-layer recovery timers. + + Fragments MUST be sent in a round-robin fashion: the sender MUST send + all the fragments for a first time before it retries any lost + fragment; lost fragments MUST be retried in sequence, oldest first. + This mechanism enables the receiver to acknowledge fragments that + were delayed in the network before they are retried. + + When a single radio frequency is used by contiguous hops, the + fragmenting endpoint SHOULD insert a delay between the frames (e.g., + carrying fragments) that are sent to the same next hop. The delay + SHOULD cover multiple transmissions so as to let a frame progress a + few hops and avoid hidden terminal issues. This precaution is not + required on channel hopping technologies such as Time-Slotted Channel + Hopping (TSCH) [RFC6554], where nodes that communicate at Layer 2 are + scheduled to send and receive, respectively, and different hops + operate on different channels. + +6.1. Forwarding Fragments + + This specification inherits from [RFC8930] and proposes a Virtual + Reassembly Buffer technique to forward fragments with no intermediate + reconstruction of the entire datagram. + + The IPv6 header MUST be placed in the first fragment in full to + enable the routing decision. The first fragment is routed and + creates an LSP from the fragmenting endpoint to the reassembling + endpoint. The next fragments are label switched along that LSP. As + a consequence, the next fragments can only follow the path that was + set up by the first fragment; they cannot follow an alternate route. + The Datagram_Tag is used to carry the label, which is swapped in each + hop. + + If the first fragment is too large for the path MTU, it will + repeatedly fail and never establish an LSP. In that case, the + fragmenting endpoint MAY retry the same datagram with a smaller + Fragment_Size, in which case it MUST abort the original attempt and + use a new Datagram_Tag for the new attempt. + +6.1.1. Receiving the First Fragment + + In route-over mode, the source and destination link-layer addresses + in a frame change at each hop. The label that is formed and placed + in the Datagram_Tag by the sender is associated with the source link- + layer address and only valid (and temporarily unique) for that source + link-layer address. + + Upon receiving the first fragment (i.e., with a Sequence of 0), an + intermediate router creates a VRB and the associated LSP state + indexed by the incoming interface, the previous-hop link-layer + address, and the Datagram_Tag and forwards the fragment along the + IPv6 route that matches the destination IPv6 address in the IPv6 + header until it reaches the reassembling endpoint, as prescribed by + [RFC8930]. The LSP state enables matching the next incoming + fragments of a datagram to the abstract forwarding information of the + next interface, source and next-hop link-layer addresses, and the + swapped Datagram_Tag. + + In addition, the router also forms a reverse LSP state indexed by the + interface to the next hop, the link-layer address the router uses as + source for that datagram, and the swapped Datagram_Tag. This reverse + LSP state enables matching the tuple (interface, destination link- + layer address, Datagram_Tag) found in an RFRAG-ACK to the abstract + forwarding information (previous interface, previous link-layer + address, Datagram_Tag) used to forward the RFRAG-ACK back to the + fragmenting endpoint. + +6.1.2. Receiving the Next Fragments + + Upon receiving the next fragment (i.e., with a non-zero Sequence), an + intermediate router looks up an LSP indexed by the tuple (incoming + interface, previous-hop link-layer address, Datagram_Tag) found in + the fragment. If it is found, the router forwards the fragment using + the associated VRB as prescribed by [RFC8930]. + + If the VRB for the tuple is not found, the router builds an RFRAG-ACK + to abort the transmission of the packet. The resulting message has + the following information: + + * The source and destination link-layer addresses are swapped from + those found in the fragment, and the same interface is used + + * The Datagram_Tag is set to the Datagram_Tag found in the fragment + + * A NULL bitmap is used to signal the abort condition + + At this point, the router is all set and can send the RFRAG-ACK back + to the previous router. The RFRAG-ACK should normally be forwarded + all the way to the source using the reverse LSP state in the VRBs in + the intermediate routers as described in the next section. + + [RFC8930] indicates that the reassembling endpoint stores "the actual + packet data from the fragments received so far, in a form that makes + it possible to detect when the whole packet has been received and can + be processed or forwarded". How this is computed is implementation + specific, but it relies on receiving all the bytes up to the + Datagram_Size indicated in the first fragment. An implementation may + receive overlapping fragments as the result of retries after an MTU + change. + +6.2. Receiving RFRAG Acknowledgments + + Upon receipt of an RFRAG-ACK, the router looks up a reverse LSP + indexed by the interface and destination link-layer address of the + received frame and the received Datagram_Tag in the RFRAG-ACK. If it + is found, the router forwards the fragment using the associated VRB + as prescribed by [RFC8930], but it uses the reverse LSP so that the + RFRAG-ACK flows back to the fragmenting endpoint. + + If the reverse LSP is not found, the router MUST silently drop the + RFRAG-ACK message. + + Either way, if the RFRAG-ACK indicates that the fragment was entirely + received (FULL bitmap), it arms a short timer, and upon timeout, the + VRB and all the associated states are destroyed. Until the timer + elapses, fragments of that datagram may still be received, e.g., if + the RFRAG-ACK was lost on the path back, and the source retried the + last fragment. In that case, the router generates an RFRAG-ACK with + a FULL bitmap back to the fragmenting endpoint if an acknowledgment + was requested; else, it silently drops the fragment. + + This specification does not provide a method to discover the number + of hops or the minimal value of MTU along those hops. In a typical + case, the MTU is constant and is the same across the network. But + should the minimal MTU along the path decrease, it is possible to + retry a long fragment (say a Sequence of 5) with several shorter + fragments with a Sequence that was not used before (e.g., 13 and 14). + Fragment 5 is marked as abandoned and will not be retried anymore. + Note that when this mechanism is in place, it is hard to predict the + total number of fragments that will be needed or the final shape of + the bitmap that would cover the whole packet. This is why the FULL + bitmap is used when the reassembling endpoint gets the whole datagram + regardless of which fragments were actually used to do so. + Intermediate nodes will know unambiguously that the process is + complete. Note that Path MTU Discovery is out of scope for this + document. + +6.3. Aborting the Transmission of a Fragmented Packet + + A reset is signaled on the forward path with a pseudo fragment that + has the Fragment_Offset set to 0. The sender of a reset SHOULD also + set the Sequence and Fragment_Size field to 0. + + When the fragmenting endpoint or a router on the path decides that a + packet should be dropped and the fragmentation process aborted, it + generates a reset pseudo fragment and forwards it down the fragment + path. + + Each router along the path forwards the pseudo fragment in turn based + on the VRB state. If an acknowledgment is not requested, the VRB and + all associated states are destroyed. + + Upon reception of the pseudo fragment, the reassembling endpoint + cleans up all resources for the packet associated with the + Datagram_Tag. If an acknowledgment is requested, the reassembling + endpoint responds with a NULL bitmap. + + On the other hand, the reassembling endpoint might need to abort the + processing of a fragmented packet for internal reasons, for instance, + if it is out of reassembly buffers, already uses all 256 possible + values of the Datagram_Tag, or keeps receiving fragments beyond a + reasonable time while it considers that this packet is already fully + reassembled and was passed to the upper layer. In that case, the + reassembling endpoint SHOULD indicate so to the fragmenting endpoint + with a NULL bitmap in an RFRAG-ACK. + + The RFRAG-ACK is forwarded all the way back to the source of the + packet and cleans up all resources on the path. Upon an + acknowledgment with a NULL bitmap, the fragmenting endpoint MUST + abort the transmission of the fragmented datagram with one exception: + in the particular case of the first fragment, it MAY decide to retry + via an alternate next hop instead. + +6.4. Applying Recoverable Fragmentation along a Diverse Path + + The text above can be read with the assumption of a serial path + between a source and a destination. The IPv6 over the TSCH mode of + IEEE 802.15.4e (6TiSCH) architecture (see Section 4.5.3 of [6TiSCH]) + defines the concept of a Track that can be a complex path between a + source and a destination with Packet ARQ, Replication, Elimination, + and Overhearing (PAREO) along the Track. This specification can be + used along any subset of the complex Track where the first fragment + is flooded. The last RFRAG Acknowledgment is flooded on that same + subset in the reverse direction. Intermediate RFRAG Acknowledgments + can be flooded on any sub-subset of that reverse subset that reaches + back to the source. + +7. Management Considerations + + This specification extends [RFC8930] and requires the same parameters + in the reassembling endpoint and on intermediate nodes. There is no + new parameter as echoing ECN is always on. These parameters + typically include the reassembly timeout at the reassembling + endpoint, an inactivity cleanup timer on the intermediate nodes, and + the number of messages that can be processed in parallel in all + nodes. + + The configuration settings introduced by this specification only + apply to the fragmenting endpoint, which is in full control of the + transmission. LLNs vary a lot in size (there can be thousands of + nodes in a mesh), in speed (from 10 Kbps to several Mbps at the PHY + layer), in traffic density, and in optimizations that are desired + (e.g., the selection of a Routing Protocol for LLNs (RPL) [RFC6550] + Objective Function [RFC6552] impacts the shape of the routing graph). + + For that reason, only very generic guidance can be given on the + settings of the fragmenting endpoint and on whether complex + algorithms are needed to perform congestion control or to estimate + the round-trip time. To cover the most complex use cases, this + specification enables the fragmenting endpoint to vary the fragment + size, the window size, and the inter-frame gap based on the number of + losses, the observed variations of the round-trip time, and the + setting of the ECN bit. + +7.1. Protocol Parameters + + The management system SHOULD be capable of providing the parameters + listed in this section, and an implementation MUST abide by those + parameters and, in particular, never exceed the minimum and maximum + configured boundaries. + + An implementation should consider the generic recommendations from + the IETF in the matter of congestion control and rate management for + IP datagrams in [RFC8085]. An implementation may perform congestion + control by using a dynamic value of the window size (Window_Size), + adapting the fragment size (Fragment_Size), and potentially reducing + the load by inserting an inter-frame gap that is longer than + necessary. In a large network where nodes contend for the bandwidth, + a larger Fragment_Size consumes less bandwidth but also reduces + fluidity and incurs higher chances of loss in transmission. + + This is controlled by the following parameters: + + inter-frame gap: The inter-frame gap indicates the minimum amount of + time between transmissions. The inter-frame gap controls the rate + at which fragments are sent, the ratio of air time, and the amount + of memory in intermediate nodes that a particular datagram will + use. It can be used as a flow control, a congestion control, and/ + or a collision control measure. It MUST be set at a minimum to a + value that protects the propagation of one transmission against + collision with next [RFC8930]. In a wireless network that uses + the same frequency along a path, this may represent the time for a + frame to progress over multiple hops (see more in Section 4.2). + It SHOULD be augmented beyond this as necessary to protect the + network against congestion. + + MinFragmentSize: The MinFragmentSize is the minimum value for the + Fragment_Size. It MUST be lower than the minimum value of + smallest 1-hop MTU that can be encountered along the path. + + OptFragmentSize: The OptFragmentSize is the value for the + Fragment_Size that the fragmenting endpoint should use to start + with. It is greater than or equal to MinFragmentSize. It is less + than or equal to MaxFragmentSize. For the first fragment, it must + account for the expansion of the IPv6 addresses and of the Hop + Limit field within MTU. For all fragments, it is a balance + between the expected fluidity and the overhead of link-layer and + 6LoWPAN headers. For a small MTU, the idea is to keep it close to + the maximum, whereas for larger MTUs, it might make sense to keep + it short enough so that the duty cycle of the transmitter is + bounded, e.g., to transmit at least 10 frames per second. + + MaxFragmentSize: The MaxFragmentSize is the maximum value for the + Fragment_Size. It MUST be lower than the maximum value of the + smallest 1-hop MTU that can be encountered along the path. A + large value augments the chances of buffer bloat and transmission + loss. The value MUST be less than 512 if the unit that is defined + for the PHY layer is the byte. + + Window_Size: The Window_Size MUST be at least 1 and less than 33. + + * If the round-trip time is known, the Window_Size SHOULD be set + to the round-trip time divided by the time per fragment; that + is, the time to transmit a fragment plus the inter-frame gap. + + Otherwise: + + * A window_size of 32 indicates that only the last fragment is to + be acknowledged in each round. This is the RECOMMENDED value + in a half-duplex LLN where the fragment acknowledgment consumes + roughly the same bandwidth on the same links as the fragments + themselves. + + * If it is set to a smaller value, more acks are generated. In a + full-duplex network, the load on the forward path will be + lower, and a small value of 3 SHOULD be configured. + + An implementation may perform its estimate of the RTO or use a + configured one. The ARQ process is controlled by the following + parameters: + + MinARQTimeOut: The minimum amount of time a node should wait for an + RFRAG Acknowledgment before it takes the next action. It MUST be + more than the maximum expected round-trip time in the respective + network. + + OptARQTimeOut: The initial value of the RTO, which is the amount of + time that a fragmenting endpoint should wait for an RFRAG + Acknowledgment before it takes the next action. It is greater + than or equal to MinARQTimeOut. It is less than or equal to + MaxARQTimeOut. See Appendix C for recommendations on computing + the round-trip time. By default, a value of 3 times the maximum + expected round-trip time in the respective network is RECOMMENDED. + + MaxARQTimeOut: The maximum amount of time a node should wait for the + RFRAG Acknowledgment before it takes the next action. It must + cover the longest expected round-trip time and be several times + less than the timeout that covers the recomposition buffer at the + reassembling endpoint, which is typically on the order of the + minute. An upper bound can be estimated to ensure that the + datagram is either fully transmitted or dropped before an upper + layer decides to retry it. + + MaxFragRetries: The maximum number of retries for a particular + fragment. A default value of 3 is RECOMMENDED. An upper bound + can be estimated to ensure that the datagram is either fully + transmitted or dropped before an upper layer decides to retry it. + + MaxDatagramRetries: The maximum number of retries from scratch for a + particular datagram. A default value of 1 is RECOMMENDED. An + upper bound can be estimated to ensure that the datagram is either + fully transmitted or dropped before an upper layer decides to + retry it. + + An implementation may be capable of performing congestion control + based on ECN; see Appendix C. This is controlled by the following + parameter: + + UseECN: Indicates whether the fragmenting endpoint should react to + ECN. The fragmenting endpoint may react to ECN by varying the + Window_Size between MinWindowSize and MaxWindowSize, varying the + Fragment_Size between MinFragmentSize and MaxFragmentSize, and/or + increasing or reducing the inter-frame gap. With this + specification, if UseECN is set and a fragmenting endpoint detects + a congestion, it may apply a congestion control method until the + end of the datagram, whereas if UseECN is reset, the endpoint does + not react to congestion. Future specifications may provide + additional parameters and capabilities. + +7.2. Observing the Network + + The management system should monitor the number of retries and ECN + settings that can be observed from the perspective of the fragmenting + endpoint with respect to the reassembling endpoint and reciprocally. + It may then tune the optimum size of Fragment_Size and of + Window_Size, OptFragmentSize, and OptWindowSize, respectively, at the + fragmenting endpoint towards a particular reassembling endpoint, + which is applicable to the next datagrams. It will preferably tune + the inter-frame gap to increase the spacing between fragments of the + same datagram and reduce the buffer bloat in the intermediate node + that holds one or more fragments of that datagram. + +8. Security Considerations + + This document specifies an instantiation of a 6LFF technique and + inherits from the generic description in [RFC8930]. The + considerations in the Security Considerations section of [RFC8930] + equally apply to this document. + + In addition to the threats detailed therein, an attacker that is on + path can prematurely end the transmission of a datagram by sending a + RFRAG Acknowledgment to the fragmenting endpoint. It can also cause + extra transmissions of fragments by resetting bits in the RFRAG + Acknowledgment Bitmap and of RFRAG Acknowledgments by forcing the + Ack-Request bit in fragments that it forwards. + + As indicated in [RFC8930], secure joining and link-layer security are + REQUIRED to protect against those attacks, as the fragmentation + protocol does not include any native security mechanisms. + + This specification does not recommend a particular algorithm for the + estimation of the duration of the RTO that covers the detection of + the loss of a fragment with the "X" flag set; regardless, an attacker + on the path may slow down or discard packets, which in turn can + affect the throughput of fragmented packets. + + Compared to [RFC4944], this specification reduces the Datagram_Tag to + 8 bits, and the tag wraps faster than with [RFC4944]. But for a + constrained network where a node is expected to be able to hold only + one or a few large packets in memory, 256 is still a large number. + Also, the acknowledgment mechanism allows cleaning up the state + rapidly once the packet is fully transmitted or aborted. + + The abstract Virtual Recovery Buffer from [RFC8930] may be used to + perform a Denial-of-Service (DoS) attack against the intermediate + routers since the routers need to maintain a state per flow. The + particular VRB implementation technique described in [LWIG-FRAG] + allows realigning which data goes in which fragment; this causes the + intermediate node to store a portion of the data, which adds an + attack vector that is not present with this specification. With this + specification, the data that is transported in each fragment is + conserved, and the state to keep does not include any data that would + not fit in the previous fragment. + +9. IANA Considerations + + This document allocates two patterns for a total of four dispatch + values for Recoverable Fragments from the "Dispatch Type Field" + registry that was created by [RFC4944] and reformatted by "IPv6 over + Low-Power Wireless Personal Area Network (6LoWPAN) Paging Dispatch" + [RFC8025]. + + +-------------+------+----------------------------------+-----------+ + | Bit Pattern | Page | Header Type | Reference | + +-------------+------+----------------------------------+-----------+ + | 11 10100x | 0 | RFRAG - Recoverable Fragment | RFC 8931 | + +-------------+------+----------------------------------+-----------+ + | 11 10100x | 1-14 | Unassigned | | + +-------------+------+----------------------------------+-----------+ + | 11 10100x | 15 | Reserved for Experimental Use | RFC 8025 | + +-------------+------+----------------------------------+-----------+ + | 11 10101x | 0 | RFRAG-ACK - RFRAG | RFC 8931 | + | | | Acknowledgment | | + +-------------+------+----------------------------------+-----------+ + | 11 10101x | 1-14 | Unassigned | | + +-------------+------+----------------------------------+-----------+ + | 11 10101x | 15 | Reserved for Experimental Use | RFC 8025 | + +-------------+------+----------------------------------+-----------+ + + Table 1: Additional Dispatch Value Bit Patterns + +10. References + +10.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC4919] Kushalnagar, N., Montenegro, G., and C. Schumacher, "IPv6 + over Low-Power Wireless Personal Area Networks (6LoWPANs): + Overview, Assumptions, Problem Statement, and Goals", + RFC 4919, DOI 10.17487/RFC4919, August 2007, + <https://www.rfc-editor.org/info/rfc4919>. + + [RFC4944] Montenegro, G., Kushalnagar, N., Hui, J., and D. Culler, + "Transmission of IPv6 Packets over IEEE 802.15.4 + Networks", RFC 4944, DOI 10.17487/RFC4944, September 2007, + <https://www.rfc-editor.org/info/rfc4944>. + + [RFC6282] Hui, J., Ed. and P. Thubert, "Compression Format for IPv6 + Datagrams over IEEE 802.15.4-Based Networks", RFC 6282, + DOI 10.17487/RFC6282, September 2011, + <https://www.rfc-editor.org/info/rfc6282>. + + [RFC6298] Paxson, V., Allman, M., Chu, J., and M. Sargent, + "Computing TCP's Retransmission Timer", RFC 6298, + DOI 10.17487/RFC6298, June 2011, + <https://www.rfc-editor.org/info/rfc6298>. + + [RFC6606] Kim, E., Kaspar, D., Gomez, C., and C. Bormann, "Problem + Statement and Requirements for IPv6 over Low-Power + Wireless Personal Area Network (6LoWPAN) Routing", + RFC 6606, DOI 10.17487/RFC6606, May 2012, + <https://www.rfc-editor.org/info/rfc6606>. + + [RFC8025] Thubert, P., Ed. and R. Cragie, "IPv6 over Low-Power + Wireless Personal Area Network (6LoWPAN) Paging Dispatch", + RFC 8025, DOI 10.17487/RFC8025, November 2016, + <https://www.rfc-editor.org/info/rfc8025>. + + [RFC8138] Thubert, P., Ed., Bormann, C., Toutain, L., and R. Cragie, + "IPv6 over Low-Power Wireless Personal Area Network + (6LoWPAN) Routing Header", RFC 8138, DOI 10.17487/RFC8138, + April 2017, <https://www.rfc-editor.org/info/rfc8138>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + + [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 + (IPv6) Specification", STD 86, RFC 8200, + DOI 10.17487/RFC8200, July 2017, + <https://www.rfc-editor.org/info/rfc8200>. + + [RFC8930] Watteyne, T., Ed., Thubert, P., Ed., and C. Bormann, "On + Forwarding 6LoWPAN (IPv6 over Low-Power Wireless Personal + Area Network) Fragments over a Multi-Hop IPv6 Network", + RFC 8930, DOI 10.17487/RFC8930, November 2020, + <https://www.rfc-editor.org/info/rfc8930>. + +10.2. Informative References + + [6TiSCH] Thubert, P., "An Architecture for IPv6 over the TSCH mode + of IEEE 802.15.4", Work in Progress, Internet-Draft, + draft-ietf-6tisch-architecture-29, 27 August 2020, + <https://tools.ietf.org/html/draft-ietf-6tisch- + architecture-29>. + + [IEEE.802.15.4] + IEEE, "IEEE Standard for Low-Rate Wireless Networks", + IEEE Standard 802.15.4-2015, + DOI 10.1109/IEEESTD.2016.7460875, April 2016, + <http://ieeexplore.ieee.org/document/7460875/>. + + [Kent] Kent, C. and J. Mogul, "Fragmentation Considered Harmful", + SIGCOMM '87: Proceedings of the ACM workshop on Frontiers + in computer communications technology, pp. 390-401, + DOI 10.1145/55483.55524, August 1987, + <http://www.hpl.hp.com/techreports/Compaq-DEC/WRL- + 87-3.pdf>. + + [LWIG-FRAG] + Bormann, C. and T. Watteyne, "Virtual reassembly buffers + in 6LoWPAN", Work in Progress, Internet-Draft, draft-ietf- + lwig-6lowpan-virtual-reassembly-02, 9 March 2020, + <https://tools.ietf.org/html/draft-ietf-lwig-6lowpan- + virtual-reassembly-02>. + + [RFC2914] Floyd, S., "Congestion Control Principles", BCP 41, + RFC 2914, DOI 10.17487/RFC2914, September 2000, + <https://www.rfc-editor.org/info/rfc2914>. + + [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol + Label Switching Architecture", RFC 3031, + DOI 10.17487/RFC3031, January 2001, + <https://www.rfc-editor.org/info/rfc3031>. + + [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition + of Explicit Congestion Notification (ECN) to IP", + RFC 3168, DOI 10.17487/RFC3168, September 2001, + <https://www.rfc-editor.org/info/rfc3168>. + + [RFC4963] Heffner, J., Mathis, M., and B. Chandler, "IPv4 Reassembly + Errors at High Data Rates", RFC 4963, + DOI 10.17487/RFC4963, July 2007, + <https://www.rfc-editor.org/info/rfc4963>. + + [RFC5033] Floyd, S. and M. Allman, "Specifying New Congestion + Control Algorithms", BCP 133, RFC 5033, + DOI 10.17487/RFC5033, August 2007, + <https://www.rfc-editor.org/info/rfc5033>. + + [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion + Control", RFC 5681, DOI 10.17487/RFC5681, September 2009, + <https://www.rfc-editor.org/info/rfc5681>. + + [RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., + Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, + JP., and R. Alexander, "RPL: IPv6 Routing Protocol for + Low-Power and Lossy Networks", RFC 6550, + DOI 10.17487/RFC6550, March 2012, + <https://www.rfc-editor.org/info/rfc6550>. + + [RFC6552] Thubert, P., Ed., "Objective Function Zero for the Routing + Protocol for Low-Power and Lossy Networks (RPL)", + RFC 6552, DOI 10.17487/RFC6552, March 2012, + <https://www.rfc-editor.org/info/rfc6552>. + + [RFC6554] Hui, J., Vasseur, JP., Culler, D., and V. Manral, "An IPv6 + Routing Header for Source Routes with the Routing Protocol + for Low-Power and Lossy Networks (RPL)", RFC 6554, + DOI 10.17487/RFC6554, March 2012, + <https://www.rfc-editor.org/info/rfc6554>. + + [RFC7554] Watteyne, T., Ed., Palattella, M., and L. Grieco, "Using + IEEE 802.15.4e Time-Slotted Channel Hopping (TSCH) in the + Internet of Things (IoT): Problem Statement", RFC 7554, + DOI 10.17487/RFC7554, May 2015, + <https://www.rfc-editor.org/info/rfc7554>. + + [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF + Recommendations Regarding Active Queue Management", + BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, + <https://www.rfc-editor.org/info/rfc7567>. + + [RFC8085] Eggert, L., Fairhurst, G., and G. Shepherd, "UDP Usage + Guidelines", BCP 145, RFC 8085, DOI 10.17487/RFC8085, + March 2017, <https://www.rfc-editor.org/info/rfc8085>. + + [RFC8087] Fairhurst, G. and M. Welzl, "The Benefits of Using + Explicit Congestion Notification (ECN)", RFC 8087, + DOI 10.17487/RFC8087, March 2017, + <https://www.rfc-editor.org/info/rfc8087>. + + [RFC8201] McCann, J., Deering, S., Mogul, J., and R. Hinden, Ed., + "Path MTU Discovery for IP version 6", STD 87, RFC 8201, + DOI 10.17487/RFC8201, July 2017, + <https://www.rfc-editor.org/info/rfc8201>. + + [RFC8900] Bonica, R., Baker, F., Huston, G., Hinden, R., Troan, O., + and F. Gont, "IP Fragmentation Considered Fragile", + BCP 230, RFC 8900, DOI 10.17487/RFC8900, September 2020, + <https://www.rfc-editor.org/info/rfc8900>. + +Appendix A. Rationale + + There are a number of uses for large packets in Wireless Sensor + Networks. Such usages may not be the most typical or represent the + largest amount of traffic over the LLN; however, the associated + functionality can be critical enough to justify extra care for + ensuring effective transport of large packets across the LLN. + + The list of those usages includes: + + Towards the LLN node: + + Firmware update: For example, a new version of the LLN node + software is downloaded from a system manager over unicast or + multicast services. Such a reflashing operation typically + involves updating a large number of similar LLN nodes over a + relatively short period of time. + + Packages of commands: A number of commands or a full + configuration can be packaged as a single message to ensure + consistency and enable atomic execution or complete rollback. + Until such commands are fully received and interpreted, the + intended operation will not take effect. + + From the LLN node: + + Waveform captures: A number of consecutive samples are measured + at a high rate for a short time and then are transferred from a + sensor to a gateway or an edge server as a single large report. + + Data logs: LLN nodes may generate large logs of sampled data for + later extraction. LLN nodes may also generate system logs to + assist in diagnosing problems on the node or network. + + Large data packets: Rich data types might require more than one + fragment. + + Uncontrolled firmware download or waveform upload can easily result + in a massive increase of the traffic and saturate the network. + + When a fragment is lost in transmission, the lack of recovery in the + original fragmentation system of RFC 4944 implies that all fragments + would need to be resent, further contributing to the congestion that + caused the initial loss and potentially leading to congestion + collapse. + + This saturation may lead to excessive radio interference or random + early discard (leaky bucket) in relaying nodes. Additional queuing + and memory congestion may result while waiting for a low-power next + hop to emerge from its sleep state. + + Considering that RFC 4944 defines an MTU as 1280 bytes, and that in + most incarnations (except 802.15.4g) an IEEE Std 802.15.4 frame can + limit the link-layer payload to as few as 74 bytes, a packet might be + fragmented into at least 18 fragments at the 6LoWPAN shim layer. + Taking into account the worst-case header overhead for 6LoWPAN + Fragmentation and Mesh Addressing headers will increase the number of + required fragments to around 32. This level of fragmentation is much + higher than that traditionally experienced over the Internet with + IPv4 fragments. At the same time, the use of radios increases the + probability of transmission loss, and mesh-under techniques compound + that risk over multiple hops. + + Mechanisms such as TCP or application-layer segmentation could be + used to support end-to-end reliable transport. One option to support + bulk data transfer over a frame-size-constrained LLN is to set the + Maximum Segment Size to fit within the link maximum frame size. + However, doing so can add significant header overhead to each + 802.15.4 frame and cause extraneous acknowledgments across the LLN + compared to the method in this specification. + +Appendix B. Requirements + + For one-hop communications, a number of LLN link layers propose a + local acknowledgment mechanism that is enough to detect and recover + the loss of fragments. In a multi-hop environment, an end-to-end + fragment recovery mechanism might be a good complement to a hop-by- + hop Medium Access Control (MAC) recovery. This document introduces a + simple protocol to recover individual fragments between 6LFF + endpoints that may be multiple hops away. + + The method addresses the following requirements of an LLN: + + Number of fragments: The recovery mechanism must support highly + fragmented packets, with a maximum of 32 fragments per packet. + + Minimum acknowledgment overhead: Because the radio is half duplex, + and because of silent time spent in the various medium access + mechanisms, an acknowledgment consumes roughly as many resources + as a data fragment. + + The new end-to-end fragment recovery mechanism should be able to + acknowledge multiple fragments in a single message and not require + an acknowledgment at all if fragments are already protected at a + lower layer. + + Controlled latency: The recovery mechanism must succeed or give up + within the time boundary imposed by the recovery process of the + upper-layer protocols. + + Optional congestion control: The aggregation of multiple concurrent + flows may lead to the saturation of the radio network and + congestion collapse. + + The recovery mechanism should provide means for controlling the + number of fragments in transit over the LLN. + +Appendix C. Considerations on Congestion Control + + Considering that a multi-hop LLN can be a very sensitive environment + due to the limited queuing capabilities of a large population of its + nodes, this document recommends a simple and conservative approach to + congestion control, based on TCP congestion avoidance. + + Congestion on the forward path is assumed in case of packet loss, and + packet loss is assumed upon timeout. This document allows + controlling the number of outstanding fragments that have been + transmitted, but for which an acknowledgment was not yet received, + and that are still covered by the ARQ timer. + + Congestion on the forward path can also be indicated by an ECN + mechanism. Though whether and how ECN [RFC3168] is carried out over + the LoWPAN is out of scope, this document provides a way for the + destination endpoint to echo an ECN indication back to the + fragmenting endpoint in an acknowledgment message as represented in + Figure 4 in Section 5.2. + + While the support of echoing the ECN at the reassembling endpoint is + mandatory, this specification only provides a minimalistic behavior + on the fragmenting endpoint. If an "E" flag is received, the window + SHOULD be reduced at least by 1 and at max to 1. Halving the window + for each "E" flag received could be a good compromise, but it needs + further experimentation. A very simple implementation may just reset + the window to 1, so the fragments are sent and acknowledged one by + one. + + Note that any action that has been performed upon detection of + congestion only applies for the transmission of one datagram, and the + next datagram starts with the configured Window_Size again. + + The exact use of the Acknowledgment Request flag and of the window + are left to implementation. An optimistic implementation could send + all the fragments up to Window_Size, setting the Acknowledgment + Request "X" flag only on the last fragment; wait for the bitmap, + which means a gap of half a round-trip time; and resend the losses. + A pessimistic implementation could set the "X" flag on the first + fragment to check that the path works and open the window only upon + receiving the RFRAG-ACK. It could then set an "X" flag again on the + second fragment and use the window as a credit to send up to + Window_Size before it is blocked. In that case, if the RFRAG-ACK + comes back before the window starves, the gating factor is the inter- + frame gap. If the RFRAG-ACK does not arrive in time, the Window_Size + is the gating factor, and the transmission of the datagram is + delayed. + + It must be noted that even though the inter-frame gap can be used as + a flow control or a congestion control measure, it also plays a + critical role in wireless collision avoidance. In particular, when a + mesh operates on the same channel over multiple hops, the forwarding + of a fragment over a certain hop may collide with the forwarding of + the next fragment that is following over a previous hop but that is + in the same interference domain. To prevent this, the fragmenting + endpoint is required to pace individual fragments within a transmit + window with an inter-frame gap. This is needed to ensure that a + given fragment is sent only when the previous fragment has had a + chance to progress beyond the interference domain of this hop. In + the case of 6TiSCH [6TiSCH], which operates over the Time-Slotted + Channel Hopping (TSCH) mode of operation of IEEE 802.15.4 [RFC7554], + a fragment is forwarded over a different channel at a different time, + and it makes full sense to transmit the next fragment as soon as the + previous fragment has had its chance to be forwarded at the next hop. + + Depending on the setting of the Window_Size and the inter-frame gap, + how the window is used, and the number of hops, the Window_Size may + or may not become the gating factor that blocks the transmission. If + the sender uses the Window_Size as a credit: + + * a conservative Window_Size of, say, 3 will be the gating factor + that limits the transmission rate of the sender -- and causes + transmission gaps longer than the inter-frame gap -- as soon as + the number of hops exceeds 3 in a TSCH network and 5-9 in a single + frequency mesh. The more hops the more the starving window will + add to latency of the transmission. + + * The recommendation to align the Window-Size to the round-trip time + divided by the time per fragment aligns the Window-Size to the + time it takes to get the RFAG_ACK before the window starves. A + Window-Size that is higher than that increases the chances of a + congestion but does not improve the forward throughput. + Considering that the RFRAG-ACK takes the same path as the fragment + with the assumption that it travels at roughly the same speed, an + inter-frame gap that separates fragments by 2 hops leads to a + Window_Size that is roughly the number of hops. + + * Setting the Window-Size to 32 minimizes the cost of the + acknowledgment in a constrained network and frees bandwidth for + the fragments in a half-duplex network. Using it increases the + risk of congestion if a bottleneck forms, but it optimizes the use + of resources under normal conditions. When it is used, the only + protection for the network is the inter-frame gap, which must be + chosen wisely to prevent the formation of a bottleneck. + + From the standpoint of a source 6LoWPAN endpoint, an outstanding + fragment is a fragment that was sent but for which no explicit + acknowledgment was yet received. This means that the fragment might + be on the path or received but not yet acknowledged, or the + acknowledgment might be on the path back. It is also possible that + either the fragment or the acknowledgment was lost on the way. + + From the fragmenting endpoint standpoint, all outstanding fragments + might still be in the network and contribute to its congestion. + There is an assumption, though, that after a certain amount of time, + a frame is either received or lost, so it is not causing congestion + anymore. This amount of time can be estimated based on the round- + trip time between the 6LoWPAN endpoints. For the lack of a more + adapted technique, the method detailed in "Computing TCP's + Retransmission Timer" [RFC6298] may be used for that computation. + + This specification provides the necessary tools for the fragmenting + endpoint to take congestion control actions and protect the network, + but it leaves the implementation free to select the action to be + taken. The intention is to use it to build experience and specify + more precisely the congestion control actions in one or more future + specifications. "Congestion Control Principles" [RFC2914] and + "Specifying New Congestion Control Algorithms" [RFC5033] provide + indications and wisdom that should help through this process. + + [RFC7567] and [RFC5681] provide deeper information on why congestion + control is needed and how TCP handles it. Basically, the goal here + is to manage the number of fragments present in the network; this is + achieved by reducing the number of outstanding fragments over a + congested path by throttling the sources. + +Acknowledgments + + The author wishes to thank Michel Veillette, Dario Tedeschi, Laurent + Toutain, Carles Gomez Montenegro, Thomas Watteyne, and Michael + Richardson for their in-depth reviews and comments. Also, many + thanks to Roman Danyliw, Peter Yee, Colin Perkins, Tirumaleswar + Reddy.K, Éric Vyncke, Warren Kumari, Magnus Westerlund, Erik + Nordmark, and especially Benjamin Kaduk and Mirja Kühlewind for their + careful reviews and help during the IETF Last Call and IESG review + process. Thanks to Jonathan Hui, Jay Werb, Christos Polyzois, + Soumitri Kolavennu, Pat Kinney, Margaret Wasserman, Richard Kelsey, + Carsten Bormann, and Harry Courtice for their various contributions + in the long process that lead to this document. + +Author's Address + + Pascal Thubert (editor) + Cisco Systems, Inc. + Building D + 45 Allee des Ormes - BP1200 + 06254 MOUGINS - Sophia Antipolis + France + + Phone: +33 497 23 26 34 + Email: pthubert@cisco.com |