diff options
Diffstat (limited to 'doc/rfc/rfc9638.txt')
-rw-r--r-- | doc/rfc/rfc9638.txt | 1175 |
1 files changed, 1175 insertions, 0 deletions
diff --git a/doc/rfc/rfc9638.txt b/doc/rfc/rfc9638.txt new file mode 100644 index 0000000..14ccb9b --- /dev/null +++ b/doc/rfc/rfc9638.txt @@ -0,0 +1,1175 @@ + + + + +Internet Engineering Task Force (IETF) S. Boutros, Ed. +Request for Comments: 9638 Ciena Corporation +Category: Informational D. Eastlake 3rd, Ed. +ISSN: 2070-1721 Independent + September 2024 + + +Network Virtualization over Layer 3 (NVO3) Encapsulation Considerations + +Abstract + + The IETF Network Virtualization Overlays (NVO3) Working Group + developed considerations for a common encapsulation that addresses + various network virtualization overlay technical concerns. This + document provides a record, for the benefit of the IETF community, of + the considerations arrived at by the NVO3 Working Group starting from + the output of the NVO3 encapsulation Design Team. These + considerations may be helpful with future deliberations by working + groups over the choice of encapsulation formats. + + There are implications of having different encapsulations in real + environments consisting of both software and hardware implementations + and within and spanning multiple data centers. For example, + Operations, Administration, and Maintenance (OAM) functions such as + path MTU discovery become challenging with multiple encapsulations + along the data path. + + Based on these considerations, the NVO3 Working Group determined that + Generic Network Virtualization Encapsulation (Geneve) with a few + modifications is the common encapsulation. This document provides + more details, particularly in Section 7. + +Status of This Memo + + This document is not an Internet Standards Track specification; it is + published for informational purposes. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Not all documents + approved by the IESG are candidates for any level of Internet + Standard; see Section 2 of RFC 7841. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + https://www.rfc-editor.org/info/rfc9638. + +Copyright Notice + + Copyright (c) 2024 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Revised BSD License text as described in Section 4.e of the + Trust Legal Provisions and are provided without warranty as described + in the Revised BSD License. + +Table of Contents + + 1. Introduction + 2. Design Team and Working Group Process + 3. Terminology + 4. Abbreviations, Acronyms, and Definitions + 5. Encapsulation Issues and Background + 5.1. Geneve + 5.2. Generic UDP Encapsulation (GUE) + 5.3. Generic Protocol Extension (GPE) for VXLAN + 6. Common Encapsulation Considerations + 6.1. Current Encapsulations + 6.2. Useful Extensions Use Cases + 6.2.1. Telemetry Extensions + 6.2.2. Security/Integrity Extensions + 6.2.3. Group-Based Policy + 6.3. Hardware Considerations + 6.4. Extension Size + 6.5. Ordering of Extension Headers + 6.6. TLV versus Bit Fields + 6.7. Control Plane Considerations + 6.8. Split NVE + 6.9. Larger VNI Considerations + 7. Recommendations + 8. Security Considerations + 9. IANA Considerations + 10. References + 10.1. Normative References + 10.2. Informative References + Appendix A. Encapsulation Comparison + A.1. Overview + A.2. Extensibility + A.2.1. Innate Extensibility Support + A.2.2. Extension Parsing + A.2.3. Critical Extensions + A.2.4. Maximal Header Length + A.3. Encapsulation Header + A.3.1. Virtual Network Identifier (VNI) + A.3.2. Next Protocol + A.3.3. Other Header Fields + A.4. Comparison Summary + Acknowledgements + Contributors + Authors' Addresses + +1. Introduction + + The NVO3 Working Group is chartered to gather requirements and + develop solutions for network virtualization data planes based on + encapsulation of virtual network traffic over an IP-based underlay + data plane. Requirements include due consideration for OAM and + security. Based on these requirements, the WG was to select, extend, + and/or develop one or more data plane encapsulation formats. + + This led to WG Internet-Drafts and an RFC describing three + encapsulations as follows: + + * "Geneve: Generic Network Virtualization Encapsulation" [RFC8926] + + * "Generic UDP Encapsulation" [GUE] + + * "Generic Protocol Extension for VXLAN (VXLAN-GPE)" [VXLAN-GPE] + + Discussion on the list and in face-to-face meetings identified a + number of technical problems with each of these encapsulations. + Furthermore, there was a clear consensus at the 96th IETF meeting in + Berlin that the working group should progress only one data plane + encapsulation, to maximize interoperability. In order to overcome a + deadlock on the encapsulation decision, the WG consensus was to form + a Design Team [RFC2418] to resolve this issue and provide initial + considerations. + +2. Design Team and Working Group Process + + The Design Team was to select one of the proposed encapsulations and + enhance it to address the technical concerns. The goals were simple + evolution of deployed networks as well as applicability to all + locations in the NVO3 architecture. The Design Team was to + specifically select a design that allows for future extensibility but + is not burdensome on hardware implementations. The selected design + also needed to operate well with the Internet Control Message + Protocol (ICMP) and in Equal-Cost Multipath (ECMP) environments. If + further extensibility is required, then it should be done in such a + manner that it does not require the consent of an entity outside of + the IETF. + + The output of the Design Team was then processed through the working + group, resulting in a working group consensus for this document. + +3. Terminology + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in + BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. + +4. Abbreviations, Acronyms, and Definitions + + The following abbreviations and acronyms are used in this document: + + ACL: Access Control List + + ECMP: Equal-Cost Multipath + + EVPN: Ethernet VPN [RFC8365] + + Geneve: Generic Network Virtualization Encapsulation [RFC8926] + + GPE: Generic Protocol Extension + + GUE: Generic UDP Encapsulation [GUE] + + HMAC: Hash-Based Message Authentication Code [RFC2104] + + IEEE: Institute for Electrical and Electronic Engineers + (<https://www.ieee.org/>) + + NIC: Network Interface Card (refers to network interface hardware + that is not necessarily a discrete "card") + + NSH: Network Service Header [RFC8300] + + NVA: Network Virtualization Authority + + NVE: Network Virtual Edge (refers to an NVE device) + + NVO3: Network Virtualization over Layer 3 + + OAM: Operations, Administration, and Maintenance [RFC6291] + + PWE3: Pseudowire Emulation Edge-to-Edge + + TCAM: Ternary Content-Addressable Memory + + TLV: Type-Length-Value + + Transit device: Refers to underlay network devices between NVEs. + + UUID: Universally Unique Identifier + + VNI: Virtual Network Identifier + + VXLAN: Virtual eXtensible Local Area Network [RFC7348] + +5. Encapsulation Issues and Background + + The following subsections describe issues with current encapsulations + as discussed by the NVO3 WG. Numerous extensions and options have + been designed for GUE and Geneve that may help resolve some of these + issues, but these have not yet been validated by the WG. + + Also included are diagrams and information on the candidate + encapsulations. These are mostly copied from other documents. Since + each protocol is assumed to be sent over UDP, an initial UDP header + is shown that would be preceded by an IPv4 or IPv6 header. + +5.1. Geneve + + The Geneve packet format, taken from [RFC8926], is shown in Figure 1 + below. + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + + Outer UDP Header: + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Source Port | Dest Port = 6081 Geneve | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | UDP Length | UDP Checksum | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Geneve Header: + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |Ver| Opt Len |O|C| Rsvd. | Protocol Type | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Virtual Network Identifier (VNI) | Reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + ~ Variable-Length Options ~ + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 1: Geneve Header + + The type of payload being carried is indicated by an Ethertype + [RFC9542] in the Protocol Type field in the Geneve header; Ethernet + itself is represented by Ethertype 0x6558. See [RFC8926] for details + concerning UDP header fields. The O bit indicates an OAM packet. + The Geneve C bit is the "Critical" bit, which means that the options + must be processed or the packet discarded. + + Issues with Geneve [RFC8926] are as follows: + + * Geneve can't be implemented cost-effectively in all use cases + because the variable-length header and order of the TLVs make it + costly (in terms of number of gates) to implement in hardware. + + * The header doesn't fit into the largest commonly available parse + buffer (256 bytes in a NIC). Thus, doubling the buffer size can't + be justified unless it is mandatory for hardware to process + additional option fields. + + The selection of Geneve despite these issues may be the result of the + Geneve design effort, assuming that the Geneve header would typically + be delivered to a server and parsed in software. + +5.2. Generic UDP Encapsulation (GUE) + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + + UDP Header: + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Source Port | Dest Port = 6080 GUE | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | UDP Length | UDP Checksum | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + GUE Header: + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | 0 |C| Hlen | Proto/ctype | Flags | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | + ~ Extensions Fields (optional) ~ + | | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 2: GUE Header + + The type of payload being carried is indicated by an IANA protocol + number in the Proto/ctype field. The GUE C bit (Control bit) + indicates a control packet. + + Issues with GUE [GUE] are as follows: + + * There were a significant number of objections to GUE related to + the complexity of its implementation in hardware, similar to those + noted for Geneve above, such as the variable length and possible + high maximum length of the header. + +5.3. Generic Protocol Extension (GPE) for VXLAN + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + + Outer UDP Header: + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Source Port | Dest Port = 4790 GPE | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | UDP Length | UDP Checksum | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + VXLAN-GPE Header + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + |R|R|Ver|I|P|B|O| Reserved | Next Protocol | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Virtual Network Identifier (VNI) | Reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + Figure 3: GPE Header + + The type of payload being carried is indicated by the Next Protocol + field using a registry specific to VXLAN-GPE. The I bit indicates + that the VNI is valid. The P bit indicates that the Next Protocol + field is valid. The B bit indicates that the packet is an ingress + replicated Broadcast, Unknown Unicast, or Multicast packet. The O + bit indicates an OAM packet. + + Issues with VXLAN-GPE [VXLAN-GPE] are as follows: + + * GPE is not day one backwards compatible with VXLAN [RFC7348]. + Although the frame format is similar, it uses a different UDP + port, so it would require changes to existing implementations even + if the rest of the GPE frame were the same. + + * GPE is insufficiently extensible. It adds a Next Protocol field + and some flag bits to the VXLAN header but is not otherwise + extensible. + + * As discussed in Section 6.2.2, security (e.g., of the VNI) has not + been addressed by GPE. Although a shim header could be added for + security and to support other extensions, this has not been + defined yet. More study would be needed to understand the + implication of such a shim on offloading in NICs. + +6. Common Encapsulation Considerations + +6.1. Current Encapsulations + + Appendix A includes a detailed comparison between the three proposed + encapsulations. The comparison indicates several common properties + but also three major differences among the encapsulations: + + * Extensibility: Geneve and GUE were defined with built-in + extensibility, while VXLAN-GPE is not inherently extensible. Note + that any of the three encapsulations can be extended using the + Network Service Header (NSH) [RFC8300]. + + * Extension method: Geneve is extensible using Type-Length-Value + (TLV) fields, while GUE uses a small set of possible extensions + and a set of flags that indicate which extensions are present. + + * Length field: Geneve and GUE include a Length field, indicating + the length of the encapsulation header, while VXLAN-GPE does not + include such a field. Thus, it may be harder to skip the + encapsulation header with VXLAN-GPE + +6.2. Useful Extensions Use Cases + + Extensions that are not vendor-specific, such as TLVs, MUST follow + the standardization process. The following use cases for extensions + show that there is a strong requirement to support variable-length + extensions with possible different subtypes. + +6.2.1. Telemetry Extensions + + In several scenarios, it is beneficial to make information available + to the operator about the path a packet took through the network or + through a network device as well as information about associated + telemetry. + + This includes not only tasks like debugging, troubleshooting, and + network planning and optimization but also policy or service level + agreement compliance checks. + + Packet scheduling algorithms, especially for balancing traffic across + equal-cost paths or links, often leverage information contained + within the packet, such as protocol number, IP address, or Message + Authentication Code (MAC) address. Thus, probe packets would need to + be either sent between the exact same endpoints with the exact same + parameters or artificially constructed as "fake" packets and inserted + along the path. Both approaches are often not feasible from an + operational perspective because access to the end system is not + feasible or the diversity of parameters and associated probe packets + to be created is simply too large. An extension providing an in-band + telemetry mechanism [RFC9197] is an alternative in those cases. + +6.2.2. Security/Integrity Extensions + + Since the currently proposed NVO3 encapsulations do not protect their + headers, a single bit corruption in the VNI field could deliver a + packet to the wrong tenant. Extension headers are needed to use any + sophisticated security. + + The possibility of VNI spoofing with an NVO3 protocol is exacerbated + by using UDP. Systems typically have no restrictions on applications + being able to send to any UDP port, so an unprivileged application + can trivially spoof VXLAN [RFC7348] packets, using arbitrary VNIs, + for instance. + + One can envision support of an HMAC-like Message Authentication Code + (MAC) [RFC2104] in an NVO3 extension to authenticate the header and + the outer IP addresses, thereby preventing attackers from injecting + packets with spoofed VNIs. + + Another aspect of security is payload security. Essentially, this + makes packets that look like the following: + + IP|UDP|NVO3 Encap|DTLS/IPsec-ESP Extension|payload. + + This is desirable because: + + * we still have the UDP header for ECMP, + + * the NVO3 header is in plain text so it can be read by network + elements, and + + * different security or other payload transforms can be supported on + a single UDP port (we don't need a separate UDP port for DTLS/ + IPsec; see [RFC9147] and [RFC6071], respectively). + +6.2.3. Group-Based Policy + + Another use case would be to carry the Group-Based Policy (GBP) + source group information within a NVO3 header extension in a similar + manner as has been implemented for VXLAN [VXLAN-GROUP]. This allows + various forms of policy such as access control and QoS to be applied + between abstract groups rather than coupled to specific endpoint + addresses. + +6.3. Hardware Considerations + + Hardware restrictions should be taken into consideration along with + future hardware enhancements that may provide more flexible metadata + (MD) processing. However, the set of options that need to and will + be implemented in hardware will be a subset of what is implemented in + software. This is because software NVEs are likely to grow features, + and hence option support, at a more rapid rate. + + It is hard to predict which options will be implemented in which + piece of hardware and when. That depends on whether the hardware + will be in the form of: + + * a NIC providing increasing offload capabilities to software NVEs, + or + + * a switch chip being used as an NVE gateway towards non-NVO3 parts + of the network, or even + + * a transit device that participates in the NVO3 data plane, e.g., + for OAM purposes. + + A result of this is that it doesn't look useful to prescribe some + order to the options so that the ones that are likely to be + implemented in hardware come first. We can't decide such an order + when we define the options; however, a control plane can enforce such + an order for some hardware implementations. + + We do know that hardware initially needs to be able to efficiently + skip over the NVO3 header to find the inner payload. That is needed + both for NICs implementing various TCP offload mechanisms and for + transit devices and NVEs applying policy or ACLs to the inner + payload. + +6.4. Extension Size + + Extension header length has a significant impact on hardware and + software implementations. A maximum total header length that is too + small will unnecessarily constrain software flexibility. A maximum + total header length that is too large will place a nontrivial cost on + hardware implementations. Thus, the DT recommends that there be a + minimum and maximum total available extension header length + specified. The maximum total header length is determined by the size + of the bit field allocated for the total extension header length + field. The risk with this approach is that it may be difficult to + extend the total header size in the future. The minimum total header + length is determined by a requirement in the specifications that all + implementations must meet. The risk with this approach is that all + implementations will only implement support for the minimum total + header length, which would then become the de facto maximum total + header length. + + The recommended minimum total available header length is 64 bytes. + + The size of an extension header should always be 4-byte aligned. + + The maximum length of a single option should be large enough to meet + the different extension use case requirements, e.g., for in-band + telemetry and future use. + +6.5. Ordering of Extension Headers + + To support hardware nodes at the target NVE or at a transit device + that can process one or a few extension headers in TCAM, a control + plane in such a deployment could signal a capability to ensure that a + specific extension header will always appear in a specific order, for + example, that such a specific extension header appear first in the + packet. + + The order of the extension headers should be hardware friendly for + both the sender and the receiver and possibly some transit devices as + well. This may require that the extension headers and their order be + determined dynamically based on the hardware of those devices. + + Transit devices don't participate in control plane communication + between the endpoints and are not required to process the extension + headers; however, if they do, they may need to process only a small + subset of the extension headers that will be consumed by target NVEs. + +6.6. TLV versus Bit Fields + + If there is a well-known initial set of options that is likely to be + implemented in software and in hardware, it can be efficient to use + the bit fields approach to indicate the presence of extensions as in + GUE. However, as described in Section 6.3, if options are added over + time and different subsets of options are likely to be implemented in + different pieces of hardware, then it would be hard for the IETF to + specify which options should get the early bit fields. TLVs are a + lot more flexible, which avoids the need to determine the relative + importance of different options. However, general TLVs of arbitrary + order, size, and repetition are difficult to implement in hardware. + A middle ground is to use TLVs with restrictions on their size and + alignment, observing that individual TLVs can have a fixed length, + and to support via the control plane a method such that an NVE will + only receive options that it needs and implements. The control plane + approach can potentially be used to control the order of the TLVs + sent to a particular NVE. Note that transit devices are not likely + to participate in the control plane; hence, to the extent that they + need to participate in option processing, some other method must be + used. Transit devices would have issues with future GUE bit fields + being defined for future options as well. + + A benefit of TLVs from a hardware perspective is that they are self + describing, i.e., all the information is in the TLV. In a bit field + approach, the hardware needs to look up the bit to determine the + length of the data associated with the bit through some separate + table, which would add hardware complexity. + + There are use cases where multiple modules of software are running on + an NVE. These can be modules such as a diagnostic module by one + vendor that does packet sampling and another module from a different + vendor that implements a firewall. Using a TLV format, it is easier + to have different software modules process different TLVs without + conflicting with each other. Such TLVs could be standard extensions + or vendor-specific extensions. This can help with hardware + modularity as well. There are some implementations with options that + allow different software modules, like MAC learning and security, to + process different options. + +6.7. Control Plane Considerations + + Given that we want to allow considerable flexibility and + extensibility (e.g., for software NVEs), yet want to be able to + support important extensions in less flexible contexts such as + hardware NVEs, it is useful to consider the control plane. By + control plane in this section we mean protocols, such as EVPN + [RFC8365] and others, and deployment-specific configurations. + + If each NVE can express in the control plane that it only supports + certain extensions (which could be a single extension, or a few), and + the source NVEs only include supported extensions in the NVO3 + packets, then the target NVE can use a simpler parser (e.g., a TCAM + might be usable to look for a single NVO3 extension) and the depth of + the inner payload in the NVO3 packet will be minimized. Furthermore, + if the target NVE cares about a few extensions and can express in the + control plane the desired order of those extensions in the NVO3 + packets, then the deployment can provide useful functionality with + simplified hardware requirements for the target NVE. + + Transit devices that are not aware of the NVO3 extensions somewhat + benefit from such an approach, since the inner payload is less deep + in the packet if no extraneous extension headers are included in the + packet. In general, a transit device is not likely to participate in + the NVO3 control plane. However, configuration mechanisms can take + into account limitations of the transit devices used in particular + deployments. + + Note that with this approach, different NVEs could desire different + extensions or sets of extensions, which means that the source NVE + needs to be able to place different sets of extensions in different + NVO3 packets, and perhaps in a different order. It also assumes that + underlay multicast or replication servers are not used together with + NVO3 extension headers. + + There is a need to consider mandatory extensions versus optional + extensions. Mandatory extensions require the receiver to drop the + packet if the extension is unknown. A control plane mechanism can + prevent the need for dropping unknown extensions, since they would + not be included to target NVEs that do not support them. + + The control planes defined today need to add the ability to describe + the different encapsulations. Thus, perhaps EVPN [RFC8365] and any + other control plane protocol that the IETF defines should have a way + to indicate the supported NVO3 extensions and their order for each of + the encapsulations supported. + + Developing a separate document on guidance for option processing and + control plane participation should be considered. This should + provide examples and guidance on the range of usage models and + deployment scenarios for specific options. It should also provide + examples of option ordering that are relevant for that specific + deployment. This includes endpoints and middleboxes that are using + the options. Having the control plane negotiate the constraints is + the most appropriate and flexible way to address these requirements. + +6.8. Split NVE + + If there is a need for hosts to send and receive options in a split + NVE case [RFC8394], this is possible using any of the existing + extensible encapsulations (GPE with NSH, GUE, or Geneve) by defining + a way to carry those over other transports. An NSH can already be + used over different transports. + + If this is needed with other encapsulations, it can be done by + defining an Ethertype so that it can be carried over Ethernet and + IEEE Std 802.1Q [IEEE802.1Q]. + + If there is a need to carry other encapsulations over MPLS, it would + require an EVPN control plane to signal that other encapsulation + headers and options will be present in front of the Layer 2 (L2) + packet. The VNI can be ignored in the header, and the MPLS label + will be the one used to identify the EVPN L2 instance. + +6.9. Larger VNI Considerations + + Whether we should make the VNI 32 bits or larger was one of the + topics considered. The benefit of a 24-bit VNI would be to avoid + unnecessary changes with existing proposals and implementations that + are almost all, if not all, using a 24-bit VNI. If we need a larger + VNI, perhaps for a telemetry case, an extension can be used to + support that. + +7. Recommendations + + The Design Team reported that Geneve was most suitable as a starting + point for a proposed standard for network virtualization, for the + following reasons given below. This conclusion was supported by the + NVO3 Working Group. + + 1. On whether the VNI should be in the base header or in an + extension header and whether it should be a 24-bit or 32-bit + field (see Section 6.9), it was agreed that the VNI is critical + information for network virtualization and MUST be present in all + packets. It was also agreed that a 24-bit VNI, which is + supported by Geneve, matches the existing widely used + encapsulation formats, i.e., VXLAN [RFC7348] and Network + Virtualization Using Generic Routing Encapsulation (NVGRE) + [RFC7637], and hence is more suitable to use going forward. + + 2. The Geneve header has the total options length, which allows + skipping over the options for NIC offload operations and transit + devices to view flow information in the inner payload. + + 3. The option of using an NSH [RFC8300] with VXLAN-GPE was + considered, but given that an NSH is targeted at service chaining + and contains service chaining information, it is less suitable + for the network virtualization use case. The other downside of + VXLAN-GPE was the lack of a header length in VXLAN-GPE, which + makes skipping over the headers to process inner payloads more + difficult. A total options length is present in Geneve. It is + not possible to skip any options in the middle with VXLAN-GPE. + In principle, a split between a base header and a header with + options is interesting (whether that options header is an NSH or + some new header without ties to a service path). Whether it + would make sense to either use an NSH for this or define a new + NVO3 options header was explored. However, this makes it + slightly harder to find the inner payload since the Length field + is not in the NVO3 header itself. Thus, one more field would + have to be extracted to compute the start of the inner payload. + Also, if the experience with IPv6 extension headers is a guide, + there would be a risk that key pieces of hardware might not + implement the options header, resulting in future calls to + deprecate its use. Making the options part of the base NVO3 + header has less of those issues. Even though the implementation + of any particular option can't be predicted ahead of time, the + option mechanism and ability to skip the options is likely to be + broadly implemented. + + 4. The TLV style and bit field style of extension mechanisms were + compared. It was deemed that parsing either TLVs or bit fields + is expensive, and while bit fields may be simpler to parse, they + are also more restrictive and require guessing which extensions + will be widely implemented in order to get early bit assignments. + Given that half the bits are already assigned in GUE, a widely + deployed extension may appear in a flag extension, and this will + require extra processing to dig the flag from the flag extension + and then look for the extension itself. Also, bit fields are not + flexible enough to address the requirements from OAM, telemetry, + and security extensions for variable-length options and different + subtypes of the same option. While TLVs are more flexible, a + control plane can restrict the number of option TLVs as well as + the order and size of the TLVs to limit this flexibility and make + the TLVs simpler for a data plane implementation to handle. + + 5. The multi-vendor NVE case was briefly discussed, as was the need + to allow vendors to put their own extensions in the NVE header. + This is possible with TLVs. + + 6. It was agreed that the C bit (Critical bit) in Geneve is helpful. + This bit indicates that the header includes options that must be + parsed, or else the packet must be discarded. The bit allows a + receiver NVE to easily decide whether or not to process options + (such as a UUID-based packet trace) and decide how an optional + extension can be ignored. Thus, a Critical bit makes it easy for + the NVE to skip over the options not marked with such a bit. + Thus, the C bit should remain as defined in Geneve. + + 7. There are already some extensions of varying sizes that are being + discussed (see Section 6.2). By using Geneve options, it is + possible to get in-band parameters like switch id, ingress port, + egress port, internal delay, and queue size using TLV extensions + for telemetry purposes from switches. It is also possible to add + security extension TLVs like HMAC [RFC2104] and DTLS/IPsec (see + [RFC9147] and [RFC6071], respectively) to authenticate the Geneve + packet header and secure the Geneve packet payload by software or + hardware tunnel endpoints. A Group-Based Policy extension TLV + can be carried as well. + + 8. There are already implementations of Geneve options deployed in + production networks. There is new hardware supporting Geneve TLV + parsing as well. In addition, an In-band Telemetry (INT) + specification [INT] is being developed by P4.org that illustrates + the option of INT metadata carried over Geneve. Open Virtual + Network (OVN) and Open vSwitch (OVS) [OVN] have also defined one + or more option TLVs for Geneve. + + 9. Usage requirements (see Section 6) have been addressed while also + considering requirements and implementations in general + (including those for software and hardware). + + There seems to be interest in standardizing some well-known secure + option TLVs to secure the header and payload to guarantee + encapsulation header integrity and tenant data privacy. The working + group should consider standardizing such option(s). + + The following enhancements to Geneve are recommended to make it more + suitable to hardware and yet provide flexibility for software: + + * The following sort of text is recommended in Geneve documents: + while TLVs are more flexible, a control plane can restrict the + number of option TLVs as well as the order and size of the TLVs to + make it simpler for a data plane implementation in software or + hardware to handle. For example, there may be some critical + information such as a secure hash that must be processed in a + certain order at lowest latency. + + * A control plane can negotiate a subset of option TLVs and certain + TLV ordering, as well as limiting the total number of option TLVs + present in the packet, for example, to allow for hardware capable + of processing fewer options. Hence, the control plane needs to + have the ability to describe the supported TLVs subset and their + order. + + * The Geneve documents should specify that the subset and order of + option TLVs SHOULD be configurable for each remote NVE in the + absence of a protocol control plane. + + * Geneve should follow fragmentation recommendations in overlay + services like PWE3 and the L2/L3 VPN recommendations to guarantee + larger MTUs for the tunnel overhead ([RFC3985], Section 5.3). + + * The Geneve documents should provide a recommendation for C bit + (Critical bit) processing. This text could specify how critical + bits can be used with control planes and specify the critical + options. + + * Given that there is a telemetry option use case for a length of + 256 bytes, it is recommended that Geneve increase the single TLV + option length to 256. + + * Geneve address requirements for OAM considerations for alternate + marking and for performance measurements that need a 2-bit field + in the header should be considered and the need for the current + OAM bit in the Geneve header should be clarified. + + * The WG should work on security options for Geneve. + +8. Security Considerations + + This document does not introduce any additional security constraints; + however, Section 6.2.2 discusses security/integrity extensions and + this document suggests, in Section 7, that the NVO3 WG work on + security options for Geneve. + +9. IANA Considerations + + This document has no IANA actions. + +10. References + +10.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + <https://www.rfc-editor.org/info/rfc2119>. + + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC + 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, + May 2017, <https://www.rfc-editor.org/info/rfc8174>. + +10.2. Informative References + + [GUE] Herbert, T., Yong, L., and O. Zia, "Generic UDP + Encapsulation", Work in Progress, Internet-Draft, draft- + ietf-intarea-gue-09, 26 October 2019, + <https://datatracker.ietf.org/doc/html/draft-ietf-intarea- + gue-09>. + + [GUE-ENCAPSULATION] + Yong, L., Herbert, T., and O. Zia, "Generic UDP + Encapsulation (GUE) for Network Virtualization Overlay", + Work in Progress, Internet-Draft, draft-hy-nvo3-gue-4-nvo- + 04, 28 October 2016, + <https://datatracker.ietf.org/doc/html/draft-hy-nvo3-gue- + 4-nvo-04>. + + [GUE-EXTENSIONS] + Herbert, T., Yong, L., and F. Templin, "Extensions for + Generic UDP Encapsulation", Work in Progress, Internet- + Draft, draft-ietf-intarea-gue-extensions-06, 8 March 2019, + <https://datatracker.ietf.org/doc/html/draft-ietf-intarea- + gue-extensions-06>. + + [IEEE802.1Q] + IEEE, "IEEE Standard for Local and Metropolitan Area + Networks--Bridges and Bridged Networks", IEEE Std 802.1Q- + 2022, DOI 10.1109/IEEESTD.2022.10004498, December 2022, + <https://doi.org/10.1109/IEEESTD.2022.10004498>. + + [INT] P4.org Applications Working Group, "In-band Network + Telemetry (INT) Dataplane Specification", November 2020, + <https://p4.org/p4-spec/docs/INT_v2_1.pdf>. + + [OVN] Linux Foundation, "Open vSwitch", + <https://www.openvswitch.org/>. + + [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- + Hashing for Message Authentication", RFC 2104, + DOI 10.17487/RFC2104, February 1997, + <https://www.rfc-editor.org/info/rfc2104>. + + [RFC2418] Bradner, S., "IETF Working Group Guidelines and + Procedures", BCP 25, RFC 2418, DOI 10.17487/RFC2418, + September 1998, <https://www.rfc-editor.org/info/rfc2418>. + + [RFC3985] Bryant, S., Ed. and P. Pate, Ed., "Pseudo Wire Emulation + Edge-to-Edge (PWE3) Architecture", RFC 3985, + DOI 10.17487/RFC3985, March 2005, + <https://www.rfc-editor.org/info/rfc3985>. + + [RFC6071] Frankel, S. and S. Krishnan, "IP Security (IPsec) and + Internet Key Exchange (IKE) Document Roadmap", RFC 6071, + DOI 10.17487/RFC6071, February 2011, + <https://www.rfc-editor.org/info/rfc6071>. + + [RFC6291] Andersson, L., van Helvoort, H., Bonica, R., Romascanu, + D., and S. Mansfield, "Guidelines for the Use of the "OAM" + Acronym in the IETF", BCP 161, RFC 6291, + DOI 10.17487/RFC6291, June 2011, + <https://www.rfc-editor.org/info/rfc6291>. + + [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, + L., Sridhar, T., Bursell, M., and C. Wright, "Virtual + eXtensible Local Area Network (VXLAN): A Framework for + Overlaying Virtualized Layer 2 Networks over Layer 3 + Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, + <https://www.rfc-editor.org/info/rfc7348>. + + [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network + Virtualization Using Generic Routing Encapsulation", + RFC 7637, DOI 10.17487/RFC7637, September 2015, + <https://www.rfc-editor.org/info/rfc7637>. + + [RFC8300] Quinn, P., Ed., Elzur, U., Ed., and C. Pignataro, Ed., + "Network Service Header (NSH)", RFC 8300, + DOI 10.17487/RFC8300, January 2018, + <https://www.rfc-editor.org/info/rfc8300>. + + [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., + Uttaro, J., and W. Henderickx, "A Network Virtualization + Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, + DOI 10.17487/RFC8365, March 2018, + <https://www.rfc-editor.org/info/rfc8365>. + + [RFC8394] Li, Y., Eastlake 3rd, D., Kreeger, L., Narten, T., and D. + Black, "Split Network Virtualization Edge (Split-NVE) + Control-Plane Requirements", RFC 8394, + DOI 10.17487/RFC8394, May 2018, + <https://www.rfc-editor.org/info/rfc8394>. + + [RFC8926] Gross, J., Ed., Ganga, I., Ed., and T. Sridhar, Ed., + "Geneve: Generic Network Virtualization Encapsulation", + RFC 8926, DOI 10.17487/RFC8926, November 2020, + <https://www.rfc-editor.org/info/rfc8926>. + + [RFC9147] Rescorla, E., Tschofenig, H., and N. Modadugu, "The + Datagram Transport Layer Security (DTLS) Protocol Version + 1.3", RFC 9147, DOI 10.17487/RFC9147, April 2022, + <https://www.rfc-editor.org/info/rfc9147>. + + [RFC9197] Brockners, F., Ed., Bhandari, S., Ed., and T. Mizrahi, + Ed., "Data Fields for In Situ Operations, Administration, + and Maintenance (IOAM)", RFC 9197, DOI 10.17487/RFC9197, + May 2022, <https://www.rfc-editor.org/info/rfc9197>. + + [RFC9542] Eastlake 3rd, D., Abley, J., and Y. Li, "IANA + Considerations and IETF Protocol and Documentation Usage + for IEEE 802 Parameters", BCP 141, RFC 9542, + DOI 10.17487/RFC9542, April 2024, + <https://www.rfc-editor.org/info/rfc9542>. + + [VXLAN-GPE] + Maino, F., Ed., Kreeger, L., Ed., and U. Elzur, Ed., + "Generic Protocol Extension for VXLAN (VXLAN-GPE)", Work + in Progress, Internet-Draft, draft-ietf-nvo3-vxlan-gpe-13, + 4 November 2023, <https://datatracker.ietf.org/doc/html/ + draft-ietf-nvo3-vxlan-gpe-13>. + + [VXLAN-GROUP] + Smith, M. and L. Kreeger, "VXLAN Group Policy Option", + Work in Progress, Internet-Draft, draft-smith-vxlan-group- + policy-05, 22 October 2018, + <https://datatracker.ietf.org/doc/html/draft-smith-vxlan- + group-policy-05>. + +Appendix A. Encapsulation Comparison + +A.1. Overview + + This section presents a comparison of the three NVO3 encapsulation + proposals: Geneve [RFC8926], GUE [GUE], and VXLAN-GPE [VXLAN-GPE]. + The three encapsulations use an outer UDP/IP transport. Geneve and + VXLAN-GPE use an 8-octet header, while GUE uses a 4-octet header. In + addition to the base header, optional extensions may be included in + the encapsulation, as discussed in Appendix A.2 below. + +A.2. Extensibility + +A.2.1. Innate Extensibility Support + + The Geneve and GUE encapsulations both enable optional headers to be + incorporated at the end of the base encapsulation header. + + VXLAN-GPE does not provide innate support for header extensions. + However, as discussed in [VXLAN-GPE], extensibility can be attained + to some extent if the Network Service Header (NSH) [RFC8300] is used + immediately following the VXLAN-GPE header. The NSH supports either + a fixed-size extension (MD Type 1) or a variable-size TLV-based + extension (MD Type 2). Note that NSH-over-VXLAN-GPE implies an + additional overhead of the 8-octet NSH, in addition to the VXLAN-GPE + header. + +A.2.2. Extension Parsing + + The Geneve variable-length options are defined as Type-Length-Value + (TLV) extensions. Similarly, VXLAN-GPE, when using an NSH, can + include NSH TLV-based extensions. In contrast, GUE defines a small + set of possible extension fields (proposed in [GUE-EXTENSIONS] and + [GUE-ENCAPSULATION]), and a set of flags in the GUE header that + indicate for each extension type whether it is present or not. + + TLV-based extensions, as defined in Geneve, provide the flexibility + for a large number of possible extension types. Similar behavior can + be supported in NSH-over-VXLAN-GPE when using MD Type 2. The flag- + based approach taken in GUE strives to simplify implementations by + defining a small number of possible extensions used in a fixed order. + + The Geneve and GUE headers both include a Length field that defines + the total length of the encapsulation, including the optional + extensions. This Length field simplifies the parsing by transit + devices that skip the encapsulation header without parsing its + extensions. + +A.2.3. Critical Extensions + + The Geneve encapsulation header includes the C field, which indicates + whether the current Geneve header includes critical options, that is + to say, options which must be parsed by the target NVE. If the + endpoint is not able to process a critical option, the packet is + discarded. + +A.2.4. Maximal Header Length + + The maximal header length in Geneve, including options, is 260 + octets. GUE defines the maximal header to be 128 octets. VXLAN-GPE + uses a fixed-length header of 8 octets, unless NSH-over-VXLAN-GPE is + used, yielding an encapsulation header of up to 264 octets. + +A.3. Encapsulation Header + +A.3.1. Virtual Network Identifier (VNI) + + The Geneve and VXLAN-GPE headers both include a 24-bit VNI field. + GUE, on the other hand, enables the use of a 32-bit field called + VNID; this field is not included in the GUE header but was defined as + an optional extension in [GUE-ENCAPSULATION]. + + The VXLAN-GPE header includes the I bit, indicating that the VNI + field is valid in the current header. A similar indicator is defined + as a flag in the GUE header [GUE-EXTENSIONS]. + +A.3.2. Next Protocol + + All three encapsulation headers include a field that specifies the + type of the next protocol header, which resides after the NVO3 + encapsulation header. The Geneve header includes a 16-bit field that + uses the IEEE Ethertype convention. GUE uses an 8-bit field, which + uses the IANA protocol numbering. The VXLAN-GPE header incorporates + an 8-bit Next Protocol field, using a registry specific to VXLAN-GPE, + defined in [VXLAN-GPE]. + + The VXLAN-GPE header also includes the P bit, which explicitly + indicates whether the Next Protocol field is present in the current + header. + +A.3.3. Other Header Fields + + The OAM bit, which is defined in Geneve and in VXLAN-GPE, indicates + whether the current packet is an OAM packet. The GUE header includes + a similar field but uses different terminology; the GUE C bit + (Control bit) specifies whether the current packet is a control + packet. Note that the GUE C bit can potentially be used in a large + set of protocols that are not OAM protocols. However, the control + packet examples discussed in [GUE] are related to OAM. + + Each of the three NVO3 encapsulation headers includes a 2-bit Version + field, which is currently defined to be zero. + + The Geneve and VXLAN-GPE headers include reserved fields; 14 bits in + the Geneve header and 27 bits in the VXLAN-GPE header are reserved. + +A.4. Comparison Summary + + The following table summarizes the comparison between the three NVO3 + encapsulations. In some cases, a plus sign ("+") or minus sign ("-") + is used to indicate that the header is stronger or weaker in an area, + respectively. + + +===============+=================+=============+===================+ + | | Geneve | GUE | VXLAN-GPE | + +===============+=================+=============+===================+ + | Outer | UDP/IP 6081 | UDP/IP 6080 | UDP/IP 4790 | + | transport UDP | | | | + | Port Number | | | | + +---------------+-----------------+-------------+-------------------+ + | Base header | 8 octets | 4 octets | 8 octets (16 | + | length | | | octets using | + | | | | an NSH) | + +---------------+-----------------+-------------+-------------------+ + | Extensibility | Variable-length | Extension | No innate | + | | options | fields | extensibility. | + | | | | Might use an | + | | | | NSH. | + +---------------+-----------------+-------------+-------------------+ + | Extension | TLV-based | Flag-based | TLV-based | + | parsing | | | (using an NSH | + | method | | | with MD Type | + | | | | 2) | + +---------------+-----------------+-------------+-------------------+ + | Extension | Variable | Fixed | Variable | + | order | | | (using an NSH) | + +---------------+-----------------+-------------+-------------------+ + | Length field | + | + | - | + +---------------+-----------------+-------------+-------------------+ + | Max header | 260 octets | 128 octets | 8 octets (264 | + | length | | | using an NSH) | + +---------------+-----------------+-------------+-------------------+ + | Critical | + | - | - | + | extension bit | | | | + +---------------+-----------------+-------------+-------------------+ + | VNI field | 24 bits | 32 bits | 24 bits | + | size | | (extension) | | + +---------------+-----------------+-------------+-------------------+ + | Next Protocol | 16 bits | 8 bits | 8 bits New | + | field | Ethertype | Internet | registry | + | | registry | protocol | | + | | | registry | | + +---------------+-----------------+-------------+-------------------+ + | Next protocol | - | - | + | + | indicator | | | | + +---------------+-----------------+-------------+-------------------+ + | OAM / Control | OAM bit | Control bit | OAM bit | + | field | | | | + +---------------+-----------------+-------------+-------------------+ + | Version field | 2 bits | 2 bits | 2 bits | + +---------------+-----------------+-------------+-------------------+ + | Reserved bits | 14 bits | none | 27 bits | + +---------------+-----------------+-------------+-------------------+ + + Table 1: Encapsulations Comparison + +Acknowledgements + + The authors would like to thank Tom Herbert for providing the + motivation for the security/integrity extension and for his valuable + comments; T. Sridhar for his valuable comments and feedback; Anoop + Ghanwani for his extensive comments; and Ignas Bagdonas. + +Contributors + + The following coauthors have contributed to this document: + + Ilango Ganga + Intel + Email: ilango.s.ganga@intel.com + + + Pankaj Garg + Microsoft + Email: pankajg@microsoft.com + + + Rajeev Manur + Broadcom + Email: rajeev.manur@broadcom.com + + + Tal Mizrahi + Huawei + Email: tal.mizrahi.phd@gmail.com + + + David Mozes + Email: mosesster@gmail.com + + + Erik Nordmark + ZEDEDA + Email: nordmark@sonic.net + + + Michael Smith + Cisco + Email: michsmit@cisco.com + + + Sam Aldrin + Google + Email: aldrin.ietf@gmail.com + + +Authors' Addresses + + Sami Boutros (editor) + Ciena Corporation + United States of America + Email: sboutros@ciena.com + + + Donald E. Eastlake 3rd (editor) + Independent + 2386 Panoramic Circle + Apopka, FL 32703 + United States of America + Phone: +1-508-333-2270 + Email: d3e3e3@gmail.com |