summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc9611.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc9611.txt')
-rw-r--r--doc/rfc/rfc9611.txt462
1 files changed, 462 insertions, 0 deletions
diff --git a/doc/rfc/rfc9611.txt b/doc/rfc/rfc9611.txt
new file mode 100644
index 0000000..e8a6c1d
--- /dev/null
+++ b/doc/rfc/rfc9611.txt
@@ -0,0 +1,462 @@
+
+
+
+
+Internet Engineering Task Force (IETF) A. Antony
+Request for Comments: 9611 secunet
+Category: Standards Track T. Brunner
+ISSN: 2070-1721 codelabs
+ S. Klassert
+ secunet
+ P. Wouters
+ Aiven
+ July 2024
+
+
+ Internet Key Exchange Protocol Version 2 (IKEv2) Support for
+ Per-Resource Child Security Associations (SAs)
+
+Abstract
+
+ In order to increase the bandwidth of IPsec traffic between peers,
+ this document defines one Notify Message Status Types and one Notify
+ Message Error Types payload for the Internet Key Exchange Protocol
+ Version 2 (IKEv2) to support the negotiation of multiple Child
+ Security Associations (SAs) with the same Traffic Selectors used on
+ different resources, such as CPUs.
+
+ The SA_RESOURCE_INFO notification is used to convey information that
+ the negotiated Child SA and subsequent new Child SAs with the same
+ Traffic Selectors are a logical group of Child SAs where most or all
+ of the Child SAs are bound to a specific resource, such as a specific
+ CPU. The TS_MAX_QUEUE notify conveys that the peer is unwilling to
+ create more additional Child SAs for this particular negotiated
+ Traffic Selector combination.
+
+ Using multiple Child SAs with the same Traffic Selectors has the
+ benefit that each resource holding the Child SA has its own Sequence
+ Number Counter, ensuring that CPUs don't have to synchronize their
+ cryptographic state or disable their packet replay protection.
+
+Status of This Memo
+
+ This is an Internet Standards Track document.
+
+ This document is a product of the Internet Engineering Task Force
+ (IETF). It represents the consensus of the IETF community. It has
+ received public review and has been approved for publication by the
+ Internet Engineering Steering Group (IESG). Further information on
+ Internet Standards is available in Section 2 of RFC 7841.
+
+ Information about the current status of this document, any errata,
+ and how to provide feedback on it may be obtained at
+ https://www.rfc-editor.org/info/rfc9611.
+
+Copyright Notice
+
+ Copyright (c) 2024 IETF Trust and the persons identified as the
+ document authors. All rights reserved.
+
+ This document is subject to BCP 78 and the IETF Trust's Legal
+ Provisions Relating to IETF Documents
+ (https://trustee.ietf.org/license-info) in effect on the date of
+ publication of this document. Please review these documents
+ carefully, as they describe your rights and restrictions with respect
+ to this document. Code Components extracted from this document must
+ include Revised BSD License text as described in Section 4.e of the
+ Trust Legal Provisions and are provided without warranty as described
+ in the Revised BSD License.
+
+Table of Contents
+
+ 1. Introduction
+ 1.1. Requirements Language
+ 1.2. Terminology
+ 2. Performance Bottlenecks
+ 3. Negotiation of Resource-Specific Child SAs
+ 4. Implementation Considerations
+ 5. Payload Format
+ 5.1. SA_RESOURCE_INFO Notify Message Status Type Payload
+ 5.2. TS_MAX_QUEUE Notify Message Error Type Payload
+ 6. Operational Considerations
+ 7. Security Considerations
+ 8. IANA Considerations
+ 9. References
+ 9.1. Normative References
+ 9.2. Informative References
+ Acknowledgements
+ Authors' Addresses
+
+1. Introduction
+
+ Most IPsec implementations are currently limited to using one
+ hardware queue or a single CPU resource for a Child SA. Running
+ packet stream encryption in parallel can be done, but there is a
+ bottleneck of different parts of the hardware locking or waiting to
+ get their sequence number assigned for the packet being encrypted.
+ The result is that a machine with many such resources is limited to
+ using only one of these resources per Child SA. This severely limits
+ the throughput that can be attained. For example, at the time of
+ writing, an unencrypted link of 10 Gbps or more is commonly reduced
+ to 2-5 Gbps when IPsec is used to encrypt the link using AES-GCM. By
+ using the implementation specified in this document, aggregate
+ throughput increased from 5Gbps using 1 CPU to 40-60 Gbps using 25-30
+ CPUs.
+
+ While this could be (partially) mitigated by setting up multiple
+ narrowed Child SAs (for example, using Populate From Packet (PFP) as
+ specified in IPsec architecture [RFC4301]), this IPsec feature would
+ cause too many Child SAs (one per network flow) or too few Child SAs
+ (one network flow used on multiple CPUs). PFP is also not widely
+ implemented.
+
+ To make better use of multiple network queues and CPUs, it can be
+ beneficial to negotiate and install multiple Child SAs with identical
+ Traffic Selectors. IKEv2 [RFC7296] already allows installing
+ multiple Child SAs with identical Traffic Selectors, but it offers no
+ method to indicate that the additional Child SA is being requested
+ for performance increase reasons and is restricted to some resource
+ (queue or CPU).
+
+ When an IKEv2 peer is receiving more additional Child SAs for a
+ single set of Traffic Selectors than it is willing to create, it can
+ return an error notify of TS_MAX_QUEUE.
+
+1.1. Requirements Language
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
+ capitals, as shown here.
+
+1.2. Terminology
+
+ This document uses the following terms defined in IKEv2 [RFC7296]:
+ Notification Data, Traffic Selector (TS), Traffic Selector initiator
+ (TSi), Traffic Selector responder (TSr), Child SA, Configuration
+ Payload (CP), IKE SA, CREATE_CHILD_SA, and NO_ADDITIONAL_SAS.
+
+ This document also uses the following terms defined in [RFC4301]:
+ Security Policy Database (SPD), SA.
+
+2. Performance Bottlenecks
+
+ There are several pragmatic reasons why most implementations must
+ restrict a Child Security Association (SA) to a single specific
+ hardware resource. A primary limitation arises from the challenges
+ associated with sharing cryptographic states, counters, and sequence
+ numbers among multiple CPUs. When these CPUs attempt to
+ simultaneously utilize shared states, it becomes impractical to do so
+ without incurring a significant performance penalty. It is necessary
+ to negotiate and establish multiple Child SAs with identical Traffic
+ Selector initiator (TSi) and Traffic Selector responder (TSr) on a
+ per-resource basis.
+
+3. Negotiation of Resource-Specific Child SAs
+
+ An initial IKEv2 exchange is used to set up an IKE SA and the initial
+ Child SA. If multiple Child SAs with the same Traffic Selectors that
+ are bound to a single resource are desired, the initiator will add
+ the SA_RESOURCE_INFO notify payload to the Exchange negotiating the
+ Child SA (e.g., IKE_AUTH or CREATE_CHILD_SA). If this initial Child
+ SA will be tied to a specific resource, it MAY indicate this by
+ including an identifier in the Notification Data. A responder that
+ is willing to have multiple Child SAs for the same Traffic Selectors
+ will respond by also adding the SA_RESOURCE_INFO notify payload in
+ which it MAY add a non-zero Notification Data.
+
+ Additional resource-specific Child SAs are negotiated as regular
+ Child SAs using the CREATE_CHILD_SA exchange and are similarly
+ identified by an accompanying SA_RESOURCE_INFO notification.
+
+ Upon installation, each resource-specific Child SA is associated with
+ an additional local selector, such as the CPU. These resource-
+ specific Child SAs MUST be negotiated with identical Child SA
+ properties that were negotiated for the initial Child SA. This
+ includes cryptographic algorithms, Traffic Selectors, Mode (e.g.,
+ transport mode), compression usage, etc. However, each Child SA does
+ have its own keying material that is individually derived according
+ to the regular IKEv2 process. The SA_RESOURCE_INFO notify payload
+ MAY be empty or MAY contain some identifying data. This identifying
+ data SHOULD be a unique identifier within all the Child SAs with the
+ same TS payloads, and the peer MUST only use it for debugging
+ purposes.
+
+ Additional Child SAs can be started on demand or can be started all
+ at once. Peers may also delete specific per-resource Child SAs if
+ they deem the associated resource to be idle.
+
+ During the CREATE_CHILD_SA rekey for the Child SA, the
+ SA_RESOURCE_INFO notification MAY be included, but regardless of
+ whether or not it is included, the rekeyed Child SA should be bound
+ to the same resource(s) as the Child SA that is being rekeyed.
+
+4. Implementation Considerations
+
+ There are various considerations that an implementation can use to
+ determine the best procedure to install multiple Child SAs.
+
+ A simple procedure could be to install one additional Child SA on
+ each CPU. An implementation can ensure that one Child SA can be used
+ by all CPUs, so that while negotiating a new per-CPU Child SA, which
+ typically takes 1 RTT delay, the CPU with no CPU-specific Child SA
+ can still encrypt its packets using the Child SA that is available
+ for all CPUs. Alternatively, if an implementation finds it needs to
+ encrypt a packet but the current CPU does not have the resources to
+ encrypt this packet, it can relay that packet to a specific CPU that
+ does have the capability to encrypt the packet, although this will
+ come with a performance penalty.
+
+ Performing per-CPU Child SA negotiations can result in both peers
+ initiating additional Child SAs simultaneously. This is especially
+ likely if per-CPU Child SAs are triggered by individual SADB_ACQUIRE
+ messages [RFC2367]. Responders should install the additional Child
+ SA on a CPU with the least amount of additional Child SAs for this
+ TSi/TSr pair.
+
+ When the number of queue or CPU resources are different between the
+ peers, the peer with the least amount of resources may decide to not
+ install a second outbound Child SA for the same resource, as it will
+ never use it to send traffic. However, it must install all inbound
+ Child SAs because it has committed to receiving traffic on these
+ negotiated Child SAs.
+
+ If per-CPU packet trigger (e.g., SADB_ACQUIRE) messages are
+ implemented (see Section 6), the Traffic Selector (TSi) entry
+ containing the information of the trigger packet should be included
+ in the TS set similarly to regular Child SAs as specified in IKEv2
+ [RFC7296], Section 2.9. Based on the trigger TSi entry, an
+ implementation can select the most optimal target CPU to install the
+ additional Child SA on. For example, if the trigger packet was for a
+ TCP destination to port 25 (SMTP), it might be able to install the
+ Child SA on the CPU that is also running the mail server process.
+ Trigger packet Traffic Selectors are documented in IKEv2 [RFC7296],
+ Section 2.9.
+
+ As per IKEv2, rekeying a Child SA SHOULD use the same (or wider)
+ Traffic Selectors to ensure that the new Child SA covers everything
+ that the rekeyed Child SA covers. This includes Traffic Selectors
+ negotiated via Configuration Payloads such as INTERNAL_IP4_ADDRESS,
+ which may use the original wide TS set or use the narrowed TS set.
+
+5. Payload Format
+
+ The Notify Payload format is defined in IKEv2 [RFC7296],
+ Section 3.10, and is copied here for convenience.
+
+ All multi-octet fields representing integers are laid out in big
+ endian order (also known as "most significant byte first", or
+ "network byte order").
+
+5.1. SA_RESOURCE_INFO Notify Message Status Type Payload
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-------------------------------+-------------------------------+
+ | Next Payload |C| RESERVED | Payload Length |
+ +---------------+---------------+-------------------------------+
+ | Protocol ID | SPI Size | Notify Message Type |
+ +---------------+---------------+-------------------------------+
+ | |
+ ~ Resource Identifier (optional) ~
+ | |
+ +-------------------------------+-------------------------------+
+
+ (C)ritical flag - MUST be 0.
+
+ Protocol ID (1 octet) - MUST be 0. MUST be ignored if not 0.
+
+ SPI Size (1 octet) - MUST be 0. MUST be ignored if not 0.
+
+ Notify Status Message Type value (2 octets) - set to 16444.
+
+ Resource Identifier (optional) - This opaque data may be set to
+ convey the local identity of the resource.
+
+5.2. TS_MAX_QUEUE Notify Message Error Type Payload
+
+ 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +---------------+---------------+-------------------------------+
+ | Next Payload |C| RESERVED | Payload Length |
+ +---------------+---------------+-------------------------------+
+ | Protocol ID | SPI Size | Notify Message Type |
+ +---------------+---------------+-------------------------------+
+
+ (C)ritical flag - MUST be 0.
+
+ Protocol ID (1 octet) - MUST be 0. MUST be ignored if not 0.
+
+ SPI Size (1 octet) - MUST be 0. MUST be ignored if not 0.
+
+ Notify Message Error Type (2 octets) - set to 48.
+
+ There is no data associated with this Notify type.
+
+6. Operational Considerations
+
+ Implementations supporting per-CPU SAs SHOULD extend their local SPD
+ selector, and the mechanism of on-demand negotiation that is
+ triggered by traffic to include a CPU (or queue) identifier in their
+ packet trigger (e.g., SADB_ACQUIRE) message from the SPD to the IKE
+ daemon. An implementation that does not support receiving per-CPU
+ packet trigger messages MAY initiate all its Child SAs immediately
+ upon receiving the (only) packet trigger message it will receive from
+ the IPsec stack. Such an implementation also needs to be careful
+ when receiving a Delete Notify request for a per-CPU Child SA, as it
+ has no method to detect when it should bring up such a per-CPU Child
+ SA again later. Also, bringing the deleted per-CPU Child SA up again
+ immediately after receiving the Delete Notify might cause an infinite
+ loop between the peers. Another issue with not bringing up all its
+ per-CPU Child SAs is that if the peer acts similarly, the two peers
+ might end up with only the first Child SA without ever activating any
+ per-CPU Child SAs. It is therefore RECOMMENDED to implement per-CPU
+ packet trigger messages.
+
+ Peers SHOULD be flexible with the maximum number of Child SAs they
+ allow for a given TSi/TSr combination in order to account for corner
+ cases. For example, during Child SA rekeying, there might be a large
+ number of additional Child SAs created before the old Child SAs are
+ torn down. Similarly, when using on-demand Child SAs, both ends
+ could trigger multiple Child SA requests as the initial packet
+ causing the Child SA negotiation might have been transported to the
+ peer via the first Child SA, where its reply packet might also
+ trigger an on-demand Child SA negotiation to start. As additional
+ Child SAs consume little additional resources, allowing at the very
+ least double the number of available CPUs is RECOMMENDED. An
+ implementation MAY allow unlimited additional Child SAs and only
+ limit this number based on its generic resource protection strategies
+ that are used to require COOKIES or refuse new IKE or Child SA
+ negotiations. Although having a very large number (e.g., hundreds or
+ thousands) of SAs may slow down per-packet SAD lookup.
+
+ Implementations might support dynamically moving a per-CPU Child SA
+ from one CPU to another CPU. If this method is supported,
+ implementations must be careful to move both the inbound and outbound
+ SAs. If the IPsec endpoint is a gateway, it can move the inbound SA
+ and outbound SA independently of each other. It is likely that for a
+ gateway, IPsec traffic would be asymmetric. If the IPsec endpoint is
+ the same host responsible for generating the traffic, the inbound and
+ outbound SAs SHOULD remain as a pair on the same CPU. If a host
+ previously skipped installing an outbound SA because it would be an
+ unused duplicate outbound SA, it will have to create and add the
+ previously skipped outbound SA to the SAD with the new CPU ID. The
+ inbound SA may not have a CPU ID in the SAD. Adding the outbound SA
+ to the SAD requires access to the key material, whereas updating the
+ CPU selector on an existing outbound SAs might not require access to
+ key material. To support this, the IKE software might have to hold
+ on to the key material longer than it normally would, as it might
+ actively attempt to destroy key material from memory that the IKE
+ daemon no longer needs access to.
+
+ An implementation that does not accept any further resource-specific
+ Child SAs MUST NOT return the NO_ADDITIONAL_SAS error because it
+ could be misinterpreted by the peer to mean that no other Child SA
+ with a different TSi and/or TSr is allowed either. Instead, it MUST
+ return TS_MAX_QUEUE.
+
+7. Security Considerations
+
+ Similar to how an implementation should limit the number of half-open
+ SAs to limit the impact of a denial-of-service attack, it is
+ RECOMMENDED that an implementation limits the maximum number of
+ additional Child SAs allowed per unique TSi/TSr.
+
+ Using multiple resource-specific child SAs makes sense for high-
+ volume IPsec connections on IPsec gateway machines where the
+ administrator has a trust relationship with the peer's administrator
+ and abuse is unlikely and easily escalated to resolve.
+
+ This trust relationship is usually not present for the deployments of
+ remote access VPNs, and allowing per-CPU Child SAs is NOT RECOMMENDED
+ in these scenarios. Therefore, it is also NOT RECOMMENDED to allow
+ per-CPU Child SAs by default.
+
+ The SA_RESOURCE_INFO notify contains an optional data payload that
+ can be used by the peer to identify the Child SA belonging to a
+ specific resource. Notification data SHOULD NOT be an identifier
+ that can be used to gain information about the hardware. For
+ example, using the CPU number itself as the identifier might give an
+ attacker knowledge of which packets are handled by which CPU ID, and
+ it might optimize a brute-force attack against the system.
+
+8. IANA Considerations
+
+ IANA has registered one new value in the "IKEv2 Notify Message Status
+ Types" registry.
+
+ +=======+============================+===========+
+ | Value | Notify Message Status Type | Reference |
+ +=======+============================+===========+
+ | 16444 | SA_RESOURCE_INFO | RFC 9611 |
+ +-------+----------------------------+-----------+
+
+ Table 1
+
+ IANA has registered one new value in the "IKEv2 Notify Message Error
+ Types" registry.
+
+ +=======+===========================+===========+
+ | Value | Notify Message Error Type | Reference |
+ +=======+===========================+===========+
+ | 48 | TS_MAX_QUEUE | RFC 9611 |
+ +-------+---------------------------+-----------+
+
+ Table 2
+
+9. References
+
+9.1. Normative References
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119,
+ DOI 10.17487/RFC2119, March 1997,
+ <https://www.rfc-editor.org/info/rfc2119>.
+
+ [RFC7296] Kaufman, C., Hoffman, P., Nir, Y., Eronen, P., and T.
+ Kivinen, "Internet Key Exchange Protocol Version 2
+ (IKEv2)", STD 79, RFC 7296, DOI 10.17487/RFC7296, October
+ 2014, <https://www.rfc-editor.org/info/rfc7296>.
+
+ [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
+ 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
+ May 2017, <https://www.rfc-editor.org/info/rfc8174>.
+
+9.2. Informative References
+
+ [RFC2367] McDonald, D., Metz, C., and B. Phan, "PF_KEY Key
+ Management API, Version 2", RFC 2367,
+ DOI 10.17487/RFC2367, July 1998,
+ <https://www.rfc-editor.org/info/rfc2367>.
+
+ [RFC4301] Kent, S. and K. Seo, "Security Architecture for the
+ Internet Protocol", RFC 4301, DOI 10.17487/RFC4301,
+ December 2005, <https://www.rfc-editor.org/info/rfc4301>.
+
+Acknowledgements
+
+ The following people provided reviews and valuable feedback: Roman
+ Danyliw, Warren Kumari, Tero Kivinen, Murray Kucherawy, John Scudder,
+ Valery Smyslov, Gunter van de Velde, and Éric Vyncke.
+
+Authors' Addresses
+
+ Antony Antony
+ secunet Security Networks AG
+ Email: antony.antony@secunet.com
+
+
+ Tobias Brunner
+ codelabs GmbH
+ Email: tobias@codelabs.ch
+
+
+ Steffen Klassert
+ secunet Security Networks AG
+ Email: steffen.klassert@secunet.com
+
+
+ Paul Wouters
+ Aiven
+ Email: paul.wouters@aiven.io