diff options
Diffstat (limited to 'doc/rfc/rfc6290.txt')
-rw-r--r-- | doc/rfc/rfc6290.txt | 1235 |
1 files changed, 1235 insertions, 0 deletions
diff --git a/doc/rfc/rfc6290.txt b/doc/rfc/rfc6290.txt new file mode 100644 index 0000000..fe85e6a --- /dev/null +++ b/doc/rfc/rfc6290.txt @@ -0,0 +1,1235 @@ + + + + + + +Internet Engineering Task Force (IETF) Y. Nir, Ed. +Request for Comments: 6290 Check Point +Category: Standards Track D. Wierbowski +ISSN: 2070-1721 IBM + F. Detienne + P. Sethi + Cisco + June 2011 + + + A Quick Crash Detection Method for the + Internet Key Exchange Protocol (IKE) + +Abstract + + This document describes an extension to the Internet Key Exchange + Protocol version 2 (IKEv2) that allows for faster detection of + Security Association (SA) desynchronization using a saved token. + + When an IPsec tunnel between two IKEv2 peers is disconnected due to a + restart of one peer, it can take as much as several minutes for the + other peer to discover that the reboot has occurred, thus delaying + recovery. In this text, we propose an extension to the protocol that + allows for recovery immediately following the restart. + +Status of This Memo + + This is an Internet Standards Track document. + + This document is a product of the Internet Engineering Task Force + (IETF). It represents the consensus of the IETF community. It has + received public review and has been approved for publication by the + Internet Engineering Steering Group (IESG). Further information on + Internet Standards is available in Section 2 of RFC 5741. + + Information about the current status of this document, any errata, + and how to provide feedback on it may be obtained at + http://www.rfc-editor.org/info/rfc6290. + +Copyright Notice + + Copyright (c) 2011 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (http://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + + + +Nir, et al. Standards Track [Page 1] + +RFC 6290 Quick Crash Detection June 2011 + + + carefully, as they describe your rights and restrictions with respect + to this document. Code Components extracted from this document must + include Simplified BSD License text as described in Section 4.e of + the Trust Legal Provisions and are provided without warranty as + described in the Simplified BSD License. + +Table of Contents + + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 + 1.1. Conventions Used in This Document . . . . . . . . . . . . 3 + 2. RFC 5996 Crash Recovery . . . . . . . . . . . . . . . . . . . 4 + 3. Protocol Outline . . . . . . . . . . . . . . . . . . . . . . . 5 + 4. Formats and Exchanges . . . . . . . . . . . . . . . . . . . . 6 + 4.1. Notification Format . . . . . . . . . . . . . . . . . . . 6 + 4.2. Passing a Token in the AUTH Exchange . . . . . . . . . . . 7 + 4.3. Replacing Tokens after Rekey or Resumption . . . . . . . . 8 + 4.4. Replacing the Token for an Existing SA . . . . . . . . . . 9 + 4.5. Presenting the Token in an Unprotected Message . . . . . . 9 + 5. Token Generation and Verification . . . . . . . . . . . . . . 10 + 5.1. A Stateless Method of Token Generation . . . . . . . . . . 11 + 5.2. A Stateless Method with IP Addresses . . . . . . . . . . . 11 + 5.3. Token Lifetime . . . . . . . . . . . . . . . . . . . . . . 12 + 6. Backup Gateways . . . . . . . . . . . . . . . . . . . . . . . 12 + 7. Interaction with Session Resumption . . . . . . . . . . . . . 13 + 8. Operational Considerations . . . . . . . . . . . . . . . . . . 14 + 8.1. Who Should Implement This Specification . . . . . . . . . 14 + 8.2. Response to Unknown Child SPI . . . . . . . . . . . . . . 15 + 9. Security Considerations . . . . . . . . . . . . . . . . . . . 16 + 9.1. QCD Token Generation and Handling . . . . . . . . . . . . 16 + 9.2. QCD Token Transmission . . . . . . . . . . . . . . . . . . 17 + 9.3. QCD Token Enumeration . . . . . . . . . . . . . . . . . . 18 + 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 + 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 + 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 + 12.1. Normative References . . . . . . . . . . . . . . . . . . . 19 + 12.2. Informative References . . . . . . . . . . . . . . . . . . 19 + Appendix A. The Path Not Taken . . . . . . . . . . . . . . . . . 20 + A.1. Initiating a New IKE SA . . . . . . . . . . . . . . . . . 20 + A.2. SIR . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 + A.3. Birth Certificates . . . . . . . . . . . . . . . . . . . . 20 + A.4. Reducing Liveness Check Length . . . . . . . . . . . . . . 21 + + + + + + + + + + +Nir, et al. Standards Track [Page 2] + +RFC 6290 Quick Crash Detection June 2011 + + +1. Introduction + + IKEv2, as described in [RFC5996] and its predecessor RFC 4306, has a + method for recovering from a reboot of one peer. As long as traffic + flows in both directions, the rebooted peer should re-establish the + tunnels immediately. However, in many cases, the rebooted peer is a + VPN gateway that protects only servers, so all traffic is inbound. + In other cases, the non-rebooted peer has a dynamic IP address, so + the rebooted peer cannot initiate IKE because its current IP address + is unknown. In such cases, the rebooted peer will not be able to + re-establish the tunnels. Section 2 describes how recovery works + under RFC 5996, and explains why it may take several minutes. + + The method proposed here is to send an octet string, called a "QCD + token", in the IKE_AUTH exchange that establishes the tunnel. That + token can be stored on the peer as part of the IKE SA. After a + reboot, the rebooted implementation can re-generate the token and + send it to the peer, so as to delete the IKE SA. Deleting the IKE SA + results in a quick establishment of new IPsec tunnels. This is + described in Section 3. + +1.1. Conventions Used in This Document + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this + document are to be interpreted as described in [RFC2119]. + + The term "token" refers to an octet string that an implementation can + generate using only the properties of a protected IKE message (such + as IKE Security Parameter Indexes (SPIs)) as input. A conforming + implementation MUST be able to generate the same token from the same + input even after rebooting. + + The term "token maker" refers to an implementation that generates a + token and sends it to the peer as specified in this document. + + The term "token taker" refers to an implementation that stores such a + token or a digest thereof, in order to verify that a new token it + receives is identical to the old token it has stored. + + The term "non-volatile storage" in this document refers to a data + storage module that persists across restarts of the token maker. + Examples of such a storage module include an internal disk, an + internal flash memory module, an external disk, and an external + database. A small non-volatile storage module is required for a + token maker, but a larger one can be used to enhance performance, as + described in Section 8.2. + + + + +Nir, et al. Standards Track [Page 3] + +RFC 6290 Quick Crash Detection June 2011 + + +2. RFC 5996 Crash Recovery + + When one peer loses state or reboots, the other peer does not get any + notification, so unidirectional IPsec traffic can still flow. The + rebooted peer will not be able to decrypt it, however, and the only + remedy is to send an unprotected INVALID_SPI notification as + described in Section 3.10.1 of [RFC5996]. That section also + describes the processing of such a notification: + + If this Informational Message is sent outside the context of an + IKE_SA, it should be used by the recipient only as a "hint" that + something might be wrong (because it could easily be forged). + + Since the INVALID_SPI can only be used as a hint, the non-rebooted + peer has to determine whether the IPsec SA and indeed the parent IKE + SA are still valid. The method of doing this is described in Section + 2.4 of [RFC5996]. This method, called "liveness check", involves + sending a protected empty INFORMATIONAL message, and awaiting a + response. This procedure is sometimes referred to as "Dead Peer + Detection" or DPD. + + Section 2.4 does not mandate how many times the liveness check + message should be retransmitted, or for how long, but does recommend + the following: + + It is suggested that messages be retransmitted at least a dozen + times over a period of at least several minutes before giving up + on an SA... + + Those "at least several minutes" are a time during part of which both + peers are active, but IPsec cannot be used. + + Especially in the case of a reboot (rather than fail-over or + administrative clearing of state), the peer does not recover + immediately. Reboot, depending on the system, may take from a few + seconds to a few minutes. This means that at first the peer just + goes silent, i.e., does not send or respond to any messages. IKEv2 + implementations can detect this situation and follow the rules given + in Section 2.4: + + If there has only been outgoing traffic on all of the SAs + associated with an IKE SA, it is essential to confirm liveness of + the other endpoint to avoid black holes. If no cryptographically + protected messages have been received on an IKE SA or any of its + Child SAs recently, the system needs to perform a liveness check + in order to prevent sending messages to a dead peer. + + + + + +Nir, et al. Standards Track [Page 4] + +RFC 6290 Quick Crash Detection June 2011 + + + [RFC5996] does not mandate any time limits, but it is possible that + the peer will start liveness checks even before the other end is + sending INVALID_SPI notification, as it detected that the other end + is not sending any packets anymore while it is still rebooting or + recovering from the situation. + + This means that the several minutes recovery period is overlapping + the actual recover time of the other peer; i.e., if the security + gateway requires several minutes to boot up from the crash, then the + other peers have already finished their liveness checks before the + crashing peer even has a chance to send INVALID_SPI notifications. + + There are cases where the peer loses state and is able to recover + immediately; in those cases it might take several minutes to recreate + the IPsec SAs. + + Note that the IKEv2 specification specifically gives no guidance for + the number of retries or the length of timeouts, as these do not + affect interoperability. This means that implementations are allowed + to use the hints provided by the INVALID_SPI messages to shorten + those timeouts (i.e., a different environment and situation requiring + different rules). + + Some existing IKEv2 implementations already do that (i.e., shorten + timeouts or limit number of retries) based on these kinds of hints + and also start liveness checks quickly after the other end goes + silent. However, see Appendix A.4 for a discussion of why this may + not be enough. + +3. Protocol Outline + + Supporting implementations will send a notification, called a "QCD + token", as described in Section 4.1 in the first IKE_AUTH exchange + messages. These are the first IKE_AUTH request and final IKE_AUTH + response that contain the AUTH payloads. The generation of these + tokens is a local matter for implementations, but considerations are + described in Section 5. Implementations that send such a token will + be called "token makers". + + A supporting implementation receiving such a token MUST store it (or + a digest thereof) along with the IKE SA. Implementations that + support this part of the protocol will be called "token takers". + Section 8.1 has considerations for which implementations need to be + token takers, and which should be token makers. Implementations that + are not token takers will silently ignore QCD tokens. + + + + + + +Nir, et al. Standards Track [Page 5] + +RFC 6290 Quick Crash Detection June 2011 + + + When a token maker receives a protected IKE request message with + unknown IKE SPIs, it SHOULD generate a new token that is identical to + the previous token, and send it to the requesting peer in an + unprotected IKE message as described in Section 4.5. + + When a token taker receives the QCD token in an unprotected + notification, it MUST verify that the TOKEN_SECRET_DATA matches the + token stored with the matching IKE SA. If the verification fails, or + if the IKE SPIs in the message do not match any existing IKE SA, it + SHOULD log the event. If it succeeds, it MUST silently delete the + IKE SA associated with the IKE_SPI fields and all dependent child + SAs. This event MAY also be logged. The token taker MUST accept + such tokens from any IP address and port combination, so as to allow + different kinds of high-availability configurations of the token + maker. + + A supporting token taker MAY immediately create new SAs using an + Initial exchange, or it may wait for subsequent traffic to trigger + the creation of new SAs. + + See Section 7 for a short discussion about this extension's + interaction with IKEv2 Session Resumption ([RFC5723]). + +4. Formats and Exchanges + +4.1. Notification Format + + The notification payload called "QCD token" is formatted as follows: + + 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + ! Next Payload !C! RESERVED ! Payload Length ! + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + ! Protocol ID ! SPI Size ! QCD Token Notify Message Type ! + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + ! ! + ~ TOKEN_SECRET_DATA ~ + ! ! + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + o Protocol ID (1 octet) MUST be 1, as this message is related to an + IKE SA. + + o SPI Size (1 octet) MUST be zero, in conformance with Section 3.10 + of [RFC5996]. + + + + + +Nir, et al. Standards Track [Page 6] + +RFC 6290 Quick Crash Detection June 2011 + + + o QCD Token Notify Message Type (2 octets) - MUST be 16419, the + value assigned for QCD token notifications. + + o TOKEN_SECRET_DATA (variable) contains a generated token as + described in Section 5. + +4.2. Passing a Token in the AUTH Exchange + + For brevity, only the Extensible Authentication Protocol (EAP) + version of an AUTH exchange will be presented here. The non-EAP + version is very similar. The figures below are based on Appendix C.3 + of [RFC5996]. + + first request --> IDi, + [N(INITIAL_CONTACT)], + [[N(HTTP_CERT_LOOKUP_SUPPORTED)], CERTREQ+], + [IDr], + [N(QCD_TOKEN)] + [CP(CFG_REQUEST)], + [N(IPCOMP_SUPPORTED)+], + [N(USE_TRANSPORT_MODE)], + [N(ESP_TFC_PADDING_NOT_SUPPORTED)], + [N(NON_FIRST_FRAGMENTS_ALSO)], + SA, TSi, TSr, + [V+] + + first response <-- IDr, [CERT+], AUTH, + EAP, + [V+] + + / --> EAP + repeat 1..N times | + \ <-- EAP + + last request --> AUTH + + last response <-- AUTH, + [N(QCD_TOKEN)] + [CP(CFG_REPLY)], + [N(IPCOMP_SUPPORTED)], + [N(USE_TRANSPORT_MODE)], + [N(ESP_TFC_PADDING_NOT_SUPPORTED)], + [N(NON_FIRST_FRAGMENTS_ALSO)], + SA, TSi, TSr, + [N(ADDITIONAL_TS_POSSIBLE)], + [V+] + + + + + +Nir, et al. Standards Track [Page 7] + +RFC 6290 Quick Crash Detection June 2011 + + + Note that the QCD_TOKEN notification is marked as optional because it + is not required by this specification that every implementation be + both token maker and token taker. If only one peer sends the QCD + token, then a reboot of the other peer will not be recoverable by + this method. This may be acceptable if traffic typically originates + from the other peer. + + In any case, the lack of a QCD_TOKEN notification MUST NOT be taken + as an indication that the peer does not support this standard. + Conversely, if a peer does not understand this notification, it will + simply ignore it. Therefore, a peer may send this notification + freely, even if it does not know whether the other side supports it. + + The QCD_TOKEN notification is related to the IKE SA and should follow + the AUTH payload and precede the Configuration payload and all + payloads related to the child SA. + +4.3. Replacing Tokens after Rekey or Resumption + + After rekeying an IKE SA, the IKE SPIs are replaced, so the new SA + also needs to have a token. If only the responder in the rekey + exchange is the token maker, this can be done within the + CREATE_CHILD_SA exchange. If the initiator is a token maker, then we + need an extra informational exchange. + + The following figure shows the CREATE_CHILD_SA exchange for rekeying + the IKE SA. Only the responder sends a QCD token. + + request --> SA, Ni, [KEi] + + response <-- SA, Nr, [KEr], N(QCD_TOKEN) + + If the initiator is also a token maker, it SHOULD initiate an + INFORMATIONAL exchange immediately after the CREATE_CHILD_SA exchange + as follows: + + request --> N(QCD_TOKEN) + + response <-- + + For session resumption, as specified in [RFC5723], the situation is + similar. The responder, which is necessarily the peer that has + crashed, SHOULD send a new ticket within the protected payload of the + IKE_SESSION_RESUME exchange. If the Initiator is also a token maker, + it needs to send a QCD_TOKEN in a separate INFORMATIONAL exchange. + + + + + + +Nir, et al. Standards Track [Page 8] + +RFC 6290 Quick Crash Detection June 2011 + + + The INFORMATIONAL exchange described in this section can also be used + if QCD tokens need to be replaced due to a key rollover. However, + since token takers are required to verify at least 4 QCD tokens, this + is only necessary if secret QCD keys are rolled over more than four + times as often as IKE SAs are rekeyed. See Section 5.1 for an + example method that uses secret keys that may require rollover. + +4.4. Replacing the Token for an Existing SA + + With some token generation methods, such as that described in + Section 5.2, a QCD token may sometimes become invalid, although the + IKE SA is still perfectly valid. + + In such a case, the token maker MUST send the new token in a + protected message under that IKE SA. That exchange could be a simple + INFORMATIONAL, such as in the last figure in the previous section, or + else it can be part of a MOBIKE INFORMATIONAL exchange such as in the + following figure taken from Section 2.2 of [RFC4555] and modified by + adding a QCD_TOKEN notification: + + (IP_I2:4500 -> IP_R1:4500) + HDR, SK { N(UPDATE_SA_ADDRESSES), + N(NAT_DETECTION_SOURCE_IP), + N(NAT_DETECTION_DESTINATION_IP) } --> + + <-- (IP_R1:4500 -> IP_I2:4500) + HDR, SK { N(NAT_DETECTION_SOURCE_IP), + N(NAT_DETECTION_DESTINATION_IP) } + + <-- (IP_R1:4500 -> IP_I2:4500) + HDR, SK { N(COOKIE2), [N(QCD_TOKEN)] } + + (IP_I2:4500 -> IP_R1:4500) + HDR, SK { N(COOKIE2), [N(QCD_TOKEN)] } --> + + A token taker MUST accept such gratuitous QCD_TOKEN notifications as + long as they are carried in protected exchanges. A token maker + SHOULD NOT generate them unless it is no longer able to generate the + old QCD_TOKEN. + +4.5. Presenting the Token in an Unprotected Message + + This QCD_TOKEN notification is unprotected, and is sent as a response + to a protected IKE request, which uses an IKE SA that is unknown. + + message --> N(INVALID_IKE_SPI), N(QCD_TOKEN)+ + + + + + +Nir, et al. Standards Track [Page 9] + +RFC 6290 Quick Crash Detection June 2011 + + + If child SPIs are persistently mapped to IKE SPIs as described in + Section 8.2, a token taker may get the following unprotected message + in response to an Encapsulating Security Payload (ESP) or + Authentication Header (AH) packet. + + message --> N(INVALID_SPI), N(QCD_TOKEN)+ + + The QCD_TOKEN and INVALID_IKE_SPI notifications are sent together to + support both implementations that conform to this specification and + implementations that don't. Similar to the description in Section + 2.21 of [RFC5996], the IKE SPI and message ID fields in the packet + headers are taken from the protected IKE request. + + To support a periodic rollover of the secret used for token + generation, the token taker MUST support at least four QCD_TOKEN + notifications in a single packet. The token is considered verified + if any of the QCD_TOKEN notifications matches. The token maker MAY + generate up to four QCD_TOKEN notifications, based on several + generations of keys. + + If the QCD_TOKEN verifies OK, the receiver MUST silently discard the + IKE SA and all associated child SAs. If the QCD_TOKEN cannot be + validated, a response MUST NOT be sent, and the event may be logged. + Section 5 defines token verification. + +5. Token Generation and Verification + + No token generation method is mandated by this document. Two methods + are documented in the following sub-sections, but they only serve as + examples. + + The following lists the requirements for a token generation + mechanism: + + o Tokens MUST be at least 16 octets long, and no more than 128 + octets long, to facilitate storage and transmission. Tokens + SHOULD be indistinguishable from random data. + + o It should not be possible for an external attacker to guess the + QCD token generated by an implementation. Cryptographic + mechanisms such as a pseudo-random number generator (PRNG) and + hash functions are RECOMMENDED. + + o The token maker MUST be able to re-generate or retrieve the token + based on the IKE SPIs even after it reboots. + + + + + + +Nir, et al. Standards Track [Page 10] + +RFC 6290 Quick Crash Detection June 2011 + + + o The method of token generation MUST be such that a collision of + QCD tokens between different pairs of IKE SPI will be highly + unlikely. + + For verification, the token taker makes a bitwise comparison of the + token stored along with the IKE SA with the token sent in the + unprotected message. Multihomed takers might flip back-and-forth + between several addresses, and have their tokens replaced as + described in Section 4.4. To help avoid the case where the latest + stored token does not match the address used after the maker lost + state, the token taker MAY store several earlier tokens associated + with the IKE SA, and silently discard the SA if any of them matches. + +5.1. A Stateless Method of Token Generation + + The following describes a stateless method of generating a token. In + this case, 'stateless' means not maintaining any per-tunnel state, + although there is a small amount of non-volatile storage required. + + o At installation or immediately after the first boot of the token + maker, 32 random octets are generated using a secure random number + generator or a PRNG. + + o Those 32 bytes, called the "QCD_SECRET", are stored in non- + volatile storage on the machine, and kept indefinitely. + + o If key rollover is required by policy, the implementation MAY + periodically generate a new QCD_SECRET and keep up to 3 previous + generations. When sending an unprotected QCD_TOKEN, as many as 4 + notification payloads may be sent, each from a different + QCD_SECRET. + + o The TOKEN_SECRET_DATA is calculated as follows: + + TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R) + +5.2. A Stateless Method with IP Addresses + + This method is similar to the one in the previous section, except + that the IP address of the token taker is also added to the block + being hashed. This has the disadvantage that the token needs to be + replaced (as described in Section 4.4) whenever the token taker + changes its address. + + + + + + + + +Nir, et al. Standards Track [Page 11] + +RFC 6290 Quick Crash Detection June 2011 + + + See Section 9.2 for a discussion of a use-case for this method. When + using this method, the TOKEN_SECRET_DATA field is calculated as + follows: + + TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R | IPaddr-T) + + The IPaddr-T field specifies the IP address of the token taker. + Secret rollover considerations are similar to those in the previous + section. + + Note that with a multihomed token taker, the QCD token matches just + one of the token taker IP addresses. Usually this is not a problem, + as packets sent to the token maker come out the same IP address. If + for some reason this changes, then the token maker can replace the + token as described in Section 4.4. If IKEv2 Mobility and Multihoming + (MOBIKE) is used, replacing the tokens SHOULD be piggybacked on the + INFORMATIONAL exchange with the UPDATE_SA_ADDRESSES notifications. + + There is a corner case where the token taker begins using a new IP + address (because of multihoming, roaming, or normal network + operations) and the token maker loses state before replacing the + token. In that case, it will send a correct QCD token, but the token + taker will still have the old token. In that case, the extension + will not work, and the peers will revert to RFC 5996 recovery. + +5.3. Token Lifetime + + The token is associated with a single IKE SA and SHOULD be deleted by + the token taker when the SA is deleted or expires. More formally, + the token is associated with the pair (SPI-I, SPI-R). + +6. Backup Gateways + + Making crash detection and recovery quick is a worthy goal, but since + rebooting a gateway takes a non-zero amount of time, many + implementations choose to have a standby gateway ready to take over + as soon as the primary gateway fails for any reason. [RFC6027] + describes considerations for such clusters of gateways with + synchronized state, but the rest of this section is relevant even + when there is no synchronized state. + + If such a configuration is available, it is RECOMMENDED that the + standby gateway be able to generate the same token as the active + gateway. If the method described in Section 5.1 is used, this means + that the QCD_SECRET field is identical in both gateways. This has + the effect of having the crash recovery available immediately. + + + + + +Nir, et al. Standards Track [Page 12] + +RFC 6290 Quick Crash Detection June 2011 + + + Note that this refers to "high-availability" configurations, where + only one gateway is active at any given moment. This is different + from "load sharing" configurations where more than one gateway is + active at the same time. For load sharing configurations, please see + Section 9.2 for security considerations. + +7. Interaction with Session Resumption + + Session resumption, specified in [RFC5723], allows the setting up of + a new IKE SA to consume less computing resources. This is + particularly useful in the case of a remote access gateway that has + many tunnels. A failure of such a gateway requires all these many + remote access clients to establish an IKE SA either with the rebooted + gateway or with a backup. This tunnel re-establishment occurs within + a short period of time, creating a burden on the remote access + gateway. Session resumption addresses this problem by having the + clients store an encrypted derivative of the IKE SA for quick + re-establishment. + + What Session Resumption does not help is the problem of detecting + that the peer gateway has failed. A failed gateway may go undetected + for an arbitrarily long time, because IPsec does not have packet + acknowledgement, and applications cannot signal the IPsec layer that + the tunnel "does not work". Section 2.4 of RFC 5996 does not specify + how long an implementation needs to wait before beginning a liveness + check, and only says "not recently" (see full quote in Section 2). + In practice, some mobile devices wait a very long time before + beginning a liveness check, in order to extend battery life by + allowing parts of the device to remain in low-power modes. + + QCD tokens provide a way to detect the failure of the peer in the + case where a liveness check has not yet ended (or begun). + + A remote access client conforming to both specifications will store + QCD tokens, as well as the Session Resumption ticket, if provided by + the gateway. A remote access gateway conforming to both + specifications will generate a QCD token for the client. When the + gateway reboots, the client will discover this in either of two ways: + + 1. The client does regular liveness checks, or else the time for + some other IKE exchange has come. Since the gateway is still + down, the IKE exchange times out after several minutes. In this + case, QCD does not help. + + + + + + + + +Nir, et al. Standards Track [Page 13] + +RFC 6290 Quick Crash Detection June 2011 + + + 2. Either the primary gateway or a backup gateway (see Section 6) is + ready and sends a QCD token to the client. In that case, the + client will quickly re-establish the IPsec tunnel, either with + the rebooted primary gateway or the backup gateway as described + in this document. + + The full combined protocol looks like this: + + Initiator Responder + ----------- ----------- + HDR, SAi1, KEi, Ni --> + + <-- HDR, SAr1, KEr, Nr, [CERTREQ] + + HDR, SK {IDi, [CERT,] + [CERTREQ,] [IDr,] + AUTH, N(QCD_TOKEN) + SAi2, TSi, TSr, + N(TICKET_REQUEST)} --> + <-- HDR, SK {IDr, [CERT,] AUTH, + N(QCD_TOKEN), SAr2, TSi, TSr, + N(TICKET_LT_OPAQUE) } + + ---- Reboot ----- + + HDR, {} --> + <-- HDR, N(QCD_TOKEN) + + HDR, [N(COOKIE),] + Ni, N(TICKET_OPAQUE) + [,N+] --> + <-- HDR, Nr [,N+] + +8. Operational Considerations + +8.1. Who Should Implement This Specification + + Throughout this document, we have referred to reboot time + alternatingly as the time that the implementation crashes and the + time when it is ready to process IPsec packets and IKE exchanges. + Depending on the hardware and software platforms and the cause of the + reboot, rebooting may take anywhere from a few seconds to several + minutes. If the implementation is down for a long time, the benefit + of this protocol extension is reduced. For this reason, critical + systems should implement backup gateways as described in Section 6. + + + + + + +Nir, et al. Standards Track [Page 14] + +RFC 6290 Quick Crash Detection June 2011 + + + Implementing the "token maker" side of QCD makes sense for IKE + implementation where protected connections originate from the peer, + such as inter-domain VPNs and remote access gateways. Implementing + the "token taker" side of QCD makes sense for IKE implementations + where protected connections originate, such as inter-domain VPNs and + remote access clients. + + To clarify this discussion: + + o For remote-access clients it makes sense to implement the token + taker role. + + o For remote-access gateways it makes sense to implement the token + maker role. + + o For inter-domain VPN gateways it makes sense to implement both + roles, because it can't be known in advance where the traffic + originates. + + o It is perfectly valid to implement both roles in any case, for + example, when using a single library or a single gateway to + perform several roles. + + In order to limit the effects of Denial-of-Service (DoS) attacks, a + token taker SHOULD limit the rate of QCD_TOKENs verified from a + particular source. + + If excessive amounts of IKE requests protected with unknown IKE SPIs + arrive at a token maker, the IKE module SHOULD revert to the behavior + described in Section 2.21 of [RFC5996] and either send an + INVALID_IKE_SPI notification or ignore it entirely. + + Section 9.2 requires that token makers never send a QCD token in the + clear for a valid IKE SA and describes some configurations where this + could occur. Implementations that may be installed in such + configurations SHOULD automatically detect this and disable this + extension in unsafe configurations and MUST allow the user to control + whether the extension is enabled or disabled. + +8.2. Response to Unknown Child SPI + + After a reboot, it is more likely that an implementation will receive + IPsec packets than IKE packets. In that case, the rebooted + implementation will send an INVALID_SPI notification, triggering a + liveness check. The token will only be sent in a response to the + liveness check, thus requiring an extra round trip. + + + + + +Nir, et al. Standards Track [Page 15] + +RFC 6290 Quick Crash Detection June 2011 + + + To avoid this, an implementation that has access to enough non- + volatile storage MAY store a mapping of child SPIs to owning IKE + SPIs, or to generated tokens. If such a mapping is available and + persistent across reboots, the rebooted implementation SHOULD respond + to the IPsec packet with an INVALID_SPI notification, along with the + appropriate QCD_TOKEN notifications. A token taker SHOULD verify the + QCD token that arrives with an INVALID_SPI notification the same as + if it arrived with the IKE SPIs of the parent IKE SA. + + However, a persistent storage module might not be updated in a timely + manner and could be populated with tokens relating to IKE SPIs that + have already been rekeyed. A token taker MUST NOT take an invalid + QCD token sent along with an INVALID_SPI notification as evidence + that the peer is either malfunctioning or attacking, but it SHOULD + limit the rate at which such notifications are processed. + +9. Security Considerations + + The extension described in this document must not reduce the security + of IKEv2 or IPsec. Specifically, an eavesdropper must not learn any + non-public information about the peers. + + The proposed mechanism should be secure against attacks by a passive + man in the middle (MITM) (eavesdropper). Such an attacker must not + be able to disrupt an existing IKE session, either by resetting the + session or by introducing significant delays. This requirement is + especially significant, because this document introduces a new way to + reset an IKE SA. + + The mechanism need not be similarly secure against an active MITM, + since this type of attacker is already able to disrupt IKE sessions. + +9.1. QCD Token Generation and Handling + + Tokens MUST be hard to guess. This is critical, because if an + attacker can guess the token associated with an IKE SA, they can tear + down the IKE SA and associated tunnels at will. When the token is + delivered in the IKE_AUTH exchange, it is encrypted. When it is sent + again in an unprotected notification, it is not, but that is the last + time this token is ever used. + + An aggregation of some tokens generated by one maker together with + the related IKE SPIs MUST NOT give an attacker the ability to guess + other tokens. Specifically, if one taker does not properly secure + the QCD tokens and an attacker gains access to them, this attacker + MUST NOT be able to guess other tokens generated by the same maker. + This is the reason that the QCD_SECRET in Section 5.1 needs to be + sufficiently long. + + + +Nir, et al. Standards Track [Page 16] + +RFC 6290 Quick Crash Detection June 2011 + + + The token taker MUST store the token in a secure manner. No attacker + should be able to gain access to a stored token. + + The QCD_SECRET MUST be protected from access by other parties. + Anyone gaining access to this value will be able to delete all the + IKE SAs for this token maker. + + The QCD token is sent by the rebooted peer in an unprotected message. + A message like that is subject to modification, deletion, and replay + by an attacker. However, these attacks will not compromise the + security of either side. Modification is meaningless because a + modified token is simply an invalid token. Deletion will only cause + the protocol not to work, resulting in a delay in tunnel + re-establishment as described in Section 2. Replay is also + meaningless, because the IKE SA has been deleted after the first + transmission. + +9.2. QCD Token Transmission + + A token maker MUST NOT send a valid QCD token in an unprotected + message for an existing IKE SA. + + This requirement is obvious and easy in the case of a single gateway. + However, some implementations use a load balancer to divide the load + between several physical gateways. It MUST NOT be possible even in + such a configuration to trick one gateway into sending a valid QCD + token for an IKE SA that is valid on another gateway. This is true + whether the attempt to trick the gateway uses the token taker's IP + address or a different IP address. + + IPsec failure detection is not applicable to deployments where the + QCD secret is shared by multiple gateways and the gateways cannot + assess whether the token can be legitimately sent in the clear while + another gateway may actually still own the SA's. Load balancing + configurations typically fall in this category. In order for a load + balancing configuration of IPsec gateways to support this + specification, all members MUST be able to tell whether a particular + IKE SA is active anywhere in the cluster. One way to do this is to + synchronize a list of active IKE SPIs among all the cluster members. + + Because it includes the token taker's IP address in the token + generation, the method in Section 5.2 can (under certain conditions) + prevent revealing the QCD token for an existing pair of IKE SPIs to + an attacker who is using a different IP address, even in a load- + sharing cluster without state synchronization. That method does not + prevent revealing the QCD token to an active attacker who is spoofing + the token taker's IP address. Such an attacker may attempt to direct + messages to a cluster member other than the member responsible for + + + +Nir, et al. Standards Track [Page 17] + +RFC 6290 Quick Crash Detection June 2011 + + + the IKE SA in an attempt to trick that gateway into sending a QCD + token for a valid IKE SA. That method should not be used unless the + load balancer guarantees that IKE packets from the same source IP + address always go to the same cluster member. + +9.3. QCD Token Enumeration + + An attacker may try to attack QCD if the generation algorithm + described in Section 5.1 is used. The attacker will send several + fake IKE requests to the gateway under attack, receiving and + recording the QCD tokens in the responses. This will allow the + attacker to create a dictionary of IKE SPIs to QCD tokens, which can + later be used to tear down any IKE SA. + + Three factors mitigate this threat: + + o The space of all possible IKE SPI pairs is huge: 2^128, so making + such a dictionary is impractical. Even if we assume that one + implementation always generates predictable IKE SPIs, the space is + still at least 2^64 entries, so making the dictionary is extremely + hard. To ensure this, token makers MUST generate unpredictable + IKE SPIs by using a cryptographically strong pseudo-random number + generator. + + o Throttling the amount of QCD_TOKEN notifications sent out, as + discussed in Section 8.1, especially when not soon after a crash + will limit the attacker's ability to construct a dictionary. + + o The methods in Section 5.1 and Section 5.2 allow for a periodic + change of the QCD_SECRET. Any such change invalidates the entire + dictionary. + +10. IANA Considerations + + IANA has assigned a notify message type (16419) from the status types + range (16406-40959) of the "IKEv2 Notify Message Types" registry with + the name "QUICK_CRASH_DETECTION". + +11. Acknowledgements + + We would like to thank Hannes Tschofenig and Yaron Sheffer for their + comments about Session Resumption. + + Others who have contributed valuable comments are, in alphabetical + order, Lakshminath Dondeti, Paul Hoffman, Tero Kivinen, Scott C + Moonen, Magnus Nystrom, and Keith Welter. + + + + + +Nir, et al. Standards Track [Page 18] + +RFC 6290 Quick Crash Detection June 2011 + + +12. References + +12.1. Normative References + + [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate + Requirement Levels", BCP 14, RFC 2119, March 1997. + + [RFC4555] Eronen, P., "IKEv2 Mobility and Multihoming Protocol + (MOBIKE)", RFC 4555, June 2006. + + [RFC5996] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, + "Internet Key Exchange Protocol Version 2 (IKEv2)", + RFC 5996, September 2010. + +12.2. Informative References + + [RFC5723] Sheffer, Y. and H. Tschofenig, "Internet Key Exchange + Protocol Version 2 (IKEv2) Session Resumption", RFC 5723, + January 2010. + + [RFC6027] Nir, Y., "IPsec Cluster Problem Statement", RFC 6027, + October 2010. + + [recovery] Detienne, F., Sethi, P., and Y. Nir, "Safe IKE Recovery", + Work in Progress, July 2009. + + + + + + + + + + + + + + + + + + + + + + + + + + +Nir, et al. Standards Track [Page 19] + +RFC 6290 Quick Crash Detection June 2011 + + +Appendix A. The Path Not Taken + +A.1. Initiating a New IKE SA + + Instead of sending a QCD token, we could have the rebooted + implementation start an Initial exchange with the peer, including the + INITIAL_CONTACT notification. This would have the same effect, + instructing the peer to erase the old IKE SA, as well as establishing + a new IKE SA with fewer rounds. + + The disadvantage here is that in IKEv2, an authentication exchange + MUST have a piggybacked Child SA set up. Since our use-case is such + that the rebooted implementation does not have traffic flowing to the + peer, there are no good selectors for such a Child SA. + + Additionally, when authentication is asymmetric, such as when EAP is + used, it is not possible for the rebooted implementation to initiate + IKE. + +A.2. SIR + + Another proposal that was considered for this work item is the SIR + extension, which is described in [recovery]. Under that proposal, + the non-rebooted peer sends a non-protected query to the possibly + rebooted peer, asking whether the IKE SA exists. The peer replies + with either a positive or negative response, and the absence of a + positive response, along with the existence of a negative response, + is taken as proof that the IKE SA has really been lost. + + The working group preferred the QCD proposal to this one. + +A.3. Birth Certificates + + Birth Certificates is a method of crash detection that has never been + formally defined. Bill Sommerfeld suggested this idea in a mail to + the IPsec mailing list on August 7, 2000, in a thread discussing + methods of crash detection: + + If we have the system sign a "birth certificate" when it + reboots (including a reboot time or boot sequence number), + we could include that with a "bad spi" ICMP error and in + the negotiation of the IKE SA. + + We believe that this method would have some problems. First, it + requires Alice to store the certificate, so as to be able to compare + the public keys. That requires more storage than does a QCD token. + Additionally, the public key operations needed to verify the self- + signed certificates are more expensive for Alice. + + + +Nir, et al. Standards Track [Page 20] + +RFC 6290 Quick Crash Detection June 2011 + + + We believe that a symmetric-key operation such as proposed here is + more light-weight and simple than that implied by the Birth + Certificate idea. + +A.4. Reducing Liveness Check Length + + Some implementations require fewer retransmissions over a shorter + period of time for cases of liveness check started because of an + INVALID_SPI or INVALID_IKE_SPI notification. + + We believe that the default retransmission policy should represent a + good balance between the need for a timely discovery of a dead peer, + and a low probability of false detection. We expect the policy to be + set to take the shortest time such that this probability achieves a + certain target. Therefore, we believe that reducing the elapsed time + and retransmission count may create an unacceptably high probability + of false detection, and this can be triggered by a single + INVALID_IKE_SPI notification. + + Additionally, even if the retransmission policy is reduced to, say, + one minute, it is still a very noticeable delay from a human + perspective, from the time that the gateway has come up (i.e., is + able to respond with an INVALID_SPI or INVALID_IKE_SPI notification) + and until the tunnels are active, or from the time the backup gateway + has taken over until the tunnels are active. The use of QCD tokens + can reduce this delay. + + + + + + + + + + + + + + + + + + + + + + + + + +Nir, et al. Standards Track [Page 21] + +RFC 6290 Quick Crash Detection June 2011 + + +Authors' Addresses + + Yoav Nir (editor) + Check Point Software Technologies, Ltd. + 5 Hasolelim st. + Tel Aviv 67897 + Israel + + EMail: ynir@checkpoint.com + + + David Wierbowski + International Business Machines + 1701 North Street + Endicott, New York 13760 + United States + + EMail: wierbows@us.ibm.com + + + Frederic Detienne + Cisco Systems, Inc. + De Kleetlaan, 7 + Diegem B-1831 + Belgium + + Phone: +32 2 704 5681 + EMail: fd@cisco.com + + + Pratima Sethi + Cisco Systems, Inc. + O'Shaugnessy Road, 11 + Bangalore, Karnataka 560027 + India + + Phone: +91 80 4154 1654 + EMail: psethi@cisco.com + + + + + + + + + + + + + +Nir, et al. Standards Track [Page 22] + |