1 files changed, 675 insertions, 0 deletions
diff --git a/doc/rfc/rfc6027.txt b/doc/rfc/rfc6027.txt
new file mode 100644
index 0000000..811462a
--- /dev/null
+++ b/doc/rfc/rfc6027.txt
@@ -0,0 +1,675 @@
+
+
+
+
+
+
+Internet Engineering Task Force (IETF)                            Y. Nir
+Request for Comments: 6027                                   Check Point
+Category: Informational                                     October 2010
+ISSN: 2070-1721
+
+
+                    IPsec Cluster Problem Statement
+
+Abstract
+
+   This document defines the terminology, problem statement, and
+   requirements for implementing Internet Key Exchange (IKE) and IPsec
+   on clusters.  It also describes gaps in existing standards and their
+   implementation that need to be filled in order to allow peers to
+   interoperate with clusters from different vendors.  Agreed upon
+   terminology, problem statement, and requirements will allow IETF
+   working groups to consider development of IPsec/IKEv2 mechanisms to
+   simplify cluster implementations.
+
+Status of This Memo
+
+   This document is not an Internet Standards Track specification; it is
+   published for informational purposes.
+
+   This document is a product of the Internet Engineering Task Force
+   (IETF).  It represents the consensus of the IETF community.  It has
+   received public review and has been approved for publication by the
+   Internet Engineering Steering Group (IESG).  Not all documents
+   approved by the IESG are a candidate for any level of Internet
+   Standard; see Section 2 of RFC 5741.
+
+   Information about the current status of this document, any errata,
+   and how to provide feedback on it may be obtained at
+   http://www.rfc-editor.org/info/rfc6027.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Nir                           Informational                     [Page 1]
+
+RFC 6027             IPsec Cluster Problem Statement        October 2010
+
+
+Copyright Notice
+
+   Copyright (c) 2010 IETF Trust and the persons identified as the
+   document authors.  All rights reserved.
+
+   This document is subject to BCP 78 and the IETF Trust's Legal
+   Provisions Relating to IETF Documents
+   (http://trustee.ietf.org/license-info) in effect on the date of
+   publication of this document.  Please review these documents
+   carefully, as they describe your rights and restrictions with respect
+   to this document.  Code Components extracted from this document must
+   include Simplified BSD License text as described in Section 4.e of
+   the Trust Legal Provisions and are provided without warranty as
+   described in the Simplified BSD License.
+
+Table of Contents
+
+   1. Introduction ....................................................3
+      1.1. Conventions Used in This Document ..........................3
+   2. Terminology .....................................................3
+   3. The Problem Statement ...........................................5
+      3.1. Scope ......................................................5
+      3.2. A Lot of Long-Lived State ..................................6
+      3.3. IKE Counters ...............................................6
+      3.4. Outbound SA Counters .......................................6
+      3.5. Inbound SA Counters ........................................7
+      3.6. Missing Synch Messages .....................................8
+      3.7. Simultaneous Use of IKE and IPsec SAs by Different
+           Members ....................................................8
+           3.7.1. Outbound SAs Using Counter Modes ....................9
+      3.8. Different IP Addresses for IKE and IPsec ..................10
+      3.9. Allocation of SPIs ........................................10
+   4. Security Considerations ........................................10
+   5. Acknowledgements ...............................................11
+   6. References .....................................................11
+      6.1. Normative References ......................................11
+      6.2. Informative References ....................................11
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Nir                           Informational                     [Page 2]
+
+RFC 6027             IPsec Cluster Problem Statement        October 2010
+
+
+1.  Introduction
+
+   IKEv2, as described in [RFC5996], and IPsec, as described in
+   [RFC4301] and others, allows deployment of VPNs between different
+   sites as well as from VPN clients to protected networks.
+
+   As VPNs become increasingly important to the organizations deploying
+   them, there is a demand to make IPsec solutions more scalable and
+   less prone to down time, by using more than one physical gateway to
+   either share the load or back each other up, forming a "cluster" (see
+   Section 2).  Similar demands have been made in the past for other
+   critical pieces of an organization's infrastructure, such as DHCP and
+   DNS servers, Web servers, databases, and others.
+
+   IKE and IPsec are, in particular, less friendly to clustering than
+   these other protocols, because they store more state, and that state
+   is more volatile.  Section 2 defines terminology for use in this
+   document and in the envisioned solution documents.
+
+   In general, deploying IKE and IPsec in a cluster requires such a
+   large amount of information to be synchronized among the members of
+   the cluster that it becomes impractical.  Alternatively, if less
+   information is synchronized, failover would mean a prolonged and
+   intensive recovery phase, which negates the scalability and
+   availability promises of using clusters.  In Section 3, we will
+   describe this in more detail.
+
+1.1.  Conventions Used in This Document
+
+   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+   document are to be interpreted as described in [RFC2119].
+
+2.  Terminology
+
+   "Single Gateway" is an implementation of IKE and IPsec enforcing a
+   certain policy, as described in [RFC4301].
+
+   "Cluster" is a set of two or more gateways, implementing the same
+   security policy, and protecting the same domain.  Clusters exist to
+   provide both high availability through redundancy and scalability
+   through load sharing.
+
+   "Member" is one gateway in a cluster.
+
+   "Availability" is a measure of a system's ability to perform the
+   service for which it was designed.  It is measured as the percentage
+   of time a service is available from the time it is supposed to be
+
+
+
+Nir                           Informational                     [Page 3]
+
+RFC 6027             IPsec Cluster Problem Statement        October 2010
+
+
+   available.  Colloquially, availability is sometimes expressed in
+   "nines" rather than percentage, with 3 "nines" meaning 99.9%
+   availability, 4 "nines" meaning 99.99% availability, etc.
+
+   "High Availability" is a property of a system, not a configuration
+   type.  A system is said to have high availability if its expected
+   down time is low.  High availability can be achieved in various ways,
+   one of which is clustering.  All the clusters described in this
+   document achieve high availability.  What "high" means depends on the
+   application, but usually is 4 to 6 "nines" (at most 0.5-50 minutes of
+   down time per year in a system that is supposed to be available all
+   the time.
+
+   "Fault Tolerance" is a property related to high availability, where a
+   system maintains service availability, even when a specified set of
+   fault conditions occur.  In clusters, we expect the system to
+   maintain service availability, when one or more of the cluster
+   members fails.
+
+   "Completely Transparent Cluster" is a cluster where the occurrence of
+   a fault is never visible to the peers.
+
+   "Partially Transparent Cluster" is a cluster where the occurrence of
+   a fault may be visible to the peers.
+
+   "Hot Standby Cluster", or "HS Cluster" is a cluster where only one of
+   the members is active at any one time.  This member is also referred
+   to as the "active" member, whereas the other(s) are referred to as
+   "standbys".  The Virtual Router Redundancy Protocol (VRRP)
+   ([RFC5798]) is one method of building such a cluster.
+
+   "Load Sharing Cluster", or "LS Cluster" is a cluster where more than
+   one of the members may be active at the same time.  The term "load
+   balancing" is also common, but it implies that the load is actually
+   balanced between the members, and this is not a requirement.
+
+   "Failover" is the event where one member takes over some load from
+   some other member.  In a hot standby cluster, this happens when a
+   standby member becomes active due to a failure of the former active
+   member, or because of an administrator command.  In a load sharing
+   cluster, this usually happens because of a failure of one of the
+   members, but certain load-balancing technologies may allow a
+   particular load (such as all the flows associated with a particular
+   child Security Association (SA)) to move from one member to another
+   to even out the load, even without any failures.
+
+
+
+
+
+
+Nir                           Informational                     [Page 4]
+
+RFC 6027             IPsec Cluster Problem Statement        October 2010
+
+
+   "Tight Cluster" is a cluster where all the members share an IP
+   address.  This could be accomplished using configured interfaces with
+   specialized protocols or hardware, such as VRRP, or through the use
+   of multicast addresses, but in any case, peers need only be
+   configured with one IP address in the Peer Authentication Database.
+
+   "Loose Cluster" is a cluster where each member has a different IP
+   address.  Peers find the correct member using some method such as DNS
+   queries or the IKEv2 redirect mechanism ([RFC5685]).  In some cases,
+   a member's IP address(es) may be allocated to another member at
+   failover.
+
+   "Synch Channel" is a communications channel among the cluster
+   members, which is used to transfer state information.  The synch
+   channel may or may not be IP based, may or may not be encrypted, and
+   may work over short or long distances.  The security and physical
+   characteristics of this channel are out of scope for this document,
+   but it is a requirement that its use be minimized for scalability.
+
+3.  The Problem Statement
+
+   This section starts by scoping the problem, and goes on to list each
+   of the issues encountered while setting up a cluster of IPsec VPN
+   gateways.
+
+3.1.  Scope
+
+   This document will make no attempt to describe the problems in
+   setting up a generic cluster.  It describes only problems related to
+   the IKE/IPsec protocols.
+
+   The problem of synchronizing the policy between cluster members is
+   out of scope, as this is an administrative issue that is not
+   particular to either clusters or to IPsec.
+
+   The interesting scenario here is VPN, whether inter-domain or remote
+   access.  Host-to-host transport mode is not expected to benefit from
+   this work.
+
+   We do not describe in full the problems of the communication channel
+   between cluster members (the Synch Channel), nor do we intend to
+   specify anything in this space later.  Specifically, mixed-vendor
+   clusters are out of scope.
+
+   The problem statement anticipates possible protocol-level solutions
+   between IKE/IPsec peers in order to improve the availability and/or
+   performance of VPN clusters.  One vendor's IPsec endpoint should be
+   able to work, optimally, with another vendor's cluster.
+
+
+
+Nir                           Informational                     [Page 5]
+
+RFC 6027             IPsec Cluster Problem Statement        October 2010
+
+
+3.2.  A Lot of Long-Lived State
+
+   IKE and IPsec have a lot of long-lived state:
+
+   o  IKE SAs last for minutes, hours, or days, and carry keys and other
+      information.  Some gateways may carry thousands to hundreds of
+      thousands of IKE SAs.
+
+   o  IPsec SAs last for minutes or hours, and carry keys, selectors,
+      and other information.  Some gateways may carry hundreds of
+      thousands of such IPsec SAs.
+
+   o  SPD (Security Policy Database) cache entries.  While the SPD is
+      unchanging, the SPD cache changes on the fly due to narrowing.
+      Entries last at least as long as the SAD (Security Association
+      Database) entries, but tend to last even longer than that.
+
+   A naive implementation of a cluster would have no synchronized state,
+   and a failover would produce an effect similar to that of a rebooted
+   gateway.  [RFC5723] describes how new IKE and IPsec SAs can be
+   recreated in such a case.
+
+3.3.  IKE Counters
+
+   We can overcome the first problem described in Section 3.2, by
+   synchronizing states -- whenever an SA is created, we can synch this
+   new state to all other members.  However, those states are not only
+   long lived, they are also ever changing.
+
+   IKE has message counters.  A peer MUST NOT process message n until
+   after it has processed message n-1.  Skipping message IDs is not
+   allowed.  So a newly active member needs to know the last message IDs
+   both received and transmitted.
+
+   One possible solution is to synchronize information about the IKE
+   message counters after every IKE exchange.  This way, the newly
+   active member knows what messages it is allowed to process, and what
+   message IDs to use on IKE requests, so that peers process them.  This
+   solution may be appropriate in some cases, but may be too onerous in
+   systems with a lot of SAs.  It also has the drawback that it never
+   recovers from the missing synch message problem, which is described
+   in Section 3.6.
+
+3.4.  Outbound SA Counters
+
+   The Encapsulating Security Payload (ESP) and Authentication Header
+   (AH) have an optional anti-replay feature, where every protected
+   packet carries a counter number.  Repeating counter numbers is
+
+
+
+Nir                           Informational                     [Page 6]
+
+RFC 6027             IPsec Cluster Problem Statement        October 2010
+
+
+   considered an attack, so the newly active member MUST NOT use a
+   replay counter number that has already been used.  The peer will drop
+   those packets as duplicates and/or warn of an attack.
+
+   Though it may be feasible to synchronize the IKE message counters, it
+   is almost never feasible to synchronize the IPsec packet counters for
+   every IPsec packet transmitted.  So we have to assume that at least
+   for IPsec, the replay counter will not be up to date on the newly
+   active member, and the newly active member may repeat a counter.
+
+   A possible solution is to synch replay counter information, not for
+   each packet emitted, but only at regular intervals, say, every 10,000
+   packets or every 0.5 seconds.  After a failover, the newly active
+   member advances the counters for outbound IPsec SAs by 10,000
+   packets.  To the peer, this looks like up to 10,000 packets were
+   lost, but this should be acceptable, as neither ESP nor AH guarantee
+   reliable delivery.
+
+3.5.  Inbound SA Counters
+
+   An even tougher issue is the synchronization of packet counters for
+   inbound IPsec SAs.  If a packet arrives at a newly active member,
+   there is no way to determine whether or not this packet is a replay.
+   The periodic synch does not solve this problem at all, because
+   suppose we synchronize every 10,000 packets, and the last synch
+   before the failover had the counter at 170,000.  It is probable,
+   though not certain, that packet number 180,000 has not yet been
+   processed, but if packet 175,000 arrives at the newly active member,
+   it has no way of determining whether or not that packet has already
+   been processed.  The synchronization does prevent the processing of
+   really old packets, such as those with counter number 165,000.
+   Ignoring all counters below 180,000 won't work either, because that's
+   up to 10,000 dropped packets, which may be very noticeable.
+
+   The easiest solution is to learn the replay counter from the incoming
+   traffic.  This is allowed by the standards, because replay counter
+   verification is an optional feature (see Section 3.2 in [RFC4301]).
+   The case can even be made that it is relatively secure, because non-
+   attack traffic will reset the counters to what they should be, so an
+   attacker faces the dual challenge of a very narrow window for attack,
+   and the need to time the attack to a failover event.  Unless the
+   attacker can actually cause the failover, this would be very
+   difficult.  It should be noted, though, that although this solution
+   is acceptable as far as RFC 4301 goes, it is a matter of policy
+   whether this is acceptable.
+
+
+
+
+
+
+Nir                           Informational                     [Page 7]
+
+RFC 6027             IPsec Cluster Problem Statement        October 2010
+
+
+   Another possible solution to the inbound IPsec SA problem is to rekey
+   all child SAs following a failover.  This may or may not be feasible
+   depending on the implementation and the configuration.
+
+3.6.  Missing Synch Messages
+
+   The synch channel is very likely not to be infallible.  Before
+   failover is detected, some synchronization messages may have been
+   missed.  For example, the active member may have created a new child
+   SA using message n.  The new information (entry in the SAD and update
+   to counters of the IKE SA) is sent on the synch channel.  Still, with
+   every possible technology, the update may be missed before the
+   failover.
+
+   This is a bad situation, because the IKE SA is doomed.  The newly
+   active member has two problems:
+
+   o  It does not have the new IPsec SA pair.  It will drop all incoming
+      packets protected with such an SA.  This could be fixed by sending
+      some DELETEs and INVALID_SPI notifications, if it wasn't for the
+      other problem.
+
+   o  The counters for the IKE SA show that only request n-1 has been
+      sent.  The next request will get the message ID n, but that will
+      be rejected by the peer.  After a sufficient number of
+      retransmissions and rejections, the whole IKE SA with all
+      associated IPsec SAs will get dropped.
+
+   The above scenario may be rare enough that it is acceptable that on a
+   configuration with thousands of IKE SAs, a few will need to be
+   recreated from scratch or using session resumption techniques.
+   However, detecting this may take a long time (several minutes) and
+   this negates the goal of creating a cluster in the first place.
+
+3.7.  Simultaneous Use of IKE and IPsec SAs by Different Members
+
+   For load sharing clusters, all active members may need to use the
+   same SAs, both IKE and IPsec.  This is an even greater problem than
+   in the case of hot standby clusters, because consecutive packets may
+   need to be sent by different members to the same peer gateway.
+
+   The solution to the IKE SA issue is up to the implementation.  It's
+   possible to create some locking mechanism over the synch channel, or
+   else have one member "own" the IKE SA and manage the child SAs for
+   all other members.  For IPsec, solutions fall into two broad
+   categories.
+
+
+
+
+
+Nir                           Informational                     [Page 8]
+
+RFC 6027             IPsec Cluster Problem Statement        October 2010
+
+
+   The first is the "sticky" category, where all communications with a
+   single peer, or all communications involving a certain SPD cache
+   entry go through a single peer.  In this case, all packets that match
+   any particular SA go through the same member, so no synchronization
+   of the replay counter needs to be done.  Inbound processing is a
+   "sticky" issue (no pun intended), because the packets have to be
+   processed by the correct member based on peer and the Security
+   Parameter Index (SPI), and most load balancers will not be able to
+   match the SPIs to the correct member, unless stickiness extends to
+   all traffic with a particular peer.  Another disadvantage of sticky
+   solutions is that the load tends to not distribute evenly, especially
+   if one SA covers a significant portion of IPsec traffic.
+
+   The second is the "duplicate" category, where the child SA is
+   duplicated for each pair of IPsec SAs for each active member.
+   Different packets for the same peer go through different members, and
+   get protected using different SAs with the same selectors and
+   matching the same entries in the SPD cache.  This has some
+   shortcomings:
+
+   o  It requires multiple parallel SAs, for which the peer has no use.
+      Section 2.8 of [RFC5996] specifically allows this, but some
+      implementation might have a policy against long-term maintenance
+      of redundant SAs.
+
+   o  Different packets that belong to the same flow may be protected by
+      different SAs, which may seem "weird" to the peer gateway,
+      especially if it is integrated with some deep-inspection
+      middleware such as a firewall.  It is not known whether this will
+      cause problems with current gateways.  It is also impossible to
+      mandate against this, because the definition of "flow" varies from
+      one implementation to another.
+
+   o  Reply packets may arrive with an IPsec SA that is not "matched" to
+      the one used for the outgoing packets.  Also, they might arrive at
+      a different member.  This problem is beyond the scope of this
+      document and should be solved by the application, perhaps by
+      forwarding misdirected packets to the correct gateway for deep
+      inspection.
+
+3.7.1.  Outbound SAs Using Counter Modes
+
+   For SAs involving counter mode ciphers such as Counter Mode (CTR)
+   ([RFC3686]) or Galois/Counter Mode (GCM) ([RFC4106]) there is yet
+   another complication.  The initial vector for such modes MUST NOT be
+   repeated, and senders use methods such as counters or linear feedback
+   shift registers (LFSRs) to ensure this.  For an SA shared between
+   more than one active member, or even failing over from one member to
+
+
+
+Nir                           Informational                     [Page 9]
+
+RFC 6027             IPsec Cluster Problem Statement        October 2010
+
+
+   another, the cluster members need to make sure that they do not
+   generate the same initial vector.  See [COUNTER_MODES] for a
+   discussion of this problem in another context.
+
+3.8.  Different IP Addresses for IKE and IPsec
+
+   In many implementations there are separate IP addresses for the
+   cluster, and for each member.  While the packets protected by tunnel
+   mode child SAs are encapsulated in IP headers with the cluster IP
+   address, the IKE packets originate from a specific member, and carry
+   that member's IP address.  This may be done so that IPsec traffic
+   bypasses the load balancer for greater scalability.  For the peer,
+   this looks weird, as the usual thing is for the IPsec packets to come
+   from the same IP address as the IKE packets.  Unmodified peers may
+   drop such packets.
+
+   One obvious solution is to use some fancy capability of the IKE host
+   to change things so that IKE packets also come out of the cluster IP
+   address.  This can be achieved through NAT or through assigning
+   multiple addresses to interfaces.  This is not, however, possible for
+   all implementations, and will not reduce load on the balancer.
+
+   [ARORA] discusses this problem in greater depth, and proposes another
+   solution, that does involve protocol changes.
+
+3.9.  Allocation of SPIs
+
+   The SPI associated with each child SA, and with each IKE SA, MUST be
+   unique relative to the peer of the SA.  Thus, in the context of a
+   cluster, each cluster member MUST generate SPIs in a fashion that
+   avoids collisions (with other cluster members) for these SPI values.
+   The means by which cluster members achieve this requirement is a
+   local matter, outside the scope of this document.
+
+4.  Security Considerations
+
+   Implementations running on clusters MUST be as secure as
+   implementations running on single gateways.  In other words, no
+   extension or interpretation used to allow operation in a cluster may
+   facilitate attacks that are not possible for single gateways.
+
+   Moreover, thought must be given to the synching requirements of any
+   protocol extension to make sure that it does not create an
+   opportunity for denial-of-service attacks on the cluster.
+
+
+
+
+
+
+
+Nir                           Informational                    [Page 10]
+
+RFC 6027             IPsec Cluster Problem Statement        October 2010
+
+
+   As mentioned in Section 3.5, allowing an inbound child SA to failover
+   to another member has the effect of disabling replay counter
+   protection for a short time.  Though the threat is arguably low, it
+   is a policy decision whether this is acceptable.
+
+   Section 3.7 describes the problem of the two directions of a flow
+   being protected by two SAs that are not part of a matched pair or
+   that are not even being processed by the same cluster member.  This
+   is not a security problem as far as IPsec is concerned because IPsec
+   has policy at the IP, protocol and port level only.  However, many
+   IPsec implementations are integrated with stateful firewalls, which
+   need to see both sides of a flow.  Such implementations may have to
+   forward packets to other members for the firewall to properly inspect
+   the traffic.
+
+5.  Acknowledgements
+
+   This document is the collective work, and includes contribution from
+   many people who participate in the IPsecME working group.
+
+   The editor would particularly like to acknowledge the extensive
+   contribution of the following people (in alphabetical order):
+   Jitender Arora, Jean-Michel Combes, Dan Harkins, David Harrington,
+   Steve Kent, Tero Kivinen, Alexey Melnikov, Yaron Sheffer, Melinda
+   Shore, and Rodney Van Meter.
+
+6.  References
+
+6.1.  Normative References
+
+   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
+              Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
+              Internet Protocol", RFC 4301, December 2005.
+
+   [RFC5996]  Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen,
+              "Internet Key Exchange Protocol Version 2 (IKEv2)",
+              September 2010.
+
+6.2.  Informative References
+
+   [ARORA]    Arora, J. and P. Kumar, "Alternate Tunnel Addresses for
+              IKEv2", Work in Progress, April 2010.
+
+
+
+
+
+
+
+Nir                           Informational                    [Page 11]
+
+RFC 6027             IPsec Cluster Problem Statement        October 2010
+
+
+   [COUNTER_MODES]
+              McGrew, D. and B. Weis, "Using Counter Modes with
+              Encapsulating Security Payload (ESP) and Authentication
+              Header (AH) to Protect Group Traffic", Work in Progress,
+              March 2010.
+
+   [RFC3686]  Housley, R., "Using Advanced Encryption Standard (AES)
+              Counter Mode", RFC 3686, January 2009.
+
+   [RFC4106]  Viega, J. and D. McGrew, "The Use of Galois/Counter Mode
+              (GCM) in IPsec Encapsulating Security Payload (ESP)",
+              RFC 4106, June 2005.
+
+   [RFC5685]  Devarapalli, V. and K. Weniger, "Redirect Mechanism for
+              IKEv2", RFC 5685, November 2009.
+
+   [RFC5723]  Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption",
+              RFC 5723, January 2010.
+
+   [RFC5798]  Nadas, S., "Virtual Router Redundancy Protocol (VRRP)",
+              RFC 5798, March 2010.
+
+Author's Address
+
+   Yoav Nir
+   Check Point Software Technologies Ltd.
+   5 Hasolelim st.
+   Tel Aviv  67897
+   Israel
+
+   EMail: ynir@checkpoint.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Nir                           Informational                    [Page 12]
+