summaryrefslogtreecommitdiff
path: root/doc/rfc/rfc4271.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/rfc/rfc4271.txt')
-rw-r--r--doc/rfc/rfc4271.txt5827
1 files changed, 5827 insertions, 0 deletions
diff --git a/doc/rfc/rfc4271.txt b/doc/rfc/rfc4271.txt
new file mode 100644
index 0000000..73f4298
--- /dev/null
+++ b/doc/rfc/rfc4271.txt
@@ -0,0 +1,5827 @@
+
+
+
+
+
+
+Network Working Group Y. Rekhter, Ed.
+Request for Comments: 4271 T. Li, Ed.
+Obsoletes: 1771 S. Hares, Ed.
+Category: Standards Track January 2006
+
+
+ A Border Gateway Protocol 4 (BGP-4)
+
+Status of This Memo
+
+ This document specifies an Internet standards track protocol for the
+ Internet community, and requests discussion and suggestions for
+ improvements. Please refer to the current edition of the "Internet
+ Official Protocol Standards" (STD 1) for the standardization state
+ and status of this protocol. Distribution of this memo is unlimited.
+
+Copyright Notice
+
+ Copyright (C) The Internet Society (2006).
+
+Abstract
+
+ This document discusses the Border Gateway Protocol (BGP), which is
+ an inter-Autonomous System routing protocol.
+
+ The primary function of a BGP speaking system is to exchange network
+ reachability information with other BGP systems. This network
+ reachability information includes information on the list of
+ Autonomous Systems (ASes) that reachability information traverses.
+ This information is sufficient for constructing a graph of AS
+ connectivity for this reachability from which routing loops may be
+ pruned, and, at the AS level, some policy decisions may be enforced.
+
+ BGP-4 provides a set of mechanisms for supporting Classless Inter-
+ Domain Routing (CIDR). These mechanisms include support for
+ advertising a set of destinations as an IP prefix, and eliminating
+ the concept of network "class" within BGP. BGP-4 also introduces
+ mechanisms that allow aggregation of routes, including aggregation of
+ AS paths.
+
+ This document obsoletes RFC 1771.
+
+
+
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 1]
+
+RFC 4271 BGP-4 January 2006
+
+
+Table of Contents
+
+ 1. Introduction ....................................................4
+ 1.1. Definition of Commonly Used Terms ..........................4
+ 1.2. Specification of Requirements ..............................6
+ 2. Acknowledgements ................................................6
+ 3. Summary of Operation ............................................7
+ 3.1. Routes: Advertisement and Storage ..........................9
+ 3.2. Routing Information Base ..................................10
+ 4. Message Formats ................................................11
+ 4.1. Message Header Format .....................................12
+ 4.2. OPEN Message Format .......................................13
+ 4.3. UPDATE Message Format .....................................14
+ 4.4. KEEPALIVE Message Format ..................................21
+ 4.5. NOTIFICATION Message Format ...............................21
+ 5. Path Attributes ................................................23
+ 5.1. Path Attribute Usage ......................................25
+ 5.1.1. ORIGIN .............................................25
+ 5.1.2. AS_PATH ............................................25
+ 5.1.3. NEXT_HOP ...........................................26
+ 5.1.4. MULTI_EXIT_DISC ....................................28
+ 5.1.5. LOCAL_PREF .........................................29
+ 5.1.6. ATOMIC_AGGREGATE ...................................29
+ 5.1.7. AGGREGATOR .........................................30
+ 6. BGP Error Handling. ............................................30
+ 6.1. Message Header Error Handling .............................31
+ 6.2. OPEN Message Error Handling ...............................31
+ 6.3. UPDATE Message Error Handling .............................32
+ 6.4. NOTIFICATION Message Error Handling .......................34
+ 6.5. Hold Timer Expired Error Handling .........................34
+ 6.6. Finite State Machine Error Handling .......................35
+ 6.7. Cease .....................................................35
+ 6.8. BGP Connection Collision Detection ........................35
+ 7. BGP Version Negotiation ........................................36
+ 8. BGP Finite State Machine (FSM) .................................37
+ 8.1. Events for the BGP FSM ....................................38
+ 8.1.1. Optional Events Linked to Optional Session
+ Attributes .........................................38
+ 8.1.2. Administrative Events ..............................42
+ 8.1.3. Timer Events .......................................46
+ 8.1.4. TCP Connection-Based Events ........................47
+ 8.1.5. BGP Message-Based Events ...........................49
+ 8.2. Description of FSM ........................................51
+ 8.2.1. FSM Definition .....................................51
+ 8.2.1.1. Terms "active" and "passive" ..............52
+ 8.2.1.2. FSM and Collision Detection ...............52
+ 8.2.1.3. FSM and Optional Session Attributes .......52
+ 8.2.1.4. FSM Event Numbers .........................53
+
+
+
+Rekhter, et al. Standards Track [Page 2]
+
+RFC 4271 BGP-4 January 2006
+
+
+ 8.2.1.5. FSM Actions that are Implementation
+ Dependent .................................53
+ 8.2.2. Finite State Machine ...............................53
+ 9. UPDATE Message Handling ........................................75
+ 9.1. Decision Process ..........................................76
+ 9.1.1. Phase 1: Calculation of Degree of Preference .......77
+ 9.1.2. Phase 2: Route Selection ...........................77
+ 9.1.2.1. Route Resolvability Condition .............79
+ 9.1.2.2. Breaking Ties (Phase 2) ...................80
+ 9.1.3. Phase 3: Route Dissemination .......................82
+ 9.1.4. Overlapping Routes .................................83
+ 9.2. Update-Send Process .......................................84
+ 9.2.1. Controlling Routing Traffic Overhead ...............85
+ 9.2.1.1. Frequency of Route Advertisement ..........85
+ 9.2.1.2. Frequency of Route Origination ............85
+ 9.2.2. Efficient Organization of Routing Information ......86
+ 9.2.2.1. Information Reduction .....................86
+ 9.2.2.2. Aggregating Routing Information ...........87
+ 9.3. Route Selection Criteria ..................................89
+ 9.4. Originating BGP routes ....................................89
+ 10. BGP Timers ....................................................90
+ Appendix A. Comparison with RFC 1771 .............................92
+ Appendix B. Comparison with RFC 1267 .............................93
+ Appendix C. Comparison with RFC 1163 .............................93
+ Appendix D. Comparison with RFC 1105 .............................94
+ Appendix E. TCP Options that May Be Used with BGP ................94
+ Appendix F. Implementation Recommendations .......................95
+ Appendix F.1. Multiple Networks Per Message .........95
+ Appendix F.2. Reducing Route Flapping ...............96
+ Appendix F.3. Path Attribute Ordering ...............96
+ Appendix F.4. AS_SET Sorting ........................96
+ Appendix F.5. Control Over Version Negotiation ......96
+ Appendix F.6. Complex AS_PATH Aggregation ...........96
+ Security Considerations ...........................................97
+ IANA Considerations ...............................................99
+ Normative References .............................................101
+ Informative References ...........................................101
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 3]
+
+RFC 4271 BGP-4 January 2006
+
+
+1. Introduction
+
+ The Border Gateway Protocol (BGP) is an inter-Autonomous System
+ routing protocol.
+
+ The primary function of a BGP speaking system is to exchange network
+ reachability information with other BGP systems. This network
+ reachability information includes information on the list of
+ Autonomous Systems (ASes) that reachability information traverses.
+ This information is sufficient for constructing a graph of AS
+ connectivity for this reachability, from which routing loops may be
+ pruned and, at the AS level, some policy decisions may be enforced.
+
+ BGP-4 provides a set of mechanisms for supporting Classless Inter-
+ Domain Routing (CIDR) [RFC1518, RFC1519]. These mechanisms include
+ support for advertising a set of destinations as an IP prefix and
+ eliminating the concept of network "class" within BGP. BGP-4 also
+ introduces mechanisms that allow aggregation of routes, including
+ aggregation of AS paths.
+
+ Routing information exchanged via BGP supports only the destination-
+ based forwarding paradigm, which assumes that a router forwards a
+ packet based solely on the destination address carried in the IP
+ header of the packet. This, in turn, reflects the set of policy
+ decisions that can (and cannot) be enforced using BGP. BGP can
+ support only those policies conforming to the destination-based
+ forwarding paradigm.
+
+1.1. Definition of Commonly Used Terms
+
+ This section provides definitions for terms that have a specific
+ meaning to the BGP protocol and that are used throughout the text.
+
+ Adj-RIB-In
+ The Adj-RIBs-In contains unprocessed routing information that has
+ been advertised to the local BGP speaker by its peers.
+
+ Adj-RIB-Out
+ The Adj-RIBs-Out contains the routes for advertisement to specific
+ peers by means of the local speaker's UPDATE messages.
+
+ Autonomous System (AS)
+ The classic definition of an Autonomous System is a set of routers
+ under a single technical administration, using an interior gateway
+ protocol (IGP) and common metrics to determine how to route
+ packets within the AS, and using an inter-AS routing protocol to
+ determine how to route packets to other ASes. Since this classic
+ definition was developed, it has become common for a single AS to
+
+
+
+Rekhter, et al. Standards Track [Page 4]
+
+RFC 4271 BGP-4 January 2006
+
+
+ use several IGPs and, sometimes, several sets of metrics within an
+ AS. The use of the term Autonomous System stresses the fact that,
+ even when multiple IGPs and metrics are used, the administration
+ of an AS appears to other ASes to have a single coherent interior
+ routing plan, and presents a consistent picture of the
+ destinations that are reachable through it.
+
+ BGP Identifier
+ A 4-octet unsigned integer that indicates the BGP Identifier of
+ the sender of BGP messages. A given BGP speaker sets the value of
+ its BGP Identifier to an IP address assigned to that BGP speaker.
+ The value of the BGP Identifier is determined upon startup and is
+ the same for every local interface and BGP peer.
+
+ BGP speaker
+ A router that implements BGP.
+
+ EBGP
+ External BGP (BGP connection between external peers).
+
+ External peer
+ Peer that is in a different Autonomous System than the local
+ system.
+
+ Feasible route
+ An advertised route that is available for use by the recipient.
+
+ IBGP
+ Internal BGP (BGP connection between internal peers).
+
+ Internal peer
+ Peer that is in the same Autonomous System as the local system.
+
+ IGP
+ Interior Gateway Protocol - a routing protocol used to exchange
+ routing information among routers within a single Autonomous
+ System.
+
+ Loc-RIB
+ The Loc-RIB contains the routes that have been selected by the
+ local BGP speaker's Decision Process.
+
+ NLRI
+ Network Layer Reachability Information.
+
+ Route
+ A unit of information that pairs a set of destinations with the
+ attributes of a path to those destinations. The set of
+
+
+
+Rekhter, et al. Standards Track [Page 5]
+
+RFC 4271 BGP-4 January 2006
+
+
+ destinations are systems whose IP addresses are contained in one
+ IP address prefix carried in the Network Layer Reachability
+ Information (NLRI) field of an UPDATE message. The path is the
+ information reported in the path attributes field of the same
+ UPDATE message.
+
+ RIB
+ Routing Information Base.
+
+ Unfeasible route
+ A previously advertised feasible route that is no longer available
+ for use.
+
+1.2. Specification of Requirements
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+ "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+ document are to be interpreted as described in RFC 2119 [RFC2119].
+
+2. Acknowledgements
+
+ This document was originally published as [RFC1267] in October 1991,
+ jointly authored by Kirk Lougheed and Yakov Rekhter.
+
+ We would like to express our thanks to Guy Almes, Len Bosack, and
+ Jeffrey C. Honig for their contributions to the earlier version
+ (BGP-1) of this document.
+
+ We would like to specially acknowledge numerous contributions by
+ Dennis Ferguson to the earlier version of this document.
+
+ We would like to explicitly thank Bob Braden for the review of the
+ earlier version (BGP-2) of this document, and for his constructive
+ and valuable comments.
+
+ We would also like to thank Bob Hinden, Director for Routing of the
+ Internet Engineering Steering Group, and the team of reviewers he
+ assembled to review the earlier version (BGP-2) of this document.
+ This team, consisting of Deborah Estrin, Milo Medin, John Moy, Radia
+ Perlman, Martha Steenstrup, Mike St. Johns, and Paul Tsuchiya, acted
+ with a strong combination of toughness, professionalism, and
+ courtesy.
+
+ Certain sections of the document borrowed heavily from IDRP
+ [IS10747], which is the OSI counterpart of BGP. For this, credit
+ should be given to the ANSI X3S3.3 group chaired by Lyman Chapin and
+ to Charles Kunzinger, who was the IDRP editor within that group.
+
+
+
+
+Rekhter, et al. Standards Track [Page 6]
+
+RFC 4271 BGP-4 January 2006
+
+
+ We would also like to thank Benjamin Abarbanel, Enke Chen, Edward
+ Crabbe, Mike Craren, Vincent Gillet, Eric Gray, Jeffrey Haas, Dimitry
+ Haskin, Stephen Kent, John Krawczyk, David LeRoy, Dan Massey,
+ Jonathan Natale, Dan Pei, Mathew Richardson, John Scudder, John
+ Stewart III, Dave Thaler, Paul Traina, Russ White, Curtis Villamizar,
+ and Alex Zinin for their comments.
+
+ We would like to specially acknowledge Andrew Lange for his help in
+ preparing the final version of this document.
+
+ Finally, we would like to thank all the members of the IDR Working
+ Group for their ideas and the support they have given to this
+ document.
+
+3. Summary of Operation
+
+ The Border Gateway Protocol (BGP) is an inter-Autonomous System
+ routing protocol. It is built on experience gained with EGP (as
+ defined in [RFC904]) and EGP usage in the NSFNET Backbone (as
+ described in [RFC1092] and [RFC1093]). For more BGP-related
+ information, see [RFC1772], [RFC1930], [RFC1997], and [RFC2858].
+
+ The primary function of a BGP speaking system is to exchange network
+ reachability information with other BGP systems. This network
+ reachability information includes information on the list of
+ Autonomous Systems (ASes) that reachability information traverses.
+ This information is sufficient for constructing a graph of AS
+ connectivity, from which routing loops may be pruned, and, at the AS
+ level, some policy decisions may be enforced.
+
+ In the context of this document, we assume that a BGP speaker
+ advertises to its peers only those routes that it uses itself (in
+ this context, a BGP speaker is said to "use" a BGP route if it is the
+ most preferred BGP route and is used in forwarding). All other cases
+ are outside the scope of this document.
+
+ In the context of this document, the term "IP address" refers to an
+ IP Version 4 address [RFC791].
+
+ Routing information exchanged via BGP supports only the destination-
+ based forwarding paradigm, which assumes that a router forwards a
+ packet based solely on the destination address carried in the IP
+ header of the packet. This, in turn, reflects the set of policy
+ decisions that can (and cannot) be enforced using BGP. Note that
+ some policies cannot be supported by the destination-based forwarding
+ paradigm, and thus require techniques such as source routing (aka
+ explicit routing) to be enforced. Such policies cannot be enforced
+ using BGP either. For example, BGP does not enable one AS to send
+
+
+
+Rekhter, et al. Standards Track [Page 7]
+
+RFC 4271 BGP-4 January 2006
+
+
+ traffic to a neighboring AS for forwarding to some destination
+ (reachable through but) beyond that neighboring AS, intending that
+ the traffic take a different route to that taken by the traffic
+ originating in the neighboring AS (for that same destination). On
+ the other hand, BGP can support any policy conforming to the
+ destination-based forwarding paradigm.
+
+ BGP-4 provides a new set of mechanisms for supporting Classless
+ Inter-Domain Routing (CIDR) [RFC1518, RFC1519]. These mechanisms
+ include support for advertising a set of destinations as an IP prefix
+ and eliminating the concept of a network "class" within BGP. BGP-4
+ also introduces mechanisms that allow aggregation of routes,
+ including aggregation of AS paths.
+
+ This document uses the term `Autonomous System' (AS) throughout. The
+ classic definition of an Autonomous System is a set of routers under
+ a single technical administration, using an interior gateway protocol
+ (IGP) and common metrics to determine how to route packets within the
+ AS, and using an inter-AS routing protocol to determine how to route
+ packets to other ASes. Since this classic definition was developed,
+ it has become common for a single AS to use several IGPs and,
+ sometimes, several sets of metrics within an AS. The use of the term
+ Autonomous System stresses the fact that, even when multiple IGPs and
+ metrics are used, the administration of an AS appears to other ASes
+ to have a single coherent interior routing plan and presents a
+ consistent picture of the destinations that are reachable through it.
+
+ BGP uses TCP [RFC793] as its transport protocol. This eliminates the
+ need to implement explicit update fragmentation, retransmission,
+ acknowledgement, and sequencing. BGP listens on TCP port 179. The
+ error notification mechanism used in BGP assumes that TCP supports a
+ "graceful" close (i.e., that all outstanding data will be delivered
+ before the connection is closed).
+
+ A TCP connection is formed between two systems. They exchange
+ messages to open and confirm the connection parameters.
+
+ The initial data flow is the portion of the BGP routing table that is
+ allowed by the export policy, called the Adj-Ribs-Out (see 3.2).
+ Incremental updates are sent as the routing tables change. BGP does
+ not require a periodic refresh of the routing table. To allow local
+ policy changes to have the correct effect without resetting any BGP
+ connections, a BGP speaker SHOULD either (a) retain the current
+ version of the routes advertised to it by all of its peers for the
+ duration of the connection, or (b) make use of the Route Refresh
+ extension [RFC2918].
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 8]
+
+RFC 4271 BGP-4 January 2006
+
+
+ KEEPALIVE messages may be sent periodically to ensure that the
+ connection is live. NOTIFICATION messages are sent in response to
+ errors or special conditions. If a connection encounters an error
+ condition, a NOTIFICATION message is sent and the connection is
+ closed.
+
+ A peer in a different AS is referred to as an external peer, while a
+ peer in the same AS is referred to as an internal peer. Internal BGP
+ and external BGP are commonly abbreviated as IBGP and EBGP.
+
+ If a particular AS has multiple BGP speakers and is providing transit
+ service for other ASes, then care must be taken to ensure a
+ consistent view of routing within the AS. A consistent view of the
+ interior routes of the AS is provided by the IGP used within the AS.
+ For the purpose of this document, it is assumed that a consistent
+ view of the routes exterior to the AS is provided by having all BGP
+ speakers within the AS maintain IBGP with each other.
+
+ This document specifies the base behavior of the BGP protocol. This
+ behavior can be, and is, modified by extension specifications. When
+ the protocol is extended, the new behavior is fully documented in the
+ extension specifications.
+
+3.1. Routes: Advertisement and Storage
+
+ For the purpose of this protocol, a route is defined as a unit of
+ information that pairs a set of destinations with the attributes of a
+ path to those destinations. The set of destinations are systems
+ whose IP addresses are contained in one IP address prefix that is
+ carried in the Network Layer Reachability Information (NLRI) field of
+ an UPDATE message, and the path is the information reported in the
+ path attributes field of the same UPDATE message.
+
+ Routes are advertised between BGP speakers in UPDATE messages.
+ Multiple routes that have the same path attributes can be advertised
+ in a single UPDATE message by including multiple prefixes in the NLRI
+ field of the UPDATE message.
+
+ Routes are stored in the Routing Information Bases (RIBs): namely,
+ the Adj-RIBs-In, the Loc-RIB, and the Adj-RIBs-Out, as described in
+ Section 3.2.
+
+ If a BGP speaker chooses to advertise a previously received route, it
+ MAY add to, or modify, the path attributes of the route before
+ advertising it to a peer.
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 9]
+
+RFC 4271 BGP-4 January 2006
+
+
+ BGP provides mechanisms by which a BGP speaker can inform its peers
+ that a previously advertised route is no longer available for use.
+ There are three methods by which a given BGP speaker can indicate
+ that a route has been withdrawn from service:
+
+ a) the IP prefix that expresses the destination for a previously
+ advertised route can be advertised in the WITHDRAWN ROUTES
+ field in the UPDATE message, thus marking the associated route
+ as being no longer available for use,
+
+ b) a replacement route with the same NLRI can be advertised, or
+
+ c) the BGP speaker connection can be closed, which implicitly
+ removes all routes the pair of speakers had advertised to each
+ other from service.
+
+ Changing the attribute(s) of a route is accomplished by advertising a
+ replacement route. The replacement route carries new (changed)
+ attributes and has the same address prefix as the original route.
+
+3.2. Routing Information Base
+
+ The Routing Information Base (RIB) within a BGP speaker consists of
+ three distinct parts:
+
+ a) Adj-RIBs-In: The Adj-RIBs-In stores routing information learned
+ from inbound UPDATE messages that were received from other BGP
+ speakers. Their contents represent routes that are available
+ as input to the Decision Process.
+
+ b) Loc-RIB: The Loc-RIB contains the local routing information the
+ BGP speaker selected by applying its local policies to the
+ routing information contained in its Adj-RIBs-In. These are
+ the routes that will be used by the local BGP speaker. The
+ next hop for each of these routes MUST be resolvable via the
+ local BGP speaker's Routing Table.
+
+ c) Adj-RIBs-Out: The Adj-RIBs-Out stores information the local BGP
+ speaker selected for advertisement to its peers. The routing
+ information stored in the Adj-RIBs-Out will be carried in the
+ local BGP speaker's UPDATE messages and advertised to its
+ peers.
+
+ In summary, the Adj-RIBs-In contains unprocessed routing information
+ that has been advertised to the local BGP speaker by its peers; the
+ Loc-RIB contains the routes that have been selected by the local BGP
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 10]
+
+RFC 4271 BGP-4 January 2006
+
+
+ speaker's Decision Process; and the Adj-RIBs-Out organizes the routes
+ for advertisement to specific peers (by means of the local speaker's
+ UPDATE messages).
+
+ Although the conceptual model distinguishes between Adj-RIBs-In,
+ Loc-RIB, and Adj-RIBs-Out, this neither implies nor requires that an
+ implementation must maintain three separate copies of the routing
+ information. The choice of implementation (for example, 3 copies of
+ the information vs 1 copy with pointers) is not constrained by the
+ protocol.
+
+ Routing information that the BGP speaker uses to forward packets (or
+ to construct the forwarding table used for packet forwarding) is
+ maintained in the Routing Table. The Routing Table accumulates
+ routes to directly connected networks, static routes, routes learned
+ from the IGP protocols, and routes learned from BGP. Whether a
+ specific BGP route should be installed in the Routing Table, and
+ whether a BGP route should override a route to the same destination
+ installed by another source, is a local policy decision, and is not
+ specified in this document. In addition to actual packet forwarding,
+ the Routing Table is used for resolution of the next-hop addresses
+ specified in BGP updates (see Section 5.1.3).
+
+4. Message Formats
+
+ This section describes message formats used by BGP.
+
+ BGP messages are sent over TCP connections. A message is processed
+ only after it is entirely received. The maximum message size is 4096
+ octets. All implementations are required to support this maximum
+ message size. The smallest message that may be sent consists of a
+ BGP header without a data portion (19 octets).
+
+ All multi-octet fields are in network byte order.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 11]
+
+RFC 4271 BGP-4 January 2006
+
+
+4.1. Message Header Format
+
+ Each message has a fixed-size header. There may or may not be a data
+ portion following the header, depending on the message type. The
+ layout of these fields is shown below:
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ + +
+ | |
+ + +
+ | Marker |
+ + +
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Length | Type |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Marker:
+
+ This 16-octet field is included for compatibility; it MUST be
+ set to all ones.
+
+ Length:
+
+ This 2-octet unsigned integer indicates the total length of the
+ message, including the header in octets. Thus, it allows one
+ to locate the (Marker field of the) next message in the TCP
+ stream. The value of the Length field MUST always be at least
+ 19 and no greater than 4096, and MAY be further constrained,
+ depending on the message type. "padding" of extra data after
+ the message is not allowed. Therefore, the Length field MUST
+ have the smallest value required, given the rest of the
+ message.
+
+ Type:
+
+ This 1-octet unsigned integer indicates the type code of the
+ message. This document defines the following type codes:
+
+ 1 - OPEN
+ 2 - UPDATE
+ 3 - NOTIFICATION
+ 4 - KEEPALIVE
+
+ [RFC2918] defines one more type code.
+
+
+
+Rekhter, et al. Standards Track [Page 12]
+
+RFC 4271 BGP-4 January 2006
+
+
+4.2. OPEN Message Format
+
+ After a TCP connection is established, the first message sent by each
+ side is an OPEN message. If the OPEN message is acceptable, a
+ KEEPALIVE message confirming the OPEN is sent back.
+
+ In addition to the fixed-size BGP header, the OPEN message contains
+ the following fields:
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+
+ | Version |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | My Autonomous System |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Hold Time |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | BGP Identifier |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Opt Parm Len |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |
+ | Optional Parameters (variable) |
+ | |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Version:
+
+ This 1-octet unsigned integer indicates the protocol version
+ number of the message. The current BGP version number is 4.
+
+ My Autonomous System:
+
+ This 2-octet unsigned integer indicates the Autonomous System
+ number of the sender.
+
+ Hold Time:
+
+ This 2-octet unsigned integer indicates the number of seconds
+ the sender proposes for the value of the Hold Timer. Upon
+ receipt of an OPEN message, a BGP speaker MUST calculate the
+ value of the Hold Timer by using the smaller of its configured
+ Hold Time and the Hold Time received in the OPEN message. The
+ Hold Time MUST be either zero or at least three seconds. An
+ implementation MAY reject connections on the basis of the Hold
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 13]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Time. The calculated value indicates the maximum number of
+ seconds that may elapse between the receipt of successive
+ KEEPALIVE and/or UPDATE messages from the sender.
+
+ BGP Identifier:
+
+ This 4-octet unsigned integer indicates the BGP Identifier of
+ the sender. A given BGP speaker sets the value of its BGP
+ Identifier to an IP address that is assigned to that BGP
+ speaker. The value of the BGP Identifier is determined upon
+ startup and is the same for every local interface and BGP peer.
+
+ Optional Parameters Length:
+
+ This 1-octet unsigned integer indicates the total length of the
+ Optional Parameters field in octets. If the value of this
+ field is zero, no Optional Parameters are present.
+
+ Optional Parameters:
+
+ This field contains a list of optional parameters, in which
+ each parameter is encoded as a <Parameter Type, Parameter
+ Length, Parameter Value> triplet.
+
+ 0 1
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...
+ | Parm. Type | Parm. Length | Parameter Value (variable)
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-...
+
+ Parameter Type is a one octet field that unambiguously
+ identifies individual parameters. Parameter Length is a one
+ octet field that contains the length of the Parameter Value
+ field in octets. Parameter Value is a variable length field
+ that is interpreted according to the value of the Parameter
+ Type field.
+
+ [RFC3392] defines the Capabilities Optional Parameter.
+
+ The minimum length of the OPEN message is 29 octets (including the
+ message header).
+
+4.3. UPDATE Message Format
+
+ UPDATE messages are used to transfer routing information between BGP
+ peers. The information in the UPDATE message can be used to
+ construct a graph that describes the relationships of the various
+ Autonomous Systems. By applying rules to be discussed, routing
+
+
+
+Rekhter, et al. Standards Track [Page 14]
+
+RFC 4271 BGP-4 January 2006
+
+
+ information loops and some other anomalies may be detected and
+ removed from inter-AS routing.
+
+ An UPDATE message is used to advertise feasible routes that share
+ common path attributes to a peer, or to withdraw multiple unfeasible
+ routes from service (see 3.1). An UPDATE message MAY simultaneously
+ advertise a feasible route and withdraw multiple unfeasible routes
+ from service. The UPDATE message always includes the fixed-size BGP
+ header, and also includes the other fields, as shown below (note,
+ some of the shown fields may not be present in every UPDATE message):
+
+ +-----------------------------------------------------+
+ | Withdrawn Routes Length (2 octets) |
+ +-----------------------------------------------------+
+ | Withdrawn Routes (variable) |
+ +-----------------------------------------------------+
+ | Total Path Attribute Length (2 octets) |
+ +-----------------------------------------------------+
+ | Path Attributes (variable) |
+ +-----------------------------------------------------+
+ | Network Layer Reachability Information (variable) |
+ +-----------------------------------------------------+
+
+ Withdrawn Routes Length:
+
+ This 2-octets unsigned integer indicates the total length of
+ the Withdrawn Routes field in octets. Its value allows the
+ length of the Network Layer Reachability Information field to
+ be determined, as specified below.
+
+ A value of 0 indicates that no routes are being withdrawn from
+ service, and that the WITHDRAWN ROUTES field is not present in
+ this UPDATE message.
+
+ Withdrawn Routes:
+
+ This is a variable-length field that contains a list of IP
+ address prefixes for the routes that are being withdrawn from
+ service. Each IP address prefix is encoded as a 2-tuple of the
+ form <length, prefix>, whose fields are described below:
+
+ +---------------------------+
+ | Length (1 octet) |
+ +---------------------------+
+ | Prefix (variable) |
+ +---------------------------+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 15]
+
+RFC 4271 BGP-4 January 2006
+
+
+ The use and the meaning of these fields are as follows:
+
+ a) Length:
+
+ The Length field indicates the length in bits of the IP
+ address prefix. A length of zero indicates a prefix that
+ matches all IP addresses (with prefix, itself, of zero
+ octets).
+
+ b) Prefix:
+
+ The Prefix field contains an IP address prefix, followed by
+ the minimum number of trailing bits needed to make the end
+ of the field fall on an octet boundary. Note that the value
+ of trailing bits is irrelevant.
+
+ Total Path Attribute Length:
+
+ This 2-octet unsigned integer indicates the total length of the
+ Path Attributes field in octets. Its value allows the length
+ of the Network Layer Reachability field to be determined as
+ specified below.
+
+ A value of 0 indicates that neither the Network Layer
+ Reachability Information field nor the Path Attribute field is
+ present in this UPDATE message.
+
+ Path Attributes:
+
+ A variable-length sequence of path attributes is present in
+ every UPDATE message, except for an UPDATE message that carries
+ only the withdrawn routes. Each path attribute is a triple
+ <attribute type, attribute length, attribute value> of variable
+ length.
+
+ Attribute Type is a two-octet field that consists of the
+ Attribute Flags octet, followed by the Attribute Type Code
+ octet.
+
+ 0 1
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Attr. Flags |Attr. Type Code|
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ The high-order bit (bit 0) of the Attribute Flags octet is the
+ Optional bit. It defines whether the attribute is optional (if
+ set to 1) or well-known (if set to 0).
+
+
+
+Rekhter, et al. Standards Track [Page 16]
+
+RFC 4271 BGP-4 January 2006
+
+
+ The second high-order bit (bit 1) of the Attribute Flags octet
+ is the Transitive bit. It defines whether an optional
+ attribute is transitive (if set to 1) or non-transitive (if set
+ to 0).
+
+ For well-known attributes, the Transitive bit MUST be set to 1.
+ (See Section 5 for a discussion of transitive attributes.)
+
+ The third high-order bit (bit 2) of the Attribute Flags octet
+ is the Partial bit. It defines whether the information
+ contained in the optional transitive attribute is partial (if
+ set to 1) or complete (if set to 0). For well-known attributes
+ and for optional non-transitive attributes, the Partial bit
+ MUST be set to 0.
+
+ The fourth high-order bit (bit 3) of the Attribute Flags octet
+ is the Extended Length bit. It defines whether the Attribute
+ Length is one octet (if set to 0) or two octets (if set to 1).
+
+ The lower-order four bits of the Attribute Flags octet are
+ unused. They MUST be zero when sent and MUST be ignored when
+ received.
+
+ The Attribute Type Code octet contains the Attribute Type Code.
+ Currently defined Attribute Type Codes are discussed in Section
+ 5.
+
+ If the Extended Length bit of the Attribute Flags octet is set
+ to 0, the third octet of the Path Attribute contains the length
+ of the attribute data in octets.
+
+ If the Extended Length bit of the Attribute Flags octet is set
+ to 1, the third and fourth octets of the path attribute contain
+ the length of the attribute data in octets.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 17]
+
+RFC 4271 BGP-4 January 2006
+
+
+ The remaining octets of the Path Attribute represent the
+ attribute value and are interpreted according to the Attribute
+ Flags and the Attribute Type Code. The supported Attribute
+ Type Codes, and their attribute values and uses are as follows:
+
+ a) ORIGIN (Type Code 1):
+
+ ORIGIN is a well-known mandatory attribute that defines the
+ origin of the path information. The data octet can assume
+ the following values:
+
+ Value Meaning
+
+ 0 IGP - Network Layer Reachability Information
+ is interior to the originating AS
+
+ 1 EGP - Network Layer Reachability Information
+ learned via the EGP protocol [RFC904]
+
+ 2 INCOMPLETE - Network Layer Reachability
+ Information learned by some other means
+
+ Usage of this attribute is defined in 5.1.1.
+
+ b) AS_PATH (Type Code 2):
+
+ AS_PATH is a well-known mandatory attribute that is composed
+ of a sequence of AS path segments. Each AS path segment is
+ represented by a triple <path segment type, path segment
+ length, path segment value>.
+
+ The path segment type is a 1-octet length field with the
+ following values defined:
+
+ Value Segment Type
+
+ 1 AS_SET: unordered set of ASes a route in the
+ UPDATE message has traversed
+
+ 2 AS_SEQUENCE: ordered set of ASes a route in
+ the UPDATE message has traversed
+
+ The path segment length is a 1-octet length field,
+ containing the number of ASes (not the number of octets) in
+ the path segment value field.
+
+ The path segment value field contains one or more AS
+ numbers, each encoded as a 2-octet length field.
+
+
+
+Rekhter, et al. Standards Track [Page 18]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Usage of this attribute is defined in 5.1.2.
+
+ c) NEXT_HOP (Type Code 3):
+
+ This is a well-known mandatory attribute that defines the
+ (unicast) IP address of the router that SHOULD be used as
+ the next hop to the destinations listed in the Network Layer
+ Reachability Information field of the UPDATE message.
+
+ Usage of this attribute is defined in 5.1.3.
+
+ d) MULTI_EXIT_DISC (Type Code 4):
+
+ This is an optional non-transitive attribute that is a
+ four-octet unsigned integer. The value of this attribute
+ MAY be used by a BGP speaker's Decision Process to
+ discriminate among multiple entry points to a neighboring
+ autonomous system.
+
+ Usage of this attribute is defined in 5.1.4.
+
+ e) LOCAL_PREF (Type Code 5):
+
+ LOCAL_PREF is a well-known attribute that is a four-octet
+ unsigned integer. A BGP speaker uses it to inform its other
+ internal peers of the advertising speaker's degree of
+ preference for an advertised route.
+
+ Usage of this attribute is defined in 5.1.5.
+
+ f) ATOMIC_AGGREGATE (Type Code 6)
+
+ ATOMIC_AGGREGATE is a well-known discretionary attribute of
+ length 0.
+
+ Usage of this attribute is defined in 5.1.6.
+
+ g) AGGREGATOR (Type Code 7)
+
+ AGGREGATOR is an optional transitive attribute of length 6.
+ The attribute contains the last AS number that formed the
+ aggregate route (encoded as 2 octets), followed by the IP
+ address of the BGP speaker that formed the aggregate route
+ (encoded as 4 octets). This SHOULD be the same address as
+ the one used for the BGP Identifier of the speaker.
+
+ Usage of this attribute is defined in 5.1.7.
+
+
+
+
+Rekhter, et al. Standards Track [Page 19]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Network Layer Reachability Information:
+
+ This variable length field contains a list of IP address
+ prefixes. The length, in octets, of the Network Layer
+ Reachability Information is not encoded explicitly, but can be
+ calculated as:
+
+ UPDATE message Length - 23 - Total Path Attributes Length
+ - Withdrawn Routes Length
+
+ where UPDATE message Length is the value encoded in the fixed-
+ size BGP header, Total Path Attribute Length, and Withdrawn
+ Routes Length are the values encoded in the variable part of
+ the UPDATE message, and 23 is a combined length of the fixed-
+ size BGP header, the Total Path Attribute Length field, and the
+ Withdrawn Routes Length field.
+
+ Reachability information is encoded as one or more 2-tuples of
+ the form <length, prefix>, whose fields are described below:
+
+ +---------------------------+
+ | Length (1 octet) |
+ +---------------------------+
+ | Prefix (variable) |
+ +---------------------------+
+
+ The use and the meaning of these fields are as follows:
+
+ a) Length:
+
+ The Length field indicates the length in bits of the IP
+ address prefix. A length of zero indicates a prefix that
+ matches all IP addresses (with prefix, itself, of zero
+ octets).
+
+ b) Prefix:
+
+ The Prefix field contains an IP address prefix, followed by
+ enough trailing bits to make the end of the field fall on an
+ octet boundary. Note that the value of the trailing bits is
+ irrelevant.
+
+ The minimum length of the UPDATE message is 23 octets -- 19 octets
+ for the fixed header + 2 octets for the Withdrawn Routes Length + 2
+ octets for the Total Path Attribute Length (the value of Withdrawn
+ Routes Length is 0 and the value of Total Path Attribute Length is
+ 0).
+
+
+
+
+Rekhter, et al. Standards Track [Page 20]
+
+RFC 4271 BGP-4 January 2006
+
+
+ An UPDATE message can advertise, at most, one set of path attributes,
+ but multiple destinations, provided that the destinations share these
+ attributes. All path attributes contained in a given UPDATE message
+ apply to all destinations carried in the NLRI field of the UPDATE
+ message.
+
+
+ An UPDATE message can list multiple routes that are to be withdrawn
+ from service. Each such route is identified by its destination
+ (expressed as an IP prefix), which unambiguously identifies the route
+ in the context of the BGP speaker - BGP speaker connection to which
+ it has been previously advertised.
+
+
+ An UPDATE message might advertise only routes that are to be
+ withdrawn from service, in which case the message will not include
+ path attributes or Network Layer Reachability Information.
+ Conversely, it may advertise only a feasible route, in which case the
+ WITHDRAWN ROUTES field need not be present.
+
+ An UPDATE message SHOULD NOT include the same address prefix in the
+ WITHDRAWN ROUTES and Network Layer Reachability Information fields.
+ However, a BGP speaker MUST be able to process UPDATE messages in
+ this form. A BGP speaker SHOULD treat an UPDATE message of this form
+ as though the WITHDRAWN ROUTES do not contain the address prefix.
+
+4.4. KEEPALIVE Message Format
+
+ BGP does not use any TCP-based, keep-alive mechanism to determine if
+ peers are reachable. Instead, KEEPALIVE messages are exchanged
+ between peers often enough not to cause the Hold Timer to expire. A
+ reasonable maximum time between KEEPALIVE messages would be one third
+ of the Hold Time interval. KEEPALIVE messages MUST NOT be sent more
+ frequently than one per second. An implementation MAY adjust the
+ rate at which it sends KEEPALIVE messages as a function of the Hold
+ Time interval.
+
+ If the negotiated Hold Time interval is zero, then periodic KEEPALIVE
+ messages MUST NOT be sent.
+
+ A KEEPALIVE message consists of only the message header and has a
+ length of 19 octets.
+
+4.5. NOTIFICATION Message Format
+
+ A NOTIFICATION message is sent when an error condition is detected.
+ The BGP connection is closed immediately after it is sent.
+
+
+
+
+Rekhter, et al. Standards Track [Page 21]
+
+RFC 4271 BGP-4 January 2006
+
+
+ In addition to the fixed-size BGP header, the NOTIFICATION message
+ contains the following fields:
+
+ 0 1 2 3
+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | Error code | Error subcode | Data (variable) |
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+ Error Code:
+
+ This 1-octet unsigned integer indicates the type of
+ NOTIFICATION. The following Error Codes have been defined:
+
+ Error Code Symbolic Name Reference
+
+ 1 Message Header Error Section 6.1
+
+ 2 OPEN Message Error Section 6.2
+
+ 3 UPDATE Message Error Section 6.3
+
+ 4 Hold Timer Expired Section 6.5
+
+ 5 Finite State Machine Error Section 6.6
+
+ 6 Cease Section 6.7
+
+ Error subcode:
+
+ This 1-octet unsigned integer provides more specific
+ information about the nature of the reported error. Each Error
+ Code may have one or more Error Subcodes associated with it.
+ If no appropriate Error Subcode is defined, then a zero
+ (Unspecific) value is used for the Error Subcode field.
+
+ Message Header Error subcodes:
+
+ 1 - Connection Not Synchronized.
+ 2 - Bad Message Length.
+ 3 - Bad Message Type.
+
+
+
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 22]
+
+RFC 4271 BGP-4 January 2006
+
+
+ OPEN Message Error subcodes:
+
+ 1 - Unsupported Version Number.
+ 2 - Bad Peer AS.
+ 3 - Bad BGP Identifier.
+ 4 - Unsupported Optional Parameter.
+ 5 - [Deprecated - see Appendix A].
+ 6 - Unacceptable Hold Time.
+
+ UPDATE Message Error subcodes:
+
+ 1 - Malformed Attribute List.
+ 2 - Unrecognized Well-known Attribute.
+ 3 - Missing Well-known Attribute.
+ 4 - Attribute Flags Error.
+ 5 - Attribute Length Error.
+ 6 - Invalid ORIGIN Attribute.
+ 7 - [Deprecated - see Appendix A].
+ 8 - Invalid NEXT_HOP Attribute.
+ 9 - Optional Attribute Error.
+ 10 - Invalid Network Field.
+ 11 - Malformed AS_PATH.
+
+ Data:
+
+ This variable-length field is used to diagnose the reason for
+ the NOTIFICATION. The contents of the Data field depend upon
+ the Error Code and Error Subcode. See Section 6 for more
+ details.
+
+ Note that the length of the Data field can be determined from
+ the message Length field by the formula:
+
+ Message Length = 21 + Data Length
+
+ The minimum length of the NOTIFICATION message is 21 octets
+ (including message header).
+
+5. Path Attributes
+
+ This section discusses the path attributes of the UPDATE message.
+
+ Path attributes fall into four separate categories:
+
+ 1. Well-known mandatory.
+ 2. Well-known discretionary.
+ 3. Optional transitive.
+ 4. Optional non-transitive.
+
+
+
+Rekhter, et al. Standards Track [Page 23]
+
+RFC 4271 BGP-4 January 2006
+
+
+ BGP implementations MUST recognize all well-known attributes. Some
+ of these attributes are mandatory and MUST be included in every
+ UPDATE message that contains NLRI. Others are discretionary and MAY
+ or MAY NOT be sent in a particular UPDATE message.
+
+ Once a BGP peer has updated any well-known attributes, it MUST pass
+ these attributes to its peers in any updates it transmits.
+
+ In addition to well-known attributes, each path MAY contain one or
+ more optional attributes. It is not required or expected that all
+ BGP implementations support all optional attributes. The handling of
+ an unrecognized optional attribute is determined by the setting of
+ the Transitive bit in the attribute flags octet. Paths with
+ unrecognized transitive optional attributes SHOULD be accepted. If a
+ path with an unrecognized transitive optional attribute is accepted
+ and passed to other BGP peers, then the unrecognized transitive
+ optional attribute of that path MUST be passed, along with the path,
+ to other BGP peers with the Partial bit in the Attribute Flags octet
+ set to 1. If a path with a recognized, transitive optional attribute
+ is accepted and passed along to other BGP peers and the Partial bit
+ in the Attribute Flags octet is set to 1 by some previous AS, it MUST
+ NOT be set back to 0 by the current AS. Unrecognized non-transitive
+ optional attributes MUST be quietly ignored and not passed along to
+ other BGP peers.
+
+ New, transitive optional attributes MAY be attached to the path by
+ the originator or by any other BGP speaker in the path. If they are
+ not attached by the originator, the Partial bit in the Attribute
+ Flags octet is set to 1. The rules for attaching new non-transitive
+ optional attributes will depend on the nature of the specific
+ attribute. The documentation of each new non-transitive optional
+ attribute will be expected to include such rules (the description of
+ the MULTI_EXIT_DISC attribute gives an example). All optional
+ attributes (both transitive and non-transitive), MAY be updated (if
+ appropriate) by BGP speakers in the path.
+
+ The sender of an UPDATE message SHOULD order path attributes within
+ the UPDATE message in ascending order of attribute type. The
+ receiver of an UPDATE message MUST be prepared to handle path
+ attributes within UPDATE messages that are out of order.
+
+ The same attribute (attribute with the same type) cannot appear more
+ than once within the Path Attributes field of a particular UPDATE
+ message.
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 24]
+
+RFC 4271 BGP-4 January 2006
+
+
+ The mandatory category refers to an attribute that MUST be present in
+ both IBGP and EBGP exchanges if NLRI are contained in the UPDATE
+ message. Attributes classified as optional for the purpose of the
+ protocol extension mechanism may be purely discretionary,
+ discretionary, required, or disallowed in certain contexts.
+
+ attribute EBGP IBGP
+ ORIGIN mandatory mandatory
+ AS_PATH mandatory mandatory
+ NEXT_HOP mandatory mandatory
+ MULTI_EXIT_DISC discretionary discretionary
+ LOCAL_PREF see Section 5.1.5 required
+ ATOMIC_AGGREGATE see Section 5.1.6 and 9.1.4
+ AGGREGATOR discretionary discretionary
+
+5.1. Path Attribute Usage
+
+ The usage of each BGP path attribute is described in the following
+ clauses.
+
+5.1.1. ORIGIN
+
+ ORIGIN is a well-known mandatory attribute. The ORIGIN attribute is
+ generated by the speaker that originates the associated routing
+ information. Its value SHOULD NOT be changed by any other speaker.
+
+5.1.2. AS_PATH
+
+ AS_PATH is a well-known mandatory attribute. This attribute
+ identifies the autonomous systems through which routing information
+ carried in this UPDATE message has passed. The components of this
+ list can be AS_SETs or AS_SEQUENCEs.
+
+ When a BGP speaker propagates a route it learned from another BGP
+ speaker's UPDATE message, it modifies the route's AS_PATH attribute
+ based on the location of the BGP speaker to which the route will be
+ sent:
+
+ a) When a given BGP speaker advertises the route to an internal
+ peer, the advertising speaker SHALL NOT modify the AS_PATH
+ attribute associated with the route.
+
+ b) When a given BGP speaker advertises the route to an external
+ peer, the advertising speaker updates the AS_PATH attribute as
+ follows:
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 25]
+
+RFC 4271 BGP-4 January 2006
+
+
+ 1) if the first path segment of the AS_PATH is of type
+ AS_SEQUENCE, the local system prepends its own AS number as
+ the last element of the sequence (put it in the leftmost
+ position with respect to the position of octets in the
+ protocol message). If the act of prepending will cause an
+ overflow in the AS_PATH segment (i.e., more than 255 ASes),
+ it SHOULD prepend a new segment of type AS_SEQUENCE and
+ prepend its own AS number to this new segment.
+
+ 2) if the first path segment of the AS_PATH is of type AS_SET,
+ the local system prepends a new path segment of type
+ AS_SEQUENCE to the AS_PATH, including its own AS number in
+ that segment.
+
+ 3) if the AS_PATH is empty, the local system creates a path
+ segment of type AS_SEQUENCE, places its own AS into that
+ segment, and places that segment into the AS_PATH.
+
+ When a BGP speaker originates a route then:
+
+ a) the originating speaker includes its own AS number in a path
+ segment, of type AS_SEQUENCE, in the AS_PATH attribute of all
+ UPDATE messages sent to an external peer. In this case, the AS
+ number of the originating speaker's autonomous system will be
+ the only entry the path segment, and this path segment will be
+ the only segment in the AS_PATH attribute.
+
+ b) the originating speaker includes an empty AS_PATH attribute in
+ all UPDATE messages sent to internal peers. (An empty AS_PATH
+ attribute is one whose length field contains the value zero).
+
+ Whenever the modification of the AS_PATH attribute calls for
+ including or prepending the AS number of the local system, the local
+ system MAY include/prepend more than one instance of its own AS
+ number in the AS_PATH attribute. This is controlled via local
+ configuration.
+
+5.1.3. NEXT_HOP
+
+ The NEXT_HOP is a well-known mandatory attribute that defines the IP
+ address of the router that SHOULD be used as the next hop to the
+ destinations listed in the UPDATE message. The NEXT_HOP attribute is
+ calculated as follows:
+
+ 1) When sending a message to an internal peer, if the route is not
+ locally originated, the BGP speaker SHOULD NOT modify the
+ NEXT_HOP attribute unless it has been explicitly configured to
+ announce its own IP address as the NEXT_HOP. When announcing a
+
+
+
+Rekhter, et al. Standards Track [Page 26]
+
+RFC 4271 BGP-4 January 2006
+
+
+ locally-originated route to an internal peer, the BGP speaker
+ SHOULD use the interface address of the router through which
+ the announced network is reachable for the speaker as the
+ NEXT_HOP. If the route is directly connected to the speaker,
+ or if the interface address of the router through which the
+ announced network is reachable for the speaker is the internal
+ peer's address, then the BGP speaker SHOULD use its own IP
+ address for the NEXT_HOP attribute (the address of the
+ interface that is used to reach the peer).
+
+ 2) When sending a message to an external peer, X, and the peer is
+ one IP hop away from the speaker:
+
+ - If the route being announced was learned from an internal
+ peer or is locally originated, the BGP speaker can use an
+ interface address of the internal peer router (or the
+ internal router) through which the announced network is
+ reachable for the speaker for the NEXT_HOP attribute,
+ provided that peer X shares a common subnet with this
+ address. This is a form of "third party" NEXT_HOP attribute.
+
+ - Otherwise, if the route being announced was learned from an
+ external peer, the speaker can use an IP address of any
+ adjacent router (known from the received NEXT_HOP attribute)
+ that the speaker itself uses for local route calculation in
+ the NEXT_HOP attribute, provided that peer X shares a common
+ subnet with this address. This is a second form of "third
+ party" NEXT_HOP attribute.
+
+ - Otherwise, if the external peer to which the route is being
+ advertised shares a common subnet with one of the interfaces
+ of the announcing BGP speaker, the speaker MAY use the IP
+ address associated with such an interface in the NEXT_HOP
+ attribute. This is known as a "first party" NEXT_HOP
+ attribute.
+
+ - By default (if none of the above conditions apply), the BGP
+ speaker SHOULD use the IP address of the interface that the
+ speaker uses to establish the BGP connection to peer X in the
+ NEXT_HOP attribute.
+
+ 3) When sending a message to an external peer X, and the peer is
+ multiple IP hops away from the speaker (aka "multihop EBGP"):
+
+ - The speaker MAY be configured to propagate the NEXT_HOP
+ attribute. In this case, when advertising a route that the
+ speaker learned from one of its peers, the NEXT_HOP attribute
+ of the advertised route is exactly the same as the NEXT_HOP
+
+
+
+Rekhter, et al. Standards Track [Page 27]
+
+RFC 4271 BGP-4 January 2006
+
+
+ attribute of the learned route (the speaker does not modify
+ the NEXT_HOP attribute).
+
+ - By default, the BGP speaker SHOULD use the IP address of the
+ interface that the speaker uses in the NEXT_HOP attribute to
+ establish the BGP connection to peer X.
+
+ Normally, the NEXT_HOP attribute is chosen such that the shortest
+ available path will be taken. A BGP speaker MUST be able to support
+ the disabling advertisement of third party NEXT_HOP attributes in
+ order to handle imperfectly bridged media.
+
+ A route originated by a BGP speaker SHALL NOT be advertised to a peer
+ using an address of that peer as NEXT_HOP. A BGP speaker SHALL NOT
+ install a route with itself as the next hop.
+
+ The NEXT_HOP attribute is used by the BGP speaker to determine the
+ actual outbound interface and immediate next-hop address that SHOULD
+ be used to forward transit packets to the associated destinations.
+
+ The immediate next-hop address is determined by performing a
+ recursive route lookup operation for the IP address in the NEXT_HOP
+ attribute, using the contents of the Routing Table, selecting one
+ entry if multiple entries of equal cost exist. The Routing Table
+ entry that resolves the IP address in the NEXT_HOP attribute will
+ always specify the outbound interface. If the entry specifies an
+ attached subnet, but does not specify a next-hop address, then the
+ address in the NEXT_HOP attribute SHOULD be used as the immediate
+ next-hop address. If the entry also specifies the next-hop address,
+ this address SHOULD be used as the immediate next-hop address for
+ packet forwarding.
+
+5.1.4. MULTI_EXIT_DISC
+
+ The MULTI_EXIT_DISC is an optional non-transitive attribute that is
+ intended to be used on external (inter-AS) links to discriminate
+ among multiple exit or entry points to the same neighboring AS. The
+ value of the MULTI_EXIT_DISC attribute is a four-octet unsigned
+ number, called a metric. All other factors being equal, the exit
+ point with the lower metric SHOULD be preferred. If received over
+ EBGP, the MULTI_EXIT_DISC attribute MAY be propagated over IBGP to
+ other BGP speakers within the same AS (see also 9.1.2.2). The
+ MULTI_EXIT_DISC attribute received from a neighboring AS MUST NOT be
+ propagated to other neighboring ASes.
+
+ A BGP speaker MUST implement a mechanism (based on local
+ configuration) that allows the MULTI_EXIT_DISC attribute to be
+ removed from a route. If a BGP speaker is configured to remove the
+
+
+
+Rekhter, et al. Standards Track [Page 28]
+
+RFC 4271 BGP-4 January 2006
+
+
+ MULTI_EXIT_DISC attribute from a route, then this removal MUST be
+ done prior to determining the degree of preference of the route and
+ prior to performing route selection (Decision Process phases 1 and
+ 2).
+
+ An implementation MAY also (based on local configuration) alter the
+ value of the MULTI_EXIT_DISC attribute received over EBGP. If a BGP
+ speaker is configured to alter the value of the MULTI_EXIT_DISC
+ attribute received over EBGP, then altering the value MUST be done
+ prior to determining the degree of preference of the route and prior
+ to performing route selection (Decision Process phases 1 and 2). See
+ Section 9.1.2.2 for necessary restrictions on this.
+
+5.1.5. LOCAL_PREF
+
+ LOCAL_PREF is a well-known attribute that SHALL be included in all
+ UPDATE messages that a given BGP speaker sends to other internal
+ peers. A BGP speaker SHALL calculate the degree of preference for
+ each external route based on the locally-configured policy, and
+ include the degree of preference when advertising a route to its
+ internal peers. The higher degree of preference MUST be preferred.
+ A BGP speaker uses the degree of preference learned via LOCAL_PREF in
+ its Decision Process (see Section 9.1.1).
+
+ A BGP speaker MUST NOT include this attribute in UPDATE messages it
+ sends to external peers, except in the case of BGP Confederations
+ [RFC3065]. If it is contained in an UPDATE message that is received
+ from an external peer, then this attribute MUST be ignored by the
+ receiving speaker, except in the case of BGP Confederations
+ [RFC3065].
+
+5.1.6. ATOMIC_AGGREGATE
+
+ ATOMIC_AGGREGATE is a well-known discretionary attribute.
+
+ When a BGP speaker aggregates several routes for the purpose of
+ advertisement to a particular peer, the AS_PATH of the aggregated
+ route normally includes an AS_SET formed from the set of ASes from
+ which the aggregate was formed. In many cases, the network
+ administrator can determine if the aggregate can safely be advertised
+ without the AS_SET, and without forming route loops.
+
+ If an aggregate excludes at least some of the AS numbers present in
+ the AS_PATH of the routes that are aggregated as a result of dropping
+ the AS_SET, the aggregated route, when advertised to the peer, SHOULD
+ include the ATOMIC_AGGREGATE attribute.
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 29]
+
+RFC 4271 BGP-4 January 2006
+
+
+ A BGP speaker that receives a route with the ATOMIC_AGGREGATE
+ attribute SHOULD NOT remove the attribute when propagating the route
+ to other speakers.
+
+ A BGP speaker that receives a route with the ATOMIC_AGGREGATE
+ attribute MUST NOT make any NLRI of that route more specific (as
+ defined in 9.1.4) when advertising this route to other BGP speakers.
+
+ A BGP speaker that receives a route with the ATOMIC_AGGREGATE
+ attribute needs to be aware of the fact that the actual path to
+ destinations, as specified in the NLRI of the route, while having the
+ loop-free property, may not be the path specified in the AS_PATH
+ attribute of the route.
+
+5.1.7. AGGREGATOR
+
+ AGGREGATOR is an optional transitive attribute, which MAY be included
+ in updates that are formed by aggregation (see Section 9.2.2.2). A
+ BGP speaker that performs route aggregation MAY add the AGGREGATOR
+ attribute, which SHALL contain its own AS number and IP address. The
+ IP address SHOULD be the same as the BGP Identifier of the speaker.
+
+6. BGP Error Handling.
+
+ This section describes actions to be taken when errors are detected
+ while processing BGP messages.
+
+ When any of the conditions described here are detected, a
+ NOTIFICATION message, with the indicated Error Code, Error Subcode,
+ and Data fields, is sent, and the BGP connection is closed (unless it
+ is explicitly stated that no NOTIFICATION message is to be sent and
+ the BGP connection is not to be closed). If no Error Subcode is
+ specified, then a zero MUST be used.
+
+ The phrase "the BGP connection is closed" means the TCP connection
+ has been closed, the associated Adj-RIB-In has been cleared, and all
+ resources for that BGP connection have been deallocated. Entries in
+ the Loc-RIB associated with the remote peer are marked as invalid.
+ The local system recalculates its best routes for the destinations of
+ the routes marked as invalid. Before the invalid routes are deleted
+ from the system, it advertises, to its peers, either withdraws for
+ the routes marked as invalid, or the new best routes before the
+ invalid routes are deleted from the system.
+
+ Unless specified explicitly, the Data field of the NOTIFICATION
+ message that is sent to indicate an error is empty.
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 30]
+
+RFC 4271 BGP-4 January 2006
+
+
+6.1. Message Header Error Handling
+
+ All errors detected while processing the Message Header MUST be
+ indicated by sending the NOTIFICATION message with the Error Code
+ Message Header Error. The Error Subcode elaborates on the specific
+ nature of the error.
+
+ The expected value of the Marker field of the message header is all
+ ones. If the Marker field of the message header is not as expected,
+ then a synchronization error has occurred and the Error Subcode MUST
+ be set to Connection Not Synchronized.
+
+ If at least one of the following is true:
+
+ - if the Length field of the message header is less than 19 or
+ greater than 4096, or
+
+ - if the Length field of an OPEN message is less than the minimum
+ length of the OPEN message, or
+
+ - if the Length field of an UPDATE message is less than the
+ minimum length of the UPDATE message, or
+
+ - if the Length field of a KEEPALIVE message is not equal to 19,
+ or
+
+ - if the Length field of a NOTIFICATION message is less than the
+ minimum length of the NOTIFICATION message,
+
+ then the Error Subcode MUST be set to Bad Message Length. The Data
+ field MUST contain the erroneous Length field.
+
+ If the Type field of the message header is not recognized, then the
+ Error Subcode MUST be set to Bad Message Type. The Data field MUST
+ contain the erroneous Type field.
+
+6.2. OPEN Message Error Handling
+
+ All errors detected while processing the OPEN message MUST be
+ indicated by sending the NOTIFICATION message with the Error Code
+ OPEN Message Error. The Error Subcode elaborates on the specific
+ nature of the error.
+
+ If the version number in the Version field of the received OPEN
+ message is not supported, then the Error Subcode MUST be set to
+ Unsupported Version Number. The Data field is a 2-octet unsigned
+ integer, which indicates the largest, locally-supported version
+ number less than the version the remote BGP peer bid (as indicated in
+
+
+
+Rekhter, et al. Standards Track [Page 31]
+
+RFC 4271 BGP-4 January 2006
+
+
+ the received OPEN message), or if the smallest, locally-supported
+ version number is greater than the version the remote BGP peer bid,
+ then the smallest, locally-supported version number.
+
+ If the Autonomous System field of the OPEN message is unacceptable,
+ then the Error Subcode MUST be set to Bad Peer AS. The determination
+ of acceptable Autonomous System numbers is outside the scope of this
+ protocol.
+
+ If the Hold Time field of the OPEN message is unacceptable, then the
+ Error Subcode MUST be set to Unacceptable Hold Time. An
+ implementation MUST reject Hold Time values of one or two seconds.
+ An implementation MAY reject any proposed Hold Time. An
+ implementation that accepts a Hold Time MUST use the negotiated value
+ for the Hold Time.
+
+ If the BGP Identifier field of the OPEN message is syntactically
+ incorrect, then the Error Subcode MUST be set to Bad BGP Identifier.
+ Syntactic correctness means that the BGP Identifier field represents
+ a valid unicast IP host address.
+
+ If one of the Optional Parameters in the OPEN message is not
+ recognized, then the Error Subcode MUST be set to Unsupported
+ Optional Parameters.
+
+ If one of the Optional Parameters in the OPEN message is recognized,
+ but is malformed, then the Error Subcode MUST be set to 0
+ (Unspecific).
+
+6.3. UPDATE Message Error Handling
+
+ All errors detected while processing the UPDATE message MUST be
+ indicated by sending the NOTIFICATION message with the Error Code
+ UPDATE Message Error. The error subcode elaborates on the specific
+ nature of the error.
+
+ Error checking of an UPDATE message begins by examining the path
+ attributes. If the Withdrawn Routes Length or Total Attribute Length
+ is too large (i.e., if Withdrawn Routes Length + Total Attribute
+ Length + 23 exceeds the message Length), then the Error Subcode MUST
+ be set to Malformed Attribute List.
+
+ If any recognized attribute has Attribute Flags that conflict with
+ the Attribute Type Code, then the Error Subcode MUST be set to
+ Attribute Flags Error. The Data field MUST contain the erroneous
+ attribute (type, length, and value).
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 32]
+
+RFC 4271 BGP-4 January 2006
+
+
+ If any recognized attribute has an Attribute Length that conflicts
+ with the expected length (based on the attribute type code), then the
+ Error Subcode MUST be set to Attribute Length Error. The Data field
+ MUST contain the erroneous attribute (type, length, and value).
+
+ If any of the well-known mandatory attributes are not present, then
+ the Error Subcode MUST be set to Missing Well-known Attribute. The
+ Data field MUST contain the Attribute Type Code of the missing,
+ well-known attribute.
+
+ If any of the well-known mandatory attributes are not recognized,
+ then the Error Subcode MUST be set to Unrecognized Well-known
+ Attribute. The Data field MUST contain the unrecognized attribute
+ (type, length, and value).
+
+ If the ORIGIN attribute has an undefined value, then the Error Sub-
+ code MUST be set to Invalid Origin Attribute. The Data field MUST
+ contain the unrecognized attribute (type, length, and value).
+
+ If the NEXT_HOP attribute field is syntactically incorrect, then the
+ Error Subcode MUST be set to Invalid NEXT_HOP Attribute. The Data
+ field MUST contain the incorrect attribute (type, length, and value).
+ Syntactic correctness means that the NEXT_HOP attribute represents a
+ valid IP host address.
+
+ The IP address in the NEXT_HOP MUST meet the following criteria to be
+ considered semantically correct:
+
+ a) It MUST NOT be the IP address of the receiving speaker.
+
+ b) In the case of an EBGP, where the sender and receiver are one
+ IP hop away from each other, either the IP address in the
+ NEXT_HOP MUST be the sender's IP address that is used to
+ establish the BGP connection, or the interface associated with
+ the NEXT_HOP IP address MUST share a common subnet with the
+ receiving BGP speaker.
+
+ If the NEXT_HOP attribute is semantically incorrect, the error SHOULD
+ be logged, and the route SHOULD be ignored. In this case, a
+ NOTIFICATION message SHOULD NOT be sent, and the connection SHOULD
+ NOT be closed.
+
+ The AS_PATH attribute is checked for syntactic correctness. If the
+ path is syntactically incorrect, then the Error Subcode MUST be set
+ to Malformed AS_PATH.
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 33]
+
+RFC 4271 BGP-4 January 2006
+
+
+ If the UPDATE message is received from an external peer, the local
+ system MAY check whether the leftmost (with respect to the position
+ of octets in the protocol message) AS in the AS_PATH attribute is
+ equal to the autonomous system number of the peer that sent the
+ message. If the check determines this is not the case, the Error
+ Subcode MUST be set to Malformed AS_PATH.
+
+ If an optional attribute is recognized, then the value of this
+ attribute MUST be checked. If an error is detected, the attribute
+ MUST be discarded, and the Error Subcode MUST be set to Optional
+ Attribute Error. The Data field MUST contain the attribute (type,
+ length, and value).
+
+ If any attribute appears more than once in the UPDATE message, then
+ the Error Subcode MUST be set to Malformed Attribute List.
+
+ The NLRI field in the UPDATE message is checked for syntactic
+ validity. If the field is syntactically incorrect, then the Error
+ Subcode MUST be set to Invalid Network Field.
+
+ If a prefix in the NLRI field is semantically incorrect (e.g., an
+ unexpected multicast IP address), an error SHOULD be logged locally,
+ and the prefix SHOULD be ignored.
+
+ An UPDATE message that contains correct path attributes, but no NLRI,
+ SHALL be treated as a valid UPDATE message.
+
+6.4. NOTIFICATION Message Error Handling
+
+ If a peer sends a NOTIFICATION message, and the receiver of the
+ message detects an error in that message, the receiver cannot use a
+ NOTIFICATION message to report this error back to the peer. Any such
+ error (e.g., an unrecognized Error Code or Error Subcode) SHOULD be
+ noticed, logged locally, and brought to the attention of the
+ administration of the peer. The means to do this, however, lies
+ outside the scope of this document.
+
+6.5. Hold Timer Expired Error Handling
+
+ If a system does not receive successive KEEPALIVE, UPDATE, and/or
+ NOTIFICATION messages within the period specified in the Hold Time
+ field of the OPEN message, then the NOTIFICATION message with the
+ Hold Timer Expired Error Code is sent and the BGP connection is
+ closed.
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 34]
+
+RFC 4271 BGP-4 January 2006
+
+
+6.6. Finite State Machine Error Handling
+
+ Any error detected by the BGP Finite State Machine (e.g., receipt of
+ an unexpected event) is indicated by sending the NOTIFICATION message
+ with the Error Code Finite State Machine Error.
+
+6.7. Cease
+
+ In the absence of any fatal errors (that are indicated in this
+ section), a BGP peer MAY choose, at any given time, to close its BGP
+ connection by sending the NOTIFICATION message with the Error Code
+ Cease. However, the Cease NOTIFICATION message MUST NOT be used when
+ a fatal error indicated by this section does exist.
+
+ A BGP speaker MAY support the ability to impose a locally-configured,
+ upper bound on the number of address prefixes the speaker is willing
+ to accept from a neighbor. When the upper bound is reached, the
+ speaker, under control of local configuration, either (a) discards
+ new address prefixes from the neighbor (while maintaining the BGP
+ connection with the neighbor), or (b) terminates the BGP connection
+ with the neighbor. If the BGP speaker decides to terminate its BGP
+ connection with a neighbor because the number of address prefixes
+ received from the neighbor exceeds the locally-configured, upper
+ bound, then the speaker MUST send the neighbor a NOTIFICATION message
+ with the Error Code Cease. The speaker MAY also log this locally.
+
+6.8. BGP Connection Collision Detection
+
+ If a pair of BGP speakers try to establish a BGP connection with each
+ other simultaneously, then two parallel connections well be formed.
+ If the source IP address used by one of these connections is the same
+ as the destination IP address used by the other, and the destination
+ IP address used by the first connection is the same as the source IP
+ address used by the other, connection collision has occurred. In the
+ event of connection collision, one of the connections MUST be closed.
+
+ Based on the value of the BGP Identifier, a convention is established
+ for detecting which BGP connection is to be preserved when a
+ collision occurs. The convention is to compare the BGP Identifiers
+ of the peers involved in the collision and to retain only the
+ connection initiated by the BGP speaker with the higher-valued BGP
+ Identifier.
+
+ Upon receipt of an OPEN message, the local system MUST examine all of
+ its connections that are in the OpenConfirm state. A BGP speaker MAY
+ also examine connections in an OpenSent state if it knows the BGP
+ Identifier of the peer by means outside of the protocol. If, among
+ these connections, there is a connection to a remote BGP speaker
+
+
+
+Rekhter, et al. Standards Track [Page 35]
+
+RFC 4271 BGP-4 January 2006
+
+
+ whose BGP Identifier equals the one in the OPEN message, and this
+ connection collides with the connection over which the OPEN message
+ is received, then the local system performs the following collision
+ resolution procedure:
+
+ 1) The BGP Identifier of the local system is compared to the BGP
+ Identifier of the remote system (as specified in the OPEN
+ message). Comparing BGP Identifiers is done by converting them
+ to host byte order and treating them as 4-octet unsigned
+ integers.
+
+ 2) If the value of the local BGP Identifier is less than the
+ remote one, the local system closes the BGP connection that
+ already exists (the one that is already in the OpenConfirm
+ state), and accepts the BGP connection initiated by the remote
+ system.
+
+ 3) Otherwise, the local system closes the newly created BGP
+ connection (the one associated with the newly received OPEN
+ message), and continues to use the existing one (the one that
+ is already in the OpenConfirm state).
+
+ Unless allowed via configuration, a connection collision with an
+ existing BGP connection that is in the Established state causes
+ closing of the newly created connection.
+
+ Note that a connection collision cannot be detected with connections
+ that are in Idle, Connect, or Active states.
+
+ Closing the BGP connection (that results from the collision
+ resolution procedure) is accomplished by sending the NOTIFICATION
+ message with the Error Code Cease.
+
+7. BGP Version Negotiation
+
+ BGP speakers MAY negotiate the version of the protocol by making
+ multiple attempts at opening a BGP connection, starting with the
+ highest version number each BGP speaker supports. If an open attempt
+ fails with an Error Code, OPEN Message Error, and an Error Subcode,
+ Unsupported Version Number, then the BGP speaker has available the
+ version number it tried, the version number its peer tried, the
+ version number passed by its peer in the NOTIFICATION message, and
+ the version numbers it supports. If the two peers do support one or
+ more common versions, then this will allow them to rapidly determine
+ the highest common version. In order to support BGP version
+ negotiation, future versions of BGP MUST retain the format of the
+ OPEN and NOTIFICATION messages.
+
+
+
+
+Rekhter, et al. Standards Track [Page 36]
+
+RFC 4271 BGP-4 January 2006
+
+
+8. BGP Finite State Machine (FSM)
+
+ The data structures and FSM described in this document are conceptual
+ and do not have to be implemented precisely as described here, as
+ long as the implementations support the described functionality and
+ they exhibit the same externally visible behavior.
+
+ This section specifies the BGP operation in terms of a Finite State
+ Machine (FSM). The section falls into two parts:
+
+ 1) Description of Events for the State machine (Section 8.1)
+ 2) Description of the FSM (Section 8.2)
+
+ Session attributes required (mandatory) for each connection are:
+
+ 1) State
+ 2) ConnectRetryCounter
+ 3) ConnectRetryTimer
+ 4) ConnectRetryTime
+ 5) HoldTimer
+ 6) HoldTime
+ 7) KeepaliveTimer
+ 8) KeepaliveTime
+
+ The state session attribute indicates the current state of the BGP
+ FSM. The ConnectRetryCounter indicates the number of times a BGP
+ peer has tried to establish a peer session.
+
+ The mandatory attributes related to timers are described in Section
+ 10. Each timer has a "timer" and a "time" (the initial value).
+
+ The optional Session attributes are listed below. These optional
+ attributes may be supported, either per connection or per local
+ system:
+
+ 1) AcceptConnectionsUnconfiguredPeers
+ 2) AllowAutomaticStart
+ 3) AllowAutomaticStop
+ 4) CollisionDetectEstablishedState
+ 5) DampPeerOscillations
+ 6) DelayOpen
+ 7) DelayOpenTime
+ 8) DelayOpenTimer
+ 9) IdleHoldTime
+ 10) IdleHoldTimer
+ 11) PassiveTcpEstablishment
+ 12) SendNOTIFICATIONwithoutOPEN
+ 13) TrackTcpState
+
+
+
+Rekhter, et al. Standards Track [Page 37]
+
+RFC 4271 BGP-4 January 2006
+
+
+ The optional session attributes support different features of the BGP
+ functionality that have implications for the BGP FSM state
+ transitions. Two groups of the attributes which relate to timers
+ are:
+
+ group 1: DelayOpen, DelayOpenTime, DelayOpenTimer
+ group 2: DampPeerOscillations, IdleHoldTime, IdleHoldTimer
+
+ The first parameter (DelayOpen, DampPeerOscillations) is an optional
+ attribute that indicates that the Timer function is active. The
+ "Time" value specifies the initial value for the "Timer"
+ (DelayOpenTime, IdleHoldTime). The "Timer" specifies the actual
+ timer.
+
+ Please refer to Section 8.1.1 for an explanation of the interaction
+ between these optional attributes and the events signaled to the
+ state machine. Section 8.2.1.3 also provides a short overview of the
+ different types of optional attributes (flags or timers).
+
+8.1. Events for the BGP FSM
+
+8.1.1. Optional Events Linked to Optional Session Attributes
+
+ The Inputs to the BGP FSM are events. Events can either be mandatory
+ or optional. Some optional events are linked to optional session
+ attributes. Optional session attributes enable several groups of FSM
+ functionality.
+
+ The linkage between FSM functionality, events, and the optional
+ session attributes are described below.
+
+ Group 1: Automatic Administrative Events (Start/Stop)
+
+ Optional Session Attributes: AllowAutomaticStart,
+ AllowAutomaticStop,
+ DampPeerOscillations,
+ IdleHoldTime, IdleHoldTimer
+
+ Option 1: AllowAutomaticStart
+
+ Description: A BGP peer connection can be started and stopped
+ by administrative control. This administrative
+ control can either be manual, based on operator
+ intervention, or under the control of logic that
+ is specific to a BGP implementation. The term
+ "automatic" refers to a start being issued to the
+ BGP peer connection FSM when such logic determines
+ that the BGP peer connection should be restarted.
+
+
+
+Rekhter, et al. Standards Track [Page 38]
+
+RFC 4271 BGP-4 January 2006
+
+
+ The AllowAutomaticStart attribute specifies that
+ this BGP connection supports automatic starting of
+ the BGP connection.
+
+ If the BGP implementation supports
+ AllowAutomaticStart, the peer may be repeatedly
+ restarted. Three other options control the rate
+ at which the automatic restart occurs:
+ DampPeerOscillations, IdleHoldTime, and the
+ IdleHoldTimer.
+
+ The DampPeerOscillations option specifies that the
+ implementation engages additional logic to damp
+ the oscillations of BGP peers in the face of
+ sequences of automatic start and automatic stop.
+ IdleHoldTime specifies the length of time the BGP
+ peer is held in the Idle state prior to allowing
+ the next automatic restart. The IdleHoldTimer is
+ the timer that holds the peer in Idle state.
+
+ An example of DampPeerOscillations logic is an
+ increase of the IdleHoldTime value if a BGP peer
+ oscillates connectivity (connected/disconnected)
+ repeatedly within a time period. To engage this
+ logic, a peer could connect and disconnect 10
+ times within 5 minutes. The IdleHoldTime value
+ would be reset from 0 to 120 seconds.
+
+ Values: TRUE or FALSE
+
+ Option 2: AllowAutomaticStop
+
+ Description: This BGP peer session optional attribute indicates
+ that the BGP connection allows "automatic"
+ stopping of the BGP connection. An "automatic"
+ stop is defined as a stop under the control of
+ implementation-specific logic. The
+ implementation-specific logic is outside the scope
+ of this specification.
+
+ Values: TRUE or FALSE
+
+ Option 3: DampPeerOscillations
+
+ Description: The DampPeerOscillations optional session
+ attribute indicates that the BGP connection is
+ using logic that damps BGP peer oscillations in
+ the Idle State.
+
+
+
+Rekhter, et al. Standards Track [Page 39]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Value: TRUE or FALSE
+
+ Option 4: IdleHoldTime
+
+ Description: The IdleHoldTime is the value that is set in the
+ IdleHoldTimer.
+
+ Values: Time in seconds
+
+ Option 5: IdleHoldTimer
+
+ Description: The IdleHoldTimer aids in controlling BGP peer
+ oscillation. The IdleHoldTimer is used to keep
+ the BGP peer in Idle for a particular duration.
+ The IdleHoldTimer_Expires event is described in
+ Section 8.1.3.
+
+ Values: Time in seconds
+
+ Group 2: Unconfigured Peers
+
+ Optional Session Attributes: AcceptConnectionsUnconfiguredPeers
+
+ Option 1: AcceptConnectionsUnconfiguredPeers
+
+ Description: The BGP FSM optionally allows the acceptance of
+ BGP peer connections from neighbors that are not
+ pre-configured. The
+ "AcceptConnectionsUnconfiguredPeers" optional
+ session attribute allows the FSM to support the
+ state transitions that allow the implementation to
+ accept or reject these unconfigured peers.
+
+ The AcceptConnectionsUnconfiguredPeers has
+ security implications. Please refer to the BGP
+ Vulnerabilities document [RFC4272] for details.
+
+ Value: True or False
+
+ Group 3: TCP processing
+
+ Optional Session Attributes: PassiveTcpEstablishment,
+ TrackTcpState
+
+ Option 1: PassiveTcpEstablishment
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 40]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Description: This option indicates that the BGP FSM will
+ passively wait for the remote BGP peer to
+ establish the BGP TCP connection.
+
+ value: TRUE or FALSE
+
+ Option 2: TrackTcpState
+
+ Description: The BGP FSM normally tracks the end result of a
+ TCP connection attempt rather than individual TCP
+ messages. Optionally, the BGP FSM can support
+ additional interaction with the TCP connection
+ negotiation. The interaction with the TCP events
+ may increase the amount of logging the BGP peer
+ connection requires and the number of BGP FSM
+ changes.
+
+ Value: TRUE or FALSE
+
+ Group 4: BGP Message Processing
+
+ Optional Session Attributes: DelayOpen, DelayOpenTime,
+ DelayOpenTimer,
+ SendNOTIFICATIONwithoutOPEN,
+ CollisionDetectEstablishedState
+
+ Option 1: DelayOpen
+
+ Description: The DelayOpen optional session attribute allows
+ implementations to be configured to delay sending
+ an OPEN message for a specific time period
+ (DelayOpenTime). The delay allows the remote BGP
+ Peer time to send the first OPEN message.
+
+ Value: TRUE or FALSE
+
+ Option 2: DelayOpenTime
+
+ Description: The DelayOpenTime is the initial value set in the
+ DelayOpenTimer.
+
+ Value: Time in seconds
+
+ Option 3: DelayOpenTimer
+
+ Description: The DelayOpenTimer optional session attribute is
+ used to delay the sending of an OPEN message on a
+
+
+
+
+Rekhter, et al. Standards Track [Page 41]
+
+RFC 4271 BGP-4 January 2006
+
+
+ connection. The DelayOpenTimer_Expires event
+ (Event 12) is described in Section 8.1.3.
+
+ Value: Time in seconds
+
+ Option 4: SendNOTIFICATIONwithoutOPEN
+
+ Description: The SendNOTIFICATIONwithoutOPEN allows a peer to
+ send a NOTIFICATION without first sending an OPEN
+ message. Without this optional session attribute,
+ the BGP connection assumes that an OPEN message
+ must be sent by a peer prior to the peer sending a
+ NOTIFICATION message.
+
+ Value: True or False
+
+ Option 5: CollisionDetectEstablishedState
+
+ Description: Normally, a Detect Collision (see Section 6.8)
+ will be ignored in the Established state. This
+ optional session attribute indicates that this BGP
+ connection processes collisions in the Established
+ state.
+
+ Value: True or False
+
+ Note: The optional session attributes clarify the BGP FSM
+ description for existing features of BGP implementations.
+ The optional session attributes may be pre-defined for an
+ implementation and not readable via management interfaces
+ for existing correct implementations. As newer BGP MIBs
+ (version 2 and beyond) are supported, these fields will be
+ accessible via a management interface.
+
+8.1.2. Administrative Events
+
+ An administrative event is an event in which the operator interface
+ and BGP Policy engine signal the BGP-finite state machine to start or
+ stop the BGP state machine. The basic start and stop indications are
+ augmented by optional connection attributes that signal a certain
+ type of start or stop mechanism to the BGP FSM. An example of this
+ combination is Event 5, AutomaticStart_with_PassiveTcpEstablishment.
+ With this event, the BGP implementation signals to the BGP FSM that
+ the implementation is using an Automatic Start with the option to use
+ a Passive TCP Establishment. The Passive TCP establishment signals
+ that this BGP FSM will wait for the remote side to start the TCP
+ establishment.
+
+
+
+
+Rekhter, et al. Standards Track [Page 42]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Note that only Event 1 (ManualStart) and Event 2 (ManualStop) are
+ mandatory administrative events. All other administrative events are
+ optional (Events 3-8). Each event below has a name, definition,
+ status (mandatory or optional), and the optional session attributes
+ that SHOULD be set at each stage. When generating Event 1 through
+ Event 8 for the BGP FSM, the conditions specified in the "Optional
+ Attribute Status" section are verified. If any of these conditions
+ are not satisfied, then the local system should log an FSM error.
+
+ The settings of optional session attributes may be implicit in some
+ implementations, and therefore may not be set explicitly by an
+ external operator action. Section 8.2.1.5 describes these implicit
+ settings of the optional session attributes. The administrative
+ states described below may also be implicit in some implementations
+ and not directly configurable by an external operator.
+
+ Event 1: ManualStart
+
+ Definition: Local system administrator manually starts the peer
+ connection.
+
+ Status: Mandatory
+
+ Optional
+ Attribute
+ Status: The PassiveTcpEstablishment attribute SHOULD be set
+ to FALSE.
+
+ Event 2: ManualStop
+
+ Definition: Local system administrator manually stops the peer
+ connection.
+
+ Status: Mandatory
+
+ Optional
+ Attribute
+ Status: No interaction with any optional attributes.
+
+ Event 3: AutomaticStart
+
+ Definition: Local system automatically starts the BGP
+ connection.
+
+ Status: Optional, depending on local system
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 43]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Optional
+ Attribute
+ Status: 1) The AllowAutomaticStart attribute SHOULD be set
+ to TRUE if this event occurs.
+ 2) If the PassiveTcpEstablishment optional session
+ attribute is supported, it SHOULD be set to
+ FALSE.
+ 3) If the DampPeerOscillations is supported, it
+ SHOULD be set to FALSE when this event occurs.
+
+ Event 4: ManualStart_with_PassiveTcpEstablishment
+
+ Definition: Local system administrator manually starts the peer
+ connection, but has PassiveTcpEstablishment
+ enabled. The PassiveTcpEstablishment optional
+ attribute indicates that the peer will listen prior
+ to establishing the connection.
+
+ Status: Optional, depending on local system
+
+ Optional
+ Attribute
+ Status: 1) The PassiveTcpEstablishment attribute SHOULD be
+ set to TRUE if this event occurs.
+ 2) The DampPeerOscillations attribute SHOULD be set
+ to FALSE when this event occurs.
+
+ Event 5: AutomaticStart_with_PassiveTcpEstablishment
+
+ Definition: Local system automatically starts the BGP
+ connection with the PassiveTcpEstablishment
+ enabled. The PassiveTcpEstablishment optional
+ attribute indicates that the peer will listen prior
+ to establishing a connection.
+
+ Status: Optional, depending on local system
+
+ Optional
+ Attribute
+ Status: 1) The AllowAutomaticStart attribute SHOULD be set
+ to TRUE.
+ 2) The PassiveTcpEstablishment attribute SHOULD be
+ set to TRUE.
+ 3) If the DampPeerOscillations attribute is
+ supported, the DampPeerOscillations SHOULD be
+ set to FALSE.
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 44]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Event 6: AutomaticStart_with_DampPeerOscillations
+
+ Definition: Local system automatically starts the BGP peer
+ connection with peer oscillation damping enabled.
+ The exact method of damping persistent peer
+ oscillations is determined by the implementation
+ and is outside the scope of this document.
+
+ Status: Optional, depending on local system.
+
+ Optional
+ Attribute
+ Status: 1) The AllowAutomaticStart attribute SHOULD be set
+ to TRUE.
+ 2) The DampPeerOscillations attribute SHOULD be set
+ to TRUE.
+ 3) The PassiveTcpEstablishment attribute SHOULD be
+ set to FALSE.
+
+ Event 7: AutomaticStart_with_DampPeerOscillations_and_
+ PassiveTcpEstablishment
+
+ Definition: Local system automatically starts the BGP peer
+ connection with peer oscillation damping enabled
+ and PassiveTcpEstablishment enabled. The exact
+ method of damping persistent peer oscillations is
+ determined by the implementation and is outside the
+ scope of this document.
+
+ Status: Optional, depending on local system
+
+ Optional
+ Attributes
+ Status: 1) The AllowAutomaticStart attribute SHOULD be set
+ to TRUE.
+ 2) The DampPeerOscillations attribute SHOULD be set
+ to TRUE.
+ 3) The PassiveTcpEstablishment attribute SHOULD be
+ set to TRUE.
+
+ Event 8: AutomaticStop
+
+ Definition: Local system automatically stops the BGP
+ connection.
+
+ An example of an automatic stop event is exceeding
+ the number of prefixes for a given peer and the
+ local system automatically disconnecting the peer.
+
+
+
+Rekhter, et al. Standards Track [Page 45]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Status: Optional, depending on local system
+
+ Optional
+ Attribute
+ Status: 1) The AllowAutomaticStop attribute SHOULD be TRUE.
+
+8.1.3. Timer Events
+
+ Event 9: ConnectRetryTimer_Expires
+
+ Definition: An event generated when the ConnectRetryTimer
+ expires.
+
+ Status: Mandatory
+
+ Event 10: HoldTimer_Expires
+
+ Definition: An event generated when the HoldTimer expires.
+
+ Status: Mandatory
+
+ Event 11: KeepaliveTimer_Expires
+
+ Definition: An event generated when the KeepaliveTimer expires.
+
+ Status: Mandatory
+
+ Event 12: DelayOpenTimer_Expires
+
+ Definition: An event generated when the DelayOpenTimer expires.
+
+ Status: Optional
+
+ Optional
+ Attribute
+ Status: If this event occurs,
+ 1) DelayOpen attribute SHOULD be set to TRUE,
+ 2) DelayOpenTime attribute SHOULD be supported,
+ 3) DelayOpenTimer SHOULD be supported.
+
+ Event 13: IdleHoldTimer_Expires
+
+ Definition: An event generated when the IdleHoldTimer expires,
+ indicating that the BGP connection has completed
+ waiting for the back-off period to prevent BGP peer
+ oscillation.
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 46]
+
+RFC 4271 BGP-4 January 2006
+
+
+ The IdleHoldTimer is only used when the persistent
+ peer oscillation damping function is enabled by
+ setting the DampPeerOscillations optional attribute
+ to TRUE.
+
+ Implementations not implementing the persistent
+ peer oscillation damping function may not have the
+ IdleHoldTimer.
+
+ Status: Optional
+
+ Optional
+ Attribute
+ Status: If this event occurs:
+ 1) DampPeerOscillations attribute SHOULD be set to
+ TRUE.
+ 2) IdleHoldTimer SHOULD have just expired.
+
+8.1.4. TCP Connection-Based Events
+
+ Event 14: TcpConnection_Valid
+
+ Definition: Event indicating the local system reception of a
+ TCP connection request with a valid source IP
+ address, TCP port, destination IP address, and TCP
+ Port. The definition of invalid source and invalid
+ destination IP address is determined by the
+ implementation.
+
+ BGP's destination port SHOULD be port 179, as
+ defined by IANA.
+
+ TCP connection request is denoted by the local
+ system receiving a TCP SYN.
+
+ Status: Optional
+
+ Optional
+ Attribute
+ Status: 1) The TrackTcpState attribute SHOULD be set to
+ TRUE if this event occurs.
+
+ Event 15: Tcp_CR_Invalid
+
+ Definition: Event indicating the local system reception of a
+ TCP connection request with either an invalid
+ source address or port number, or an invalid
+ destination address or port number.
+
+
+
+Rekhter, et al. Standards Track [Page 47]
+
+RFC 4271 BGP-4 January 2006
+
+
+ BGP destination port number SHOULD be 179, as
+ defined by IANA.
+
+ A TCP connection request occurs when the local
+ system receives a TCP SYN.
+
+ Status: Optional
+
+ Optional
+ Attribute
+ Status: 1) The TrackTcpState attribute should be set to
+ TRUE if this event occurs.
+
+ Event 16: Tcp_CR_Acked
+
+ Definition: Event indicating the local system's request to
+ establish a TCP connection to the remote peer.
+
+ The local system's TCP connection sent a TCP SYN,
+ received a TCP SYN/ACK message, and sent a TCP ACK.
+
+ Status: Mandatory
+
+ Event 17: TcpConnectionConfirmed
+
+ Definition: Event indicating that the local system has received
+ a confirmation that the TCP connection has been
+ established by the remote site.
+
+ The remote peer's TCP engine sent a TCP SYN. The
+ local peer's TCP engine sent a SYN, ACK message and
+ now has received a final ACK.
+
+ Status: Mandatory
+
+ Event 18: TcpConnectionFails
+
+ Definition: Event indicating that the local system has received
+ a TCP connection failure notice.
+
+ The remote BGP peer's TCP machine could have sent a
+ FIN. The local peer would respond with a FIN-ACK.
+ Another possibility is that the local peer
+ indicated a timeout in the TCP connection and
+ downed the connection.
+
+ Status: Mandatory
+
+
+
+
+Rekhter, et al. Standards Track [Page 48]
+
+RFC 4271 BGP-4 January 2006
+
+
+8.1.5. BGP Message-Based Events
+
+ Event 19: BGPOpen
+
+ Definition: An event is generated when a valid OPEN message has
+ been received.
+
+ Status: Mandatory
+
+ Optional
+ Attribute
+ Status: 1) The DelayOpen optional attribute SHOULD be set
+ to FALSE.
+ 2) The DelayOpenTimer SHOULD not be running.
+
+ Event 20: BGPOpen with DelayOpenTimer running
+
+ Definition: An event is generated when a valid OPEN message has
+ been received for a peer that has a successfully
+ established transport connection and is currently
+ delaying the sending of a BGP open message.
+
+ Status: Optional
+
+ Optional
+ Attribute
+ Status: 1) The DelayOpen attribute SHOULD be set to TRUE.
+ 2) The DelayOpenTimer SHOULD be running.
+
+ Event 21: BGPHeaderErr
+
+ Definition: An event is generated when a received BGP message
+ header is not valid.
+
+ Status: Mandatory
+
+ Event 22: BGPOpenMsgErr
+
+ Definition: An event is generated when an OPEN message has been
+ received with errors.
+
+ Status: Mandatory
+
+ Event 23: OpenCollisionDump
+
+ Definition: An event generated administratively when a
+ connection collision has been detected while
+ processing an incoming OPEN message and this
+
+
+
+Rekhter, et al. Standards Track [Page 49]
+
+RFC 4271 BGP-4 January 2006
+
+
+ connection has been selected to be disconnected.
+ See Section 6.8 for more information on collision
+ detection.
+
+ Event 23 is an administrative action generated by
+ implementation logic that determines whether this
+ connection needs to be dropped per the rules in
+ Section 6.8. This event may occur if the FSM is
+ implemented as two linked state machines.
+
+ Status: Optional
+
+ Optional
+ Attribute
+ Status: If the state machine is to process this event in
+ the Established state,
+ 1) CollisionDetectEstablishedState optional
+ attribute SHOULD be set to TRUE.
+
+ Please note: The OpenCollisionDump event can occur
+ in Idle, Connect, Active, OpenSent, and OpenConfirm
+ without any optional attributes being set.
+
+ Event 24: NotifMsgVerErr
+
+ Definition: An event is generated when a NOTIFICATION message
+ with "version error" is received.
+
+ Status: Mandatory
+
+ Event 25: NotifMsg
+
+ Definition: An event is generated when a NOTIFICATION message
+ is received and the error code is anything but
+ "version error".
+
+ Status: Mandatory
+
+ Event 26: KeepAliveMsg
+
+ Definition: An event is generated when a KEEPALIVE message is
+ received.
+
+ Status: Mandatory
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 50]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Event 27: UpdateMsg
+
+ Definition: An event is generated when a valid UPDATE message
+ is received.
+
+ Status: Mandatory
+
+ Event 28: UpdateMsgErr
+
+ Definition: An event is generated when an invalid UPDATE
+ message is received.
+
+ Status: Mandatory
+
+8.2. Description of FSM
+
+8.2.1. FSM Definition
+
+ BGP MUST maintain a separate FSM for each configured peer. Each BGP
+ peer paired in a potential connection will attempt to connect to the
+ other, unless configured to remain in the idle state, or configured
+ to remain passive. For the purpose of this discussion, the active or
+ connecting side of the TCP connection (the side of a TCP connection
+ sending the first TCP SYN packet) is called outgoing. The passive or
+ listening side (the sender of the first SYN/ACK) is called an
+ incoming connection. (See Section 8.2.1.1 for information on the
+ terms active and passive used below.)
+
+ A BGP implementation MUST connect to and listen on TCP port 179 for
+ incoming connections in addition to trying to connect to peers. For
+ each incoming connection, a state machine MUST be instantiated.
+ There exists a period in which the identity of the peer on the other
+ end of an incoming connection is known, but the BGP identifier is not
+ known. During this time, both an incoming and outgoing connection
+ may exist for the same configured peering. This is referred to as a
+ connection collision (see Section 6.8).
+
+ A BGP implementation will have, at most, one FSM for each configured
+ peering, plus one FSM for each incoming TCP connection for which the
+ peer has not yet been identified. Each FSM corresponds to exactly
+ one TCP connection.
+
+ There may be more than one connection between a pair of peers if the
+ connections are configured to use a different pair of IP addresses.
+ This is referred to as multiple "configured peerings" to the same
+ peer.
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 51]
+
+RFC 4271 BGP-4 January 2006
+
+
+8.2.1.1. Terms "active" and "passive"
+
+ The terms active and passive have been in the Internet operator's
+ vocabulary for almost a decade and have proven useful. The words
+ active and passive have slightly different meanings when applied to a
+ TCP connection or a peer. There is only one active side and one
+ passive side to any one TCP connection, per the definition above and
+ the state machine below. When a BGP speaker is configured as active,
+ it may end up on either the active or passive side of the connection
+ that eventually gets established. Once the TCP connection is
+ completed, it doesn't matter which end was active and which was
+ passive. The only difference is in which side of the TCP connection
+ has port number 179.
+
+8.2.1.2. FSM and Collision Detection
+
+ There is one FSM per BGP connection. When the connection collision
+ occurs prior to determining what peer a connection is associated
+ with, there may be two connections for one peer. After the
+ connection collision is resolved (see Section 6.8), the FSM for the
+ connection that is closed SHOULD be disposed.
+
+8.2.1.3. FSM and Optional Session Attributes
+
+ Optional Session Attributes specify either attributes that act as
+ flags (TRUE or FALSE) or optional timers. For optional attributes
+ that act as flags, if the optional session attribute can be set to
+ TRUE on the system, the corresponding BGP FSM actions must be
+ supported. For example, if the following options can be set in a BGP
+ implementation: AutoStart and PassiveTcpEstablishment, then Events 3,
+ 4 and 5 must be supported. If an Optional Session attribute cannot
+ be set to TRUE, the events supporting that set of options do not have
+ to be supported.
+
+ Each of the optional timers (DelayOpenTimer and IdleHoldTimer) has a
+ group of attributes that are:
+
+ - flag indicating support,
+ - Time set in Timer
+ - Timer.
+
+ The two optional timers show this format:
+
+ DelayOpenTimer: DelayOpen, DelayOpenTime, DelayOpenTimer
+ IdleHoldTimer: DampPeerOscillations, IdleHoldTime,
+ IdleHoldTimer
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 52]
+
+RFC 4271 BGP-4 January 2006
+
+
+ If the flag indicating support for an optional timer (DelayOpen or
+ DampPeerOscillations) cannot be set to TRUE, the timers and events
+ supporting that option do not have to be supported.
+
+8.2.1.4. FSM Event Numbers
+
+ The Event numbers (1-28) utilized in this state machine description
+ aid in specifying the behavior of the BGP state machine.
+ Implementations MAY use these numbers to provide network management
+ information. The exact form of an FSM or the FSM events are specific
+ to each implementation.
+
+8.2.1.5. FSM Actions that are Implementation Dependent
+
+ At certain points, the BGP FSM specifies that BGP initialization will
+ occur or that BGP resources will be deleted. The initialization of
+ the BGP FSM and the associated resources depend on the policy portion
+ of the BGP implementation. The details of these actions are outside
+ the scope of the FSM document.
+
+8.2.2. Finite State Machine
+
+ Idle state:
+
+ Initially, the BGP peer FSM is in the Idle state. Hereafter, the
+ BGP peer FSM will be shortened to BGP FSM.
+
+ In this state, BGP FSM refuses all incoming BGP connections for
+ this peer. No resources are allocated to the peer. In response
+ to a ManualStart event (Event 1) or an AutomaticStart event (Event
+ 3), the local system:
+
+ - initializes all BGP resources for the peer connection,
+
+ - sets ConnectRetryCounter to zero,
+
+ - starts the ConnectRetryTimer with the initial value,
+
+ - initiates a TCP connection to the other BGP peer,
+
+ - listens for a connection that may be initiated by the remote
+ BGP peer, and
+
+ - changes its state to Connect.
+
+ The ManualStop event (Event 2) and AutomaticStop (Event 8) event
+ are ignored in the Idle state.
+
+
+
+
+Rekhter, et al. Standards Track [Page 53]
+
+RFC 4271 BGP-4 January 2006
+
+
+ In response to a ManualStart_with_PassiveTcpEstablishment event
+ (Event 4) or AutomaticStart_with_PassiveTcpEstablishment event
+ (Event 5), the local system:
+
+ - initializes all BGP resources,
+
+ - sets the ConnectRetryCounter to zero,
+
+ - starts the ConnectRetryTimer with the initial value,
+
+ - listens for a connection that may be initiated by the remote
+ peer, and
+
+ - changes its state to Active.
+
+ The exact value of the ConnectRetryTimer is a local matter, but it
+ SHOULD be sufficiently large to allow TCP initialization.
+
+ If the DampPeerOscillations attribute is set to TRUE, the
+ following three additional events may occur within the Idle state:
+
+ - AutomaticStart_with_DampPeerOscillations (Event 6),
+
+ - AutomaticStart_with_DampPeerOscillations_and_
+ PassiveTcpEstablishment (Event 7),
+
+ - IdleHoldTimer_Expires (Event 13).
+
+ Upon receiving these 3 events, the local system will use these
+ events to prevent peer oscillations. The method of preventing
+ persistent peer oscillation is outside the scope of this document.
+
+ Any other event (Events 9-12, 15-28) received in the Idle state
+ does not cause change in the state of the local system.
+
+ Connect State:
+
+ In this state, BGP FSM is waiting for the TCP connection to be
+ completed.
+
+ The start events (Events 1, 3-7) are ignored in the Connect state.
+
+ In response to a ManualStop event (Event 2), the local system:
+
+ - drops the TCP connection,
+
+ - releases all BGP resources,
+
+
+
+
+Rekhter, et al. Standards Track [Page 54]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - sets ConnectRetryCounter to zero,
+
+ - stops the ConnectRetryTimer and sets ConnectRetryTimer to
+ zero, and
+
+ - changes its state to Idle.
+
+ In response to the ConnectRetryTimer_Expires event (Event 9), the
+ local system:
+
+ - drops the TCP connection,
+
+ - restarts the ConnectRetryTimer,
+
+ - stops the DelayOpenTimer and resets the timer to zero,
+
+ - initiates a TCP connection to the other BGP peer,
+
+ - continues to listen for a connection that may be initiated by
+ the remote BGP peer, and
+
+ - stays in the Connect state.
+
+ If the DelayOpenTimer_Expires event (Event 12) occurs in the
+ Connect state, the local system:
+
+ - sends an OPEN message to its peer,
+
+ - sets the HoldTimer to a large value, and
+
+ - changes its state to OpenSent.
+
+ If the BGP FSM receives a TcpConnection_Valid event (Event 14),
+ the TCP connection is processed, and the connection remains in the
+ Connect state.
+
+ If the BGP FSM receives a Tcp_CR_Invalid event (Event 15), the
+ local system rejects the TCP connection, and the connection
+ remains in the Connect state.
+
+ If the TCP connection succeeds (Event 16 or Event 17), the local
+ system checks the DelayOpen attribute prior to processing. If the
+ DelayOpen attribute is set to TRUE, the local system:
+
+ - stops the ConnectRetryTimer (if running) and sets the
+ ConnectRetryTimer to zero,
+
+ - sets the DelayOpenTimer to the initial value, and
+
+
+
+Rekhter, et al. Standards Track [Page 55]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - stays in the Connect state.
+
+ If the DelayOpen attribute is set to FALSE, the local system:
+
+ - stops the ConnectRetryTimer (if running) and sets the
+ ConnectRetryTimer to zero,
+
+ - completes BGP initialization
+
+ - sends an OPEN message to its peer,
+
+ - sets the HoldTimer to a large value, and
+
+ - changes its state to OpenSent.
+
+ A HoldTimer value of 4 minutes is suggested.
+
+ If the TCP connection fails (Event 18), the local system checks
+ the DelayOpenTimer. If the DelayOpenTimer is running, the local
+ system:
+
+ - restarts the ConnectRetryTimer with the initial value,
+
+ - stops the DelayOpenTimer and resets its value to zero,
+
+ - continues to listen for a connection that may be initiated by
+ the remote BGP peer, and
+
+ - changes its state to Active.
+
+ If the DelayOpenTimer is not running, the local system:
+
+ - stops the ConnectRetryTimer to zero,
+
+ - drops the TCP connection,
+
+ - releases all BGP resources, and
+
+ - changes its state to Idle.
+
+ If an OPEN message is received while the DelayOpenTimer is running
+ (Event 20), the local system:
+
+ - stops the ConnectRetryTimer (if running) and sets the
+ ConnectRetryTimer to zero,
+
+ - completes the BGP initialization,
+
+
+
+
+Rekhter, et al. Standards Track [Page 56]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - stops and clears the DelayOpenTimer (sets the value to zero),
+
+ - sends an OPEN message,
+
+ - sends a KEEPALIVE message,
+
+ - if the HoldTimer initial value is non-zero,
+
+ - starts the KeepaliveTimer with the initial value and
+
+ - resets the HoldTimer to the negotiated value,
+
+ else, if the HoldTimer initial value is zero,
+
+ - resets the KeepaliveTimer and
+
+ - resets the HoldTimer value to zero,
+
+ - and changes its state to OpenConfirm.
+
+ If the value of the autonomous system field is the same as the
+ local Autonomous System number, set the connection status to an
+ internal connection; otherwise it will be "external".
+
+ If BGP message header checking (Event 21) or OPEN message checking
+ detects an error (Event 22) (see Section 6.2), the local system:
+
+ - (optionally) If the SendNOTIFICATIONwithoutOPEN attribute is
+ set to TRUE, then the local system first sends a NOTIFICATION
+ message with the appropriate error code, and then
+
+ - stops the ConnectRetryTimer (if running) and sets the
+ ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If a NOTIFICATION message is received with a version error (Event
+ 24), the local system checks the DelayOpenTimer. If the
+ DelayOpenTimer is running, the local system:
+
+
+
+Rekhter, et al. Standards Track [Page 57]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - stops the ConnectRetryTimer (if running) and sets the
+ ConnectRetryTimer to zero,
+
+ - stops and resets the DelayOpenTimer (sets to zero),
+
+ - releases all BGP resources,
+
+ - drops the TCP connection, and
+
+ - changes its state to Idle.
+
+ If the DelayOpenTimer is not running, the local system:
+
+ - stops the ConnectRetryTimer and sets the ConnectRetryTimer to
+ zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - performs peer oscillation damping if the DampPeerOscillations
+ attribute is set to True, and
+
+ - changes its state to Idle.
+
+ In response to any other events (Events 8, 10-11, 13, 19, 23,
+ 25-28), the local system:
+
+ - if the ConnectRetryTimer is running, stops and resets the
+ ConnectRetryTimer (sets to zero),
+
+ - if the DelayOpenTimer is running, stops and resets the
+ DelayOpenTimer (sets to zero),
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - performs peer oscillation damping if the DampPeerOscillations
+ attribute is set to True, and
+
+ - changes its state to Idle.
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 58]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Active State:
+
+ In this state, BGP FSM is trying to acquire a peer by listening
+ for, and accepting, a TCP connection.
+
+ The start events (Events 1, 3-7) are ignored in the Active state.
+
+ In response to a ManualStop event (Event 2), the local system:
+
+ - If the DelayOpenTimer is running and the
+ SendNOTIFICATIONwithoutOPEN session attribute is set, the
+ local system sends a NOTIFICATION with a Cease,
+
+ - releases all BGP resources including stopping the
+ DelayOpenTimer
+
+ - drops the TCP connection,
+
+ - sets ConnectRetryCounter to zero,
+
+ - stops the ConnectRetryTimer and sets the ConnectRetryTimer to
+ zero, and
+
+ - changes its state to Idle.
+
+ In response to a ConnectRetryTimer_Expires event (Event 9), the
+ local system:
+
+ - restarts the ConnectRetryTimer (with initial value),
+
+ - initiates a TCP connection to the other BGP peer,
+
+ - continues to listen for a TCP connection that may be initiated
+ by a remote BGP peer, and
+
+ - changes its state to Connect.
+
+ If the local system receives a DelayOpenTimer_Expires event (Event
+ 12), the local system:
+
+ - sets the ConnectRetryTimer to zero,
+
+ - stops and clears the DelayOpenTimer (set to zero),
+
+ - completes the BGP initialization,
+
+ - sends the OPEN message to its remote peer,
+
+
+
+
+Rekhter, et al. Standards Track [Page 59]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - sets its hold timer to a large value, and
+
+ - changes its state to OpenSent.
+
+ A HoldTimer value of 4 minutes is also suggested for this state
+ transition.
+
+ If the local system receives a TcpConnection_Valid event (Event
+ 14), the local system processes the TCP connection flags and stays
+ in the Active state.
+
+ If the local system receives a Tcp_CR_Invalid event (Event 15),
+ the local system rejects the TCP connection and stays in the
+ Active State.
+
+ In response to the success of a TCP connection (Event 16 or Event
+ 17), the local system checks the DelayOpen optional attribute
+ prior to processing.
+
+ If the DelayOpen attribute is set to TRUE, the local system:
+
+ - stops the ConnectRetryTimer and sets the ConnectRetryTimer
+ to zero,
+
+ - sets the DelayOpenTimer to the initial value
+ (DelayOpenTime), and
+
+ - stays in the Active state.
+
+ If the DelayOpen attribute is set to FALSE, the local system:
+
+ - sets the ConnectRetryTimer to zero,
+
+ - completes the BGP initialization,
+
+ - sends the OPEN message to its peer,
+
+ - sets its HoldTimer to a large value, and
+
+ - changes its state to OpenSent.
+
+ A HoldTimer value of 4 minutes is suggested as a "large value" for
+ the HoldTimer.
+
+ If the local system receives a TcpConnectionFails event (Event
+ 18), the local system:
+
+ - restarts the ConnectRetryTimer (with the initial value),
+
+
+
+Rekhter, et al. Standards Track [Page 60]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - stops and clears the DelayOpenTimer (sets the value to zero),
+
+ - releases all BGP resource,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - optionally performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If an OPEN message is received and the DelayOpenTimer is running
+ (Event 20), the local system:
+
+ - stops the ConnectRetryTimer (if running) and sets the
+ ConnectRetryTimer to zero,
+
+ - stops and clears the DelayOpenTimer (sets to zero),
+
+ - completes the BGP initialization,
+
+ - sends an OPEN message,
+
+ - sends a KEEPALIVE message,
+
+ - if the HoldTimer value is non-zero,
+
+ - starts the KeepaliveTimer to initial value,
+
+ - resets the HoldTimer to the negotiated value,
+
+ else if the HoldTimer is zero
+
+ - resets the KeepaliveTimer (set to zero),
+
+ - resets the HoldTimer to zero, and
+
+ - changes its state to OpenConfirm.
+
+ If the value of the autonomous system field is the same as the
+ local Autonomous System number, set the connection status to an
+ internal connection; otherwise it will be external.
+
+ If BGP message header checking (Event 21) or OPEN message checking
+ detects an error (Event 22) (see Section 6.2), the local system:
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 61]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - (optionally) sends a NOTIFICATION message with the appropriate
+ error code if the SendNOTIFICATIONwithoutOPEN attribute is set
+ to TRUE,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If a NOTIFICATION message is received with a version error (Event
+ 24), the local system checks the DelayOpenTimer. If the
+ DelayOpenTimer is running, the local system:
+
+ - stops the ConnectRetryTimer (if running) and sets the
+ ConnectRetryTimer to zero,
+
+ - stops and resets the DelayOpenTimer (sets to zero),
+
+ - releases all BGP resources,
+
+ - drops the TCP connection, and
+
+ - changes its state to Idle.
+
+ If the DelayOpenTimer is not running, the local system:
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 62]
+
+RFC 4271 BGP-4 January 2006
+
+
+ In response to any other event (Events 8, 10-11, 13, 19, 23,
+ 25-28), the local system:
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by one,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ OpenSent:
+
+ In this state, BGP FSM waits for an OPEN message from its peer.
+
+ The start events (Events 1, 3-7) are ignored in the OpenSent
+ state.
+
+ If a ManualStop event (Event 2) is issued in the OpenSent state,
+ the local system:
+
+ - sends the NOTIFICATION with a Cease,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - sets the ConnectRetryCounter to zero, and
+
+ - changes its state to Idle.
+
+ If an AutomaticStop event (Event 8) is issued in the OpenSent
+ state, the local system:
+
+ - sends the NOTIFICATION with a Cease,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all the BGP resources,
+
+ - drops the TCP connection,
+
+
+
+Rekhter, et al. Standards Track [Page 63]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If the HoldTimer_Expires (Event 10), the local system:
+
+ - sends a NOTIFICATION message with the error code Hold Timer
+ Expired,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If a TcpConnection_Valid (Event 14), Tcp_CR_Acked (Event 16), or a
+ TcpConnectionConfirmed event (Event 17) is received, a second TCP
+ connection may be in progress. This second TCP connection is
+ tracked per Connection Collision processing (Section 6.8) until an
+ OPEN message is received.
+
+ A TCP Connection Request for an Invalid port (Tcp_CR_Invalid
+ (Event 15)) is ignored.
+
+ If a TcpConnectionFails event (Event 18) is received, the local
+ system:
+
+ - closes the BGP connection,
+
+ - restarts the ConnectRetryTimer,
+
+ - continues to listen for a connection that may be initiated by
+ the remote BGP peer, and
+
+ - changes its state to Active.
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 64]
+
+RFC 4271 BGP-4 January 2006
+
+
+ When an OPEN message is received, all fields are checked for
+ correctness. If there are no errors in the OPEN message (Event
+ 19), the local system:
+
+ - resets the DelayOpenTimer to zero,
+
+ - sets the BGP ConnectRetryTimer to zero,
+
+ - sends a KEEPALIVE message, and
+
+ - sets a KeepaliveTimer (via the text below)
+
+ - sets the HoldTimer according to the negotiated value (see
+ Section 4.2),
+
+ - changes its state to OpenConfirm.
+
+ If the negotiated hold time value is zero, then the HoldTimer and
+ KeepaliveTimer are not started. If the value of the Autonomous
+ System field is the same as the local Autonomous System number,
+ then the connection is an "internal" connection; otherwise, it is
+ an "external" connection. (This will impact UPDATE processing as
+ described below.)
+
+ If the BGP message header checking (Event 21) or OPEN message
+ checking detects an error (Event 22)(see Section 6.2), the local
+ system:
+
+ - sends a NOTIFICATION message with the appropriate error code,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is TRUE, and
+
+ - changes its state to Idle.
+
+ Collision detection mechanisms (Section 6.8) need to be applied
+ when a valid BGP OPEN message is received (Event 19 or Event 20).
+ Please refer to Section 6.8 for the details of the comparison. A
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 65]
+
+RFC 4271 BGP-4 January 2006
+
+
+ CollisionDetectDump event occurs when the BGP implementation
+ determines, by means outside the scope of this document, that a
+ connection collision has occurred.
+
+ If a connection in the OpenSent state is determined to be the
+ connection that must be closed, an OpenCollisionDump (Event 23) is
+ signaled to the state machine. If such an event is received in
+ the OpenSent state, the local system:
+
+ - sends a NOTIFICATION with a Cease,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If a NOTIFICATION message is received with a version error (Event
+ 24), the local system:
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection, and
+
+ - changes its state to Idle.
+
+ In response to any other event (Events 9, 11-13, 20, 25-28), the
+ local system:
+
+ - sends the NOTIFICATION with the Error Code Finite State
+ Machine Error,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+
+
+Rekhter, et al. Standards Track [Page 66]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ OpenConfirm State:
+
+ In this state, BGP waits for a KEEPALIVE or NOTIFICATION message.
+
+ Any start event (Events 1, 3-7) is ignored in the OpenConfirm
+ state.
+
+ In response to a ManualStop event (Event 2) initiated by the
+ operator, the local system:
+
+ - sends the NOTIFICATION message with a Cease,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - sets the ConnectRetryCounter to zero,
+
+ - sets the ConnectRetryTimer to zero, and
+
+ - changes its state to Idle.
+
+ In response to the AutomaticStop event initiated by the system
+ (Event 8), the local system:
+
+ - sends the NOTIFICATION message with a Cease,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If the HoldTimer_Expires event (Event 10) occurs before a
+ KEEPALIVE message is received, the local system:
+
+
+
+
+Rekhter, et al. Standards Track [Page 67]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - sends the NOTIFICATION message with the Error Code Hold Timer
+ Expired,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If the local system receives a KeepaliveTimer_Expires event (Event
+ 11), the local system:
+
+ - sends a KEEPALIVE message,
+
+ - restarts the KeepaliveTimer, and
+
+ - remains in the OpenConfirmed state.
+
+ In the event of a TcpConnection_Valid event (Event 14), or the
+ success of a TCP connection (Event 16 or Event 17) while in
+ OpenConfirm, the local system needs to track the second
+ connection.
+
+ If a TCP connection is attempted with an invalid port (Event 15),
+ the local system will ignore the second connection attempt.
+
+ If the local system receives a TcpConnectionFails event (Event 18)
+ from the underlying TCP or a NOTIFICATION message (Event 25), the
+ local system:
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+
+
+
+Rekhter, et al. Standards Track [Page 68]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - changes its state to Idle.
+
+ If the local system receives a NOTIFICATION message with a version
+ error (NotifMsgVerErr (Event 24)), the local system:
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection, and
+
+ - changes its state to Idle.
+
+ If the local system receives a valid OPEN message (BGPOpen (Event
+ 19)), the collision detect function is processed per Section 6.8.
+ If this connection is to be dropped due to connection collision,
+ the local system:
+
+ - sends a NOTIFICATION with a Cease,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection (send TCP FIN),
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If an OPEN message is received, all fields are checked for
+ correctness. If the BGP message header checking (BGPHeaderErr
+ (Event 21)) or OPEN message checking detects an error (see Section
+ 6.2) (BGPOpenMsgErr (Event 22)), the local system:
+
+ - sends a NOTIFICATION message with the appropriate error code,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+
+
+
+Rekhter, et al. Standards Track [Page 69]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If, during the processing of another OPEN message, the BGP
+ implementation determines, by a means outside the scope of this
+ document, that a connection collision has occurred and this
+ connection is to be closed, the local system will issue an
+ OpenCollisionDump event (Event 23). When the local system
+ receives an OpenCollisionDump event (Event 23), the local system:
+
+ - sends a NOTIFICATION with a Cease,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If the local system receives a KEEPALIVE message (KeepAliveMsg
+ (Event 26)), the local system:
+
+ - restarts the HoldTimer and
+
+ - changes its state to Established.
+
+ In response to any other event (Events 9, 12-13, 20, 27-28), the
+ local system:
+
+ - sends a NOTIFICATION with a code of Finite State Machine
+ Error,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+
+
+
+Rekhter, et al. Standards Track [Page 70]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ Established State:
+
+ In the Established state, the BGP FSM can exchange UPDATE,
+ NOTIFICATION, and KEEPALIVE messages with its peer.
+
+ Any Start event (Events 1, 3-7) is ignored in the Established
+ state.
+
+ In response to a ManualStop event (initiated by an operator)
+ (Event 2), the local system:
+
+ - sends the NOTIFICATION message with a Cease,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - deletes all routes associated with this connection,
+
+ - releases BGP resources,
+
+ - drops the TCP connection,
+
+ - sets the ConnectRetryCounter to zero, and
+
+ - changes its state to Idle.
+
+ In response to an AutomaticStop event (Event 8), the local system:
+
+ - sends a NOTIFICATION with a Cease,
+
+ - sets the ConnectRetryTimer to zero
+
+ - deletes all routes associated with this connection,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+
+
+Rekhter, et al. Standards Track [Page 71]
+
+RFC 4271 BGP-4 January 2006
+
+
+ One reason for an AutomaticStop event is: A BGP receives an UPDATE
+ messages with a number of prefixes for a given peer such that the
+ total prefixes received exceeds the maximum number of prefixes
+ configured. The local system automatically disconnects the peer.
+
+ If the HoldTimer_Expires event occurs (Event 10), the local
+ system:
+
+ - sends a NOTIFICATION message with the Error Code Hold Timer
+ Expired,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If the KeepaliveTimer_Expires event occurs (Event 11), the local
+ system:
+
+ - sends a KEEPALIVE message, and
+
+ - restarts its KeepaliveTimer, unless the negotiated HoldTime
+ value is zero.
+
+ Each time the local system sends a KEEPALIVE or UPDATE message, it
+ restarts its KeepaliveTimer, unless the negotiated HoldTime value
+ is zero.
+
+ A TcpConnection_Valid (Event 14), received for a valid port, will
+ cause the second connection to be tracked.
+
+ An invalid TCP connection (Tcp_CR_Invalid event (Event 15)) will
+ be ignored.
+
+ In response to an indication that the TCP connection is
+ successfully established (Event 16 or Event 17), the second
+ connection SHALL be tracked until it sends an OPEN message.
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 72]
+
+RFC 4271 BGP-4 January 2006
+
+
+ If a valid OPEN message (BGPOpen (Event 19)) is received, and if
+ the CollisionDetectEstablishedState optional attribute is TRUE,
+ the OPEN message will be checked to see if it collides (Section
+ 6.8) with any other connection. If the BGP implementation
+ determines that this connection needs to be terminated, it will
+ process an OpenCollisionDump event (Event 23). If this connection
+ needs to be terminated, the local system:
+
+ - sends a NOTIFICATION with a Cease,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - deletes all routes associated with this connection,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations is set to TRUE, and
+
+ - changes its state to Idle.
+
+ If the local system receives a NOTIFICATION message (Event 24 or
+ Event 25) or a TcpConnectionFails (Event 18) from the underlying
+ TCP, the local system:
+
+ - sets the ConnectRetryTimer to zero,
+
+ - deletes all routes associated with this connection,
+
+ - releases all the BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - changes its state to Idle.
+
+
+
+
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 73]
+
+RFC 4271 BGP-4 January 2006
+
+
+ If the local system receives a KEEPALIVE message (Event 26), the
+ local system:
+
+ - restarts its HoldTimer, if the negotiated HoldTime value is
+ non-zero, and
+
+ - remains in the Established state.
+
+ If the local system receives an UPDATE message (Event 27), the
+ local system:
+
+ - processes the message,
+
+ - restarts its HoldTimer, if the negotiated HoldTime value is
+ non-zero, and
+
+ - remains in the Established state.
+
+ If the local system receives an UPDATE message, and the UPDATE
+ message error handling procedure (see Section 6.3) detects an
+ error (Event 28), the local system:
+
+ - sends a NOTIFICATION message with an Update error,
+
+ - sets the ConnectRetryTimer to zero,
+
+ - deletes all routes associated with this connection,
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+ In response to any other event (Events 9, 12-13, 20-22), the local
+ system:
+
+ - sends a NOTIFICATION message with the Error Code Finite State
+ Machine Error,
+
+ - deletes all routes associated with this connection,
+
+ - sets the ConnectRetryTimer to zero,
+
+
+
+Rekhter, et al. Standards Track [Page 74]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - releases all BGP resources,
+
+ - drops the TCP connection,
+
+ - increments the ConnectRetryCounter by 1,
+
+ - (optionally) performs peer oscillation damping if the
+ DampPeerOscillations attribute is set to TRUE, and
+
+ - changes its state to Idle.
+
+9. UPDATE Message Handling
+
+ An UPDATE message may be received only in the Established state.
+ Receiving an UPDATE message in any other state is an error. When an
+ UPDATE message is received, each field is checked for validity, as
+ specified in Section 6.3.
+
+ If an optional non-transitive attribute is unrecognized, it is
+ quietly ignored. If an optional transitive attribute is
+ unrecognized, the Partial bit (the third high-order bit) in the
+ attribute flags octet is set to 1, and the attribute is retained for
+ propagation to other BGP speakers.
+
+ If an optional attribute is recognized and has a valid value, then,
+ depending on the type of the optional attribute, it is processed
+ locally, retained, and updated, if necessary, for possible
+ propagation to other BGP speakers.
+
+ If the UPDATE message contains a non-empty WITHDRAWN ROUTES field,
+ the previously advertised routes, whose destinations (expressed as IP
+ prefixes) are contained in this field, SHALL be removed from the
+ Adj-RIB-In. This BGP speaker SHALL run its Decision Process because
+ the previously advertised route is no longer available for use.
+
+ If the UPDATE message contains a feasible route, the Adj-RIB-In will
+ be updated with this route as follows: if the NLRI of the new route
+ is identical to the one the route currently has stored in the Adj-
+ RIB-In, then the new route SHALL replace the older route in the Adj-
+ RIB-In, thus implicitly withdrawing the older route from service.
+ Otherwise, if the Adj-RIB-In has no route with NLRI identical to the
+ new route, the new route SHALL be placed in the Adj-RIB-In.
+
+ Once the BGP speaker updates the Adj-RIB-In, the speaker SHALL run
+ its Decision Process.
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 75]
+
+RFC 4271 BGP-4 January 2006
+
+
+9.1. Decision Process
+
+ The Decision Process selects routes for subsequent advertisement by
+ applying the policies in the local Policy Information Base (PIB) to
+ the routes stored in its Adj-RIBs-In. The output of the Decision
+ Process is the set of routes that will be advertised to peers; the
+ selected routes will be stored in the local speaker's Adj-RIBs-Out,
+ according to policy.
+
+ The BGP Decision Process described here is conceptual, and does not
+ have to be implemented precisely as described, as long as the
+ implementations support the described functionality and they exhibit
+ the same externally visible behavior.
+
+ The selection process is formalized by defining a function that takes
+ the attribute of a given route as an argument and returns either (a)
+ a non-negative integer denoting the degree of preference for the
+ route, or (b) a value denoting that this route is ineligible to be
+ installed in Loc-RIB and will be excluded from the next phase of
+ route selection.
+
+ The function that calculates the degree of preference for a given
+ route SHALL NOT use any of the following as its inputs: the existence
+ of other routes, the non-existence of other routes, or the path
+ attributes of other routes. Route selection then consists of the
+ individual application of the degree of preference function to each
+ feasible route, followed by the choice of the one with the highest
+ degree of preference.
+
+ The Decision Process operates on routes contained in the Adj-RIBs-In,
+ and is responsible for:
+
+ - selection of routes to be used locally by the speaker
+
+ - selection of routes to be advertised to other BGP peers
+
+ - route aggregation and route information reduction
+
+ The Decision Process takes place in three distinct phases, each
+ triggered by a different event:
+
+ a) Phase 1 is responsible for calculating the degree of preference
+ for each route received from a peer.
+
+ b) Phase 2 is invoked on completion of phase 1. It is responsible
+ for choosing the best route out of all those available for each
+ distinct destination, and for installing each chosen route into
+ the Loc-RIB.
+
+
+
+Rekhter, et al. Standards Track [Page 76]
+
+RFC 4271 BGP-4 January 2006
+
+
+ c) Phase 3 is invoked after the Loc-RIB has been modified. It is
+ responsible for disseminating routes in the Loc-RIB to each
+ peer, according to the policies contained in the PIB. Route
+ aggregation and information reduction can optionally be
+ performed within this phase.
+
+9.1.1. Phase 1: Calculation of Degree of Preference
+
+ The Phase 1 decision function is invoked whenever the local BGP
+ speaker receives, from a peer, an UPDATE message that advertises a
+ new route, a replacement route, or withdrawn routes.
+
+ The Phase 1 decision function is a separate process,f which completes
+ when it has no further work to do.
+
+ The Phase 1 decision function locks an Adj-RIB-In prior to operating
+ on any route contained within it, and unlocks it after operating on
+ all new or unfeasible routes contained within it.
+
+ For each newly received or replacement feasible route, the local BGP
+ speaker determines a degree of preference as follows:
+
+ If the route is learned from an internal peer, either the value of
+ the LOCAL_PREF attribute is taken as the degree of preference, or
+ the local system computes the degree of preference of the route
+ based on preconfigured policy information. Note that the latter
+ may result in formation of persistent routing loops.
+
+ If the route is learned from an external peer, then the local BGP
+ speaker computes the degree of preference based on preconfigured
+ policy information. If the return value indicates the route is
+ ineligible, the route MAY NOT serve as an input to the next phase
+ of route selection; otherwise, the return value MUST be used as
+ the LOCAL_PREF value in any IBGP readvertisement.
+
+ The exact nature of this policy information, and the computation
+ involved, is a local matter.
+
+9.1.2. Phase 2: Route Selection
+
+ The Phase 2 decision function is invoked on completion of Phase 1.
+ The Phase 2 function is a separate process, which completes when it
+ has no further work to do. The Phase 2 process considers all routes
+ that are eligible in the Adj-RIBs-In.
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 77]
+
+RFC 4271 BGP-4 January 2006
+
+
+ The Phase 2 decision function is blocked from running while the Phase
+ 3 decision function is in process. The Phase 2 function locks all
+ Adj-RIBs-In prior to commencing its function, and unlocks them on
+ completion.
+
+ If the NEXT_HOP attribute of a BGP route depicts an address that is
+ not resolvable, or if it would become unresolvable if the route was
+ installed in the routing table, the BGP route MUST be excluded from
+ the Phase 2 decision function.
+
+ If the AS_PATH attribute of a BGP route contains an AS loop, the BGP
+ route should be excluded from the Phase 2 decision function. AS loop
+ detection is done by scanning the full AS path (as specified in the
+ AS_PATH attribute), and checking that the autonomous system number of
+ the local system does not appear in the AS path. Operations of a BGP
+ speaker that is configured to accept routes with its own autonomous
+ system number in the AS path are outside the scope of this document.
+
+ It is critical that BGP speakers within an AS do not make conflicting
+ decisions regarding route selection that would cause forwarding loops
+ to occur.
+
+ For each set of destinations for which a feasible route exists in the
+ Adj-RIBs-In, the local BGP speaker identifies the route that has:
+
+ a) the highest degree of preference of any route to the same set
+ of destinations, or
+
+ b) is the only route to that destination, or
+
+ c) is selected as a result of the Phase 2 tie breaking rules
+ specified in Section 9.1.2.2.
+
+ The local speaker SHALL then install that route in the Loc-RIB,
+ replacing any route to the same destination that is currently being
+ held in the Loc-RIB. When the new BGP route is installed in the
+ Routing Table, care must be taken to ensure that existing routes to
+ the same destination that are now considered invalid are removed from
+ the Routing Table. Whether the new BGP route replaces an existing
+ non-BGP route in the Routing Table depends on the policy configured
+ on the BGP speaker.
+
+ The local speaker MUST determine the immediate next-hop address from
+ the NEXT_HOP attribute of the selected route (see Section 5.1.3). If
+ either the immediate next-hop or the IGP cost to the NEXT_HOP (where
+ the NEXT_HOP is resolved through an IGP route) changes, Phase 2 Route
+ Selection MUST be performed again.
+
+
+
+
+Rekhter, et al. Standards Track [Page 78]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Notice that even though BGP routes do not have to be installed in the
+ Routing Table with the immediate next-hop(s), implementations MUST
+ take care that, before any packets are forwarded along a BGP route,
+ its associated NEXT_HOP address is resolved to the immediate
+ (directly connected) next-hop address, and that this address (or
+ multiple addresses) is finally used for actual packet forwarding.
+
+ Unresolvable routes SHALL be removed from the Loc-RIB and the routing
+ table. However, corresponding unresolvable routes SHOULD be kept in
+ the Adj-RIBs-In (in case they become resolvable).
+
+9.1.2.1. Route Resolvability Condition
+
+ As indicated in Section 9.1.2, BGP speakers SHOULD exclude
+ unresolvable routes from the Phase 2 decision. This ensures that
+ only valid routes are installed in Loc-RIB and the Routing Table.
+
+ The route resolvability condition is defined as follows:
+
+ 1) A route Rte1, referencing only the intermediate network
+ address, is considered resolvable if the Routing Table contains
+ at least one resolvable route Rte2 that matches Rte1's
+ intermediate network address and is not recursively resolved
+ (directly or indirectly) through Rte1. If multiple matching
+ routes are available, only the longest matching route SHOULD be
+ considered.
+
+ 2) Routes referencing interfaces (with or without intermediate
+ addresses) are considered resolvable if the state of the
+ referenced interface is up and if IP processing is enabled on
+ this interface.
+
+ BGP routes do not refer to interfaces, but can be resolved through
+ the routes in the Routing Table that can be of both types (those that
+ specify interfaces or those that do not). IGP routes and routes to
+ directly connected networks are expected to specify the outbound
+ interface. Static routes can specify the outbound interface, the
+ intermediate address, or both.
+
+ Note that a BGP route is considered unresolvable in a situation where
+ the BGP speaker's Routing Table contains no route matching the BGP
+ route's NEXT_HOP. Mutually recursive routes (routes resolving each
+ other or themselves) also fail the resolvability check.
+
+ It is also important that implementations do not consider feasible
+ routes that would become unresolvable if they were installed in the
+ Routing Table, even if their NEXT_HOPs are resolvable using the
+ current contents of the Routing Table (an example of such routes
+
+
+
+Rekhter, et al. Standards Track [Page 79]
+
+RFC 4271 BGP-4 January 2006
+
+
+ would be mutually recursive routes). This check ensures that a BGP
+ speaker does not install routes in the Routing Table that will be
+ removed and not used by the speaker. Therefore, in addition to local
+ Routing Table stability, this check also improves behavior of the
+ protocol in the network.
+
+ Whenever a BGP speaker identifies a route that fails the
+ resolvability check because of mutual recursion, an error message
+ SHOULD be logged.
+
+9.1.2.2. Breaking Ties (Phase 2)
+
+ In its Adj-RIBs-In, a BGP speaker may have several routes to the same
+ destination that have the same degree of preference. The local
+ speaker can select only one of these routes for inclusion in the
+ associated Loc-RIB. The local speaker considers all routes with the
+ same degrees of preference, both those received from internal peers,
+ and those received from external peers.
+
+ The following tie-breaking procedure assumes that, for each candidate
+ route, all the BGP speakers within an autonomous system can ascertain
+ the cost of a path (interior distance) to the address depicted by the
+ NEXT_HOP attribute of the route, and follow the same route selection
+ algorithm.
+
+ The tie-breaking algorithm begins by considering all equally
+ preferable routes to the same destination, and then selects routes to
+ be removed from consideration. The algorithm terminates as soon as
+ only one route remains in consideration. The criteria MUST be
+ applied in the order specified.
+
+ Several of the criteria are described using pseudo-code. Note that
+ the pseudo-code shown was chosen for clarity, not efficiency. It is
+ not intended to specify any particular implementation. BGP
+ implementations MAY use any algorithm that produces the same results
+ as those described here.
+
+ a) Remove from consideration all routes that are not tied for
+ having the smallest number of AS numbers present in their
+ AS_PATH attributes. Note that when counting this number, an
+ AS_SET counts as 1, no matter how many ASes are in the set.
+
+ b) Remove from consideration all routes that are not tied for
+ having the lowest Origin number in their Origin attribute.
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 80]
+
+RFC 4271 BGP-4 January 2006
+
+
+ c) Remove from consideration routes with less-preferred
+ MULTI_EXIT_DISC attributes. MULTI_EXIT_DISC is only comparable
+ between routes learned from the same neighboring AS (the
+ neighboring AS is determined from the AS_PATH attribute).
+ Routes that do not have the MULTI_EXIT_DISC attribute are
+ considered to have the lowest possible MULTI_EXIT_DISC value.
+
+ This is also described in the following procedure:
+
+ for m = all routes still under consideration
+ for n = all routes still under consideration
+ if (neighborAS(m) == neighborAS(n)) and (MED(n) < MED(m))
+ remove route m from consideration
+
+ In the pseudo-code above, MED(n) is a function that returns the
+ value of route n's MULTI_EXIT_DISC attribute. If route n has
+ no MULTI_EXIT_DISC attribute, the function returns the lowest
+ possible MULTI_EXIT_DISC value (i.e., 0).
+
+ Similarly, neighborAS(n) is a function that returns the
+ neighbor AS from which the route was received. If the route is
+ learned via IBGP, and the other IBGP speaker didn't originate
+ the route, it is the neighbor AS from which the other IBGP
+ speaker learned the route. If the route is learned via IBGP,
+ and the other IBGP speaker either (a) originated the route, or
+ (b) created the route by aggregation and the AS_PATH attribute
+ of the aggregate route is either empty or begins with an
+ AS_SET, it is the local AS.
+
+ If a MULTI_EXIT_DISC attribute is removed before re-advertising
+ a route into IBGP, then comparison based on the received EBGP
+ MULTI_EXIT_DISC attribute MAY still be performed. If an
+ implementation chooses to remove MULTI_EXIT_DISC, then the
+ optional comparison on MULTI_EXIT_DISC, if performed, MUST be
+ performed only among EBGP-learned routes. The best EBGP-
+ learned route may then be compared with IBGP-learned routes
+ after the removal of the MULTI_EXIT_DISC attribute. If
+ MULTI_EXIT_DISC is removed from a subset of EBGP-learned
+ routes, and the selected "best" EBGP-learned route will not
+ have MULTI_EXIT_DISC removed, then the MULTI_EXIT_DISC must be
+ used in the comparison with IBGP-learned routes. For IBGP-
+ learned routes, the MULTI_EXIT_DISC MUST be used in route
+ comparisons that reach this step in the Decision Process.
+ Including the MULTI_EXIT_DISC of an EBGP-learned route in the
+ comparison with an IBGP-learned route, then removing the
+ MULTI_EXIT_DISC attribute, and advertising the route has been
+ proven to cause route loops.
+
+
+
+
+Rekhter, et al. Standards Track [Page 81]
+
+RFC 4271 BGP-4 January 2006
+
+
+ d) If at least one of the candidate routes was received via EBGP,
+ remove from consideration all routes that were received via
+ IBGP.
+
+ e) Remove from consideration any routes with less-preferred
+ interior cost. The interior cost of a route is determined by
+ calculating the metric to the NEXT_HOP for the route using the
+ Routing Table. If the NEXT_HOP hop for a route is reachable,
+ but no cost can be determined, then this step should be skipped
+ (equivalently, consider all routes to have equal costs).
+
+ This is also described in the following procedure.
+
+ for m = all routes still under consideration
+ for n = all routes in still under consideration
+ if (cost(n) is lower than cost(m))
+ remove m from consideration
+
+ In the pseudo-code above, cost(n) is a function that returns
+ the cost of the path (interior distance) to the address given
+ in the NEXT_HOP attribute of the route.
+
+ f) Remove from consideration all routes other than the route that
+ was advertised by the BGP speaker with the lowest BGP
+ Identifier value.
+
+ g) Prefer the route received from the lowest peer address.
+
+9.1.3. Phase 3: Route Dissemination
+
+ The Phase 3 decision function is invoked on completion of Phase 2, or
+ when any of the following events occur:
+
+ a) when routes in the Loc-RIB to local destinations have changed
+
+ b) when locally generated routes learned by means outside of BGP
+ have changed
+
+ c) when a new BGP speaker connection has been established
+
+ The Phase 3 function is a separate process that completes when it has
+ no further work to do. The Phase 3 Routing Decision function is
+ blocked from running while the Phase 2 decision function is in
+ process.
+
+ All routes in the Loc-RIB are processed into Adj-RIBs-Out according
+ to configured policy. This policy MAY exclude a route in the Loc-RIB
+ from being installed in a particular Adj-RIB-Out. A route SHALL NOT
+
+
+
+Rekhter, et al. Standards Track [Page 82]
+
+RFC 4271 BGP-4 January 2006
+
+
+ be installed in the Adj-Rib-Out unless the destination, and NEXT_HOP
+ described by this route, may be forwarded appropriately by the
+ Routing Table. If a route in Loc-RIB is excluded from a particular
+ Adj-RIB-Out, the previously advertised route in that Adj-RIB-Out MUST
+ be withdrawn from service by means of an UPDATE message (see 9.2).
+
+ Route aggregation and information reduction techniques (see Section
+ 9.2.2.1) may optionally be applied.
+
+ Any local policy that results in routes being added to an Adj-RIB-Out
+ without also being added to the local BGP speaker's forwarding table
+ is outside the scope of this document.
+
+ When the updating of the Adj-RIBs-Out and the Routing Table is
+ complete, the local BGP speaker runs the Update-Send process of 9.2.
+
+9.1.4. Overlapping Routes
+
+ A BGP speaker may transmit routes with overlapping Network Layer
+ Reachability Information (NLRI) to another BGP speaker. NLRI overlap
+ occurs when a set of destinations are identified in non-matching
+ multiple routes. Because BGP encodes NLRI using IP prefixes, overlap
+ will always exhibit subset relationships. A route describing a
+ smaller set of destinations (a longer prefix) is said to be more
+ specific than a route describing a larger set of destinations (a
+ shorter prefix); similarly, a route describing a larger set of
+ destinations is said to be less specific than a route describing a
+ smaller set of destinations.
+
+ The precedence relationship effectively decomposes less specific
+ routes into two parts:
+
+ - a set of destinations described only by the less specific route,
+ and
+
+ - a set of destinations described by the overlap of the less
+ specific and the more specific routes
+
+ The set of destinations described by the overlap represents a portion
+ of the less specific route that is feasible, but is not currently in
+ use. If a more specific route is later withdrawn, the set of
+ destinations described by the overlap will still be reachable using
+ the less specific route.
+
+ If a BGP speaker receives overlapping routes, the Decision Process
+ MUST consider both routes based on the configured acceptance policy.
+ If both a less and a more specific route are accepted, then the
+ Decision Process MUST install, in Loc-RIB, either both the less and
+
+
+
+Rekhter, et al. Standards Track [Page 83]
+
+RFC 4271 BGP-4 January 2006
+
+
+ the more specific routes or aggregate the two routes and install, in
+ Loc-RIB, the aggregated route, provided that both routes have the
+ same value of the NEXT_HOP attribute.
+
+ If a BGP speaker chooses to aggregate, then it SHOULD either include
+ all ASes used to form the aggregate in an AS_SET, or add the
+ ATOMIC_AGGREGATE attribute to the route. This attribute is now
+ primarily informational. With the elimination of IP routing
+ protocols that do not support classless routing, and the elimination
+ of router and host implementations that do not support classless
+ routing, there is no longer a need to de-aggregate. Routes SHOULD
+ NOT be de-aggregated. In particular, a route that carries the
+ ATOMIC_AGGREGATE attribute MUST NOT be de-aggregated. That is, the
+ NLRI of this route cannot be more specific. Forwarding along such a
+ route does not guarantee that IP packets will actually traverse only
+ ASes listed in the AS_PATH attribute of the route.
+
+9.2. Update-Send Process
+
+ The Update-Send process is responsible for advertising UPDATE
+ messages to all peers. For example, it distributes the routes chosen
+ by the Decision Process to other BGP speakers, which may be located
+ in either the same autonomous system or a neighboring autonomous
+ system.
+
+ When a BGP speaker receives an UPDATE message from an internal peer,
+ the receiving BGP speaker SHALL NOT re-distribute the routing
+ information contained in that UPDATE message to other internal peers
+ (unless the speaker acts as a BGP Route Reflector [RFC2796]).
+
+ As part of Phase 3 of the route selection process, the BGP speaker
+ has updated its Adj-RIBs-Out. All newly installed routes and all
+ newly unfeasible routes for which there is no replacement route SHALL
+ be advertised to its peers by means of an UPDATE message.
+
+ A BGP speaker SHOULD NOT advertise a given feasible BGP route from
+ its Adj-RIB-Out if it would produce an UPDATE message containing the
+ same BGP route as was previously advertised.
+
+ Any routes in the Loc-RIB marked as unfeasible SHALL be removed.
+ Changes to the reachable destinations within its own autonomous
+ system SHALL also be advertised in an UPDATE message.
+
+ If, due to the limits on the maximum size of an UPDATE message (see
+ Section 4), a single route doesn't fit into the message, the BGP
+ speaker MUST not advertise the route to its peers and MAY choose to
+ log an error locally.
+
+
+
+
+Rekhter, et al. Standards Track [Page 84]
+
+RFC 4271 BGP-4 January 2006
+
+
+9.2.1. Controlling Routing Traffic Overhead
+
+ The BGP protocol constrains the amount of routing traffic (that is,
+ UPDATE messages), in order to limit both the link bandwidth needed to
+ advertise UPDATE messages and the processing power needed by the
+ Decision Process to digest the information contained in the UPDATE
+ messages.
+
+9.2.1.1. Frequency of Route Advertisement
+
+ The parameter MinRouteAdvertisementIntervalTimer determines the
+ minimum amount of time that must elapse between an advertisement
+ and/or withdrawal of routes to a particular destination by a BGP
+ speaker to a peer. This rate limiting procedure applies on a per-
+ destination basis, although the value of
+ MinRouteAdvertisementIntervalTimer is set on a per BGP peer basis.
+
+ Two UPDATE messages sent by a BGP speaker to a peer that advertise
+ feasible routes and/or withdrawal of unfeasible routes to some common
+ set of destinations MUST be separated by at least
+ MinRouteAdvertisementIntervalTimer. This can only be achieved by
+ keeping a separate timer for each common set of destinations. This
+ would be unwarranted overhead. Any technique that ensures that the
+ interval between two UPDATE messages sent from a BGP speaker to a
+ peer that advertise feasible routes and/or withdrawal of unfeasible
+ routes to some common set of destinations will be at least
+ MinRouteAdvertisementIntervalTimer, and will also ensure that a
+ constant upper bound on the interval is acceptable.
+
+ Since fast convergence is needed within an autonomous system, either
+ (a) the MinRouteAdvertisementIntervalTimer used for internal peers
+ SHOULD be shorter than the MinRouteAdvertisementIntervalTimer used
+ for external peers, or (b) the procedure describe in this section
+ SHOULD NOT apply to routes sent to internal peers.
+
+ This procedure does not limit the rate of route selection, but only
+ the rate of route advertisement. If new routes are selected multiple
+ times while awaiting the expiration of
+ MinRouteAdvertisementIntervalTimer, the last route selected SHALL be
+ advertised at the end of MinRouteAdvertisementIntervalTimer.
+
+9.2.1.2. Frequency of Route Origination
+
+ The parameter MinASOriginationIntervalTimer determines the minimum
+ amount of time that must elapse between successive advertisements of
+ UPDATE messages that report changes within the advertising BGP
+ speaker's own autonomous systems.
+
+
+
+
+Rekhter, et al. Standards Track [Page 85]
+
+RFC 4271 BGP-4 January 2006
+
+
+9.2.2. Efficient Organization of Routing Information
+
+ Having selected the routing information it will advertise, a BGP
+ speaker may avail itself of several methods to organize this
+ information in an efficient manner.
+
+9.2.2.1. Information Reduction
+
+ Information reduction may imply a reduction in granularity of policy
+ control - after information is collapsed, the same policies will
+ apply to all destinations and paths in the equivalence class.
+
+ The Decision Process may optionally reduce the amount of information
+ that it will place in the Adj-RIBs-Out by any of the following
+ methods:
+
+ a) Network Layer Reachability Information (NLRI):
+
+ Destination IP addresses can be represented as IP address
+ prefixes. In cases where there is a correspondence between the
+ address structure and the systems under control of an
+ autonomous system administrator, it will be possible to reduce
+ the size of the NLRI carried in the UPDATE messages.
+
+ b) AS_PATHs:
+
+ AS path information can be represented as ordered AS_SEQUENCEs
+ or unordered AS_SETs. AS_SETs are used in the route
+ aggregation algorithm described in Section 9.2.2.2. They
+ reduce the size of the AS_PATH information by listing each AS
+ number only once, regardless of how many times it may have
+ appeared in multiple AS_PATHs that were aggregated.
+
+ An AS_SET implies that the destinations listed in the NLRI can
+ be reached through paths that traverse at least some of the
+ constituent autonomous systems. AS_SETs provide sufficient
+ information to avoid routing information looping; however,
+ their use may prune potentially feasible paths because such
+ paths are no longer listed individually in the form of
+ AS_SEQUENCEs. In practice, this is not likely to be a problem
+ because once an IP packet arrives at the edge of a group of
+ autonomous systems, the BGP speaker is likely to have more
+ detailed path information and can distinguish individual paths
+ from destinations.
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 86]
+
+RFC 4271 BGP-4 January 2006
+
+
+9.2.2.2. Aggregating Routing Information
+
+ Aggregation is the process of combining the characteristics of
+ several different routes in such a way that a single route can be
+ advertised. Aggregation can occur as part of the Decision Process to
+ reduce the amount of routing information that will be placed in the
+ Adj-RIBs-Out.
+
+ Aggregation reduces the amount of information that a BGP speaker must
+ store and exchange with other BGP speakers. Routes can be aggregated
+ by applying the following procedure, separately, to path attributes
+ of the same type and to the Network Layer Reachability Information.
+
+ Routes that have different MULTI_EXIT_DISC attributes SHALL NOT be
+ aggregated.
+
+ If the aggregated route has an AS_SET as the first element in its
+ AS_PATH attribute, then the router that originates the route SHOULD
+ NOT advertise the MULTI_EXIT_DISC attribute with this route.
+
+ Path attributes that have different type codes cannot be aggregated
+ together. Path attributes of the same type code may be aggregated,
+ according to the following rules:
+
+ NEXT_HOP:
+ When aggregating routes that have different NEXT_HOP
+ attributes, the NEXT_HOP attribute of the aggregated route
+ SHALL identify an interface on the BGP speaker that performs
+ the aggregation.
+
+ ORIGIN attribute:
+ If at least one route among routes that are aggregated has
+ ORIGIN with the value INCOMPLETE, then the aggregated route
+ MUST have the ORIGIN attribute with the value INCOMPLETE.
+ Otherwise, if at least one route among routes that are
+ aggregated has ORIGIN with the value EGP, then the aggregated
+ route MUST have the ORIGIN attribute with the value EGP. In
+ all other cases,, the value of the ORIGIN attribute of the
+ aggregated route is IGP.
+
+ AS_PATH attribute:
+ If routes to be aggregated have identical AS_PATH attributes,
+ then the aggregated route has the same AS_PATH attribute as
+ each individual route.
+
+ For the purpose of aggregating AS_PATH attributes, we model
+ each AS within the AS_PATH attribute as a tuple <type, value>,
+ where "type" identifies a type of the path segment the AS
+
+
+
+Rekhter, et al. Standards Track [Page 87]
+
+RFC 4271 BGP-4 January 2006
+
+
+ belongs to (e.g., AS_SEQUENCE, AS_SET), and "value" identifies
+ the AS number. If the routes to be aggregated have different
+ AS_PATH attributes, then the aggregated AS_PATH attribute SHALL
+ satisfy all of the following conditions:
+
+ - all tuples of type AS_SEQUENCE in the aggregated AS_PATH
+ SHALL appear in all of the AS_PATHs in the initial set of
+ routes to be aggregated.
+
+ - all tuples of type AS_SET in the aggregated AS_PATH SHALL
+ appear in at least one of the AS_PATHs in the initial set
+ (they may appear as either AS_SET or AS_SEQUENCE types).
+
+ - for any tuple X of type AS_SEQUENCE in the aggregated
+ AS_PATH, which precedes tuple Y in the aggregated AS_PATH,
+ X precedes Y in each AS_PATH in the initial set, which
+ contains Y, regardless of the type of Y.
+
+ - No tuple of type AS_SET with the same value SHALL appear
+ more than once in the aggregated AS_PATH.
+
+ - Multiple tuples of type AS_SEQUENCE with the same value may
+ appear in the aggregated AS_PATH only when adjacent to
+ another tuple of the same type and value.
+
+ An implementation may choose any algorithm that conforms to
+ these rules. At a minimum, a conformant implementation SHALL
+ be able to perform the following algorithm that meets all of
+ the above conditions:
+
+ - determine the longest leading sequence of tuples (as
+ defined above) common to all the AS_PATH attributes of the
+ routes to be aggregated. Make this sequence the leading
+ sequence of the aggregated AS_PATH attribute.
+
+ - set the type of the rest of the tuples from the AS_PATH
+ attributes of the routes to be aggregated to AS_SET, and
+ append them to the aggregated AS_PATH attribute.
+
+ - if the aggregated AS_PATH has more than one tuple with the
+ same value (regardless of tuple's type), eliminate all but
+ one such tuple by deleting tuples of the type AS_SET from
+ the aggregated AS_PATH attribute.
+
+ - for each pair of adjacent tuples in the aggregated AS_PATH,
+ if both tuples have the same type, merge them together, as
+ long as doing so will not cause a segment with a length
+ greater than 255 to be generated.
+
+
+
+Rekhter, et al. Standards Track [Page 88]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Appendix F, Section F.6 presents another algorithm that
+ satisfies the conditions and allows for more complex policy
+ configurations.
+
+ ATOMIC_AGGREGATE:
+ If at least one of the routes to be aggregated has
+ ATOMIC_AGGREGATE path attribute, then the aggregated route
+ SHALL have this attribute as well.
+
+ AGGREGATOR:
+ Any AGGREGATOR attributes from the routes to be aggregated MUST
+ NOT be included in the aggregated route. The BGP speaker
+ performing the route aggregation MAY attach a new AGGREGATOR
+ attribute (see Section 5.1.7).
+
+9.3. Route Selection Criteria
+
+ Generally, additional rules for comparing routes among several
+ alternatives are outside the scope of this document. There are two
+ exceptions:
+
+ - If the local AS appears in the AS path of the new route being
+ considered, then that new route cannot be viewed as better than
+ any other route (provided that the speaker is configured to
+ accept such routes). If such a route were ever used, a routing
+ loop could result.
+
+ - In order to achieve a successful distributed operation, only
+ routes with a likelihood of stability can be chosen. Thus, an
+ AS SHOULD avoid using unstable routes, and it SHOULD NOT make
+ rapid, spontaneous changes to its choice of route. Quantifying
+ the terms "unstable" and "rapid" (from the previous sentence)
+ will require experience, but the principle is clear. Routes
+ that are unstable can be "penalized" (e.g., by using the
+ procedures described in [RFC2439]).
+
+9.4. Originating BGP routes
+
+ A BGP speaker may originate BGP routes by injecting routing
+ information acquired by some other means (e.g., via an IGP) into BGP.
+ A BGP speaker that originates BGP routes assigns the degree of
+ preference (e.g., according to local configuration) to these routes
+ by passing them through the Decision Process (see Section 9.1).
+ These routes MAY also be distributed to other BGP speakers within the
+ local AS as part of the update process (see Section 9.2). The
+ decision of whether to distribute non-BGP acquired routes within an
+ AS via BGP depends on the environment within the AS (e.g., type of
+ IGP) and SHOULD be controlled via configuration.
+
+
+
+Rekhter, et al. Standards Track [Page 89]
+
+RFC 4271 BGP-4 January 2006
+
+
+10. BGP Timers
+
+ BGP employs five timers: ConnectRetryTimer (see Section 8), HoldTimer
+ (see Section 4.2), KeepaliveTimer (see Section 8),
+ MinASOriginationIntervalTimer (see Section 9.2.1.2), and
+ MinRouteAdvertisementIntervalTimer (see Section 9.2.1.1).
+
+ Two optional timers MAY be supported: DelayOpenTimer, IdleHoldTimer
+ by BGP (see Section 8). Section 8 describes their use. The full
+ operation of these optional timers is outside the scope of this
+ document.
+
+ ConnectRetryTime is a mandatory FSM attribute that stores the initial
+ value for the ConnectRetryTimer. The suggested default value for the
+ ConnectRetryTime is 120 seconds.
+
+ HoldTime is a mandatory FSM attribute that stores the initial value
+ for the HoldTimer. The suggested default value for the HoldTime is
+ 90 seconds.
+
+ During some portions of the state machine (see Section 8), the
+ HoldTimer is set to a large value. The suggested default for this
+ large value is 4 minutes.
+
+ The KeepaliveTime is a mandatory FSM attribute that stores the
+ initial value for the KeepaliveTimer. The suggested default value
+ for the KeepaliveTime is 1/3 of the HoldTime.
+
+ The suggested default value for the MinASOriginationIntervalTimer is
+ 15 seconds.
+
+ The suggested default value for the
+ MinRouteAdvertisementIntervalTimer on EBGP connections is 30 seconds.
+
+ The suggested default value for the
+ MinRouteAdvertisementIntervalTimer on IBGP connections is 5 seconds.
+
+ An implementation of BGP MUST allow the HoldTimer to be configurable
+ on a per-peer basis, and MAY allow the other timers to be
+ configurable.
+
+ To minimize the likelihood that the distribution of BGP messages by a
+ given BGP speaker will contain peaks, jitter SHOULD be applied to the
+ timers associated with MinASOriginationIntervalTimer, KeepaliveTimer,
+ MinRouteAdvertisementIntervalTimer, and ConnectRetryTimer. A given
+ BGP speaker MAY apply the same jitter to each of these quantities,
+ regardless of the destinations to which the updates are being sent;
+ that is, jitter need not be configured on a per-peer basis.
+
+
+
+Rekhter, et al. Standards Track [Page 90]
+
+RFC 4271 BGP-4 January 2006
+
+
+ The suggested default amount of jitter SHALL be determined by
+ multiplying the base value of the appropriate timer by a random
+ factor, which is uniformly distributed in the range from 0.75 to 1.0.
+ A new random value SHOULD be picked each time the timer is set. The
+ range of the jitter's random value MAY be configurable.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 91]
+
+RFC 4271 BGP-4 January 2006
+
+
+Appendix A. Comparison with RFC 1771
+
+ There are numerous editorial changes in comparison to [RFC1771] (too
+ many to list here).
+
+ The following list the technical changes:
+
+ Changes to reflect the usage of features such as TCP MD5
+ [RFC2385], BGP Route Reflectors [RFC2796], BGP Confederations
+ [RFC3065], and BGP Route Refresh [RFC2918].
+
+ Clarification of the use of the BGP Identifier in the AGGREGATOR
+ attribute.
+
+ Procedures for imposing an upper bound on the number of prefixes
+ that a BGP speaker would accept from a peer.
+
+ The ability of a BGP speaker to include more than one instance of
+ its own AS in the AS_PATH attribute for the purpose of inter-AS
+ traffic engineering.
+
+ Clarification of the various types of NEXT_HOPs.
+
+ Clarification of the use of the ATOMIC_AGGREGATE attribute.
+
+ The relationship between the immediate next hop, and the next hop
+ as specified in the NEXT_HOP path attribute.
+
+ Clarification of the tie-breaking procedures.
+
+ Clarification of the frequency of route advertisements.
+
+ Optional Parameter Type 1 (Authentication Information) has been
+ deprecated.
+
+ UPDATE Message Error subcode 7 (AS Routing Loop) has been
+ deprecated.
+
+ OPEN Message Error subcode 5 (Authentication Failure) has been
+ deprecated.
+
+ Use of the Marker field for authentication has been deprecated.
+
+ Implementations MUST support TCP MD5 [RFC2385] for authentication.
+
+ Clarification of BGP FSM.
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 92]
+
+RFC 4271 BGP-4 January 2006
+
+
+Appendix B. Comparison with RFC 1267
+
+ All the changes listed in Appendix A, plus the following.
+
+ BGP-4 is capable of operating in an environment where a set of
+ reachable destinations may be expressed via a single IP prefix. The
+ concept of network classes, or subnetting, is foreign to BGP-4. To
+ accommodate these capabilities, BGP-4 changes the semantics and
+ encoding associated with the AS_PATH attribute. New text has been
+ added to define semantics associated with IP prefixes. These
+ abilities allow BGP-4 to support the proposed supernetting scheme
+ [RFC1518, RFC1519].
+
+ To simplify configuration, this version introduces a new attribute,
+ LOCAL_PREF, that facilitates route selection procedures.
+
+ The INTER_AS_METRIC attribute has been renamed MULTI_EXIT_DISC.
+
+ A new attribute, ATOMIC_AGGREGATE, has been introduced to insure that
+ certain aggregates are not de-aggregated. Another new attribute,
+ AGGREGATOR, can be added to aggregate routes to advertise which AS
+ and which BGP speaker within that AS caused the aggregation.
+
+ To ensure that Hold Timers are symmetric, the Hold Timer is now
+ negotiated on a per-connection basis. Hold Timers of zero are now
+ supported.
+
+Appendix C. Comparison with RFC 1163
+
+ All of the changes listed in Appendices A and B, plus the following.
+
+ To detect and recover from BGP connection collision, a new field (BGP
+ Identifier) has been added to the OPEN message. New text (Section
+ 6.8) has been added to specify the procedure for detecting and
+ recovering from collision.
+
+ The new document no longer restricts the router that is passed in the
+ NEXT_HOP path attribute to be part of the same Autonomous System as
+ the BGP Speaker.
+
+ The new document optimizes and simplifies the exchange of information
+ about previously reachable routes.
+
+
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 93]
+
+RFC 4271 BGP-4 January 2006
+
+
+Appendix D. Comparison with RFC 1105
+
+ All of the changes listed in Appendices A, B, and C, plus the
+ following.
+
+ Minor changes to the [RFC1105] Finite State Machine were necessary to
+ accommodate the TCP user interface provided by BSD version 4.3.
+
+ The notion of Up/Down/Horizontal relations presented in RFC 1105 has
+ been removed from the protocol.
+
+ The changes in the message format from RFC 1105 are as follows:
+
+ 1. The Hold Time field has been removed from the BGP header and
+ added to the OPEN message.
+
+ 2. The version field has been removed from the BGP header and
+ added to the OPEN message.
+
+ 3. The Link Type field has been removed from the OPEN message.
+
+ 4. The OPEN CONFIRM message has been eliminated and replaced with
+ implicit confirmation, provided by the KEEPALIVE message.
+
+ 5. The format of the UPDATE message has been changed
+ significantly. New fields were added to the UPDATE message to
+ support multiple path attributes.
+
+ 6. The Marker field has been expanded and its role broadened to
+ support authentication.
+
+ Note that quite often BGP, as specified in RFC 1105, is referred to
+ as BGP-1; BGP, as specified in [RFC1163], is referred to as BGP-2;
+ BGP, as specified in RFC 1267 is referred to as BGP-3; and BGP, as
+ specified in this document is referred to as BGP-4.
+
+Appendix E. TCP Options that May Be Used with BGP
+
+ If a local system TCP user interface supports the TCP PUSH function,
+ then each BGP message SHOULD be transmitted with PUSH flag set.
+ Setting PUSH flag forces BGP messages to be transmitted to the
+ receiver promptly.
+
+ If a local system TCP user interface supports setting the DSCP field
+ [RFC2474] for TCP connections, then the TCP connection used by BGP
+ SHOULD be opened with bits 0-2 of the DSCP field set to 110 (binary).
+
+ An implementation MUST support the TCP MD5 option [RFC2385].
+
+
+
+Rekhter, et al. Standards Track [Page 94]
+
+RFC 4271 BGP-4 January 2006
+
+
+Appendix F. Implementation Recommendations
+
+ This section presents some implementation recommendations.
+
+Appendix F.1. Multiple Networks Per Message
+
+ The BGP protocol allows for multiple address prefixes with the same
+ path attributes to be specified in one message. Using this
+ capability is highly recommended. With one address prefix per
+ message there is a substantial increase in overhead in the receiver.
+ Not only does the system overhead increase due to the reception of
+ multiple messages, but the overhead of scanning the routing table for
+ updates to BGP peers and other routing protocols (and sending the
+ associated messages) is incurred multiple times as well.
+
+ One method of building messages that contain many address prefixes
+ per path attribute set from a routing table that is not organized on
+ a per path attribute set basis is to build many messages as the
+ routing table is scanned. As each address prefix is processed, a
+ message for the associated set of path attributes is allocated, if it
+ does not exist, and the new address prefix is added to it. If such a
+ message exists, the new address prefix is appended to it. If the
+ message lacks the space to hold the new address prefix, it is
+ transmitted, a new message is allocated, and the new address prefix
+ is inserted into the new message. When the entire routing table has
+ been scanned, all allocated messages are sent and their resources are
+ released. Maximum compression is achieved when all destinations
+ covered by the address prefixes share a common set of path
+ attributes, making it possible to send many address prefixes in one
+ 4096-byte message.
+
+ When peering with a BGP implementation that does not compress
+ multiple address prefixes into one message, it may be necessary to
+ take steps to reduce the overhead from the flood of data received
+ when a peer is acquired or when a significant network topology change
+ occurs. One method of doing this is to limit the rate of updates.
+ This will eliminate the redundant scanning of the routing table to
+ provide flash updates for BGP peers and other routing protocols. A
+ disadvantage of this approach is that it increases the propagation
+ latency of routing information. By choosing a minimum flash update
+ interval that is not much greater than the time it takes to process
+ the multiple messages, this latency should be minimized. A better
+ method would be to read all received messages before sending updates.
+
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 95]
+
+RFC 4271 BGP-4 January 2006
+
+
+Appendix F.2. Reducing Route Flapping
+
+ To avoid excessive route flapping, a BGP speaker that needs to
+ withdraw a destination and send an update about a more specific or
+ less specific route should combine them into the same UPDATE message.
+
+Appendix F.3. Path Attribute Ordering
+
+ Implementations that combine update messages (as described above in
+ Section 6.1) may prefer to see all path attributes presented in a
+ known order. This permits them to quickly identify sets of
+ attributes from different update messages that are semantically
+ identical. To facilitate this, it is a useful optimization to order
+ the path attributes according to type code. This optimization is
+ entirely optional.
+
+Appendix F.4. AS_SET Sorting
+
+ Another useful optimization that can be done to simplify this
+ situation is to sort the AS numbers found in an AS_SET. This
+ optimization is entirely optional.
+
+Appendix F.5. Control Over Version Negotiation
+
+ Because BGP-4 is capable of carrying aggregated routes that cannot be
+ properly represented in BGP-3, an implementation that supports BGP-4
+ and another BGP version should provide the capability to only speak
+ BGP-4 on a per-peer basis.
+
+Appendix F.6. Complex AS_PATH Aggregation
+
+ An implementation that chooses to provide a path aggregation
+ algorithm retaining significant amounts of path information may wish
+ to use the following procedure:
+
+ For the purpose of aggregating AS_PATH attributes of two routes,
+ we model each AS as a tuple <type, value>, where "type" identifies
+ a type of the path segment the AS belongs to (e.g., AS_SEQUENCE,
+ AS_SET), and "value" is the AS number. Two ASes are said to be
+ the same if their corresponding <type, value> tuples are the same.
+
+ The algorithm to aggregate two AS_PATH attributes works as
+ follows:
+
+ a) Identify the same ASes (as defined above) within each
+ AS_PATH attribute that are in the same relative order within
+ both AS_PATH attributes. Two ASes, X and Y, are said to be
+ in the same order if either:
+
+
+
+Rekhter, et al. Standards Track [Page 96]
+
+RFC 4271 BGP-4 January 2006
+
+
+ - X precedes Y in both AS_PATH attributes, or
+ - Y precedes X in both AS_PATH attributes.
+
+ b) The aggregated AS_PATH attribute consists of ASes identified
+ in (a), in exactly the same order as they appear in the
+ AS_PATH attributes to be aggregated. If two consecutive
+ ASes identified in (a) do not immediately follow each other
+ in both of the AS_PATH attributes to be aggregated, then the
+ intervening ASes (ASes that are between the two consecutive
+ ASes that are the same) in both attributes are combined into
+ an AS_SET path segment that consists of the intervening ASes
+ from both AS_PATH attributes. This segment is then placed
+ between the two consecutive ASes identified in (a) of the
+ aggregated attribute. If two consecutive ASes identified in
+ (a) immediately follow each other in one attribute, but do
+ not follow in another, then the intervening ASes of the
+ latter are combined into an AS_SET path segment. This
+ segment is then placed between the two consecutive ASes
+ identified in (a) of the aggregated attribute.
+
+ c) For each pair of adjacent tuples in the aggregated AS_PATH,
+ if both tuples have the same type, merge them together if
+ doing so will not cause a segment of a length greater than
+ 255 to be generated.
+
+ If, as a result of the above procedure, a given AS number appears
+ more than once within the aggregated AS_PATH attribute, all but
+ the last instance (rightmost occurrence) of that AS number should
+ be removed from the aggregated AS_PATH attribute.
+
+Security Considerations
+
+ A BGP implementation MUST support the authentication mechanism
+ specified in RFC 2385 [RFC2385]. The authentication provided by this
+ mechanism could be done on a per-peer basis.
+
+ BGP makes use of TCP for reliable transport of its traffic between
+ peer routers. To provide connection-oriented integrity and data
+ origin authentication on a point-to-point basis, BGP specifies use of
+ the mechanism defined in RFC 2385. These services are intended to
+ detect and reject active wiretapping attacks against the inter-router
+ TCP connections. Absent the use of mechanisms that effect these
+ security services, attackers can disrupt these TCP connections and/or
+ masquerade as a legitimate peer router. Because the mechanism
+ defined in the RFC does not provide peer-entity authentication, these
+ connections may be subject to some forms of replay attacks that will
+ not be detected at the TCP layer. Such attacks might result in
+ delivery (from TCP) of "broken" or "spoofed" BGP messages.
+
+
+
+Rekhter, et al. Standards Track [Page 97]
+
+RFC 4271 BGP-4 January 2006
+
+
+ The mechanism defined in RFC 2385 augments the normal TCP checksum
+ with a 16-byte message authentication code (MAC) that is computed
+ over the same data as the TCP checksum. This MAC is based on a one-
+ way hash function (MD5) and use of a secret key. The key is shared
+ between peer routers and is used to generate MAC values that are not
+ readily computed by an attacker who does not have access to the key.
+ A compliant implementation must support this mechanism, and must
+ allow a network administrator to activate it on a per-peer basis.
+
+ RFC 2385 does not specify a means of managing (e.g., generating,
+ distributing, and replacing) the keys used to compute the MAC. RFC
+ 3562 [RFC3562] (an informational document) provides some guidance in
+ this area, and provides rationale to support this guidance. It notes
+ that a distinct key should be used for communication with each
+ protected peer. If the same key is used for multiple peers, the
+ offered security services may be degraded, e.g., due to an increased
+ risk of compromise at one router that adversely affects other
+ routers.
+
+ The keys used for MAC computation should be changed periodically, to
+ minimize the impact of a key compromise or successful cryptanalytic
+ attack. RFC 3562 suggests a crypto period (the interval during which
+ a key is employed) of, at most, 90 days. More frequent key changes
+ reduce the likelihood that replay attacks (as described above) will
+ be feasible. However, absent a standard mechanism for effecting such
+ changes in a coordinated fashion between peers, one cannot assume
+ that BGP-4 implementations complying with this RFC will support
+ frequent key changes.
+
+ Obviously, each should key also be chosen to be difficult for an
+ attacker to guess. The techniques specified in RFC 1750 for random
+ number generation provide a guide for generation of values that could
+ be used as keys. RFC 2385 calls for implementations to support keys
+ "composed of a string of printable ASCII of 80 bytes or less." RFC
+ 3562 suggests keys used in this context be 12 to 24 bytes of random
+ (pseudo-random) bits. This is fairly consistent with suggestions for
+ analogous MAC algorithms, which typically employ keys in the range of
+ 16 to 20 bytes. To provide enough random bits at the low end of this
+ range, RFC 3562 also observes that a typical ACSII text string would
+ have to be close to the upper bound for the key length specified in
+ RFC 2385.
+
+ BGP vulnerabilities analysis is discussed in [RFC4272].
+
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 98]
+
+RFC 4271 BGP-4 January 2006
+
+
+IANA Considerations
+
+ All the BGP messages contain an 8-bit message type, for which IANA
+ has created and is maintaining a registry entitled "BGP Message
+ Types". This document defines the following message types:
+
+ Name Value Definition
+ ---- ----- ----------
+ OPEN 1 See Section 4.2
+ UPDATE 2 See Section 4.3
+ NOTIFICATION 3 See Section 4.5
+ KEEPALIVE 4 See Section 4.4
+
+ Future assignments are to be made using either the Standards Action
+ process defined in [RFC2434], or the Early IANA Allocation process
+ defined in [RFC4020]. Assignments consist of a name and the value.
+
+ The BGP UPDATE messages may carry one or more Path Attributes, where
+ each Attribute contains an 8-bit Attribute Type Code. IANA is
+ already maintaining such a registry, entitled "BGP Path Attributes".
+ This document defines the following Path Attributes Type Codes:
+
+ Name Value Definition
+ ---- ----- ----------
+ ORIGIN 1 See Section 5.1.1
+ AS_PATH 2 See Section 5.1.2
+ NEXT_HOP 3 See Section 5.1.3
+ MULTI_EXIT_DISC 4 See Section 5.1.4
+ LOCAL_PREF 5 See Section 5.1.5
+ ATOMIC_AGGREGATE 6 See Section 5.1.6
+ AGGREGATOR 7 See Section 5.1.7
+
+ Future assignments are to be made using either the Standards Action
+ process defined in [RFC2434], or the Early IANA Allocation process
+ defined in [RFC4020]. Assignments consist of a name and the value.
+
+ The BGP NOTIFICATION message carries an 8-bit Error Code, for which
+ IANA has created and is maintaining a registry entitled "BGP Error
+ Codes". This document defines the following Error Codes:
+
+ Name Value Definition
+ ------------ ----- ----------
+ Message Header Error 1 Section 6.1
+ OPEN Message Error 2 Section 6.2
+ UPDATE Message Error 3 Section 6.3
+ Hold Timer Expired 4 Section 6.5
+ Finite State Machine Error 5 Section 6.6
+ Cease 6 Section 6.7
+
+
+
+Rekhter, et al. Standards Track [Page 99]
+
+RFC 4271 BGP-4 January 2006
+
+
+ Future assignments are to be made using either the Standards Action
+ process defined in [RFC2434], or the Early IANA Allocation process
+ defined in [RFC4020]. Assignments consist of a name and the value.
+
+ The BGP NOTIFICATION message carries an 8-bit Error Subcode, where
+ each Subcode has to be defined within the context of a particular
+ Error Code, and thus has to be unique only within that context.
+
+ IANA has created and is maintaining a set of registries, "Error
+ Subcodes", with a separate registry for each BGP Error Code. Future
+ assignments are to be made using either the Standards Action process
+ defined in [RFC2434], or the Early IANA Allocation process defined in
+ [RFC4020]. Assignments consist of a name and the value.
+
+ This document defines the following Message Header Error subcodes:
+
+ Name Value Definition
+ -------------------- ----- ----------
+ Connection Not Synchronized 1 See Section 6.1
+ Bad Message Length 2 See Section 6.1
+ Bad Message Type 3 See Section 6.1
+
+ This document defines the following OPEN Message Error subcodes:
+
+ Name Value Definition
+ -------------------- ----- ----------
+ Unsupported Version Number 1 See Section 6.2
+ Bad Peer AS 2 See Section 6.2
+ Bad BGP Identifier 3 See Section 6.2
+ Unsupported Optional Parameter 4 See Section 6.2
+ [Deprecated] 5 See Appendix A
+ Unacceptable Hold Time 6 See Section 6.2
+
+ This document defines the following UPDATE Message Error subcodes:
+
+ Name Value Definition
+ -------------------- --- ----------
+ Malformed Attribute List 1 See Section 6.3
+ Unrecognized Well-known Attribute 2 See Section 6.3
+ Missing Well-known Attribute 3 See Section 6.3
+ Attribute Flags Error 4 See Section 6.3
+ Attribute Length Error 5 See Section 6.3
+ Invalid ORIGIN Attribute 6 See Section 6.3
+ [Deprecated] 7 See Appendix A
+ Invalid NEXT_HOP Attribute 8 See Section 6.3
+ Optional Attribute Error 9 See Section 6.3
+ Invalid Network Field 10 See Section 6.3
+ Malformed AS_PATH 11 See Section 6.3
+
+
+
+Rekhter, et al. Standards Track [Page 100]
+
+RFC 4271 BGP-4 January 2006
+
+
+Normative References
+
+ [RFC791] Postel, J., "Internet Protocol", STD 5, RFC 791, September
+ 1981.
+
+ [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC
+ 793, September 1981.
+
+ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
+ Requirement Levels", BCP 14, RFC 2119, March 1997.
+
+ [RFC2385] Heffernan, A., "Protection of BGP Sessions via the TCP MD5
+ Signature Option", RFC 2385, August 1998.
+
+ [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an
+ IANA Considerations Section in RFCs", BCP 26, RFC 2434,
+ October 1998.
+
+Informative References
+
+ [RFC904] Mills, D., "Exterior Gateway Protocol formal
+ specification", RFC 904, April 1984.
+
+ [RFC1092] Rekhter, J., "EGP and policy based routing in the new
+ NSFNET backbone", RFC 1092, February 1989.
+
+ [RFC1093] Braun, H., "NSFNET routing architecture", RFC 1093,
+ February 1989.
+
+ [RFC1105] Lougheed, K. and Y. Rekhter, "Border Gateway Protocol
+ (BGP)", RFC 1105, June 1989.
+
+ [RFC1163] Lougheed, K. and Y. Rekhter, "Border Gateway Protocol
+ (BGP)", RFC 1163, June 1990.
+
+ [RFC1267] Lougheed, K. and Y. Rekhter, "Border Gateway Protocol 3
+ (BGP-3)", RFC 1267, October 1991.
+
+ [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-
+ 4)", RFC 1771, March 1995.
+
+ [RFC1772] Rekhter, Y. and P. Gross, "Application of the Border
+ Gateway Protocol in the Internet", RFC 1772, March 1995.
+
+ [RFC1518] Rekhter, Y. and T. Li, "An Architecture for IP Address
+ Allocation with CIDR", RFC 1518, September 1993.
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 101]
+
+RFC 4271 BGP-4 January 2006
+
+
+ [RFC1519] Fuller, V., Li, T., Yu, J., and K. Varadhan, "Classless
+ Inter-Domain Routing (CIDR): an Address Assignment and
+ Aggregation Strategy", RFC 1519, September 1993.
+
+ [RFC1930] Hawkinson, J. and T. Bates, "Guidelines for creation,
+ selection, and registration of an Autonomous System (AS)",
+ BCP 6, RFC 1930, March 1996.
+
+ [RFC1997] Chandra, R., Traina, P., and T. Li, "BGP Communities
+ Attribute", RFC 1997, August 1996.
+
+ [RFC2439] Villamizar, C., Chandra, R., and R. Govindan, "BGP Route
+ Flap Damping", RFC 2439, November 1998.
+
+ [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black,
+ "Definition of the Differentiated Services Field (DS Field)
+ in the IPv4 and IPv6 Headers", RFC 2474, December 1998.
+
+ [RFC2796] Bates, T., Chandra, R., and E. Chen, "BGP Route Reflection
+ - An Alternative to Full Mesh IBGP", RFC 2796, April 2000.
+
+ [RFC2858] Bates, T., Rekhter, Y., Chandra, R., and D. Katz,
+ "Multiprotocol Extensions for BGP-4", RFC 2858, June 2000.
+
+ [RFC3392] Chandra, R. and J. Scudder, "Capabilities Advertisement
+ with BGP-4", RFC 3392, November 2002.
+
+ [RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC 2918,
+ September 2000.
+
+ [RFC3065] Traina, P., McPherson, D., and J. Scudder, "Autonomous
+ System Confederations for BGP", RFC 3065, February 2001.
+
+ [RFC3562] Leech, M., "Key Management Considerations for the TCP MD5
+ Signature Option", RFC 3562, July 2003.
+
+ [IS10747] "Information Processing Systems - Telecommunications and
+ Information Exchange between Systems - Protocol for
+ Exchange of Inter-domain Routeing Information among
+ Intermediate Systems to Support Forwarding of ISO 8473
+ PDUs", ISO/IEC IS10747, 1993.
+
+ [RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis", RFC
+ 4272, January 2006
+
+ [RFC4020] Kompella, K. and A. Zinin, "Early IANA Allocation of
+ Standards Track Code Points", BCP 100, RFC 4020, February
+ 2005.
+
+
+
+Rekhter, et al. Standards Track [Page 102]
+
+RFC 4271 BGP-4 January 2006
+
+
+Editors' Addresses
+
+ Yakov Rekhter
+ Juniper Networks
+
+ EMail: yakov@juniper.net
+
+
+ Tony Li
+
+ EMail: tony.li@tony.li
+
+
+ Susan Hares
+ NextHop Technologies, Inc.
+ 825 Victors Way
+ Ann Arbor, MI 48108
+
+ Phone: (734)222-1610
+ EMail: skh@nexthop.com
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 103]
+
+RFC 4271 BGP-4 January 2006
+
+
+Full Copyright Statement
+
+ Copyright (C) The Internet Society (2006).
+
+ This document is subject to the rights, licenses and restrictions
+ contained in BCP 78, and except as set forth therein, the authors
+ retain all their rights.
+
+ This document and the information contained herein are provided on an
+ "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
+ OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
+ ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
+ INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
+ INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
+ WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+
+Intellectual Property
+
+ The IETF takes no position regarding the validity or scope of any
+ Intellectual Property Rights or other rights that might be claimed to
+ pertain to the implementation or use of the technology described in
+ this document or the extent to which any license under such rights
+ might or might not be available; nor does it represent that it has
+ made any independent effort to identify any such rights. Information
+ on the procedures with respect to rights in RFC documents can be
+ found in BCP 78 and BCP 79.
+
+ Copies of IPR disclosures made to the IETF Secretariat and any
+ assurances of licenses to be made available, or the result of an
+ attempt made to obtain a general license or permission for the use of
+ such proprietary rights by implementers or users of this
+ specification can be obtained from the IETF on-line IPR repository at
+ http://www.ietf.org/ipr.
+
+ The IETF invites any interested party to bring to its attention any
+ copyrights, patents or patent applications, or other proprietary
+ rights that may cover technology that may be required to implement
+ this standard. Please address the information to the IETF at
+ ietf-ipr@ietf.org.
+
+Acknowledgement
+
+ Funding for the RFC Editor function is provided by the IETF
+ Administrative Support Activity (IASA).
+
+
+
+
+
+
+
+Rekhter, et al. Standards Track [Page 104]
+