draft-ietf-idr-bgp4-17.txt | draft-ietf-idr-bgp4-18.txt | |||
---|---|---|---|---|
Network Working Group Y. Rekhter | Network Working Group Y. Rekhter | |||
INTERNET DRAFT Juniper Networks | INTERNET DRAFT Juniper Networks | |||
T. Li | T. Li | |||
Procket Networks, Inc. | Procket Networks, Inc. | |||
S. Hares | ||||
NextHop Technologies, Inc. | ||||
Editors | Editors | |||
A Border Gateway Protocol 4 (BGP-4) | A Border Gateway Protocol 4 (BGP-4) | |||
<draft-ietf-idr-bgp4-17.txt> | <draft-ietf-idr-bgp4-18.txt> | |||
Status of this Memo | Status of this Memo | |||
This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
Drafts. | Drafts. | |||
skipping to change at page 1, line 33 | skipping to change at page 1, line 35 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as ``work in progress.'' | material or to cite them other than as ``work in progress.'' | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt | http://www.ietf.org/ietf/1id-abstracts.txt | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
1. Acknowledgments | Specification of Requirements | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||||
document are to be interpreted as described in RFC2119 [RFC2119]. | ||||
Table of Contents | ||||
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | ||||
1. Definition of commonly used terms . . . . . . . . . . . . . . 4 | ||||
2. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 6 | ||||
3. Summary of Operation . . . . . . . . . . . . . . . . . . . . . 7 | ||||
3.1 Routes: Advertisement and Storage . . . . . . . . . . . . . . 9 | ||||
3.2 Routing Information Bases . . . . . . . . . . . . . . . . . . 10 | ||||
4. Message Formats . . . . . . . . . . . . . . . . . . . . . . . 11 | ||||
4.1 Message Header Format . . . . . . . . . . . . . . . . . . . . 11 | ||||
4.2 OPEN Message Format . . . . . . . . . . . . . . . . . . . . . 12 | ||||
4.3 UPDATE Message Format . . . . . . . . . . . . . . . . . . . . 14 | ||||
4.4 KEEPALIVE Message Format . . . . . . . . . . . . . . . . . . 21 | ||||
4.5 NOTIFICATION Message Format . . . . . . . . . . . . . . . . . 21 | ||||
5. Path Attributes . . . . . . . . . . . . . . . . . . . . . . . 23 | ||||
5.1 Path Attribute Usage . . . . . . . . . . . . . . . . . . . . 25 | ||||
5.1.1 ORIGIN . . . . . . . . . . . . . . . . . . . . . . . . . . 25 | ||||
5.1.2 AS_PATH . . . . . . . . . . . . . . . . . . . . . . . . . . 25 | ||||
5.1.3 NEXT_HOP . . . . . . . . . . . . . . . . . . . . . . . . . 26 | ||||
5.1.4 MULTI_EXIT_DISC . . . . . . . . . . . . . . . . . . . . . . 28 | ||||
5.1.5 LOCAL_PREF . . . . . . . . . . . . . . . . . . . . . . . . 28 | ||||
5.1.6 ATOMIC_AGGREGATE . . . . . . . . . . . . . . . . . . . . . 29 | ||||
5.1.7 AGGREGATOR . . . . . . . . . . . . . . . . . . . . . . . . 30 | ||||
6. BGP Error Handling . . . . . . . . . . . . . . . . . . . . . . 30 | ||||
6.1 Message Header error handling . . . . . . . . . . . . . . . . 30 | ||||
6.2 OPEN message error handling . . . . . . . . . . . . . . . . . 31 | ||||
6.3 UPDATE message error handling . . . . . . . . . . . . . . . . 32 | ||||
6.4 NOTIFICATION message error handling . . . . . . . . . . . . . 34 | ||||
6.5 Hold Timer Expired error handling . . . . . . . . . . . . . . 34 | ||||
6.6 Finite State Machine error handling . . . . . . . . . . . . . 34 | ||||
6.7 Cease . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 | ||||
6.8 BGP connection collision detection . . . . . . . . . . . . . 35 | ||||
7. BGP Version Negotiation . . . . . . . . . . . . . . . . . . . 36 | ||||
8. BGP Finite State machine . . . . . . . . . . . . . . . . . . . 36 | ||||
8.1 Events for the BGP FSM . . . . . . . . . . . . . . . . . . . 37 | ||||
8.1.1 Administrative Events . . . . . . . . . . . . . . . . . . 37 | ||||
8.1.2 Timer Events . . . . . . . . . . . . . . . . . . . . . . . 38 | ||||
8.1.3 TCP connection based Events . . . . . . . . . . . . . . . . 39 | ||||
8.1.4 BGP Messages based Events . . . . . . . . . . . . . . . . . 41 | ||||
8.2 Description of FSM . . . . . . . . . . . . . . . . . . . . . 43 | ||||
8.2.1 FSM Definition . . . . . . . . . . . . . . . . . . . . . . 43 | ||||
8.2.1.1 Terms "active" and "passive" . . . . . . . . . . . . . . 43 | ||||
8.2.1.2 FSM and collision detection . . . . . . . . . . . . . . . 44 | ||||
8.2.2 Finite State Machine . . . . . . . . . . . . . . . . . . . 44 | ||||
9. UPDATE Message Handling . . . . . . . . . . . . . . . . . . . 57 | ||||
9.1 Decision Process . . . . . . . . . . . . . . . . . . . . . . 58 | ||||
9.1.1 Phase 1: Calculation of Degree of Preference . . . . . . . 59 | ||||
9.1.2 Phase 2: Route Selection . . . . . . . . . . . . . . . . . 60 | ||||
9.1.2.1 Route Resolvability Condition . . . . . . . . . . . . . . 61 | ||||
9.1.2.2 Breaking Ties (Phase 2) . . . . . . . . . . . . . . . . . 62 | ||||
9.1.3 Phase 3: Route Dissemination . . . . . . . . . . . . . . . 64 | ||||
9.1.4 Overlapping Routes . . . . . . . . . . . . . . . . . . . . 65 | ||||
9.2 Update-Send Process . . . . . . . . . . . . . . . . . . . . . 66 | ||||
9.2.1 Controlling Routing Traffic Overhead . . . . . . . . . . . 67 | ||||
9.2.1.1 Frequency of Route Advertisement . . . . . . . . . . . . 67 | ||||
9.2.1.2 Frequency of Route Origination . . . . . . . . . . . . . 68 | ||||
9.2.2 Efficient Organization of Routing Information . . . . . . . 68 | ||||
9.2.2.1 Information Reduction . . . . . . . . . . . . . . . . . . 68 | ||||
9.2.2.2 Aggregating Routing Information . . . . . . . . . . . . . 69 | ||||
9.3 Route Selection Criteria . . . . . . . . . . . . . . . . . . 72 | ||||
9.4 Originating BGP routes . . . . . . . . . . . . . . . . . . . 72 | ||||
10. BGP Timers . . . . . . . . . . . . . . . . . . . . . . . . . 72 | ||||
Appendix A. Comparison with RFC1771 . . . . . . . . . . . . . . . 73 | ||||
Appendix B. Comparison with RFC1267 . . . . . . . . . . . . . . . 74 | ||||
Appendix C. Comparison with RFC 1163 . . . . . . . . . . . . . . 75 | ||||
Appendix D. Comparison with RFC 1105 . . . . . . . . . . . . . . 75 | ||||
Appendix E. TCP options that may be used with BGP . . . . . . . . 76 | ||||
Appendix F. Implementation Recommendations . . . . . . . . . . . 76 | ||||
Appendix F.1 Multiple Networks Per Message . . . . . . . . . . . 76 | ||||
Appendix F.2 Reducing route flapping . . . . . . . . . . . . . . 77 | ||||
Appendix F.3 Path attribute ordering . . . . . . . . . . . . . . 77 | ||||
Appendix F.4 AS_SET sorting . . . . . . . . . . . . . . . . . . . 77 | ||||
Appendix F.5 Control over version negotiation . . . . . . . . . . 78 | ||||
Appendix F.6 Complex AS_PATH aggregation . . . . . . . . . . . . 78 | ||||
Security Considerations . . . . . . . . . . . . . . . . . . . . . 79 | ||||
References . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 | ||||
Authors Information . . . . . . . . . . . . . . . . . . . . . . . 80 | ||||
Abstract | ||||
The Border Gateway Protocol (BGP) is an inter-Autonomous System rout- | ||||
ing protocol. | ||||
The primary function of a BGP speaking system is to exchange network | ||||
reachability information with other BGP systems. This network reacha- | ||||
bility information includes information on the list of Autonomous | ||||
Systems (ASs) that reachability information traverses. This informa- | ||||
tion is sufficient to construct a graph of AS connectivity from which | ||||
routing loops may be pruned and some policy decisions at the AS level | ||||
may be enforced. | ||||
BGP-4 provides a set of mechanisms for supporting Classless Inter- | ||||
Domain Routing (CIDR) [RFC1518, RFC1519]. These mechanisms include | ||||
support for advertising a set of destinations as an IP prefix and | ||||
eliminating the concept of network "class" within BGP. BGP-4 also | ||||
introduces mechanisms which allow aggregation of routes, including | ||||
aggregation of AS paths. | ||||
Routing information exchanged via BGP supports only the destination- | ||||
based forwarding paradigm, which assumes that a router forwards a | ||||
packet based solely on the destination address carried in the IP | ||||
header of the packet. This, in turn, reflects the set of policy deci- | ||||
sions that can (and can not) be enforced using BGP. BGP can support | ||||
only the policies conforming to the destination-based forwarding | ||||
paradigm. | ||||
1. Definition of commonly used terms | ||||
This section provides definition for terms that have a specific mean- | ||||
ing to the BGP protocol and that are used throughout the text. | ||||
Autonomous System (AS) | ||||
The classic definition of an Autonomous System is a set of routers | ||||
under a single technical administration, using an interior gateway | ||||
protocol (IGP) and common metrics to determine how to route pack- | ||||
ets within the AS, and using an inter-AS routing protocol to | ||||
determine how to route packets to other ASs. Since this classic | ||||
definition was developed, it has become common for a single AS to | ||||
use several IGPs and sometimes several sets of metrics within an | ||||
AS. The use of the term Autonomous System here stresses the fact | ||||
that, even when multiple IGPs and metrics are used, the adminis- | ||||
tration of an AS appears to other ASs to have a single coherent | ||||
interior routing plan and presents a consistent picture of what | ||||
destinations are reachable through it. | ||||
BGP speaker | ||||
A router that implements BGP. | ||||
BGP Identifier | ||||
A 4-octet unsigned integer indicating the BGP Identifier of the | ||||
sender of BGP messages. A given BGP speaker sets the value of its | ||||
BGP Identifier to an IP address assigned to that BGP speaker. The | ||||
value of the BGP Identifier is determined on startup and is the | ||||
same for every local interface and every BGP peer. | ||||
Internal peer | ||||
Peer that is in the same Autonomous System as the local system. | ||||
IBGP | ||||
Internal BGP (BGP connection between internal peers). | ||||
External peer | ||||
Peer that is in a different Autonomous System than the local sys- | ||||
tem. | ||||
EBGP | ||||
External BGP (BGP connection between external peers). | ||||
NLRI | ||||
Network Layer Reachability Information. | ||||
Route | ||||
A unit of information that pairs a set of destinations with the | ||||
attributes of a path to those destinations. The set of destina- | ||||
tions are systems whose IP addresses are contained in one IP | ||||
address prefix carried in the Network Layer Reachability Informa- | ||||
tion (NLRI) field of an UPDATE message. The path is the informa- | ||||
tion reported in the path attributes field of the same UPDATE mes- | ||||
sage. | ||||
RIB | ||||
Routing Information Base. | ||||
Adj-RIB-In | ||||
The Adj-RIBs-In contain unprocessed routing information that has | ||||
been advertised to the local BGP speaker by its peers. | ||||
Loc-RIB | ||||
The Loc-RIB contains the routes that have been selected by the | ||||
local BGP speaker's Decision Process. | ||||
Adj-RIB-Out | ||||
The Adj-RIBs-Out contains the routes for advertisement to specific | ||||
peers by means of the local speaker's UPDATE messages. | ||||
IGP | ||||
Interior Gateway Protocol - a routing protocol used to exchange | ||||
routing information among routers within a single Autonomous Sys- | ||||
tem. | ||||
Feasible route | ||||
A route that is available for use. | ||||
Unfeasible route | ||||
A previously advertised feasible route that is no longer available | ||||
for use. | ||||
2. Acknowledgments | ||||
This document was originally published as RFC 1267 in October 1991, | This document was originally published as RFC 1267 in October 1991, | |||
jointly authored by Kirk Lougheed and Yakov Rekhter. | jointly authored by Kirk Lougheed and Yakov Rekhter. | |||
We would like to express our thanks to Guy Almes, Len Bosack, and | We would like to express our thanks to Guy Almes, Len Bosack, and | |||
Jeffrey C. Honig for their contributions to the earlier version of | Jeffrey C. Honig for their contributions to the earlier version | |||
this document. | (BGP-1) of this document. | |||
We would like to specially acknowledge numerous contributions by Den- | ||||
nis Ferguson to the earlier version of this document. | ||||
We like to explicitly thank Bob Braden for the review of the earlier | We like to explicitly thank Bob Braden for the review of the earlier | |||
version of this document as well as his constructive and valuable | version (BGP-2) of this document as well as his constructive and | |||
comments. | valuable comments. | |||
We would also like to thank Bob Hinden, Director for Routing of the | We would also like to thank Bob Hinden, Director for Routing of the | |||
Internet Engineering Steering Group, and the team of reviewers he | Internet Engineering Steering Group, and the team of reviewers he | |||
assembled to review the earlier version (BGP-2) of this document. | assembled to review the earlier version (BGP-2) of this document. | |||
This team, consisting of Deborah Estrin, Milo Medin, John Moy, Radia | This team, consisting of Deborah Estrin, Milo Medin, John Moy, Radia | |||
Perlman, Martha Steenstrup, Mike St. Johns, and Paul Tsuchiya, acted | Perlman, Martha Steenstrup, Mike St. Johns, and Paul Tsuchiya, acted | |||
with a strong combination of toughness, professionalism, and | with a strong combination of toughness, professionalism, and cour- | |||
courtesy. | tesy. | |||
This updated version of the document is the product of the IETF IDR | Certain sections of the document borrowed heavily from IDRP | |||
Working Group with Yakov Rekhter and Tony Li as editors. Certain | [IS10747], which is the OSI counterpart of BGP. For this credit | |||
sections of the document borrowed heavily from IDRP [7], which is the | should be given to the ANSI X3S3.3 group chaired by Lyman Chapin and | |||
OSI counterpart of BGP. For this credit should be given to the ANSI | to Charles Kunzinger who was the IDRP editor within that group. | |||
X3S3.3 group chaired by Lyman Chapin and to Charles Kunzinger who was | ||||
the IDRP editor within that group. We would also like to thank Enke | We would also like to thank Benjamin Abarbanel, Enke Chen, Edward | |||
Chen, Edward Crabbe, Mike Craren, Vincent Gillet, Eric Gray, Jeffrey | Crabbe, Mike Craren, Vincent Gillet, Eric Gray, Jeffrey Haas, Dimitry | |||
Haas, Dimitry Haskin, John Krawczyk, David LeRoy, Dan Massey, Dan | Haskin, John Krawczyk, David LeRoy, Dan Massey, Jonathan Natale, Dan | |||
Pei, Mathew Richardson, John Scudder, John Stewart III, Dave Thaler, | Pei, Mathew Richardson, John Scudder, John Stewart III, Dave Thaler, | |||
Paul Traina, Russ White, Curtis Villamizar, and Alex Zinin for their | Paul Traina, Russ White, Curtis Villamizar, and Alex Zinin for their | |||
comments. | comments. | |||
Many thanks to Sue Hares for her contributions to the document, and | We would like to specially acknowledge Andrew Lange for his help in | |||
especially for her work on the BGP Finite State Machine. | preparing the final version of this document. | |||
We would like to specially acknowledge numerous contributions by | Finally, we would like to thank all the members of the IDR Working | |||
Dennis Ferguson. | Group for their ideas and support they have given to this document. | |||
2. Introduction | 3. Summary of Operation | |||
The Border Gateway Protocol (BGP) is an inter-Autonomous System | The Border Gateway Protocol (BGP) is an inter-Autonomous System rout- | |||
routing protocol. It is built on experience gained with EGP as | ing protocol. It is built on experience gained with EGP as defined in | |||
defined in RFC 904 [1] and EGP usage in the NSFNET Backbone as | [RFC904] and EGP usage in the NSFNET Backbone as described in | |||
described in RFC 1092 [2] and RFC 1093 [3]. | [RFC1092] and [RFC1093]. | |||
The primary function of a BGP speaking system is to exchange network | The primary function of a BGP speaking system is to exchange network | |||
reachability information with other BGP systems. This network | reachability information with other BGP systems. This network reacha- | |||
reachability information includes information on the list of | bility information includes information on the list of Autonomous | |||
Autonomous Systems (ASs) that reachability information traverses. | Systems (ASs) that reachability information traverses. This informa- | |||
This information is sufficient to construct a graph of AS | tion is sufficient to construct a graph of AS connectivity from which | |||
connectivity from which routing loops may be pruned and some policy | routing loops may be pruned and some policy decisions at the AS level | |||
decisions at the AS level may be enforced. | may be enforced. | |||
BGP-4 provides a new set of mechanisms for supporting Classless | In the context of this document we assume that a BGP speaker adver- | |||
Inter-Domain Routing (CIDR) [8, 9]. These mechanisms include support | tises to its peers only those routes that it itself uses (in this | |||
for advertising an IP prefix and eliminates the concept of network | context a BGP speaker is said to "use" a BGP route if it is the most | |||
"class" within BGP. BGP-4 also introduces mechanisms which allow | preferred BGP route and is used in forwarding). All other cases are | |||
aggregation of routes, including aggregation of AS paths. | outside the scope of this document. | |||
To characterize the set of policy decisions that can be enforced | Routing information exchanged via BGP supports only the destination- | |||
using BGP, one must focus on the rule that a BGP speaker advertises | based forwarding paradigm, which assumes that a router forwards a | |||
to its peers (other BGP speakers which it communicates with) in | packet based solely on the destination address carried in the IP | |||
neighboring ASs only those routes that it itself uses. This rule | header of the packet. This, in turn, reflects the set of policy deci- | |||
reflects the "hop-by-hop" routing paradigm generally used throughout | sions that can (and can not) be enforced using BGP. Note that some | |||
the current Internet. Note that some policies cannot be supported by | policies can not be supported by the destination-based forwarding | |||
the "hop-by-hop" routing paradigm and thus require techniques such as | paradigm, and thus require techniques such as source routing (aka | |||
source routing (aka explicit routing) to enforce. For example, BGP | explicit routing) to be enforced. Such policies can not be enforced | |||
does not enable one AS to send traffic to a neighboring AS intending | using BGP either. For example, BGP does not enable one AS to send | |||
that the traffic take a different route from that taken by traffic | traffic to a neighboring AS for forwarding to some destination | |||
originating in the neighboring AS. On the other hand, BGP can support | (reachable through but) beyond that neighboring AS intending that the | |||
any policy conforming to the "hop-by-hop" routing paradigm. Since the | traffic take a different route to that taken by the traffic originat- | |||
current Internet uses only the "hop-by-hop" inter-AS routing paradigm | ing in the neighboring AS (for that same destination). On the other | |||
and since BGP can support any policy that conforms to that paradigm, | hand, BGP can support any policy conforming to the destination-based | |||
BGP is highly applicable as an inter-AS routing protocol for the | forwarding paradigm. | |||
current Internet. | ||||
A more complete discussion of what policies can and cannot be | A more complete discussion of what policies can and cannot be | |||
enforced with BGP is outside the scope of this document (but refer to | enforced with BGP is outside the scope of this document (but refer to | |||
the companion document discussing BGP usage [5]). | the companion document discussing BGP usage [RFC1772]). | |||
BGP runs over a reliable transport protocol. This eliminates the need | ||||
to implement explicit update fragmentation, retransmission, | ||||
acknowledgment, and sequencing. Any authentication scheme used by the | ||||
transport protocol (e.g., RFC2385 [10]) may be used in addition to | ||||
BGP's own authentication mechanisms. The error notification mechanism | ||||
used in BGP assumes that the transport protocol supports a "graceful" | ||||
close, i.e., that all outstanding data will be delivered before the | ||||
connection is closed. | ||||
BGP uses TCP [4] as its transport protocol. TCP meets BGP's transport | BGP-4 provides a new set of mechanisms for supporting Classless | |||
requirements and is present in virtually all commercial routers and | Inter-Domain Routing (CIDR) [RFC1518, RFC1519]. These mechanisms | |||
hosts. In the following descriptions the phrase "transport protocol | include support for advertising a set of destinations as an IP prefix | |||
connection" can be understood to refer to a TCP connection. BGP uses | and eliminating the concept of network "class" within BGP. BGP-4 | |||
TCP port 179 for establishing its connections. | also introduces mechanisms which allow aggregation of routes, includ- | |||
ing aggregation of AS paths. | ||||
This document uses the term `Autonomous System' (AS) throughout. The | This document uses the term `Autonomous System' (AS) throughout. The | |||
classic definition of an Autonomous System is a set of routers under | classic definition of an Autonomous System is a set of routers under | |||
a single technical administration, using an interior gateway protocol | a single technical administration, using an interior gateway protocol | |||
and common metrics to determine how to route packets within the AS, | (IGP) and common metrics to determine how to route packets within the | |||
and using an exterior gateway protocol to determine how to route | AS, and using an inter-AS routing protocol to determine how to route | |||
packets to other ASs. Since this classic definition was developed, it | packets to other ASs. Since this classic definition was developed, it | |||
has become common for a single AS to use several interior gateway | has become common for a single AS to use several IGPs and sometimes | |||
protocols and sometimes several sets of metrics within an AS. The use | several sets of metrics within an AS. The use of the term Autonomous | |||
of the term Autonomous System here stresses the fact that, even when | System here stresses the fact that, even when multiple IGPs and met- | |||
multiple IGPs and metrics are used, the administration of an AS | rics are used, the administration of an AS appears to other ASs to | |||
appears to other ASs to have a single coherent interior routing plan | have a single coherent interior routing plan and presents a consis- | |||
and presents a consistent picture of what destinations are reachable | tent picture of what destinations are reachable through it. | |||
through it. | ||||
The planned use of BGP in the Internet environment, including such | The planned use of BGP in the Internet environment, including such | |||
issues as topology, the interaction between BGP and IGPs, and the | issues as topology, the interaction between BGP and IGPs, and the | |||
enforcement of routing policy rules is presented in a companion | enforcement of routing policy rules is presented in a companion docu- | |||
document [5]. This document is the first of a series of documents | ment [RFC1772]. This document is the first of a series of documents | |||
planned to explore various aspects of BGP application. | planned to explore various aspects of BGP application. | |||
3. Summary of Operation | BGP uses TCP [RFC793] as its transport protocol. This eliminates the | |||
need to implement explicit update fragmentation, retransmission, | ||||
acknowledgment, and sequencing. BGP listens on TCP port 179. Any | ||||
authentication scheme used by TCP (e.g., RFC2385 [RFC2385]) may be | ||||
used. The error notification mechanism used in BGP assumes that TCP | ||||
supports a "graceful" close, i.e., that all outstanding data will be | ||||
delivered before the connection is closed. | ||||
Two systems form a transport protocol connection between one another. | Two systems form a TCP connection between one another. They exchange | |||
They exchange messages to open and confirm the connection parameters. | messages to open and confirm the connection parameters. | |||
The initial data flow is the portion of the BGP routing table that is | The initial data flow is the portion of the BGP routing table that is | |||
allowed by the export policy, called the Adj-Ribs-Out (see 3.2). | allowed by the export policy, called the Adj-Ribs-Out (see 3.2). | |||
Incremental updates are sent as the routing tables change. BGP does | Incremental updates are sent as the routing tables change. BGP does | |||
not require periodic refresh of the routing table. Therefore, a BGP | not require periodic refresh of the routing table. To allow local | |||
speaker must retain the current version of the routes advertised by | policy changes to have the correct effect without resetting any BGP | |||
all of its peers for the duration of the connection. If the | connections, a BGP speaker SHOULD either (a) retain the current ver- | |||
implementation decides to not store the routes that have been | sion of the routes advertised to it by all of its peers for the dura- | |||
received from a peer, but have been filtered out according to | tion of the connection, or (b) make use of the Route Refresh | |||
configured local policy, the BGP Route Refresh extension [12] may be | extension [RFC2918]. | |||
used to request the full set of routes from a peer without resetting | ||||
the BGP session when the local policy configuration changes. | ||||
KEEPALIVE messages may be sent periodically to ensure the liveness of | KEEPALIVE messages may be sent periodically to ensure the liveness of | |||
the connection. NOTIFICATION messages are sent in response to errors | the connection. NOTIFICATION messages are sent in response to errors | |||
or special conditions. If a connection encounters an error condition, | or special conditions. If a connection encounters an error condition, | |||
a NOTIFICATION message is sent and the connection is closed. | a NOTIFICATION message is sent and the connection is closed. | |||
The hosts executing the Border Gateway Protocol need not be routers. | The hosts executing BGP need not be routers. A non-routing host | |||
A non-routing host could exchange routing information with routers | could exchange routing information with routers via EGP [RFC904] or | |||
via EGP or even an interior routing protocol. That non-routing host | even an interior routing protocol. That non-routing host could then | |||
could then use BGP to exchange routing information with a border | use BGP to exchange routing information with a border router in | |||
router in another Autonomous System. The implications and | another Autonomous System. The implications and applications of this | |||
applications of this architecture are for further study. | architecture are for further study. | |||
Connections between BGP speakers of different ASs are referred to as | A peer in a different AS is referred to as an external peer, while a | |||
"external" links. BGP connections between BGP speakers within the | peer in the same AS may be described as an internal peer. Internal | |||
same AS are referred to as "internal" links. Similarly, a peer in a | BGP and external BGP are commonly abbreviated IBGP and EBGP. | |||
different AS is referred to as an external peer, while a peer in the | ||||
same AS may be described as an internal peer. Internal BGP and | ||||
external BGP are commonly abbreviated IBGP and EBGP. | ||||
If a particular AS has multiple BGP speakers and is providing transit | If a particular AS has multiple BGP speakers and is providing transit | |||
service for other ASs, then care must be taken to ensure a consistent | service for other ASs, then care must be taken to ensure a consistent | |||
view of routing within the AS. A consistent view of the interior | view of routing within the AS. A consistent view of the interior | |||
routes of the AS is provided by the interior routing protocol. A | routes of the AS is provided by the IGP used within the AS. For the | |||
consistent view of the routes exterior to the AS can be provided by | purpose of this document, it is assumed that a consistent view of the | |||
having all BGP speakers within the AS maintain direct IBGP | routes exterior to the AS is provided by having all BGP speakers | |||
connections with each other. Alternately the interior routing | within the AS maintain IBGP with each other. Care must be taken to | |||
protocol can pass BGP information among routers within an AS, taking | ensure that the interior routers have all been updated with transit | |||
care not to lose BGP attributes that will be needed by EBGP speakers | information before the BGP speakers announce to other ASs that tran- | |||
if transit connectivity is being provided. For the purpose of | sit service is being provided. | |||
discussion, it is assumed that BGP information is passed within an AS | ||||
using IBGP. Care must be taken to ensure that the interior routers | ||||
have all been updated with transit information before the EBGP | ||||
speakers announce to other ASs that transit service is being | ||||
provided. | ||||
3.1 Routes: Advertisement and Storage | 3.1 Routes: Advertisement and Storage | |||
For the purpose of this protocol, a route is defined as a unit of | For the purpose of this protocol, a route is defined as a unit of | |||
information that pairs a set of destinations with the attributes of a | information that pairs a set of destinations with the attributes of a | |||
path to those destinations. The set of destinations are the systems | path to those destinations. The set of destinations are systems whose | |||
whose IP addresses are reported in the Network Layer Reachability | IP addresses are contained in one IP address prefix carried in the | |||
Information (NLRI) field and the path is the information reported in | Network Layer Reachability Information (NLRI) field of an UPDATE mes- | |||
the path attributes field of the same UPDATE message. | sage, and the path is the information reported in the path attributes | |||
field of the same UPDATE message. | ||||
Routes are advertised between BGP speakers in UPDATE messages. | Routes are advertised between BGP speakers in UPDATE messages. Mul- | |||
tiple routes that have the same path attributes can be advertised in | ||||
a single UPDATE message by including multiple prefixes in the NLRI | ||||
field of the UPDATE message. | ||||
Routes are stored in the Routing Information Bases (RIBs): namely, | Routes are stored in the Routing Information Bases (RIBs): namely, | |||
the Adj-RIBs-In, the Loc-RIB, and the Adj-RIBs-Out. Routes that will | the Adj-RIBs-In, the Loc-RIB, and the Adj-RIBs-Out, as described in | |||
be advertised to other BGP speakers must be present in the Adj-RIB- | Section 3.2. | |||
Out. Routes that will be used by the local BGP speaker must be | ||||
present in the Loc-RIB, and the next hop for each of these routes | ||||
must be resolvable via the local BGP speaker's Routing Table. Routes | ||||
that are received from other BGP speakers are present in the Adj- | ||||
RIBs-In. | ||||
If a BGP speaker chooses to advertise the route, it may add to or | If a BGP speaker chooses to advertise the route, it may add to or | |||
modify the path attributes of the route before advertising it to a | modify the path attributes of the route before advertising it to a | |||
peer. | peer. | |||
BGP provides mechanisms by which a BGP speaker can inform its peer | BGP provides mechanisms by which a BGP speaker can inform its peer | |||
that a previously advertised route is no longer available for use. | that a previously advertised route is no longer available for use. | |||
There are three methods by which a given BGP speaker can indicate | There are three methods by which a given BGP speaker can indicate | |||
that a route has been withdrawn from service: | that a route has been withdrawn from service: | |||
skipping to change at page 6, line 4 | skipping to change at page 10, line 18 | |||
BGP provides mechanisms by which a BGP speaker can inform its peer | BGP provides mechanisms by which a BGP speaker can inform its peer | |||
that a previously advertised route is no longer available for use. | that a previously advertised route is no longer available for use. | |||
There are three methods by which a given BGP speaker can indicate | There are three methods by which a given BGP speaker can indicate | |||
that a route has been withdrawn from service: | that a route has been withdrawn from service: | |||
a) the IP prefix that expresses the destination for a previously | a) the IP prefix that expresses the destination for a previously | |||
advertised route can be advertised in the WITHDRAWN ROUTES field | advertised route can be advertised in the WITHDRAWN ROUTES field | |||
in the UPDATE message, thus marking the associated route as being | in the UPDATE message, thus marking the associated route as being | |||
no longer available for use | no longer available for use | |||
b) a replacement route with the same NLRI can be advertised, or | b) a replacement route with the same NLRI can be advertised, or | |||
c) the BGP speaker - BGP speaker connection can be closed, which | c) the BGP speaker - BGP speaker connection can be closed, which | |||
implicitly removes from service all routes which the pair of | implicitly removes from service all routes which the pair of | |||
speakers had advertised to each other. | speakers had advertised to each other. | |||
Changing attribute of a route is accomplished by advertising a | ||||
replacement route. The replacement route carries new (changed) | ||||
attributes and has the same NLRI as the original route. | ||||
3.2 Routing Information Bases | 3.2 Routing Information Bases | |||
The Routing Information Base (RIB) within a BGP speaker consists of | The Routing Information Base (RIB) within a BGP speaker consists of | |||
three distinct parts: | three distinct parts: | |||
a) Adj-RIBs-In: The Adj-RIBs-In store routing information that has | a) Adj-RIBs-In: The Adj-RIBs-In store routing information that has | |||
been learned from inbound UPDATE messages. Their contents | been learned from inbound UPDATE messages received from other BGP | |||
represent routes that are available as an input to the Decision | speakers. Their contents represent routes that are available as an | |||
Process. | input to the Decision Process. | |||
b) Loc-RIB: The Loc-RIB contains the local routing information | b) Loc-RIB: The Loc-RIB contains the local routing information | |||
that the BGP speaker has selected by applying its local policies | that the BGP speaker has selected by applying its local policies | |||
to the routing information contained in its Adj-RIBs-In. | to the routing information contained in its Adj-RIBs-In. These are | |||
the routes that will be used by the local BGP speaker. The next | ||||
hop for each of these routes must be resolvable via the local BGP | ||||
speaker's Routing Table. | ||||
c) Adj-RIBs-Out: The Adj-RIBs-Out store the information that the | c) Adj-RIBs-Out: The Adj-RIBs-Out store the information that the | |||
local BGP speaker has selected for advertisement to its peers. The | local BGP speaker has selected for advertisement to its peers. The | |||
routing information stored in the Adj-RIBs-Out will be carried in | routing information stored in the Adj-RIBs-Out will be carried in | |||
the local BGP speaker's UPDATE messages and advertised to its | the local BGP speaker's UPDATE messages and advertised to its | |||
peers. | peers. | |||
In summary, the Adj-RIBs-In contain unprocessed routing information | In summary, the Adj-RIBs-In contain unprocessed routing information | |||
that has been advertised to the local BGP speaker by its peers; the | that has been advertised to the local BGP speaker by its peers; the | |||
Loc-RIB contains the routes that have been selected by the local BGP | Loc-RIB contains the routes that have been selected by the local BGP | |||
skipping to change at page 7, line 7 | skipping to change at page 11, line 29 | |||
Routing information that the router uses to forward packets (or to | Routing information that the router uses to forward packets (or to | |||
construct the forwarding table that is used for packet forwarding) is | construct the forwarding table that is used for packet forwarding) is | |||
maintained in the Routing Table. The Routing Table accumulates routes | maintained in the Routing Table. The Routing Table accumulates routes | |||
to directly connected networks, static routes, routes learned from | to directly connected networks, static routes, routes learned from | |||
the IGP protocols, and routes learned from BGP. Whether or not a | the IGP protocols, and routes learned from BGP. Whether or not a | |||
specific BGP route should be installed in the Routing Table, and | specific BGP route should be installed in the Routing Table, and | |||
whether a BGP route should override a route to the same destination | whether a BGP route should override a route to the same destination | |||
installed by another source is a local policy decision, not specified | installed by another source is a local policy decision, not specified | |||
in this document. Besides actual packet forwarding, the Routing Table | in this document. Besides actual packet forwarding, the Routing Table | |||
is used for resolution of the next-hop addresses specified in BGP | is used for resolution of the next-hop addresses specified in BGP | |||
updates (see Section 9.1.2). | updates (see Section 5.1.3). | |||
4. Message Formats | 4. Message Formats | |||
This section describes message formats used by BGP. | This section describes message formats used by BGP. | |||
Messages are sent over a reliable transport protocol connection. A | BGP messages are sent over a TCP connection. A message is processed | |||
message is processed only after it is entirely received. The maximum | only after it is entirely received. The maximum message size is 4096 | |||
message size is 4096 octets. All implementations are required to | octets. All implementations are required to support this maximum mes- | |||
support this maximum message size. The smallest message that may be | sage size. The smallest message that may be sent consists of a BGP | |||
sent consists of a BGP header without a data portion, or 19 octets. | header without a data portion, or 19 octets. | |||
4.1 Message Header Format | 4.1 Message Header Format | |||
Each message has a fixed-size header. There may or may not be a data | Each message has a fixed-size header. There may or may not be a data | |||
portion following the header, depending on the message type. The | portion following the header, depending on the message type. The lay- | |||
layout of these fields is shown below: | out of these fields is shown below: | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | | | | | |||
+ + | + + | |||
| | | | | | |||
+ + | + + | |||
| Marker | | | Marker | | |||
+ + | + + | |||
| | | | | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Length | Type | | | Length | Type | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Marker: | Marker: | |||
This 16-octet field contains a value that the receiver of the | This 16-octet field is included for compatibility; it MUST be | |||
message can predict. If the Type of the message is OPEN, or if | set to all ones. | |||
the OPEN message carries no Authentication Information (as an | ||||
Optional Parameter), then the Marker must be all ones. | ||||
Otherwise, the value of the marker can be predicted by some a | ||||
computation specified as part of the authentication mechanism | ||||
(which is specified as part of the Authentication Information) | ||||
used. The Marker can be used to detect loss of synchronization | ||||
between a pair of BGP peers, and to authenticate incoming BGP | ||||
messages. | ||||
Length: | Length: | |||
This 2-octet unsigned integer indicates the total length of the | This 2-octet unsigned integer indicates the total length of the | |||
message, including the header, in octets. Thus, e.g., it allows | message, including the header, in octets. Thus, e.g., it allows | |||
one to locate in the transport-level stream the (Marker field | one to locate in the TCP stream the (Marker field of the) next | |||
of the) next message. The value of the Length field must always | message. The value of the Length field must always be at least | |||
be at least 19 and no greater than 4096, and may be further | 19 and no greater than 4096, and may be further constrained, | |||
constrained, depending on the message type. No "padding" of | depending on the message type. No "padding" of extra data after | |||
extra data after the message is allowed, so the Length field | the message is allowed, so the Length field must have the | |||
must have the smallest value required given the rest of the | smallest value required given the rest of the message. | |||
message. | ||||
Type: | Type: | |||
This 1-octet unsigned integer indicates the type code of the | This 1-octet unsigned integer indicates the type code of the | |||
message. The following type codes are defined: | message. This document defines the following type codes: | |||
1 - OPEN | 1 - OPEN | |||
2 - UPDATE | 2 - UPDATE | |||
3 - NOTIFICATION | 3 - NOTIFICATION | |||
4 - KEEPALIVE | 4 - KEEPALIVE | |||
[RFC2918] defines one more type code. | ||||
4.2 OPEN Message Format | 4.2 OPEN Message Format | |||
After a transport protocol connection is established, the first | After a TCP is established, the first message sent by each side is an | |||
message sent by each side is an OPEN message. If the OPEN message is | OPEN message. If the OPEN message is acceptable, a KEEPALIVE message | |||
acceptable, a KEEPALIVE message confirming the OPEN is sent back. | confirming the OPEN is sent back. Once the OPEN is confirmed, UPDATE, | |||
Once the OPEN is confirmed, UPDATE, KEEPALIVE, and NOTIFICATION | KEEPALIVE, and NOTIFICATION messages may be exchanged. | |||
messages may be exchanged. | ||||
In addition to the fixed-size BGP header, the OPEN message contains | In addition to the fixed-size BGP header, the OPEN message contains | |||
the following fields: | the following fields: | |||
0 1 2 3 | 0 1 2 3 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
| Version | | | Version | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| My Autonomous System | | | My Autonomous System | | |||
skipping to change at page 9, line 36 | skipping to change at page 14, line 8 | |||
Hold Time and the Hold Time received in the OPEN message. The | Hold Time and the Hold Time received in the OPEN message. The | |||
Hold Time MUST be either zero or at least three seconds. An | Hold Time MUST be either zero or at least three seconds. An | |||
implementation may reject connections on the basis of the Hold | implementation may reject connections on the basis of the Hold | |||
Time. The calculated value indicates the maximum number of | Time. The calculated value indicates the maximum number of | |||
seconds that may elapse between the receipt of successive | seconds that may elapse between the receipt of successive | |||
KEEPALIVE, and/or UPDATE messages by the sender. | KEEPALIVE, and/or UPDATE messages by the sender. | |||
BGP Identifier: | BGP Identifier: | |||
This 4-octet unsigned integer indicates the BGP Identifier of | This 4-octet unsigned integer indicates the BGP Identifier of | |||
the sender. A given BGP speaker sets the value of its BGP | the sender. A given BGP speaker sets the value of its BGP Iden- | |||
Identifier to an IP address assigned to that BGP speaker. The | tifier to an IP address assigned to that BGP speaker. The | |||
value of the BGP Identifier is determined on startup and is the | value of the BGP Identifier is determined on startup and is the | |||
same for every local interface and every BGP peer. | same for every local interface and every BGP peer. | |||
Optional Parameters Length: | Optional Parameters Length: | |||
This 1-octet unsigned integer indicates the total length of the | This 1-octet unsigned integer indicates the total length of the | |||
Optional Parameters field in octets. If the value of this field | Optional Parameters field in octets. If the value of this field | |||
is zero, no Optional Parameters are present. | is zero, no Optional Parameters are present. | |||
Optional Parameters: | Optional Parameters: | |||
skipping to change at page 10, line 12 | skipping to change at page 14, line 31 | |||
This field may contain a list of optional parameters, where | This field may contain a list of optional parameters, where | |||
each parameter is encoded as a <Parameter Type, Parameter | each parameter is encoded as a <Parameter Type, Parameter | |||
Length, Parameter Value> triplet. | Length, Parameter Value> triplet. | |||
0 1 | 0 1 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... | |||
| Parm. Type | Parm. Length | Parameter Value (variable) | | Parm. Type | Parm. Length | Parameter Value (variable) | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-... | |||
Parameter Type is a one octet field that unambiguously | Parameter Type is a one octet field that unambiguously identi- | |||
identifies individual parameters. Parameter Length is a one | fies individual parameters. Parameter Length is a one octet | |||
octet field that contains the length of the Parameter Value | field that contains the length of the Parameter Value field in | |||
field in octets. Parameter Value is a variable length field | octets. Parameter Value is a variable length field that is | |||
that is interpreted according to the value of the Parameter | interpreted according to the value of the Parameter Type field. | |||
Type field. | ||||
This document defines the following Optional Parameters: | ||||
a) Authentication Information (Parameter Type 1): | ||||
This optional parameter may be used to authenticate a BGP | ||||
peer. The Parameter Value field contains a 1-octet | ||||
Authentication Code followed by a variable length | ||||
Authentication Data. | ||||
0 1 2 3 4 5 6 7 8 | ||||
+-+-+-+-+-+-+-+-+ | ||||
| Auth. Code | | ||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | | ||||
| Authentication Data | | ||||
| | | ||||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
Authentication Code: | ||||
This 1-octet unsigned integer indicates the | ||||
authentication mechanism being used. Whenever an | ||||
authentication mechanism is specified for use within | ||||
BGP, three things must be included in the | ||||
specification: | ||||
- the value of the Authentication Code which indicates | ||||
use of the mechanism, | ||||
- the form and meaning of the Authentication Data, and | ||||
- the algorithm for computing values of Marker fields. | ||||
Note that a separate authentication mechanism may be | ||||
used in establishing the transport level connection. | ||||
Authentication Data: | ||||
Authentication Data is a variable length field that is | [RFC2842] defines the Capabilities Optional Parameter. | |||
interpreted according to the value of the | ||||
Authentication Code field. | ||||
The minimum length of the OPEN message is 29 octets (including | The minimum length of the OPEN message is 29 octets (including mes- | |||
message header). | sage header). | |||
4.3 UPDATE Message Format | 4.3 UPDATE Message Format | |||
UPDATE messages are used to transfer routing information between BGP | UPDATE messages are used to transfer routing information between BGP | |||
peers. The information in the UPDATE packet can be used to construct | peers. The information in the UPDATE message can be used to construct | |||
a graph describing the relationships of the various Autonomous | a graph describing the relationships of the various Autonomous Sys- | |||
Systems. By applying rules to be discussed, routing information loops | tems. By applying rules to be discussed, routing information loops | |||
and some other anomalies may be detected and removed from inter-AS | and some other anomalies may be detected and removed from inter-AS | |||
routing. | routing. | |||
An UPDATE message is used to advertise feasible routes sharing common | An UPDATE message is used to advertise feasible routes sharing common | |||
path attribute to a peer, or to withdraw multiple unfeasible routes | path attribute to a peer, or to withdraw multiple unfeasible routes | |||
from service (see 3.1). An UPDATE message may simultaneously | from service (see 3.1). An UPDATE message may simultaneously adver- | |||
advertise a feasible route and withdraw multiple unfeasible routes | tise a feasible route and withdraw multiple unfeasible routes from | |||
from service. The UPDATE message always includes the fixed-size BGP | service. The UPDATE message always includes the fixed-size BGP | |||
header, and also includes the other fields as shown below (note, some | header, and also includes the other fields as shown below (note, some | |||
of the shown fields may not be present in every UPDATE message): | of the shown fields may not be present in every UPDATE message): | |||
+-----------------------------------------------------+ | +-----------------------------------------------------+ | |||
| Withdrawn Routes Length (2 octets) | | | Withdrawn Routes Length (2 octets) | | |||
+-----------------------------------------------------+ | +-----------------------------------------------------+ | |||
| Withdrawn Routes (variable) | | | Withdrawn Routes (variable) | | |||
+-----------------------------------------------------+ | +-----------------------------------------------------+ | |||
| Total Path Attribute Length (2 octets) | | | Total Path Attribute Length (2 octets) | | |||
+-----------------------------------------------------+ | +-----------------------------------------------------+ | |||
skipping to change at page 13, line 9 | skipping to change at page 16, line 32 | |||
Path Attributes field in octets. Its value must allow the | Path Attributes field in octets. Its value must allow the | |||
length of the Network Layer Reachability field to be determined | length of the Network Layer Reachability field to be determined | |||
as specified below. | as specified below. | |||
A value of 0 indicates that no Network Layer Reachability | A value of 0 indicates that no Network Layer Reachability | |||
Information field is present in this UPDATE message. | Information field is present in this UPDATE message. | |||
Path Attributes: | Path Attributes: | |||
A variable length sequence of path attributes is present in | A variable length sequence of path attributes is present in | |||
every UPDATE. Each path attribute is a triple <attribute type, | every UPDATE message, except for an UPDATE message that carries | |||
attribute length, attribute value> of variable length. | only the withdrawn routes. Each path attribute is a triple | |||
<attribute type, attribute length, attribute value> of variable | ||||
length. | ||||
Attribute Type is a two-octet field that consists of the | Attribute Type is a two-octet field that consists of the | |||
Attribute Flags octet followed by the Attribute Type Code | Attribute Flags octet followed by the Attribute Type Code | |||
octet. | octet. | |||
0 1 | 0 1 | |||
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Attr. Flags |Attr. Type Code| | | Attr. Flags |Attr. Type Code| | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
skipping to change at page 13, line 33 | skipping to change at page 17, line 9 | |||
Optional bit. It defines whether the attribute is optional (if | Optional bit. It defines whether the attribute is optional (if | |||
set to 1) or well-known (if set to 0). | set to 1) or well-known (if set to 0). | |||
The second high-order bit (bit 1) of the Attribute Flags octet | The second high-order bit (bit 1) of the Attribute Flags octet | |||
is the Transitive bit. It defines whether an optional attribute | is the Transitive bit. It defines whether an optional attribute | |||
is transitive (if set to 1) or non-transitive (if set to 0). | is transitive (if set to 1) or non-transitive (if set to 0). | |||
For well-known attributes, the Transitive bit must be set to 1. | For well-known attributes, the Transitive bit must be set to 1. | |||
(See Section 5 for a discussion of transitive attributes.) | (See Section 5 for a discussion of transitive attributes.) | |||
The third high-order bit (bit 2) of the Attribute Flags octet | The third high-order bit (bit 2) of the Attribute Flags octet | |||
is the Partial bit. It defines whether the information | is the Partial bit. It defines whether the information con- | |||
contained in the optional transitive attribute is partial (if | tained in the optional transitive attribute is partial (if set | |||
set to 1) or complete (if set to 0). For well-known attributes | to 1) or complete (if set to 0). For well-known attributes and | |||
and for optional non-transitive attributes the Partial bit must | for optional non-transitive attributes the Partial bit must be | |||
be set to 0. | set to 0. | |||
The fourth high-order bit (bit 3) of the Attribute Flags octet | The fourth high-order bit (bit 3) of the Attribute Flags octet | |||
is the Extended Length bit. It defines whether the Attribute | is the Extended Length bit. It defines whether the Attribute | |||
Length is one octet (if set to 0) or two octets (if set to 1). | Length is one octet (if set to 0) or two octets (if set to 1). | |||
The lower-order four bits of the Attribute Flags octet are | The lower-order four bits of the Attribute Flags octet are | |||
unused. They must be zero when sent and must be ignored when | unused. They must be zero when sent and must be ignored when | |||
received. | received. | |||
The Attribute Type Code octet contains the Attribute Type Code. | The Attribute Type Code octet contains the Attribute Type Code. | |||
skipping to change at page 14, line 33 | skipping to change at page 18, line 4 | |||
ORIGIN is a well-known mandatory attribute that defines the | ORIGIN is a well-known mandatory attribute that defines the | |||
origin of the path information. The data octet can assume | origin of the path information. The data octet can assume | |||
the following values: | the following values: | |||
Value Meaning | Value Meaning | |||
0 IGP - Network Layer Reachability Information | 0 IGP - Network Layer Reachability Information | |||
is interior to the originating AS | is interior to the originating AS | |||
1 EGP - Network Layer Reachability Information | 1 EGP - Network Layer Reachability Information | |||
learned via the EGP protocol | learned via the EGP protocol [RFC904] | |||
2 INCOMPLETE - Network Layer Reachability | 2 INCOMPLETE - Network Layer Reachability | |||
Information learned by some other means | Information learned by some other means | |||
Its usage is defined in 5.1.1 | Usage of this attribute is defined in 5.1.1. | |||
b) AS_PATH (Type Code 2): | b) AS_PATH (Type Code 2): | |||
AS_PATH is a well-known mandatory attribute that is composed | AS_PATH is a well-known mandatory attribute that is composed | |||
of a sequence of AS path segments. Each AS path segment is | of a sequence of AS path segments. Each AS path segment is | |||
represented by a triple <path segment type, path segment | represented by a triple <path segment type, path segment | |||
length, path segment value>. | length, path segment value>. | |||
The path segment type is a 1-octet long field with the | The path segment type is a 1-octet long field with the fol- | |||
following values defined: | lowing values defined: | |||
Value Segment Type | Value Segment Type | |||
1 AS_SET: unordered set of ASs a route in the | 1 AS_SET: unordered set of ASs a route in the | |||
UPDATE message has traversed | UPDATE message has traversed | |||
2 AS_SEQUENCE: ordered set of ASs a route in | 2 AS_SEQUENCE: ordered set of ASs a route in | |||
the UPDATE message has traversed | the UPDATE message has traversed | |||
The path segment length is a 1-octet long field containing | The path segment length is a 1-octet long field containing | |||
the number of ASs in the path segment value field. | the number of ASs (not the number of octets) in the path | |||
segment value field. | ||||
The path segment value field contains one or more AS | The path segment value field contains one or more AS num- | |||
numbers, each encoded as a 2-octets long field. | bers, each encoded as a 2-octets long field. | |||
Usage of this attribute is defined in 5.1.2. | Usage of this attribute is defined in 5.1.2. | |||
c) NEXT_HOP (Type Code 3): | c) NEXT_HOP (Type Code 3): | |||
This is a well-known mandatory attribute that defines the IP | This is a well-known mandatory attribute that defines the IP | |||
address of the border router that should be used as the next | address of the border router that should be used as the next | |||
hop to the destinations listed in the Network Layer | hop to the destinations listed in the Network Layer Reacha- | |||
Reachability Information field of the UPDATE message. | bility Information field of the UPDATE message. | |||
Usage of this attribute is defined in 5.1.3. | Usage of this attribute is defined in 5.1.3. | |||
d) MULTI_EXIT_DISC (Type Code 4): | d) MULTI_EXIT_DISC (Type Code 4): | |||
This is an optional non-transitive attribute that is a four | This is an optional non-transitive attribute that is a four | |||
octet non-negative integer. The value of this attribute may | octet non-negative integer. The value of this attribute may | |||
be used by a BGP speaker's decision process to discriminate | be used by a BGP speaker's decision process to discriminate | |||
among multiple entry points to a neighboring autonomous | among multiple entry points to a neighboring autonomous sys- | |||
system. | tem. | |||
Its usage is defined in 5.1.4. | Usage of this attribute is defined in 5.1.4. | |||
e) LOCAL_PREF (Type Code 5): | e) LOCAL_PREF (Type Code 5): | |||
LOCAL_PREF is a well-known attribute that is a four octet | LOCAL_PREF is a well-known attribute that is a four octet | |||
non-negative integer. A BGP speaker uses it to inform other | non-negative integer. A BGP speaker uses it to inform other | |||
internal peers of the advertising speaker's degree of | internal peers of the advertising speaker's degree of pref- | |||
preference for an advertised route. Usage of this attribute | erence for an advertised route. | |||
is described in 5.1.5. | ||||
Usage of this attribute is defined in 5.1.5. | ||||
f) ATOMIC_AGGREGATE (Type Code 6) | f) ATOMIC_AGGREGATE (Type Code 6) | |||
ATOMIC_AGGREGATE is a well-known discretionary attribute of | ATOMIC_AGGREGATE is a well-known discretionary attribute of | |||
length 0. Usage of this attribute is described in 5.1.6. | length 0. | |||
Usage of this attribute is defined in 5.1.6. | ||||
g) AGGREGATOR (Type Code 7) | g) AGGREGATOR (Type Code 7) | |||
AGGREGATOR is an optional transitive attribute of length 6. | AGGREGATOR is an optional transitive attribute of length 6. | |||
The attribute contains the last AS number that formed the | The attribute contains the last AS number that formed the | |||
aggregate route (encoded as 2 octets), followed by the IP | aggregate route (encoded as 2 octets), followed by the IP | |||
address of the BGP speaker that formed the aggregate route | address of the BGP speaker that formed the aggregate route | |||
(encoded as 4 octets). This should be the same address as | (encoded as 4 octets). This should be the same address as | |||
the one used for the BGP Identifier of the speaker. Usage | the one used for the BGP Identifier of the speaker. | |||
of this attribute is described in 5.1.7. | ||||
Usage of this attribute is defined in 5.1.7. | ||||
Network Layer Reachability Information: | Network Layer Reachability Information: | |||
This variable length field contains a list of IP address | This variable length field contains a list of IP address pre- | |||
prefixes. The length in octets of the Network Layer | fixes. The length in octets of the Network Layer Reachability | |||
Reachability Information is not encoded explicitly, but can be | Information is not encoded explicitly, but can be calculated | |||
calculated as: | as: | |||
UPDATE message Length - 23 - Total Path Attributes Length - | UPDATE message Length - 23 - Total Path Attributes Length - | |||
Withdrawn Routes Length | Withdrawn Routes Length | |||
where UPDATE message Length is the value encoded in the fixed- | where UPDATE message Length is the value encoded in the fixed- | |||
size BGP header, Total Path Attribute Length and Withdrawn | size BGP header, Total Path Attribute Length and Withdrawn | |||
Routes Length are the values encoded in the variable part of | Routes Length are the values encoded in the variable part of | |||
the UPDATE message, and 23 is a combined length of the fixed- | the UPDATE message, and 23 is a combined length of the fixed- | |||
size BGP header, the Total Path Attribute Length field and the | size BGP header, the Total Path Attribute Length field and the | |||
Withdrawn Routes Length field. | Withdrawn Routes Length field. | |||
skipping to change at page 16, line 49 | skipping to change at page 20, line 26 | |||
a) Length: | a) Length: | |||
The Length field indicates the length in bits of the IP | The Length field indicates the length in bits of the IP | |||
address prefix. A length of zero indicates a prefix that | address prefix. A length of zero indicates a prefix that | |||
matches all IP addresses (with prefix, itself, of zero | matches all IP addresses (with prefix, itself, of zero | |||
octets). | octets). | |||
b) Prefix: | b) Prefix: | |||
The Prefix field contains IP address prefixes followed by | The Prefix field contains an IP address prefix followed by | |||
enough trailing bits to make the end of the field fall on an | enough trailing bits to make the end of the field fall on an | |||
octet boundary. Note that the value of the trailing bits is | octet boundary. Note that the value of the trailing bits is | |||
irrelevant. | irrelevant. | |||
The minimum length of the UPDATE message is 23 octets -- 19 octets | The minimum length of the UPDATE message is 23 octets -- 19 octets | |||
for the fixed header + 2 octets for the Withdrawn Routes Length + 2 | for the fixed header + 2 octets for the Withdrawn Routes Length + 2 | |||
octets for the Total Path Attribute Length (the value of Withdrawn | octets for the Total Path Attribute Length (the value of Withdrawn | |||
Routes Length is 0 and the value of Total Path Attribute Length is | Routes Length is 0 and the value of Total Path Attribute Length is | |||
0). | 0). | |||
An UPDATE message can advertise at most one set of path attributes, | An UPDATE message can advertise at most one set of path attributes, | |||
but multiple destinations, provided that the destinations share these | but multiple destinations, provided that the destinations share these | |||
attributes. All path attributes contained in a given UPDATE message | attributes. All path attributes contained in a given UPDATE message | |||
apply to all destinations carried in the NLRI field of the UPDATE | apply to all destinations carried in the NLRI field of the UPDATE | |||
message. | message. | |||
An UPDATE message can list multiple routes to be withdrawn from | An UPDATE message can list multiple routes to be withdrawn from ser- | |||
service. Each such route is identified by its destination (expressed | vice. Each such route is identified by its destination (expressed as | |||
as an IP prefix), which unambiguously identifies the route in the | an IP prefix), which unambiguously identifies the route in the con- | |||
context of the BGP speaker - BGP speaker connection to which it has | text of the BGP speaker - BGP speaker connection to which it has been | |||
been previously advertised. | previously advertised. | |||
An UPDATE message might advertise only routes to be withdrawn from | An UPDATE message might advertise only routes to be withdrawn from | |||
service, in which case it will not include path attributes or Network | service, in which case it will not include path attributes or Network | |||
Layer Reachability Information. Conversely, it may advertise only a | Layer Reachability Information. Conversely, it may advertise only a | |||
feasible route, in which case the WITHDRAWN ROUTES field need not be | feasible route, in which case the WITHDRAWN ROUTES field need not be | |||
present. | present. | |||
An UPDATE message should not include the same address prefix in the | An UPDATE message should not include the same address prefix in the | |||
WITHDRAWN ROUTES and Network Layer Reachability Information fields, | WITHDRAWN ROUTES and Network Layer Reachability Information fields, | |||
however a BGP speaker MUST be able to process UPDATE messages in this | however a BGP speaker MUST be able to process UPDATE messages in this | |||
form. A BGP speaker should treat an UPDATE message of this form as if | form. A BGP speaker should treat an UPDATE message of this form as if | |||
the WITHDRAWN ROUTES doesn't contain the address prefix. | the WITHDRAWN ROUTES doesn't contain the address prefix. | |||
4.4 KEEPALIVE Message Format | 4.4 KEEPALIVE Message Format | |||
BGP does not use any transport protocol-based keep-alive mechanism to | BGP does not use any TCP-based keep-alive mechanism to determine if | |||
determine if peers are reachable. Instead, KEEPALIVE messages are | peers are reachable. Instead, KEEPALIVE messages are exchanged | |||
exchanged between peers often enough as not to cause the Hold Timer | between peers often enough as not to cause the Hold Timer to expire. | |||
to expire. A reasonable maximum time between KEEPALIVE messages would | A reasonable maximum time between KEEPALIVE messages would be one | |||
be one third of the Hold Time interval. KEEPALIVE messages MUST NOT | third of the Hold Time interval. KEEPALIVE messages MUST NOT be sent | |||
be sent more frequently than one per second. An implementation MAY | more frequently than one per second. An implementation MAY adjust the | |||
adjust the rate at which it sends KEEPALIVE messages as a function of | rate at which it sends KEEPALIVE messages as a function of the Hold | |||
the Hold Time interval. | Time interval. | |||
If the negotiated Hold Time interval is zero, then periodic KEEPALIVE | If the negotiated Hold Time interval is zero, then periodic KEEPALIVE | |||
messages MUST NOT be sent. | messages MUST NOT be sent. | |||
KEEPALIVE message consists of only message header and has a length of | KEEPALIVE message consists of only message header and has a length of | |||
19 octets. | 19 octets. | |||
4.5 NOTIFICATION Message Format | 4.5 NOTIFICATION Message Format | |||
A NOTIFICATION message is sent when an error condition is detected. | A NOTIFICATION message is sent when an error condition is detected. | |||
skipping to change at page 18, line 44 | skipping to change at page 22, line 22 | |||
3 UPDATE Message Error Section 6.3 | 3 UPDATE Message Error Section 6.3 | |||
4 Hold Timer Expired Section 6.5 | 4 Hold Timer Expired Section 6.5 | |||
5 Finite State Machine Error Section 6.6 | 5 Finite State Machine Error Section 6.6 | |||
6 Cease Section 6.7 | 6 Cease Section 6.7 | |||
Error subcode: | Error subcode: | |||
This 1-octet unsigned integer provides more specific | This 1-octet unsigned integer provides more specific informa- | |||
information about the nature of the reported error. Each Error | tion about the nature of the reported error. Each Error Code | |||
Code may have one or more Error Subcodes associated with it. If | may have one or more Error Subcodes associated with it. If no | |||
no appropriate Error Subcode is defined, then a zero | appropriate Error Subcode is defined, then a zero (Unspecific) | |||
(Unspecific) value is used for the Error Subcode field. | value is used for the Error Subcode field. | |||
Message Header Error subcodes: | Message Header Error subcodes: | |||
1 - Connection Not Synchronized. | 1 - Connection Not Synchronized. | |||
2 - Bad Message Length. | 2 - Bad Message Length. | |||
3 - Bad Message Type. | 3 - Bad Message Type. | |||
OPEN Message Error subcodes: | OPEN Message Error subcodes: | |||
1 - Unsupported Version Number. | 1 - Unsupported Version Number. | |||
skipping to change at page 19, line 48 | skipping to change at page 23, line 21 | |||
This variable-length field is used to diagnose the reason for | This variable-length field is used to diagnose the reason for | |||
the NOTIFICATION. The contents of the Data field depend upon | the NOTIFICATION. The contents of the Data field depend upon | |||
the Error Code and Error Subcode. See Section 6 below for more | the Error Code and Error Subcode. See Section 6 below for more | |||
details. | details. | |||
Note that the length of the Data field can be determined from | Note that the length of the Data field can be determined from | |||
the message Length field by the formula: | the message Length field by the formula: | |||
Message Length = 21 + Data Length | Message Length = 21 + Data Length | |||
The minimum length of the NOTIFICATION message is 21 octets | The minimum length of the NOTIFICATION message is 21 octets (includ- | |||
(including message header). | ing message header). | |||
5. Path Attributes | 5. Path Attributes | |||
This section discusses the path attributes of the UPDATE message. | This section discusses the path attributes of the UPDATE message. | |||
Path attributes fall into four separate categories: | Path attributes fall into four separate categories: | |||
1. Well-known mandatory. | 1. Well-known mandatory. | |||
2. Well-known discretionary. | 2. Well-known discretionary. | |||
3. Optional transitive. | 3. Optional transitive. | |||
4. Optional non-transitive. | 4. Optional non-transitive. | |||
Well-known attributes must be recognized by all BGP implementations. | Well-known attributes must be recognized by all BGP implementations. | |||
Some of these attributes are mandatory and must be included in every | Some of these attributes are mandatory and must be included in every | |||
UPDATE message that contains NLRI. Others are discretionary and may | UPDATE message that contains NLRI. Others are discretionary and may | |||
or may not be sent in a particular UPDATE message. | or may not be sent in a particular UPDATE message. | |||
All well-known attributes must be passed along (after proper | All well-known attributes must be passed along (after proper updat- | |||
updating, if necessary) to other BGP peers. | ing, if necessary) to other BGP peers. | |||
In addition to well-known attributes, each path may contain one or | In addition to well-known attributes, each path may contain one or | |||
more optional attributes. It is not required or expected that all BGP | more optional attributes. It is not required or expected that all BGP | |||
implementations support all optional attributes. The handling of an | implementations support all optional attributes. The handling of an | |||
unrecognized optional attribute is determined by the setting of the | unrecognized optional attribute is determined by the setting of the | |||
Transitive bit in the attribute flags octet. Paths with unrecognized | Transitive bit in the attribute flags octet. Paths with unrecognized | |||
transitive optional attributes should be accepted. If a path with | transitive optional attributes should be accepted. If a path with | |||
unrecognized transitive optional attribute is accepted and passed | unrecognized transitive optional attribute is accepted and passed | |||
along to other BGP peers, then the unrecognized transitive optional | along to other BGP peers, then the unrecognized transitive optional | |||
attribute of that path must be passed along with the path to other | attribute of that path must be passed along with the path to other | |||
skipping to change at page 21, line 16 | skipping to change at page 24, line 37 | |||
the UPDATE message in ascending order of attribute type. The receiver | the UPDATE message in ascending order of attribute type. The receiver | |||
of an UPDATE message must be prepared to handle path attributes | of an UPDATE message must be prepared to handle path attributes | |||
within the UPDATE message that are out of order. | within the UPDATE message that are out of order. | |||
The same attribute cannot appear more than once within the Path | The same attribute cannot appear more than once within the Path | |||
Attributes field of a particular UPDATE message. | Attributes field of a particular UPDATE message. | |||
The mandatory category refers to an attribute which must be present | The mandatory category refers to an attribute which must be present | |||
in both IBGP and EBGP exchanges if NLRI are contained in the UPDATE | in both IBGP and EBGP exchanges if NLRI are contained in the UPDATE | |||
message. Attributes classified as optional for the purpose of the | message. Attributes classified as optional for the purpose of the | |||
protocol extension mechanism may be purely discretionary, or | protocol extension mechanism may be purely discretionary, or discre- | |||
discretionary, required, or disallowed in certain contexts. | tionary, required, or disallowed in certain contexts. | |||
attribute EBGP IBGP | attribute EBGP IBGP | |||
ORIGIN mandatory mandatory | ORIGIN mandatory mandatory | |||
AS_PATH mandatory mandatory | AS_PATH mandatory mandatory | |||
NEXT_HOP mandatory mandatory | NEXT_HOP mandatory mandatory | |||
MULTI_EXIT_DISC discretionary discretionary | MULTI_EXIT_DISC discretionary discretionary | |||
LOCAL_PREF disallowed required | LOCAL_PREF see section 5.1.5 required | |||
ATOMIC_AGGREGATE see section 5.1.6 and 9.1.4 | ATOMIC_AGGREGATE see section 5.1.6 and 9.1.4 | |||
AGGREGATOR discretionary discretionary | AGGREGATOR discretionary discretionary | |||
5.1 Path Attribute Usage | 5.1 Path Attribute Usage | |||
The usage of each BGP path attributes is described in the following | The usage of each BGP path attributes is described in the following | |||
clauses. | clauses. | |||
5.1.1 ORIGIN | 5.1.1 ORIGIN | |||
ORIGIN is a well-known mandatory attribute. The ORIGIN attribute | ORIGIN is a well-known mandatory attribute. The ORIGIN attribute | |||
shall be generated by the autonomous system that originates the | shall be generated by the speaker that originates the associated | |||
associated routing information. It shall be included in the UPDATE | routing information. Its value SHOULD NOT be changed by any other | |||
messages of all BGP speakers that choose to propagate this | speaker. | |||
information to other BGP speakers. | ||||
5.1.2 AS_PATH | 5.1.2 AS_PATH | |||
AS_PATH is a well-known mandatory attribute. This attribute | AS_PATH is a well-known mandatory attribute. This attribute identi- | |||
identifies the autonomous systems through which routing information | fies the autonomous systems through which routing information carried | |||
carried in this UPDATE message has passed. The components of this | in this UPDATE message has passed. The components of this list can be | |||
list can be AS_SETs or AS_SEQUENCEs. | AS_SETs or AS_SEQUENCEs. | |||
When a BGP speaker propagates a route which it has learned from | When a BGP speaker propagates a route which it has learned from | |||
another BGP speaker's UPDATE message, it shall modify the route's | another BGP speaker's UPDATE message, it shall modify the route's | |||
AS_PATH attribute based on the location of the BGP speaker to which | AS_PATH attribute based on the location of the BGP speaker to which | |||
the route will be sent: | the route will be sent: | |||
a) When a given BGP speaker advertises the route to an internal | a) When a given BGP speaker advertises the route to an internal | |||
peer, the advertising speaker shall not modify the AS_PATH | peer, the advertising speaker shall not modify the AS_PATH | |||
attribute associated with the route. | attribute associated with the route. | |||
b) When a given BGP speaker advertises the route to an external | b) When a given BGP speaker advertises the route to an external | |||
peer, then the advertising speaker shall update the AS_PATH | peer, then the advertising speaker shall update the AS_PATH | |||
attribute as follows: | attribute as follows: | |||
1) if the first path segment of the AS_PATH is of type | 1) if the first path segment of the AS_PATH is of type | |||
AS_SEQUENCE, the local system shall prepend its own AS number | AS_SEQUENCE, the local system shall prepend its own AS number | |||
as the last element of the sequence (put it in the leftmost | as the last element of the sequence (put it in the leftmost | |||
position). If the act of prepending will cause an overflow in | position). If the act of prepending will cause an overflow in | |||
the AS_PATH segment, i.e. more than 255 elements, it shall be | the AS_PATH segment, i.e. more than 255 ASs, it shall be legal | |||
legal to prepend a new segment of type AS_SEQUENCE and prepend | to prepend a new segment of type AS_SEQUENCE and prepend its | |||
its own AS number to this new segment. | own AS number to this new segment. | |||
2) if the first path segment of the AS_PATH is of type AS_SET, | 2) if the first path segment of the AS_PATH is of type AS_SET, | |||
the local system shall prepend a new path segment of type | the local system shall prepend a new path segment of type | |||
AS_SEQUENCE to the AS_PATH, including its own AS number in that | AS_SEQUENCE to the AS_PATH, including its own AS number in that | |||
segment. | segment. | |||
When a BGP speaker originates a route then: | When a BGP speaker originates a route then: | |||
a) the originating speaker shall include its own AS number in a | a) the originating speaker shall include its own AS number in a | |||
path segment of type AS_SEQUENCE in the AS_PATH attribute of all | path segment of type AS_SEQUENCE in the AS_PATH attribute of all | |||
UPDATE messages sent to an external peer. (In this case, the AS | UPDATE messages sent to an external peer. (In this case, the AS | |||
number of the originating speaker's autonomous system will be the | number of the originating speaker's autonomous system will be the | |||
only entry the path segment, and this path segment will be the | only entry the path segment, and this path segment will be the | |||
only segment in the AS_PATH attribute). | only segment in the AS_PATH attribute). | |||
b) the originating speaker shall include an empty AS_PATH | b) the originating speaker shall include an empty AS_PATH | |||
attribute in all UPDATE messages sent to internal peers. (An | attribute in all UPDATE messages sent to internal peers. (An | |||
empty AS_PATH attribute is one whose length field contains the | empty AS_PATH attribute is one whose length field contains the | |||
value zero). | value zero). | |||
Whenever the modification of the AS_PATH attribute calls for | Whenever the modification of the AS_PATH attribute calls for includ- | |||
including or prepending the AS number of the local system, the local | ing or prepending the AS number of the local system, the local system | |||
system may include/prepend more than one instance of its own AS | may include/prepend more than one instance of its own AS number in | |||
number in the AS_PATH attribute. This is controlled via local | the AS_PATH attribute. This is controlled via local configuration. | |||
configuration. | ||||
5.1.3 NEXT_HOP | 5.1.3 NEXT_HOP | |||
The NEXT_HOP path attribute defines the IP address of the border | The NEXT_HOP is a well-known mandatory attribute that defines the IP | |||
router that should be used as the next hop to the destinations listed | address of the border router that should be used as the next hop to | |||
in the UPDATE message. The NEXT_HOP attribute is calculated as | the destinations listed in the UPDATE message. The NEXT_HOP attribute | |||
follows. | is calculated as follows. | |||
1) When sending a message to an internal peer, the BGP speaker | 1) When sending a message to an internal peer, if the route is not | |||
should not modify the NEXT_HOP attribute, unless it has been | locally originated the BGP speaker should not modify the NEXT_HOP | |||
explicitly configured to announce its own IP address as the | attribute, unless it has been explicitly configured to announce | |||
NEXT_HOP. | its own IP address as the NEXT_HOP. When announcing a locally | |||
originated route to an internal peer, the BGP speaker should use | ||||
as the NEXT_HOP the interface address of the router through which | ||||
the announced network is reachable for the speaker; if the route | ||||
is directly connected to the speaker, or the interface address of | ||||
the router through which the announced network is reachable for | ||||
the speaker is the internal peer's address, then the BGP speaker | ||||
should use for the NEXT_HOP attribute its own IP address (the | ||||
address of the interface that is used to reach the peer). | ||||
2) When sending a message to an external peer X, and the peer is | 2) When sending a message to an external peer X, and the peer is | |||
one IP hop away from the speaker: | one IP hop away from the speaker: | |||
- If the route being announced was learned from an internal | - If the route being announced was learned from an internal | |||
peer or is locally originated, the BGP speaker can use for the | peer or is locally originated, the BGP speaker can use for the | |||
NEXT_HOP attribute an interface address of the internal peer | NEXT_HOP attribute an interface address of the internal peer | |||
router (or the internal router) through which the announced | router (or the internal router) through which the announced | |||
network is reachable for the speaker, provided that peer X | network is reachable for the speaker, provided that peer X | |||
shares a common subnet with this address. This is a form of | shares a common subnet with this address. This is a form of | |||
"third party" NEXT_HOP attribute. | "third party" NEXT_HOP attribute. | |||
- If the route being announced was learned from an external | - Otherwise, if the route being announced was learned from an | |||
peer, the speaker can use in the NEXT_HOP attribute an IP | external peer, the speaker can use in the NEXT_HOP attribute an | |||
address of any adjacent router (known from the received | IP address of any adjacent router (known from the received | |||
NEXT_HOP attribute) that the speaker itself uses for local | NEXT_HOP attribute) that the speaker itself uses for local | |||
route calculation, provided that peer X shares a common subnet | route calculation, provided that peer X shares a common subnet | |||
with this address. This is a second form of "third party" | with this address. This is a second form of "third party" | |||
NEXT_HOP attribute. | NEXT_HOP attribute. | |||
- If the external peer to which the route is being advertised | - Otherwise, if the external peer to which the route is being | |||
shares a common subnet with one of the announcing router's own | advertised shares a common subnet with one of the announcing | |||
interfaces, the router may use the IP address associated with | router's own interfaces, the router may use the IP address | |||
such an interface in the NEXT_HOP attribute. This is known as a | associated with such an interface in the NEXT_HOP attribute. | |||
"first party" NEXT_HOP attribute. | This is known as a "first party" NEXT_HOP attribute. | |||
- By default (if none of the above conditions apply), the BGP | - By default (if none of the above conditions apply), the BGP | |||
speaker should use in the NEXT_HOP attribute the IP address of | speaker should use in the NEXT_HOP attribute the IP address of | |||
the interface that the speaker uses to establish the BGP | the interface that the speaker uses to establish the BGP con- | |||
session to peer X. | nection to peer X. | |||
3) When sending a message to an external peer X, and the peer is | 3) When sending a message to an external peer X, and the peer is | |||
multiple IP hops away from the speaker (aka "multihop EBGP"): | multiple IP hops away from the speaker (aka "multihop EBGP"): | |||
- The speaker may be configured to propagate the NEXT_HOP | - The speaker may be configured to propagate the NEXT_HOP | |||
attribute. In this case when advertising a route that the | attribute. In this case when advertising a route that the | |||
speaker learned from one of its peers, the NEXT_HOP attribute | speaker learned from one of its peers, the NEXT_HOP attribute | |||
of the advertised route is exactly the same as the NEXT_HOP | of the advertised route is exactly the same as the NEXT_HOP | |||
attribute of the learned route (the speaker just doesn't modify | attribute of the learned route (the speaker just doesn't modify | |||
the NEXT_HOP attribute). | the NEXT_HOP attribute). | |||
- By default, the BGP speaker should use in the NEXT_HOP | - By default, the BGP speaker should use in the NEXT_HOP | |||
attribute the IP address of the interface that the speaker uses | attribute the IP address of the interface that the speaker uses | |||
to establish the BGP session to peer X. | to establish the BGP connection to peer X. | |||
Normally the NEXT_HOP attribute is chosen such that the shortest | Normally the NEXT_HOP attribute is chosen such that the shortest | |||
available path will be taken. A BGP speaker must be able to support | available path will be taken. A BGP speaker must be able to support | |||
disabling advertisement of third party NEXT_HOP attributes to handle | disabling advertisement of third party NEXT_HOP attributes to handle | |||
imperfectly bridged media. | imperfectly bridged media. | |||
A BGP speaker must never advertise an address of a peer to that peer | A BGP speaker must never advertise an address of a peer to that peer | |||
as a NEXT_HOP, for a route that the speaker is originating. A BGP | as a NEXT_HOP, for a route that the speaker is originating. A BGP | |||
speaker must never install a route with itself as the next hop. | speaker must never install a route with itself as the next hop. | |||
The NEXT_HOP attribute is used by the BGP speaker to determine the | The NEXT_HOP attribute is used by the BGP speaker to determine the | |||
actual outbound interface and immediate next-hop address that should | actual outbound interface and immediate next-hop address that should | |||
be used to forward transit packets to the associated destinations. | be used to forward transit packets to the associated destinations. | |||
The immediate next-hop address is determined by performing a | ||||
recursive route lookup operation for the IP address in the NEXT_HOP | The immediate next-hop address is determined by performing a recur- | |||
attribute using the contents of the Routing Table (see Section | sive route lookup operation for the IP address in the NEXT_HOP | |||
9.1.2.2). The resolving route will always specify the outbound | attribute using the contents of the Routing Table, selecting one | |||
interface. If the resolving route specifies the next-hop address, | entry if multiple entries of equal cost exist. The Routing Table | |||
this address should be used as the immediate address for packet | entry which resolves the IP address in the NEXT_HOP attribute will | |||
forwarding. If the address in the NEXT_HOP attribute is directly | always specify the outbound interface. If the entry specifies an | |||
resolved through a route to an attached subnet (such a route will not | attached subnet, but does not specify a next-hop address, then the | |||
specify the next-hop address), the outbound interface should be taken | address in the NEXT_HOP attribute should be used as the immediate | |||
from the resolving route and the address in the NEXT_HOP attribute | next-hop address. If the entry also specifies the next-hop address, | |||
should be used as the immediate next-hop address. | this address should be used as the immediate next-hop address for | |||
packet forwarding. | ||||
5.1.4 MULTI_EXIT_DISC | 5.1.4 MULTI_EXIT_DISC | |||
The MULTI_EXIT_DISC attribute may be used on external (inter-AS) | The MULTI_EXIT_DISC is an optional non-transitive attribute which may | |||
links to discriminate among multiple exit or entry points to the same | be used on external (inter-AS) links to discriminate among multiple | |||
neighboring AS. The value of the MULTI_EXIT_DISC attribute is a four | exit or entry points to the same neighboring AS. The value of the | |||
octet unsigned number which is called a metric. All other factors | MULTI_EXIT_DISC attribute is a four octet unsigned number which is | |||
being equal, the exit point with lower metric should be preferred. If | called a metric. All other factors being equal, the exit point with | |||
received over external links, the MULTI_EXIT_DISC attribute MAY be | lower metric should be preferred. If received over EBGP, the | |||
propagated over internal links to other BGP speakers within the same | MULTI_EXIT_DISC attribute MAY be propagated over IBGP to other BGP | |||
AS. The MULTI_EXIT_DISC attribute received from a neighboring AS MUST | speakers within the same AS. The MULTI_EXIT_DISC attribute received | |||
NOT be propagated to other neighboring ASs. | from a neighboring AS MUST NOT be propagated to other neighboring | |||
ASs. | ||||
A BGP speaker MUST IMPLEMENT a mechanism based on local configuration | A BGP speaker MUST IMPLEMENT a mechanism based on local configuration | |||
which allows the MULTI_EXIT_DISC attribute to be removed from a | which allows the MULTI_EXIT_DISC attribute to be removed from a | |||
route. This MAY be done prior to determining the degree of preference | route. This MAY be done prior to determining the degree of preference | |||
of the route and performing route selection (decision process phases | of the route and performing route selection (decision process phases | |||
1 and 2). | 1 and 2). | |||
An implementation MAY also (based on local configuration) alter the | An implementation MAY also (based on local configuration) alter the | |||
value of the MULTI_EXIT_DISC attribute received over an external | value of the MULTI_EXIT_DISC attribute received over EBGP. This MAY | |||
link. If it does so, it shall do so prior to determining the degree | be done prior to determining the degree of preference of the route | |||
of preference of the route and performing route selection (decision | and performing route selection (decision process phases 1 and 2). See | |||
process phases 1 and 2). | section 9.1.2.2 for necessary restricts on this. | |||
5.1.5 LOCAL_PREF | 5.1.5 LOCAL_PREF | |||
LOCAL_PREF is a well-known attribute that SHALL be included in all | LOCAL_PREF is a well-known attribute that SHALL be included in all | |||
UPDATE messages that a given BGP speaker sends to the other internal | UPDATE messages that a given BGP speaker sends to the other internal | |||
peers. A BGP speaker SHALL calculate the degree of preference for | peers. A BGP speaker SHALL calculate the degree of preference for | |||
each external route based on the locally configured policy, and | each external route based on the locally configured policy, and | |||
include the degree of preference when advertising a route to its | include the degree of preference when advertising a route to its | |||
internal peers. The higher degree of preference MUST be preferred. A | internal peers. The higher degree of preference MUST be preferred. A | |||
BGP speaker shall use the degree of preference learned via LOCAL_PREF | BGP speaker shall use the degree of preference learned via LOCAL_PREF | |||
in its decision process (see section 9.1.1). | in its decision process (see section 9.1.1). | |||
A BGP speaker MUST NOT include this attribute in UPDATE messages that | A BGP speaker MUST NOT include this attribute in UPDATE messages that | |||
it sends to external peers, except for the case of BGP Confederations | it sends to external peers, except for the case of BGP Confederations | |||
[13]. If it is contained in an UPDATE message that is received from | [RFC3065]. If it is contained in an UPDATE message that is received | |||
an external peer, then this attribute MUST be ignored by the | from an external peer, then this attribute MUST be ignored by the | |||
receiving speaker, except for the case of BGP Confederations [13]. | receiving speaker, except for the case of BGP Confederations | |||
[RF3065]. | ||||
5.1.6 ATOMIC_AGGREGATE | 5.1.6 ATOMIC_AGGREGATE | |||
ATOMIC_AGGREGATE is a well-known discretionary attribute. | ATOMIC_AGGREGATE is a well-known discretionary attribute. | |||
When a router aggregates several routes for the purpose of | When a router aggregates several routes for the purpose of advertise- | |||
advertisement to a particular peer, and the AS_PATH of the aggregated | ment to a particular peer, the AS_PATH of the aggregated route nor- | |||
route excludes at least some of the AS numbers present in the AS_PATH | mally includes an AS_SET formed from the set of AS from which the | |||
of the routes that are aggregated, the aggregated route, when | aggregate was formed. In many cases the network administrator can | |||
advertised to the peer, MUST include the ATOMIC_AGGREGATE attribute. | determine that the aggregate can safely be advertised without the | |||
AS_SET and not form route loops. | ||||
If an aggregate excludes at least some of the AS numbers present in | ||||
the AS_PATH of the routes that are aggregated as a result of dropping | ||||
the AS_SET, the aggregated route, when advertised to the peer, SHOULD | ||||
include the ATOMIC_AGGREGATE attribute. | ||||
A BGP speaker that receives a route with the ATOMIC_AGGREGATE | A BGP speaker that receives a route with the ATOMIC_AGGREGATE | |||
attribute MUST NOT remove the attribute from the route when | attribute SHOULD NOT remove the attribute from the route when propa- | |||
propagating it to other speakers. | gating it to other speakers. | |||
A BGP speaker that receives a route with the ATOMIC_AGGREGATE | A BGP speaker that receives a route with the ATOMIC_AGGREGATE | |||
attribute MUST NOT make any NLRI of that route more specific (as | attribute MUST NOT make any NLRI of that route more specific (as | |||
defined in 9.1.4) when advertising this route to other BGP speakers. | defined in 9.1.4) when advertising this route to other BGP speakers. | |||
A BGP speaker that receives a route with the ATOMIC_AGGREGATE | A BGP speaker that receives a route with the ATOMIC_AGGREGATE | |||
attribute needs to be cognizant of the fact that the actual path to | attribute needs to be cognizant of the fact that the actual path to | |||
destinations, as specified in the NLRI of the route, while having the | destinations, as specified in the NLRI of the route, while having the | |||
loop-free property, may not be the path specified in the AS_PATH | loop-free property, may not be the path specified in the AS_PATH | |||
attribute of the route. | attribute of the route. | |||
skipping to change at page 26, line 29 | skipping to change at page 30, line 18 | |||
in updates which are formed by aggregation (see Section 9.2.2.2). A | in updates which are formed by aggregation (see Section 9.2.2.2). A | |||
BGP speaker which performs route aggregation may add the AGGREGATOR | BGP speaker which performs route aggregation may add the AGGREGATOR | |||
attribute which shall contain its own AS number and IP address. The | attribute which shall contain its own AS number and IP address. The | |||
IP address should be the same as the BGP Identifier of the speaker. | IP address should be the same as the BGP Identifier of the speaker. | |||
6. BGP Error Handling. | 6. BGP Error Handling. | |||
This section describes actions to be taken when errors are detected | This section describes actions to be taken when errors are detected | |||
while processing BGP messages. | while processing BGP messages. | |||
When any of the conditions described here are detected, a | When any of the conditions described here are detected, a NOTIFICA- | |||
NOTIFICATION message with the indicated Error Code, Error Subcode, | TION message with the indicated Error Code, Error Subcode, and Data | |||
and Data fields is sent, and the BGP connection is closed. If no | fields is sent, and the BGP connection is closed, unless it is | |||
Error Subcode is specified, then a zero must be used. | explicitly stated that no NOTIFICATION message is to be sent and the | |||
BGP connection is not to be closed. If no Error Subcode is specified, | ||||
then a zero must be used. | ||||
The phrase "the BGP connection is closed" means that the transport | The phrase "the BGP connection is closed" means that the TCP connec- | |||
protocol connection has been closed, the associated Adj-RIB-In has | tion has been closed, the associated Adj-RIB-In has been cleared, and | |||
been cleared, and that all resources for that BGP connection have | that all resources for that BGP connection have been deallocated. | |||
been deallocated. Entries in the Loc-RIB associated with the remote | Entries in the Loc-RIB associated with the remote peer are marked as | |||
peer are marked as invalid. The fact that the routes have become | invalid. The fact that the routes have become invalid is passed to | |||
invalid is passed to other BGP peers before the routes are deleted | other BGP peers before the routes are deleted from the system. | |||
from the system. | ||||
Unless specified explicitly, the Data field of the NOTIFICATION | Unless specified explicitly, the Data field of the NOTIFICATION mes- | |||
message that is sent to indicate an error is empty. | sage that is sent to indicate an error is empty. | |||
6.1 Message Header error handling. | 6.1 Message Header error handling. | |||
All errors detected while processing the Message Header are indicated | All errors detected while processing the Message Header are indicated | |||
by sending the NOTIFICATION message with Error Code Message Header | by sending the NOTIFICATION message with Error Code Message Header | |||
Error. The Error Subcode elaborates on the specific nature of the | Error. The Error Subcode elaborates on the specific nature of the | |||
error. | error. | |||
The expected value of the Marker field of the message header is all | The expected value of the Marker field of the message header is all | |||
ones if the message type is OPEN. The expected value of the Marker | ones. If the Marker field of the message header is not as expected, | |||
field for all other types of BGP messages determined based on the | then a synchronization error has occurred and the Error Subcode is | |||
presence of the Authentication Information Optional Parameter in the | set to Connection Not Synchronized. | |||
BGP OPEN message and the actual authentication mechanism (if the | ||||
Authentication Information in the BGP OPEN message is present). The | ||||
Marker field should be all ones if the OPEN message carried no | ||||
authentication information. If the Marker field of the message header | ||||
is not the expected one, then a synchronization error has occurred | ||||
and the Error Subcode is set to Connection Not Synchronized. | ||||
If the Length field of the message header is less than 19 or greater | If the Length field of the message header is less than 19 or greater | |||
than 4096, or if the Length field of an OPEN message is less than the | than 4096, or if the Length field of an OPEN message is less than the | |||
minimum length of the OPEN message, or if the Length field of an | minimum length of the OPEN message, or if the Length field of an | |||
UPDATE message is less than the minimum length of the UPDATE message, | UPDATE message is less than the minimum length of the UPDATE message, | |||
or if the Length field of a KEEPALIVE message is not equal to 19, or | or if the Length field of a KEEPALIVE message is not equal to 19, or | |||
if the Length field of a NOTIFICATION message is less than the | if the Length field of a NOTIFICATION message is less than the mini- | |||
minimum length of the NOTIFICATION message, then the Error Subcode is | mum length of the NOTIFICATION message, then the Error Subcode is set | |||
set to Bad Message Length. The Data field contains the erroneous | to Bad Message Length. The Data field contains the erroneous Length | |||
Length field. | field. | |||
If the Type field of the message header is not recognized, then the | If the Type field of the message header is not recognized, then the | |||
Error Subcode is set to Bad Message Type. The Data field contains the | Error Subcode is set to Bad Message Type. The Data field contains the | |||
erroneous Type field. | erroneous Type field. | |||
6.2 OPEN message error handling. | 6.2 OPEN message error handling. | |||
All errors detected while processing the OPEN message are indicated | All errors detected while processing the OPEN message are indicated | |||
by sending the NOTIFICATION message with Error Code OPEN Message | by sending the NOTIFICATION message with Error Code OPEN Message | |||
Error. The Error Subcode elaborates on the specific nature of the | Error. The Error Subcode elaborates on the specific nature of the | |||
skipping to change at page 28, line 13 | skipping to change at page 31, line 39 | |||
received OPEN message), or if the smallest locally supported version | received OPEN message), or if the smallest locally supported version | |||
number is greater than the version the remote BGP peer bid, then the | number is greater than the version the remote BGP peer bid, then the | |||
smallest locally supported version number. | smallest locally supported version number. | |||
If the Autonomous System field of the OPEN message is unacceptable, | If the Autonomous System field of the OPEN message is unacceptable, | |||
then the Error Subcode is set to Bad Peer AS. The determination of | then the Error Subcode is set to Bad Peer AS. The determination of | |||
acceptable Autonomous System numbers is outside the scope of this | acceptable Autonomous System numbers is outside the scope of this | |||
protocol. | protocol. | |||
If the Hold Time field of the OPEN message is unacceptable, then the | If the Hold Time field of the OPEN message is unacceptable, then the | |||
Error Subcode MUST be set to Unacceptable Hold Time. An | Error Subcode MUST be set to Unacceptable Hold Time. An implementa- | |||
implementation MUST reject Hold Time values of one or two seconds. | tion MUST reject Hold Time values of one or two seconds. An imple- | |||
An implementation MAY reject any proposed Hold Time. An | mentation MAY reject any proposed Hold Time. An implementation which | |||
implementation which accepts a Hold Time MUST use the negotiated | accepts a Hold Time MUST use the negotiated value for the Hold Time. | |||
value for the Hold Time. | ||||
If the BGP Identifier field of the OPEN message is syntactically | If the BGP Identifier field of the OPEN message is syntactically | |||
incorrect, then the Error Subcode is set to Bad BGP Identifier. | incorrect, then the Error Subcode is set to Bad BGP Identifier. Syn- | |||
Syntactic correctness means that the BGP Identifier field represents | tactic correctness means that the BGP Identifier field represents a | |||
a valid IP host address. | valid IP host address. | |||
If one of the Optional Parameters in the OPEN message is not | If one of the Optional Parameters in the OPEN message is not | |||
recognized, then the Error Subcode is set to Unsupported Optional | recognized, then the Error Subcode is set to Unsupported Optional | |||
Parameters. | Parameters. | |||
If one of the Optional Parameters in the OPEN message is recognized, | If one of the Optional Parameters in the OPEN message is recognized, | |||
but is malformed, then the Error Subcode is set to 0 (Unspecific). | but is malformed, then the Error Subcode is set to 0 (Unspecific). | |||
If the OPEN message carries Authentication Information (as an | ||||
Optional Parameter), then the corresponding authentication procedure | ||||
is invoked. If the authentication procedure (based on Authentication | ||||
Code and Authentication Data) fails, then the Error Subcode is set to | ||||
Authentication Failure. | ||||
6.3 UPDATE message error handling. | 6.3 UPDATE message error handling. | |||
All errors detected while processing the UPDATE message are indicated | All errors detected while processing the UPDATE message are indicated | |||
by sending the NOTIFICATION message with Error Code UPDATE Message | by sending the NOTIFICATION message with Error Code UPDATE Message | |||
Error. The error subcode elaborates on the specific nature of the | Error. The error subcode elaborates on the specific nature of the | |||
error. | error. | |||
Error checking of an UPDATE message begins by examining the path | Error checking of an UPDATE message begins by examining the path | |||
attributes. If the Withdrawn Routes Length or Total Attribute Length | attributes. If the Withdrawn Routes Length or Total Attribute Length | |||
is too large (i.e., if Withdrawn Routes Length + Total Attribute | is too large (i.e., if Withdrawn Routes Length + Total Attribute | |||
Length + 23 exceeds the message Length), then the Error Subcode is | Length + 23 exceeds the message Length), then the Error Subcode is | |||
set to Malformed Attribute List. | set to Malformed Attribute List. | |||
If any recognized attribute has Attribute Flags that conflict with | If any recognized attribute has Attribute Flags that conflict with | |||
the Attribute Type Code, then the Error Subcode is set to Attribute | the Attribute Type Code, then the Error Subcode is set to Attribute | |||
Flags Error. The Data field contains the erroneous attribute (type, | Flags Error. The Data field contains the erroneous attribute (type, | |||
length and value). | length and value). | |||
If any recognized attribute has Attribute Length that conflicts with | If any recognized attribute has Attribute Length that conflicts with | |||
the expected length (based on the attribute type code), then the | the expected length (based on the attribute type code), then the | |||
Error Subcode is set to Attribute Length Error. The Data field | Error Subcode is set to Attribute Length Error. The Data field con- | |||
contains the erroneous attribute (type, length and value). | tains the erroneous attribute (type, length and value). | |||
If any of the mandatory well-known attributes are not present, then | If any of the mandatory well-known attributes are not present, then | |||
the Error Subcode is set to Missing Well-known Attribute. The Data | the Error Subcode is set to Missing Well-known Attribute. The Data | |||
field contains the Attribute Type Code of the missing well-known | field contains the Attribute Type Code of the missing well-known | |||
attribute. | attribute. | |||
If any of the mandatory well-known attributes are not recognized, | If any of the mandatory well-known attributes are not recognized, | |||
then the Error Subcode is set to Unrecognized Well-known Attribute. | then the Error Subcode is set to Unrecognized Well-known Attribute. | |||
The Data field contains the unrecognized attribute (type, length and | The Data field contains the unrecognized attribute (type, length and | |||
value). | value). | |||
If the ORIGIN attribute has an undefined value, then the Error | If the ORIGIN attribute has an undefined value, then the Error Sub- | |||
Subcode is set to Invalid Origin Attribute. The Data field contains | code is set to Invalid Origin Attribute. The Data field contains the | |||
the unrecognized attribute (type, length and value). | unrecognized attribute (type, length and value). | |||
If the NEXT_HOP attribute field is syntactically incorrect, then the | If the NEXT_HOP attribute field is syntactically incorrect, then the | |||
Error Subcode is set to Invalid NEXT_HOP Attribute. The Data field | Error Subcode is set to Invalid NEXT_HOP Attribute. The Data field | |||
contains the incorrect attribute (type, length and value). Syntactic | contains the incorrect attribute (type, length and value). Syntactic | |||
correctness means that the NEXT_HOP attribute represents a valid IP | correctness means that the NEXT_HOP attribute represents a valid IP | |||
host address. Semantic correctness applies only to the external BGP | host address. | |||
links, and only when the sender and the receiving speaker are one IP | ||||
hop away from each other. To be semantically correct, the IP address | The IP address in the NEXT_HOP must meet the following criteria to be | |||
in the NEXT_HOP must not be the IP address of the receiving speaker, | considered semantically correct: | |||
and the NEXT_HOP IP address must either be the sender's IP address | ||||
(used to establish the BGP session), or the interface associated with | a) It must not be the IP address of the receiving speaker | |||
the NEXT_HOP IP address must share a common subnet with the receiving | ||||
BGP speaker. If the NEXT_HOP attribute is semantically incorrect, the | b) In the case of an EBGP where the sender and receiver are one IP | |||
error should be logged, and the route should be ignored. In this | hop away from each other, either the IP address in the NEXT_HOP | |||
case, no NOTIFICATION message should be sent. | must be the sender's IP address (that is used to establish the BGP | |||
connection), or the interface associated with the NEXT_HOP IP | ||||
address must share a common subnet with the receiving BGP speaker. | ||||
If the NEXT_HOP attribute is semantically incorrect, the error should | ||||
be logged, and the route should be ignored. In this case, no NOTIFI- | ||||
CATION message should be sent, and connection should not be closed. | ||||
The AS_PATH attribute is checked for syntactic correctness. If the | The AS_PATH attribute is checked for syntactic correctness. If the | |||
path is syntactically incorrect, then the Error Subcode is set to | path is syntactically incorrect, then the Error Subcode is set to | |||
Malformed AS_PATH. | Malformed AS_PATH. | |||
The information carried by the AS_PATH attribute is checked for AS | If the UPDATE message is received from an external peer, the local | |||
loops. AS loop detection is done by scanning the full AS path (as | system MAY check whether the leftmost AS in the AS_PATH attribute is | |||
specified in the AS_PATH attribute), and checking that the autonomous | equal to the autonomous system number of the peer than sent the mes- | |||
system number of the local system does not appear in the AS path. If | sage. If the check determines that this is not the case, the Error | |||
the autonomous system number appears in the AS path the route may be | Subcode is set to Malformed AS_PATH. | |||
stored in the Adj-RIB-In, but unless the router is configured to | ||||
accept routes with its own autonomous system in the AS path, the | ||||
route shall not be passed to the BGP Decision Process. Operations of | ||||
a router that is configured to accept routes with its own autonomous | ||||
system number in the AS path are outside the scope of this document. | ||||
If an optional attribute is recognized, then the value of this | If an optional attribute is recognized, then the value of this | |||
attribute is checked. If an error is detected, the attribute is | attribute is checked. If an error is detected, the attribute is dis- | |||
discarded, and the Error Subcode is set to Optional Attribute Error. | carded, and the Error Subcode is set to Optional Attribute Error. | |||
The Data field contains the attribute (type, length and value). | The Data field contains the attribute (type, length and value). | |||
If any attribute appears more than once in the UPDATE message, then | If any attribute appears more than once in the UPDATE message, then | |||
the Error Subcode is set to Malformed Attribute List. | the Error Subcode is set to Malformed Attribute List. | |||
The NLRI field in the UPDATE message is checked for syntactic | The NLRI field in the UPDATE message is checked for syntactic valid- | |||
validity. If the field is syntactically incorrect, then the Error | ity. If the field is syntactically incorrect, then the Error Subcode | |||
Subcode is set to Invalid Network Field. | is set to Invalid Network Field. | |||
If a prefix in the NLRI field is semantically incorrect (e.g., an | If a prefix in the NLRI field is semantically incorrect (e.g., an | |||
unexpected multicast IP address), an error should be logged locally, | unexpected multicast IP address), an error should be logged locally, | |||
and the prefix should be ignored. | and the prefix should be ignored. | |||
An UPDATE message that contains correct path attributes, but no NLRI, | An UPDATE message that contains correct path attributes, but no NLRI, | |||
shall be treated as a valid UPDATE message. | shall be treated as a valid UPDATE message. | |||
6.4 NOTIFICATION message error handling. | 6.4 NOTIFICATION message error handling. | |||
If a peer sends a NOTIFICATION message, and there is an error in that | If a peer sends a NOTIFICATION message, and the receiver of the mes- | |||
message, there is unfortunately no means of reporting this error via | sage detects an error in that message, the receiver can not use a | |||
a subsequent NOTIFICATION message. Any such error, such as an | NOTIFICATION message to report this error back to the peer. Any such | |||
unrecognized Error Code or Error Subcode, should be noticed, logged | error, such as an unrecognized Error Code or Error Subcode, should be | |||
locally, and brought to the attention of the administration of the | noticed, logged locally, and brought to the attention of the adminis- | |||
peer. The means to do this, however, lies outside the scope of this | tration of the peer. The means to do this, however, lies outside the | |||
document. | scope of this document. | |||
6.5 Hold Timer Expired error handling. | 6.5 Hold Timer Expired error handling. | |||
If a system does not receive successive KEEPALIVE and/or UPDATE | If a system does not receive successive KEEPALIVE and/or UPDATE | |||
and/or NOTIFICATION messages within the period specified in the Hold | and/or NOTIFICATION messages within the period specified in the Hold | |||
Time field of the OPEN message, then the NOTIFICATION message with | Time field of the OPEN message, then the NOTIFICATION message with | |||
Hold Timer Expired Error Code must be sent and the BGP connection | Hold Timer Expired Error Code must be sent and the BGP connection | |||
closed. | closed. | |||
6.6 Finite State Machine error handling. | 6.6 Finite State Machine error handling. | |||
skipping to change at page 31, line 20 | skipping to change at page 34, line 37 | |||
with Error Code Finite State Machine Error. | with Error Code Finite State Machine Error. | |||
6.7 Cease. | 6.7 Cease. | |||
In absence of any fatal errors (that are indicated in this section), | In absence of any fatal errors (that are indicated in this section), | |||
a BGP peer may choose at any given time to close its BGP connection | a BGP peer may choose at any given time to close its BGP connection | |||
by sending the NOTIFICATION message with Error Code Cease. However, | by sending the NOTIFICATION message with Error Code Cease. However, | |||
the Cease NOTIFICATION message must not be used when a fatal error | the Cease NOTIFICATION message must not be used when a fatal error | |||
indicated by this section does exist. | indicated by this section does exist. | |||
A BGP speaker may support the ability to impose an (locally | A BGP speaker may support the ability to impose an (locally config- | |||
configured) upper bound on the number of address prefixes the speaker | ured) upper bound on the number of address prefixes the speaker is | |||
is willing to accept from a neighbor. When the upper bound is | willing to accept from a neighbor. When the upper bound is reached, | |||
reached, the speaker (under control of local configuration) may | the speaker (under control of local configuration) may either (a) | |||
either (a) discard new address prefixes from the neighbor, or (b) | discard new address prefixes from the neighbor (while maintaining BGP | |||
terminate the BGP peering with the neighbor. If the BGP speaker | connection with the neighbor), or (b) terminate the BGP connection | |||
decides to terminate its peering with a neighbor because the number | with the neighbor. If the BGP speaker decides to terminate its BGP | |||
of address prefixes received from the neighbor exceeds the locally | connection with a neighbor because the number of address prefixes | |||
configured upper bound, then the speaker must send to the neighbor a | received from the neighbor exceeds the locally configured upper | |||
NOTIFICATION message with the Error Code Cease. | bound, then the speaker must send to the neighbor a NOTIFICATION mes- | |||
sage with the Error Code Cease. | ||||
6.8 Connection collision detection. | 6.8 BGP connection collision detection. | |||
If a pair of BGP speakers try simultaneously to establish a BGP | If a pair of BGP speakers try simultaneously to establish a BGP con- | |||
connection to each other, then two parallel connections between this | nection to each other, then two parallel connections between this | |||
pair of speakers might well be formed. If the source IP address used | pair of speakers might well be formed. If the source IP address used | |||
by one of these connections is the same as the destination IP address | by one of these connections is the same as the destination IP address | |||
used by the other, and the destination IP address used by the first | used by the other, and the destination IP address used by the first | |||
connection is the same as the source IP address used by the other, we | connection is the same as the source IP address used by the other, we | |||
refer to this situation as connection collision. Clearly in the | refer to this situation as connection collision. Clearly in the | |||
presence of connection collision, one of these connections must be | presence of connection collision, one of these connections must be | |||
closed. | closed. | |||
Based on the value of the BGP Identifier a convention is established | Based on the value of the BGP Identifier a convention is established | |||
for detecting which BGP connection is to be preserved when a | for detecting which BGP connection is to be preserved when a colli- | |||
collision does occur. The convention is to compare the BGP | sion does occur. The convention is to compare the BGP Identifiers of | |||
Identifiers of the peers involved in the collision and to retain only | the peers involved in the collision and to retain only the connection | |||
the connection initiated by the BGP speaker with the higher-valued | initiated by the BGP speaker with the higher-valued BGP Identifier. | |||
BGP Identifier. | ||||
Upon receipt of an OPEN message, the local system must examine all of | Upon receipt of an OPEN message, the local system must examine all of | |||
its connections that are in the OpenConfirm state. A BGP speaker may | its connections that are in the OpenConfirm state. A BGP speaker may | |||
also examine connections in an OpenSent state if it knows the BGP | also examine connections in an OpenSent state if it knows the BGP | |||
Identifier of the peer by means outside of the protocol. If among | Identifier of the peer by means outside of the protocol. If among | |||
these connections there is a connection to a remote BGP speaker whose | these connections there is a connection to a remote BGP speaker whose | |||
BGP Identifier equals the one in the OPEN message, and this | BGP Identifier equals the one in the OPEN message, and this connec- | |||
connection collides with the connection over which the OPEN message | tion collides with the connection over which the OPEN message is | |||
is received then the local system performs the following collision | received then the local system performs the following collision reso- | |||
resolution procedure: | lution procedure: | |||
1. The BGP Identifier of the local system is compared to the BGP | 1. The BGP Identifier of the local system is compared to the BGP | |||
Identifier of the remote system (as specified in the OPEN | Identifier of the remote system (as specified in the OPEN mes- | |||
message). | sage). Comparing BGP Identifiers is done by treating them as | |||
(4-octet long) unsigned integers. | ||||
2. If the value of the local BGP Identifier is less than the | 2. If the value of the local BGP Identifier is less than the | |||
remote one, the local system closes BGP connection that already | remote one, the local system closes BGP connection that already | |||
exists (the one that is already in the OpenConfirm state), and | exists (the one that is already in the OpenConfirm state), and | |||
accepts BGP connection initiated by the remote system. | accepts BGP connection initiated by the remote system. | |||
3. Otherwise, the local system closes newly created BGP connection | 3. Otherwise, the local system closes newly created BGP connection | |||
(the one associated with the newly received OPEN message), and | (the one associated with the newly received OPEN message), and | |||
continues to use the existing one (the one that is already in the | continues to use the existing one (the one that is already in the | |||
OpenConfirm state). | OpenConfirm state). | |||
Comparing BGP Identifiers is done by treating them as (4-octet | ||||
long) unsigned integers. | ||||
Unless allowed via configuration, a connection collision with an | Unless allowed via configuration, a connection collision with an | |||
existing BGP connection that is in Established state causes | existing BGP connection that is in Established state causes closing | |||
closing of the newly created connection. | of the newly created connection. | |||
Note that a connection collision cannot be detected with | Note that a connection collision can not be detected with connections | |||
connections that are in Idle, or Connect, or Active states. | that are in Idle, or Connect, or Active states. | |||
Closing the BGP connection (that results from the collision | Closing the BGP connection (that results from the collision resolu- | |||
resolution procedure) is accomplished by sending the NOTIFICATION | tion procedure) is accomplished by sending the NOTIFICATION message | |||
message with the Error Code Cease. | with the Error Code Cease. | |||
7. BGP Version Negotiation. | 7. BGP Version Negotiation | |||
BGP speakers may negotiate the version of the protocol by making | BGP speakers may negotiate the version of the protocol by making mul- | |||
multiple attempts to open a BGP connection, starting with the highest | tiple attempts to open a BGP connection, starting with the highest | |||
version number each supports. If an open attempt fails with an Error | version number each supports. If an open attempt fails with an Error | |||
Code OPEN Message Error, and an Error Subcode Unsupported Version | Code OPEN Message Error, and an Error Subcode Unsupported Version | |||
Number, then the BGP speaker has available the version number it | Number, then the BGP speaker has available the version number it | |||
tried, the version number its peer tried, the version number passed | tried, the version number its peer tried, the version number passed | |||
by its peer in the NOTIFICATION message, and the version numbers that | by its peer in the NOTIFICATION message, and the version numbers that | |||
it supports. If the two peers do support one or more common versions, | it supports. If the two peers do support one or more common versions, | |||
then this will allow them to rapidly determine the highest common | then this will allow them to rapidly determine the highest common | |||
version. In order to support BGP version negotiation, future versions | version. In order to support BGP version negotiation, future versions | |||
of BGP must retain the format of the OPEN and NOTIFICATION messages. | of BGP must retain the format of the OPEN and NOTIFICATION messages. | |||
8. BGP Finite State machine. | 8. BGP Finite State machine | |||
This section specifies BGP operation in terms of a Finite State | ||||
Machine (FSM). Following is a brief summary and overview of BGP | ||||
operations by state as determined by this FSM. | ||||
Initially BGP is in the Idle state. | ||||
Idle state: | ||||
A manual start event is a start event initiated by an operator. | ||||
An automatic start event is a start event generated by the | ||||
system. | ||||
In this state BGP refuses all incoming BGP connections. No | ||||
resources are allocated to the peer. In response to a Start | ||||
event (manual or automatic), the local system: | ||||
- initializes all BGP resources, | ||||
- starts the ConnectRetry timer, | ||||
- initiates a transport connection to the other BGP peer, | ||||
- listens for a connection that may be initiated by the | ||||
remote BGP peer, and | ||||
- changes its state to connect. | This section specifies the BGP operation in terms of a Finite State | |||
Machine (FSM). The section falls into 2 parts: | ||||
The exact value of the ConnectRetry timer is a local matter, | 1) Description of Events for the State machine (section 8.1) | |||
but it should be sufficiently large to allow TCP | 2) Description of the FSM (section 8.2) | |||
initialization. | ||||
Any other event received in the IDLE state, is ignored. | Session Attributes required for each connection are; | |||
IdleHold state: | 1) State | |||
2) Connect Retry timer | ||||
3) Hold timer | ||||
4) Hold time | ||||
5) Keepalive timer | ||||
The IdleHold state keeps the system in "Idle" mode until a | 8.1 Events for the BGP FSM | |||
certain time period has passed or an operator intervenes to | ||||
manually restart the connection. This "IdleHold timeout" | ||||
prevents persistent flapping of a BGP peering session. | ||||
Upon entering the Idle Hold state, if the IdleHoldTimer exceeds | 8.1.1 Administrative Events | |||
the local limit the "Keep Idle" flag is set. | ||||
Upon receiving a Manual start, the local system: | Please note that only Event 1 (manual start) and Event 2 (manual | |||
stop) are mandatory administrative events. All other administrative | ||||
events are optional. | ||||
- clears the IdleHoldtimer, | Event1: Manual start | |||
- clears "keep Idle" flag | Definition: Administrator manually starts peer | |||
connection. | ||||
Status: Mandatory | ||||
- initializes all BGP resources, | Event2: Manual stop | |||
- starts the ConnectRetry timer, | Definition: Local system administrator manually | |||
stops the peer connection. | ||||
- initiates a transport connection to the other BGP peer, | Status: Mandatory | |||
- listens for a connection that may be initiated by the | Event3: Automatic start | |||
remote BGPPeer, and | ||||
- changes its state to connect. | Definition: Local system automatically starts the | |||
BGP connection. | ||||
Upon receiving a IdleHoldtimer expired event, the local system | Status: Optional depending on local system | |||
checks to see that the Keep Idle flag is set. If the Keep Idle | ||||
flag is set, the system stays in the "Idle Hold" state. | ||||
If the Keep Idle flag is not set, the local system: | Event4: Manual start with passive TCP establishment | |||
- clears the IdleHoldtimer, | Definition: Administrator manually start the peer | |||
connection, but has the passive flag | ||||
enabled. The passive flag indicates | ||||
that the peer will listen prior to | ||||
establishing the connection. | ||||
- and transitions the state to Idle. | Status: Optional depending on local system | |||
Getting out of the IdleHoldstate requires either operator | Event5: Automatic start with passive TCP establishment | |||
intervention via a manual start or the IdleHoldtimer to expire | ||||
with the "Keep Idle" flag to be clear. | ||||
Any other event received in the IdleHold state is ignored. | Definition: Local system automatically starts the | |||
BGP connection with the passive flag | ||||
enabled. The passive flag indicates | ||||
that the peer will listen prior to | ||||
establishing a connection. | ||||
Connect State: | Status: Optional depending on local system use | |||
of a passive connection. | ||||
In this state, BGP is waiting for the transport protocol | Event6: Automatic start with bgp_stop_flap option set | |||
connection to be completed. | ||||
If the transport connection succeeds, the local system: | Definition: Local system automatically starts the | |||
BGP peer connection with persistent peer | ||||
oscillation damping enabled. The exact | ||||
method of damping persistent peer | ||||
oscillations is left up to the | ||||
implementation. These methods of | ||||
damping persistent BGP adjacency | ||||
flapping are outside the scope of this | ||||
document. | ||||
- clears the ConnectRetry timer, | Status: Optional, used only if the bgp peer has | |||
Enabled a method of damping persistent | ||||
BGP peer flapping. | ||||
- completes initialization, | Event7: Auto stop | |||
- send an Open message to its peer, | Definition: Local system automatically stops the | |||
BGP connection. | ||||
- set Hold timer to a large value, and | Status: Optional depending on local system | |||
- changes its state to Open Sent. | 8.1.2 Timer Events | |||
A hold timer value of 4 minutes is suggested. | Event8: Idle hold timer expires | |||
If the transport protocol connection fails (e.g., | Definition: Idle Hold timer expires. The Idle | |||
retransmission timeout), the local system: | Hold Timer is only used when persistent | |||
BGP oscillation damping functions are | ||||
enabled. | ||||
- restarts the ConnectRetry timer, | Status: Optional. Used when persistent | |||
BGP peer oscillation damping functions | ||||
are enabled. | ||||
- continues to listen for a connection that may be initiated | Event9: Connect retry timer expires | |||
by the remote BGP peer, and | ||||
- changes its state to Active. | Definition: An event triggered by the expiration of | |||
the ConnectRetry timer. | ||||
In response to the ConnectRetry timer expired event, the local | Status: Mandatory | |||
system: | ||||
- restarts the ConnectRetry timer, | Event10: Hold time expires | |||
- initiates a transport connection to the other BGP peer, | Definition: An event generated when the HoldTimer | |||
expires. | ||||
- continues to listen for a connection that may be initiated | Status: Mandatory | |||
by the remote BGP peer, and | ||||
- stays in Connect state. | Event11: Keepalive timer expires | |||
The start event (manual or automatic) is ignored in the Connect | Definition: A periodic event generated due to the | |||
state. | expiration of the KeepAlive Timer. | |||
In response to any other event (initiated by the system or | Status: Mandatory | |||
operator), the local system: | ||||
- IdleHoldtimer = 2**(ConnectRetryCnt)*60 | Event12: DelayBGP open timer expires | |||
- Increment ConnectRetryCnt by 1, | Definition: A timer that delays sending of the BGP | |||
Open message for n seconds after the | ||||
TCP connection has been completed. | ||||
- Set connect retry timer to zero, | Status: Optional | |||
- Drops TCP connection, | ||||
- Releases all BGP resources, and | 8.1.3 TCP Connection based Events | |||
- Goes to IdleHoldstate | Event13: TCP connection indication & valid remote peer | |||
Active State: | Definition: Event indicating that TCP connection | |||
request with a valid source IP address and TCP | ||||
port, and valid destination IP address | ||||
and TCP Port. The definition of | ||||
invalid source, and invalid destination | ||||
IP address is left to the implementation. | ||||
BGP's destination port should be port | ||||
179 as defined by IANA. | ||||
In this state BGP is trying to acquire a peer by listening for | TCP connection request is denoted by | |||
and accepting a transport protocol connection. | the local system receiving a TCP SYN. | |||
If the transport connection succeeds, the local system: | Status: Mandatory | |||
Event14: RCV TCP connection indication with invalid source or | ||||
destination | ||||
- clears the ConnectRetry timer, | Definition: TCP connection request received with either | |||
an invalid source address or port | ||||
number or an invalid destination | ||||
address or port number. BGP destination | ||||
port number should be 179 as defined | ||||
by IANA. | ||||
- completes the initialization, | Again, a TCP connection request is is | |||
denoted by local system receiving a TCP | ||||
SYN with an invalid source port or | ||||
destination address or port number. | ||||
- sends the Open message to it's peer, | Status: Mandatory | |||
- sets its Hold timer to a large value, | Event15: TCP connection request sent received an ACK. | |||
- and changes its state to OpenSent. | Definition: Local system's request to establish a TCP | |||
connection to the remote side received | ||||
an ACK. | ||||
A Hold timer value of 4 minutes is suggested. | The local system's TCP session sent a TCP | |||
SYN, and received a TCP SYN, ACK pair of | ||||
messages, and Sent a TCP ACK. | ||||
In response the ConnectRetry timer expired event, the local | Status: Mandatory | |||
system: | ||||
- restarts the ConnectRetry timer, | Event16: TCP connection confirmed | |||
- initiates a transport connection to the other BGP peer, | Definition: The local system has received a confirmation that | |||
the TCP connection has been established by | ||||
the remote site. | ||||
- continues to listen for connection that may be initiated | The remote peer's TCP engine sent a TCP SYN. | |||
by remote BGP peer, | The local peer's TCP engine sent a SYN, ACK | |||
pair, and now has received a final ACK. | ||||
- and changes its state to Connect. | Status: Mandatory | |||
If the local system does not allow BGP connections with | Event17: TCP connection fails | |||
unconfigured peers, then the local system: | ||||
- rejects connections from IP addresses that are not | Definition: This BGP peer receives a TCP | |||
configured peers, | connection failure notice. | |||
- and remains in the Active state. | The remote BGP peer's TCP machine could have | |||
sent a FIN. The local peer would respond | ||||
with a FIN-ACK. Another alternative is that | ||||
the local peer indicated a timeout in the | ||||
TCP session and downed the connection. | ||||
The start events (initiated by the system or operator) are | Status: Mandatory | |||
ignored in the Active state. | ||||
In response to any other event (initiated by the system or | 8.1.4 BGP Messages based Events | |||
operator), the local system: | ||||
- IdleHoldtimer = 2**(ConnectRetryCnt)*60 | Event18: BGPOpen | |||
- Increment ConnectRetryCnt by 1, | Definition: An event indicating that a valid Open | |||
message has been received. | ||||
- Set connect retry timer to zero, and | Status: Mandatory | |||
- Drops TCP connection, | Event19: BGPOpen with BGP Delay Open Timer running | |||
- Releases all BGP resources, | Definition: An event indicating that a valid Open | |||
message has been successful | ||||
established for a peer that is | ||||
currently delaying the sending of an | ||||
BGP Open message. | ||||
- Goes to IdleHold state. | Status: Optional | |||
Open Sent: | Event20: BGPHeaderErr | |||
In this state BGP waits for an Open Message from its peer. | Definition: BGP message header is not valid. | |||
When an OPEN message is received, all fields are check for | ||||
correctness. If the BGP message header checking or OPEN | ||||
message check detects an error (see Section 6.2), or a | ||||
connection collision (see Section 6.8) the local system: | ||||
- sends a NOTIFICATION message | Status: Mandatory | |||
- IdleHoldtimer = 2**(ConnectRetryCnt)*60 | Event21: BGPOpenMsgErr | |||
- Increment ConnectRetryCnt by 1, | Definition: An BGP Open message has been received | |||
with errors. | ||||
- Set connect retry timer to zero, and | Status: Mandatory | |||
- Drops TCP connection, | Event22: Open collision dump | |||
- Releases all BGP resources, | Definition: An event generated administratively | |||
when a connection Collision has been | ||||
detected while processing an incoming | ||||
Open message. This connection has been | ||||
selected to disconnected. See section | ||||
6.8 for more information on collision | ||||
detection. | ||||
- Goes to IdleHold state. | Event 22 is an administrative could | |||
occur if FSM is implemented as two | ||||
linked state machines. | ||||
If there are no errors in the OPEN message, the local system: | Status: Optional | |||
- sends a KEEPALIVE message and | Event23: NotifMsgVerErr | |||
- sets a KeepAlive timer (via the text below) | Definition: An event is generated when a | |||
NOTIFICIATION message with "version | ||||
error" is received. | ||||
- set the Hold timer according to the negotiated value (see | Status: Mandatory | |||
section 4.2), | ||||
- set the state to Open Confirm. | Event24: NotifMsg | |||
If the negotiated Hold time value is zero, then the Hold Time | Definition: An event is generated when a | |||
timer and KeepAlive timers are not started. If the value of | NOTIFICATION messages is received and | |||
the Autonomous System field is the same as the local Autonomous | the error code is anything but | |||
System number, then the connection is an "internal" connection; | "version error". | |||
otherwise, it is an "external" connection. (This will impact | ||||
UPDATE processing as described below.) | ||||
If a disconnect NOTIFICATION is received from the underlying | Status: Mandatory | |||
transport protocol, the local system: | ||||
- closes the BGP connection, | Event25: KeepAliveMsg | |||
- restarts the Connect Retry timer, | Definition: An event is generated when a KEEPALIVE | |||
message is received. | ||||
- and continues to listen for a connection that may be | Status: Mandatory | |||
initiated by the remote BGP peer, and goes into Active | ||||
state. | ||||
If the Hold Timer expires, the local system: | Event26: UpdateMsg | |||
- send a NOTIFICATION message with error code Hold Timer | Definition: An event is generated when a valid | |||
Expired, | Update message is received. | |||
- IdleHoldtimer = 2**(ConnectRetryCnt)*60 | Status: Mandatory | |||
- Increment ConnectRetryCnt by 1, | Event27: UpdateMsgErr | |||
- Set connect retry timer to zero, and | Definition: An event is generated when an invalid | |||
Update message is received. | ||||
- Drops TCP connection, | Status: Mandatory | |||
- Releases all BGP resources, and | 8.2 Description of FSM | |||
- Goes to IdleHold state. | 8.2.1 FSM Definition | |||
The Start event (manual and automatic) is ignored in the | BGP must maintain a separate FSM for each configured peer, Each BGP | |||
OpenSent state. | peer paired in a potential connection unless configured to remain in | |||
the idle state, or configured to remain passive, will attempt to to | ||||
connect to the other. For the purpose of this discussion, the active | ||||
or connect side of the TCP connection (the side of a TCP connection | ||||
(the side sending the first TCP SYN packet) is called outgoing. The | ||||
passive or listening side (the sender of the first SYN ACK) is called | ||||
an incoming connection. [See section on the terms active and passive | ||||
below.] | ||||
If a NOTIFICATION message is received with a version error, the | A BGP implementation must connect to and listen on TCP port 179 for | |||
local system: | incoming connections in addition to trying to connect to peers. For | |||
each incoming connection, a state machine must be instantiated. | ||||
There exists a period in which the identity of the peer on the other | ||||
end of an incoming connection is known but the BGP identifier is not | ||||
known. During this time, both an incoming and an outgoing connection | ||||
for the same configured peering may exist. This is referred to as a | ||||
connection collision (see Section x.x, was 6.8). | ||||
- Closes the transport connection | A BGP implementation will have at most one FSM for each configured | |||
peering plus one FSM for each incoming TCP connection for which the | ||||
peer has not yet been identified. Each FSM corresponds to exactly one | ||||
TCP connection. | ||||
- Releases BGP resources, | There may be more than one connections between a pair of peers if the | |||
connections are configured to use a different pair of IP addresses. | ||||
This is referred to as multiple "configured peerings" to the same | ||||
peer. | ||||
- ConnectRetryCnt = 0, | 8.2.1.1 Terms "active" and "passive" | |||
- Connect retry timer = 0, and | The terms active and passive have been in our vocabulary for almost a | |||
- transition to Idle state. | decade and have proven useful. The words active and passive have | |||
slightly different meanings applied to a TCP connection or applied to | ||||
a peer. There is only one active side and one passive side to any | ||||
one TCP connection per the definition above and the state machine | ||||
below. When a BGP speaker is configured active it may end up on | ||||
either the active or passive side of the connection that eventually | ||||
gets established. Once the TCP connection is completed, it doesn't | ||||
matter which end was active and which end was passive and the only | ||||
difference is which side of the TCP connection has port number 179. | ||||
If any other NOTIFICATION is received, the local system: | 8.2.1.2 FSM and collision detection | |||
- IdleHoldtimer = 2**(ConnectRetryCnt)*60 | There is one FSM per BGP connection. Prior to determining what peer | |||
a connection is associated with there may be two connections for a | ||||
given peer. There should be no more than one connection per peer. | ||||
The collision detection identifies the case where there is more than | ||||
one connection per peer and provides guidance for which connection to | ||||
get rid of. When this occurs, the corresponding FSM for the connec- | ||||
tion that is closed should be disposed of | ||||
- Increment ConnectRetryCnt by 1, | 8.2.2 Finite State Machine | |||
- Set connect retry timer to zero, and | Idle state: | |||
- Drops TCP connection, | Initially BGP is in the Idle state. | |||
- Releases all BGP resources, | In this state BGP refuses all incoming BGP connections. No | |||
resources are allocated to the peer. In response to a | ||||
manual start event(Event1) or an automatic start | ||||
event(Event3), the local system | ||||
- initializes all BGP resources, | ||||
- sets ConnectRetryCnt (the connect retry counter) to zero | ||||
- starts the connect retry timer with initial value, | ||||
- initiates a TCP connection to the other BGP peer, | ||||
- listens for a connection that may be initiated by | ||||
the remote BGP peer, and | ||||
- changes its state to connect. | ||||
- Goes to IdleHold state. | An manual stop event (Event2) is ignored in the Idle state. | |||
In response to any other event, the local system: | In response to a manual start event with the passive TCP connection | |||
flag (Event 4) or automatic start with the passive TCP connection | ||||
flag (Event 5), the local system: | ||||
- initializes all BGP resources, | ||||
- sets ConnectRetryCnt (the connect retry counter) to zero, | ||||
- start the connect retry timer with initial value, | ||||
- listens for a connection that may be initiated by | ||||
the remote peer, and | ||||
- changes its state to Active. | ||||
- sends the NOTFICATION message with Error Code Finite State | The exact value of the ConnectRetry timer is a local | |||
Machine Error, | matter, but it should be sufficiently large to allow TCP | |||
initialization. | ||||
- IdleHoldtimer = 2**(ConnectRetryCnt)*60 | If a persistent BGP peer oscillation damping function is | |||
enabled, two additional events may occur within Idle state: | ||||
- Automatic start with bgp_stop_flap set [Event6], | ||||
- Idle Hold Timer expired [Event 8]. | ||||
- Increment ConnectRetryCnt by 1, | The method of preventing persistent BGP peer oscillation is | |||
outside the scope of this document. | ||||
- Set connect retry timer to zero, | Any other events [Events 9-27] received in the Idle state, | |||
are noted by the MIB processing as FSM Errors | ||||
and the local peer stays in the Idle State. | ||||
- Drops TCP connection, | Connect State: | |||
- Releases all BGP resources, and | In this state, BGP is waiting for the TCP connection to | |||
be completed. | ||||
- Goes to IdleHold state. | If the TCP connection succeeds [Event 15 or | |||
Event 16], the local system checks the "Delay Open | ||||
Flag". If the delay Open flag is set, the local system: | ||||
- clears the connect retry timer, | ||||
- set the BGP open delay timer to the initial | ||||
value. | ||||
Open Confirm State | If the Delay Open flag is not set, the local system: | |||
- clears the connect retry timer, | ||||
- completes BGP initialization | ||||
- send an Open message to its peer, | ||||
- sets hold timer to a large value, and | ||||
- Change the state to Open Sent. | ||||
In this state BGP waits for a KEEPALIVE or NOTIFICATION | A hold timer value of 4 minutes is suggested. | |||
message. | ||||
If the local system receives a KEEPALIVE message, it changes | If the Open Delay timer expires [Event 12] in the connect | |||
its state to Established. | state, | |||
- send an Open message to its peer, | ||||
- set the hold timer to a large value, and | ||||
- change the state to Open Sent. | ||||
If the Hold Timer expires before a KEEPALIVE message is | If the BGP port receives a TCP connection indication | |||
received, the local system: | [Event 13], the TCP connection is processed and | |||
the connection remains in the connected state. | ||||
- send the NOTIFICATION message with the error code Hold | If the TCP connection receives an indication | |||
Timer Expired, | that is invalid or unconfigured. [Event 14]: | |||
- the TCP connection is rejected. | ||||
- sets IdleHoldTimer = 2**(ConnectRetryCnt)*60 | If the TCP connection fails (timeout or disconnect) | |||
- Increments ConnectRetryCnt by 1, | [Event17], the local system: | |||
- restarts the connect retry timer, | ||||
- continues to listen for a connection that may be | ||||
initiated by the remote BGP peer, and | ||||
- changes its state to Active. | ||||
- Sets the connect retry timer to zero, | If an Open is received with the BGP Delay Open timer is | |||
running [Event 19], the local system: | ||||
- clears the connect retry timer (cleared to zero), | ||||
- completes the BGP initialization, | ||||
- Stops and clears the BGP Open Delay timer | ||||
- Sends an Open message | ||||
- Set the hold timer to a large value (4 minutes), and | ||||
- changes its state to Open Confirm. | ||||
- Drop the TCP connection, | The start events [Event 1, 3-6] are ignored in connect | |||
state. | ||||
- Releases all BGP resources, | A manual stop event[Event2], the local system: | |||
- drops the TCP connection, | ||||
- releases all BGP resources, | ||||
- sets ConnectRetryCnt (the connect retry count) to zero | ||||
- resets the connect retry timer (sets to zero), and | ||||
- goes to Idle state. | ||||
- Goes to IdleHoldState. | In response to the connect retry timer expired event(Event | |||
9), the local system: | ||||
- Sets the MIB FSM error information with connect retry | ||||
expired, | ||||
- drops the TCP connection | ||||
- restarts the connect retry timer | ||||
- initiates a TCP connection to the other BGP | ||||
peer, | ||||
- continues to listen for a connection that may be | ||||
initiated by the remote BGP peer, and | ||||
- stays in Connect state. | ||||
If the local system receives a NOTIFICATION message or receives | In response to any other events [Events 7-8, 10-11, 18, 20- | |||
a disconnect NOTIFICATION from the underlying transport | 27] the local system: | |||
protocol, the local system: | ||||
- Sets IdleHold Timer = 2**(ConnectRetryCnt)*60 | - resets the connect retry timer (sets to zero), | |||
- drops the TCP connection, | ||||
- release all BGP resources, | ||||
- increments the ConnectRetryCnt (connect retry count) by 1, | ||||
- [optionally] performs bgp peer oscillation damping, and | ||||
- goes to Idle state. | ||||
- Increments ConnectRetryCnt by 1, | Active State: | |||
- Sets the connect retry timer to zero, | In this state BGP is trying to acquire a peer by listening | |||
for and accepting a TCP connection. | ||||
- Drops the TCP connection, | A TCP connection succeeds [Event 15 or Event 16], the | |||
local system: process the TCP connection flags | ||||
- If the BGP delay open flag is set: | ||||
o clears the connect retry timer, | ||||
o completes the BGP initialization, and | ||||
o sets the BGP delay Open timer | ||||
- Releases all BGP resources, | - If the BGP delay open flag is not set: | |||
o clears the connect retry timer, | ||||
o completes the BGP initialization, | ||||
o sends the Open message to it's peer, | ||||
o sets its hold timer to a large value, | ||||
and changes its state to OpenSent. | ||||
- Goes to IdleHoldstate. | A Hold timer value of 4 minutes is suggested. | |||
In response to the Stop event initiated by the system, the | If the local system receives a valid TCP Indication | |||
local system: | [Event 13], the local system processes the TCP connection flags. | |||
- sends the NOTIFICATION message with Cease, | If the local system receives a TCP indication | |||
that is invalid for this connection [Event 14]: | ||||
- the TCP connection is rejected. | ||||
- sets IdleHoldtimer = 2**(ConnectRetryCnt)*60 | If the local system receives a TCP connection | |||
failed [Event 17] (timeout or receives connection | ||||
disconnect), the local system will: | ||||
- set TCP disconnect in the MIB reason code, | ||||
- restart connect retry timer (with initial value) | ||||
- release all BGP resources | ||||
- Acknowledge the drop of TCP connection if | ||||
TCP disconnect (send a FIN ACK), | ||||
- Increment ConnectRetryCnt (connect retry count) by 1, and | ||||
- perform the BGP peer oscillation damping process [2]. | ||||
- Increments ConnectRetryCnt by 1, | If the local system has the delay open timer expired [event | |||
12] local system: | ||||
- clears the connect retry timer (set to zero), | ||||
- stops and clears the delay open timer (set to zero) | ||||
- completes the BGP initialization, | ||||
- sends the Open message to it's remote peer, | ||||
- sets its hold timer to a large value, | ||||
- and set the state to Open Confirm. | ||||
- Sets the Connect retry timer to zero, | A hold timer value of 4 minutes is also suggested for this | |||
state transition. | ||||
- Drops the TCP connection, | If an Open is received with the BGP delay open timer is | |||
running [Event 19], the local system | ||||
- clears the connect retry timer (cleared to zero), | ||||
- stops and clears the BGP open delay timer | ||||
- completes the BGP initialization, | ||||
- stops and clears the BGP open delay timer | ||||
- sends an Open message | ||||
- set its hold timer to a large value (4 minutes), and | ||||
- changes its state to Open Confirm. | ||||
- Releases all BGP resources, | In response the ConnectRetry timer expired event[Event9], | |||
the local system: | ||||
- restarts the connect retry timer (with initial value), | ||||
- initiates a TCP connection to the other BGP | ||||
peer, | ||||
- Continues to listen for TCP connection that may be | ||||
initiated by remote BGP peer, | ||||
- and changes its state to Connect. | ||||
- Goes to IdleHoldstate. | The start events [Event1, 3-6] are ignored in the Active | |||
state. | ||||
In response to a Stop event initiated by the operator, the | A manual stop event[Event2], the local system: | |||
local system: | - Sets the administrative down in the MIB reason code, | |||
- Sends a Notification with a Cease, | ||||
- If any BGP routes exist, delete the routes | ||||
- release all BGP resources, | ||||
- drops the TCP connection, | ||||
- sets ConnectRetryCnt (connect retry count) to zero | ||||
- resets the connect retry timer (sets to zero), | ||||
- goes to Idle state. | ||||
- sends the NOTIFICATION message with Cease, | In response to any other event (Events 7-8, 10-11,18, 20- | |||
- releases all BGP resources | 27), the local system: | |||
- stores the MIB information to indicate appropriate | ||||
error [FSM for Events 7-8, 10-11, 18, 20-27] | ||||
- reset the connect retry timer (sets to zero), | ||||
- release all BGP resources, | ||||
- drops the TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) by one, | ||||
- optionally performs BGP peer oscillation damping, | ||||
- and goes to the idle state | ||||
- sets the ConnectRetryCnt to zero | Open Sent: | |||
- sets the connect retry timer to 0 | In this state BGP waits for an Open Message from its peer. | |||
- transitions to Idle state. | When an OPEN message is received, all fields are checked | |||
for correctness. If there are no errors in the OPEN message | ||||
[Event 18] the local system: | ||||
- resets the BGP Delay timer to zero, | ||||
- reset BGP Connect Timer to zero, | ||||
- sends a KEEPALIVE message and | ||||
- sets a KeepAlive timer (via the text below) | ||||
- sets the Hold timer according to the negotiated value | ||||
(see section 4.2), and | ||||
- sets the state to Open Confirm. | ||||
The Start event is ignored in the OpenConfirm state. | If the negotiated Hold time value is zero, then the Hold | |||
and KeepAlive timers are not started. If the | ||||
value of the Autonomous System field is the same as the | ||||
local Autonomous System number, then the connection is an | ||||
"internal" connection; otherwise, it is an "external" | ||||
connection. (This will impact UPDATE processing as | ||||
described below.) | ||||
In response to any other event, the local system: | If the BGP message header checking [Event20] or OPEN message | |||
check detects an error (see Section 6.2)[Event21], the local system: | ||||
- sends a NOTIFICATION message with appropriate error | ||||
code, | ||||
- reset the connect retry timer (sets to zero), | ||||
- if there are any routes associated with the BGP session, | ||||
delete these routes | ||||
- release all BGP resources, | ||||
- drop the TCP connection | ||||
- increments the ConnectRetryCnt (connect retry cout) by 1, | ||||
- bgp peer oscillation damping process, | ||||
- and goes to the Idle state. | ||||
- sends a NOTIFICATION with a code of Finite State Machine | Collision detection mechanisms (section 6.8) need to be | |||
Error, | applied when a valid BGP Open is received [Event 18 or | |||
Event 19]. Please refer to section 6.8 for the details of | ||||
the comparison. An administrative collision detect is when | ||||
BGP implementation determines my means outside the scope of | ||||
this document that a connection collision has occurred. | ||||
- sets IdleHoldtimer = 2**(ConnectRetryCnt)*60 | If a connection in Open Sent is determined to be the | |||
connection that must be closed, an administrative collision | ||||
detect [Event 22] is signaled to the state machine. If such | ||||
an administrative collision detect dump [Event 22] is | ||||
received in Open Sent, the local system: | ||||
- sets MIB state information to | ||||
collision detect closure, | ||||
- send a NOTIFICATION with a CEASE | ||||
- resets the connect retry timer, | ||||
- release all BGP resources, | ||||
- drop the TCP connection, | ||||
- increments ConnectRetryCnt (connect rery count) by 1, | ||||
- performs any BGP peer oscillation damp process, and | ||||
- enters Idle state. | ||||
- Increments ConnectRetryCnt by 1, | If a NOTIFICATION message is received with a version | |||
error[Event23], Notification message without version number | ||||
[Event 24], the local system: | ||||
- resets the connect retry timer (sets to zero) | ||||
- drops the TCP connection, | ||||
- releases all BGP resources, | ||||
- increments the ConnectRetryCnt (connect retry count) by 1 | ||||
- process any BGP peer oscillation damping, | ||||
- and sets the state to Idle. | ||||
- Sets the Connect retry timer to zero, | The Start events [Event1, 3-6] are ignored in the OpenSent | |||
state. | ||||
If a manual stop event [Event 2] is issued in Open sent | ||||
state, the local system: | ||||
- Sets administrative down reason in MIB reason, | ||||
- sends the Notification with a cease, | ||||
- if BGP routes exists, delete the routes, | ||||
- Release all BGP resources, | ||||
- Drops the TCP connection, | - Drops the TCP connection, | |||
- set ConnectRetryCnt (connect retry count) to zero, | ||||
- resets the Connect Retry timer (set to zero), and | ||||
- transitions to the Idle state. | ||||
- Releases all BGP resources, | If an automatic stop event [Event 7] is issued in Open sent | |||
state, the local system: | ||||
- Goes to IdleHoldstate. | - Sets administrative down reason in MIB reason, | |||
- sends the Notification with a cease, | ||||
- if any routes are associated with te BGP session, | ||||
delete the routes, | ||||
- release all the BGP resources | ||||
- Drops the TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) by 1, | ||||
- BGP peer oscillation process [2], and | ||||
- transitions to the Idle state. | ||||
Established State: | If the Hold Timer expires[Event 10], the local system: | |||
- set Hold timer expired in MIB Error reason code, | ||||
- send a NOTIFICATION message with error code Hold | ||||
Timer Expired, | ||||
- reset the connect retry timer (sets to zero), | ||||
- releases all BGP resources, | ||||
- drops the TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) by 1, | ||||
and transitions to the Idle state. | ||||
In the Established state BGP can exchange UPDATE, NOTFICATION, | If a TCP indication is received for valid connection | |||
and KEEPALIVE messages with its peer. | [Event 13] or TCP request aknowledgement [Event 15] | |||
is received, or a TCP connect confirm [Event 16] is | ||||
received a second TCP session may be in progress. This | ||||
second TCP session is tracked per the Call Collision | ||||
processing (section 6.8) until an OPEN message is received. | ||||
If the local system receives an UPDATE or KEEPALIVE message, it | A TCP connection for an invalid port [Event 14] is ignored. | |||
restarts its Hold Timer, if the negotiated Hold Time value is | ||||
non-zero. | ||||
If the local system receives a NOTIFICATION message or a | If a TCP connection failure [Event17], is received | |||
disconnect from the underlying transport protocol, it: | the local system: | |||
- closes the BGP connection, | ||||
- restarts the Connect Retry timer, | ||||
- and continues to listen for a connection that may be | ||||
initiated by the remote BGP peer, | ||||
- and goes into Active state. | ||||
- sets IdleHoldtimer = 2**(ConnectRetryCnt)*60, | In response to any other event [Events 8-9, 11-12, 19, 25-27], | |||
the local system: | ||||
- sends the NOTIFICATION with the Error Code Finite | ||||
state machine error, | ||||
- resets the connect retry timer (sets to zero), | ||||
- releases all BGP resources | ||||
- drops the TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) by 1, | ||||
- process any bgp peer oscillation damping[2], | ||||
- and sets the state to idle. | ||||
- Increments ConnectRetryCnt by 1, | Open Confirm State: | |||
- Sets the Connect retry timer to zero, | In this state BGP waits for a KEEPALIVE or NOTIFICATION | |||
message. | ||||
- Drops the TCP connection, | If the local system receives a KEEPALIVE message[Event 25], | |||
- restarts the Hold timer, and | ||||
- changes its state to Established. | ||||
- Releases all BGP resources, and | If the local system receives a NOTIFICATION message [Event | |||
- Goes to IdleHoldstate. | 23-24] or receives a TCP Disconnect [Event 17] from the | |||
underlying TCP , the local system: | ||||
- sets the appropriate MIB information for FSM error, | ||||
- resets the connect retry timer (sets the timer to | ||||
zero), | ||||
- releases all BGP resources, | ||||
- drops the TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) by 1, | ||||
- and sets the state to idle. | ||||
If the local system receives an UPDATE message, and the Update | Any start event [Event1, 3-6] is ignored in the OpenConfirm | |||
message error handling procedure (see Section 6.3) detecs an | state. | |||
error, the local system: | ||||
- sends a NOTIFICATION message with Update error, | In response to a manual stop event[Event 2] initiated by | |||
the operator, the local system: | ||||
- set Administrative down in MIB Reason code, | ||||
- sends the NOTIFICATION message with Cease, | ||||
- if any BGP routes, dete the routes | ||||
- releases all BGP resources, | ||||
- drop the TCP connection, | ||||
- sets the ConnectRetryCnt (connect retry count) to zero | ||||
- sets the connect retry timer to zero, and | ||||
- transitions to Idle state. | ||||
- sets IdleHoldtimer = 2**(ConnectRetryCnt)*60 | In response to the Automatic stop event initiated by the | |||
system[Event 7], the local system: | ||||
- sets the MIB entry for this peer to administratively | ||||
down, | ||||
- sends the NOTIFICATION message with Cease, | ||||
- connect retry timer reset (set to zero) | ||||
- If any BGP routes exist, delete the routes, | ||||
- release all BGP resources, | ||||
- drops the TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) | ||||
by 1, and | ||||
- transitions to the Idle State. | ||||
- Increments ConnectRetryCnt by 1, | If the Hold Timer expires before a KEEPALIVE message is | |||
received [Event 10], the local system: | ||||
- set the MIB reason to Hold time expired, | ||||
- send the NOTIFICATION message with the error code | ||||
set to Hold Time Expired, | ||||
- resets the connect retry timer (sets the timer to to | ||||
zero), | ||||
- releases all BGP resources, | ||||
- drops the TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) by 1, | ||||
- and sets the state to Idle. | ||||
- Sets the Connect retry timer to zero, | If the local system receives a KEEPALIVE timer expires | |||
event [Event 11], the system: | ||||
- sends a KEEPALIVE message, | ||||
- restarts the Keepalive timer, and | ||||
- remains in Open Confirmed state. | ||||
- Drops the TCP connection, | In the event of TCP establishment [Event 13], or TCP | |||
connection succeeding [Event 15 or Event 16] while in Open | ||||
Confirm, the local system needs to track the 2nd | ||||
connection. | ||||
- Releases all BGP resources, and | If a TCP connection is attempted to an invalid port [Event | |||
14], the local system will ignore the second connection | ||||
attempt. | ||||
- Goes to IdleHoldstate. | If an OPEN message is received, all fields are check for | |||
correctness. If the BGP message header checking [Event20] | ||||
or OPEN message check detects an error (see Section | ||||
6.2)[Event21], the local system: | ||||
- sends a NOTIFICATION message with appropriate error | ||||
code, | ||||
- resets the connect retry timer (sets the timer to | ||||
zero), | ||||
- releases all BGP resources, | ||||
- drops the TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) by 1, | ||||
- runs the BGP peer oscillation damping process [2] | ||||
- and goes to the Idle state. | ||||
If the Hold timer expires, the local system: | If the Open messages is valid [Event 18], the collision | |||
detect function is processed per section 6.8. If this | ||||
connection is to be dropped due to call collision, the | ||||
local system: | ||||
- sets the Call Collision cease in the MIB reason | ||||
code, | ||||
- sends a Notification with a Cease | ||||
- resets the Connect timer (set to zero), | ||||
- releases all BGP resources, | ||||
- Drops the TCP connection (send TCP FIN), | ||||
- increments the ConnectRetryCnt by 1 (connect retry count), and | ||||
- performs any BGP peer oscillation damping process [2]. | ||||
- sends a NOTIFICATION message with Error Code Hold Timer | If during the processing of another Open message, the BGP | |||
Expired, | implementation determines my means outside the scope of | |||
this document that a connection collision has occurred and | ||||
this connection is to be closed, the local system will | ||||
issue a call collision dump [Event 22]. When the local | ||||
system receives a call collision dump event [Event 22], the | ||||
local system: | ||||
- Sets the MIB FSM variable to indicate collision | ||||
detected and dump connection. | ||||
- send a NOTIFICATION with a CEASE | ||||
- deletes all routes associated with connection, | ||||
- resets the connect retry timer, | ||||
- releases all BGP resources | ||||
- drops all TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) by 1, | ||||
- and performs any BGP peer oscillation damping, and | ||||
- enters Idle state. | ||||
- sets IdleHoldtimer = 2**(ConnectRetryCnt)*60 | In response to any other event [Events 8-9, 12, 19, 26-27], | |||
the local system: | ||||
- sends a NOTIFICATION with a code of Finite State | ||||
Machine Error, | ||||
- resets the connect retry timer (sets to zero) | ||||
- drops the TCP connection, | ||||
- releases all BGP resources, | ||||
- increments the ConnectRetryCnt (connect retrycount) by 1, | ||||
- performs any BGP peer oscillation damping, and | ||||
- transitions to Idle state. | ||||
- Increments ConnectRetryCnt by 1, | Established State: | |||
- Sets the connect retry timer to zero, | In the Established state BGP can exchange UPDATE, | |||
NOTFICATION, and KEEPALIVE messages with its peer. | ||||
- Drops the TCP connection, | If the local system receives an UPDATE message [Event26], | |||
the local system will: | ||||
- process the update packet | ||||
- restarts its Hold timer, if the negotiated Hold Time | ||||
value is non-zero, and | ||||
- remain in the Established state. | ||||
- Releases all BGP resources, | If the local system receives a NOTIFICATION message | |||
[Event23 or Event24] or a disconnect [Event17] from the | ||||
underlying TCP, it: | ||||
- sets the appropriate error code in MIB reason code, | ||||
- if any BGP routes exist, delete all BGP routes, | ||||
- resets the connect retry timer (sets to zero), | ||||
- releases all the BGP resources, | ||||
- drops the TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) | ||||
by 1, and | ||||
- goes to the Idle state. | ||||
- Goes to IdleHold state. | If the local system receives a Keepalive message | |||
[Event 25], the local system will: | ||||
- restarts its Hold Timer, if the negotiated Hold Time | ||||
value is non-zero, and | ||||
- remain in the Established state. | ||||
If the KeepAlive timer expires, the local system sends a | If the local system receives an UPDATE message, and the | |||
KEEPALIVE message, it restarts its KeepAlive timer, unless the | Update message error handling procedure (see Section 6.3) | |||
negotiated Hold Time value is zero. | detects an error [Event27], the local system: | |||
- sends a NOTIFICATION message with Update error, | ||||
- resets the connect retry timer (sets to zero), | ||||
- drops the TCP connection, | ||||
- releases all BGP resources, | ||||
- increments the ConnectRetryCnt (connect retry count) | ||||
by 1, | ||||
- performs any BGP peer oscillation damping, | ||||
- and goes to Idle state. | ||||
Each time time the local system sends a KEEPALIVE or UPDATE | Any start event (Event 1, 3-6) is ignored in the | |||
message, it restarts its KeepAlive timer, unless the negotiated | Established state. | |||
Hold Time value is zero. | ||||
In response to the Stop event initiated by the system | In response to a manual stop event (initiated by an | |||
(automatic), the local system: | operator)[Event2], the local sytem: | |||
- sets the Administrative stop in MIB reason code, | ||||
- sends the NOTIFICATION message with Cease, | ||||
- if BGP routes exist, delete the BGP routes, | ||||
- release BGP resources, | ||||
- drops TCP connection, | ||||
- sets ConnectRetryCnt (connect retry count) | ||||
to zero (0), | ||||
- resets connect retry timer to zero (0), and | ||||
- transitions to the Idle. | ||||
In response to an automatic stop event initiated by the | ||||
system (automatic) [Event7], the local system: | ||||
- sets Administrative Stop in MIB Reason code, | ||||
- sends a NOTIFICATION with Cease, | - sends a NOTIFICATION with Cease, | |||
- resets the connect retry timer (sets to zero) | ||||
- sets IdleHoldtimer = 2**(ConnectRetryCnt)*60 | - deletes all routes associated with bgp connection, | |||
- increments ConnectRetryCnt by 1, | ||||
- sets the connect retry timer to zero, | ||||
- drops the TCP connection, | ||||
- releases all BGP resources, | - releases all BGP resources, | |||
- drops the TCP connection, | ||||
- goes to IdleHold state, and | - increments the ConnectRetryCnt (connect retry count) | |||
by 1, | ||||
- deletes all routes. | - performs any BGP peer oscillation damping, and | |||
- transitions to the idle state. | ||||
An example automatic stop event is exceeding the number of | An example automatic stop event is exceeding the number of | |||
prefixes for a given peer and the local system automatically | prefixes for a given peer and the local system | |||
disconnecting the peer. | automatically disconnecting the peer. | |||
In response to a stop event initiated by an operator: | ||||
- release all resources (including deleting all routes), | ||||
- set ConnectRetryCnt to zero (0), | ||||
- set connect retry timer to zero (0), and | ||||
- transition to the Idle. | ||||
The Start event is ignored in the Established state. | If the Hold timer expires [Event10], the local system: | |||
- sends a NOTIFICATION message with Error Code Hold | ||||
Timer Expired, | ||||
- resets the connect retry timer (sets to zero), | ||||
- releases all BGP resources, | ||||
- drops the TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) | ||||
by 1, | ||||
- performs any BGP peer oscillation damping, | ||||
- and goes to Idle state. | ||||
In response to any other event, the local system: | If the KeepAlive timer expires [Event11], the local system | |||
sends a KEEPALIVE message, it restarts its KeepAlive timer, | ||||
unless the negotiated Hold Time value is zero. | ||||
- sends a NOTIFICATION message with Error Code Finite State | Each time time the local system sends a KEEPALIVE or UPDATE | |||
Machine Error, | message, it restarts its KeepAlive timer, unless the | |||
negotiated Hold Time value is zero. | ||||
- sets IdleHoldtimer = 2**(ConnectRetryCnt)*60 | A TCP connection indication [Event 13] received | |||
for a valid port will cause the 2nd connection to be | ||||
tracked. A TCP connection indications for | ||||
invalid port [Event 14], will be ignored. | ||||
- increments ConnectRetryCnt by 1, | In response to a TCP connection succeeds [Event 15 | |||
or Event 16], the 2nd connection shall be tracked until | ||||
it sends an OPEN message. | ||||
- sets the connect retry timer to zero, | If a valid Open message [Event 18] is received, it will be | |||
checked to see if it collides (section 6.8) with any other | ||||
session. If the BGP implementation determines that this | ||||
connection needs to be terminated, it will process an Call | ||||
Collision dump event[Event 22]. If this session needs to be | ||||
terminated, the connection will be terminated by: | ||||
- send a NOTIFICATION with a CEASE | ||||
- deletes all routes associated with connection, | ||||
- resets the connect retry timer, | ||||
- if any BGP routes, delete the routes, | ||||
- release all BGP resources, | ||||
- drops the TCP connection, | - drops the TCP connection, | |||
- increments ConnectRetryCnt (connect retry count) | ||||
by 1, | ||||
- and performs any BGP peer oscillation damping, | ||||
- and enters the Idle state | ||||
- releases all BGP resources | In response to any other event [Events 8-9,12, 19-21] the | |||
local system: | ||||
- goes to IdleHoldstate, and | - sends a NOTIFICATION message with Error Code Finite | |||
- deletes all routes. | State Machine Error, | |||
- deletes all routes associated with BGP connection, | ||||
- resets the connect retry timer (sets to zero) | ||||
- releases all BGP resources, | ||||
- drops the TCP connection, | ||||
- increments the ConnectRetryCnt (connect retry count) | ||||
by 1, | ||||
- performs any BGP peer oscillation damping, and | ||||
- transitions to Idle. | ||||
9. UPDATE Message Handling | 9. UPDATE Message Handling | |||
An UPDATE message may be received only in the Established state. | An UPDATE message may be received only in the Established state. | |||
When an UPDATE message is received, each field is checked for | When an UPDATE message is received, each field is checked for valid- | |||
validity as specified in Section 6.3. | ity as specified in Section 6.3. | |||
If an optional non-transitive attribute is unrecognized, it is | If an optional non-transitive attribute is unrecognized, it is qui- | |||
quietly ignored. If an optional transitive attribute is unrecognized, | etly ignored. If an optional transitive attribute is unrecognized, | |||
the Partial bit (the third high-order bit) in the attribute flags | the Partial bit (the third high-order bit) in the attribute flags | |||
octet is set to 1, and the attribute is retained for propagation to | octet is set to 1, and the attribute is retained for propagation to | |||
other BGP speakers. | other BGP speakers. | |||
If an optional attribute is recognized, and has a valid value, then, | If an optional attribute is recognized, and has a valid value, then, | |||
depending on the type of the optional attribute, it is processed | depending on the type of the optional attribute, it is processed | |||
locally, retained, and updated, if necessary, for possible | locally, retained, and updated, if necessary, for possible propaga- | |||
propagation to other BGP speakers. | tion to other BGP speakers. | |||
The information carried by the AS_PATH attribute is checked for AS | ||||
loops. AS loop detection is done by scanning the full AS path (as | ||||
specified in the AS_PATH attribute), and checking that the autonomous | ||||
system number of the local system does not appear in the AS path. If | ||||
the autonomous system number appears in the AS path the route may be | ||||
stored in the Adj-RIB-In, but unless the router is configured to | ||||
accept routes with its own autonomous system in the AS path, the | ||||
route shall not be passed to the BGP Decision Process. Operations of | ||||
a router that is configured to accept routes with its own autonomous | ||||
system number in the AS path are outside the scope of this document. | ||||
If the UPDATE message contains a non-empty WITHDRAWN ROUTES field, | If the UPDATE message contains a non-empty WITHDRAWN ROUTES field, | |||
the previously advertised routes whose destinations (expressed as IP | the previously advertised routes whose destinations (expressed as IP | |||
prefixes) contained in this field shall be removed from the Adj-RIB- | prefixes) contained in this field shall be removed from the Adj-RIB- | |||
In. This BGP speaker shall run its Decision Process since the | In. This BGP speaker shall run its Decision Process since the previ- | |||
previously advertised route is no longer available for use. | ously advertised route is no longer available for use. | |||
If the UPDATE message contains a feasible route, the Adj-RIB-In will | If the UPDATE message contains a feasible route, the Adj-RIB-In will | |||
be updated with this route as follows: if the NLRI of the new route | be updated with this route as follows: if the NLRI of the new route | |||
is identical to the one of the route currently stored in the Adj-RIB- | is identical to the one of the route currently stored in the Adj-RIB- | |||
In, then the new route shall replace the older route in the Adj-RIB- | In, then the new route shall replace the older route in the Adj-RIB- | |||
In, thus implicitly withdrawing the older route from service. | In, thus implicitly withdrawing the older route from service. Other- | |||
Otherwise, if the Adj-RIB-In has no route with NLRI identical to the | wise, if the Adj-RIB-In has no route with NLRI identical to the new | |||
new route, the new route shall be placed in the Adj-RIB-In. | route, the new route shall be placed in the Adj-RIB-In. | |||
Once the BGP speaker updates the Adj-RIB-In, the speaker shall run | Once the BGP speaker updates the Adj-RIB-In, the speaker shall run | |||
its Decision Process. | its Decision Process. | |||
9.1 Decision Process | 9.1 Decision Process | |||
The Decision Process selects routes for subsequent advertisement by | The Decision Process selects routes for subsequent advertisement by | |||
applying the policies in the local Policy Information Base (PIB) to | applying the policies in the local Policy Information Base (PIB) to | |||
the routes stored in its Adj-RIBs-In. The output of the Decision | the routes stored in its Adj-RIBs-In. The output of the Decision Pro- | |||
Process is the set of routes that will be advertised to all peers; | cess is the set of routes that will be advertised to all peers; the | |||
the selected routes will be stored in the local speaker's Adj-RIB- | selected routes will be stored in the local speaker's Adj-RIB-Out. | |||
Out. | ||||
The selection process is formalized by defining a function that takes | The selection process is formalized by defining a function that takes | |||
the attribute of a given route as an argument and returns either (a) | the attribute of a given route as an argument and returns either (a) | |||
a non-negative integer denoting the degree of preference for the | a non-negative integer denoting the degree of preference for the | |||
route, or (b) a value denoting that this route is ineligible to be | route, or (b) a value denoting that this route is ineligible to be | |||
installed in LocRib and will be excluded from the next phase of route | installed in LocRib and will be excluded from the next phase of route | |||
selection. | selection. | |||
The function that calculates the degree of preference for a given | The function that calculates the degree of preference for a given | |||
route shall not use as its inputs any of the following: the existence | route shall not use as its inputs any of the following: the existence | |||
of other routes, the non-existence of other routes, or the path | of other routes, the non-existence of other routes, or the path | |||
attributes of other routes. Route selection then consists of | attributes of other routes. Route selection then consists of individ- | |||
individual application of the degree of preference function to each | ual application of the degree of preference function to each feasible | |||
feasible route, followed by the choice of the one with the highest | route, followed by the choice of the one with the highest degree of | |||
degree of preference. | preference. | |||
The Decision Process operates on routes contained in the Adj-RIB-In, | The Decision Process operates on routes contained in the Adj-RIB-In, | |||
and is responsible for: | and is responsible for: | |||
- selection of routes to be used locally by the speaker | - selection of routes to be used locally by the speaker | |||
- selection of routes to be advertised to other BGP peers | - selection of routes to be advertised to other BGP peers | |||
- route aggregation and route information reduction | - route aggregation and route information reduction | |||
The Decision Process takes place in three distinct phases, each | The Decision Process takes place in three distinct phases, each trig- | |||
triggered by a different event: | gered by a different event: | |||
a) Phase 1 is responsible for calculating the degree of preference | a) Phase 1 is responsible for calculating the degree of preference | |||
for each route received from a peer. | for each route received from a peer. | |||
b) Phase 2 is invoked on completion of phase 1. It is responsible | b) Phase 2 is invoked on completion of phase 1. It is responsible | |||
for choosing the best route out of all those available for each | for choosing the best route out of all those available for each | |||
distinct destination, and for installing each chosen route into | distinct destination, and for installing each chosen route into | |||
the Loc-RIB. | the Loc-RIB. | |||
c) Phase 3 is invoked after the Loc-RIB has been modified. It is | c) Phase 3 is invoked after the Loc-RIB has been modified. It is | |||
skipping to change at page 46, line 8 | skipping to change at page 60, line 5 | |||
9.1.1 Phase 1: Calculation of Degree of Preference | 9.1.1 Phase 1: Calculation of Degree of Preference | |||
The Phase 1 decision function shall be invoked whenever the local BGP | The Phase 1 decision function shall be invoked whenever the local BGP | |||
speaker receives from a peer an UPDATE message that advertises a new | speaker receives from a peer an UPDATE message that advertises a new | |||
route, a replacement route, or withdrawn routes. | route, a replacement route, or withdrawn routes. | |||
The Phase 1 decision function is a separate process which completes | The Phase 1 decision function is a separate process which completes | |||
when it has no further work to do. | when it has no further work to do. | |||
The Phase 1 decision function shall lock an Adj-RIB-In prior to | The Phase 1 decision function shall lock an Adj-RIB-In prior to oper- | |||
operating on any route contained within it, and shall unlock it after | ating on any route contained within it, and shall unlock it after | |||
operating on all new or unfeasible routes contained within it. | operating on all new or unfeasible routes contained within it. | |||
For each newly received or replacement feasible route, the local BGP | For each newly received or replacement feasible route, the local BGP | |||
speaker shall determine a degree of preference as follows: | speaker shall determine a degree of preference as follows: | |||
If the route is learned from an internal peer, either the value of | If the route is learned from an internal peer, either the value of | |||
the LOCAL_PREF attribute shall be taken as the degree of | the LOCAL_PREF attribute shall be taken as the degree of prefer- | |||
preference, or the local system may compute the degree of | ence, or the local system may compute the degree of preference of | |||
preference of the route based on preconfigured policy information. | the route based on preconfigured policy information. Note that the | |||
Note that the latter (computing the degree of preference based on | latter (computing the degree of preference based on preconfigured | |||
preconfigured policy information) may result in formation of | policy information) may result in formation of persistent routing | |||
persistent routing loops. | loops. | |||
If the route is learned from an external peer, then the local BGP | If the route is learned from an external peer, then the local BGP | |||
speaker computes the degree of preference based on preconfigured | speaker computes the degree of preference based on preconfigured | |||
policy information. If the return value indicates that the route | policy information. If the return value indicates that the route | |||
is ineligible, the route may not serve as an input to the next | is ineligible, the route may not serve as an input to the next | |||
phase of route selection; otherwise the return value is used as | phase of route selection; otherwise the return value is used as | |||
the LOCAL_PREF value in any IBGP readvertisement. | the LOCAL_PREF value in any IBGP readvertisement. | |||
The exact nature of this policy information and the computation | The exact nature of this policy information and the computation | |||
involved is a local matter. | involved is a local matter. | |||
skipping to change at page 47, line 15 | skipping to change at page 61, line 13 | |||
to occur. | to occur. | |||
For each set of destinations for which a feasible route exists in the | For each set of destinations for which a feasible route exists in the | |||
Adj-RIBs-In, the local BGP speaker shall identify the route that has: | Adj-RIBs-In, the local BGP speaker shall identify the route that has: | |||
a) the highest degree of preference of any route to the same set | a) the highest degree of preference of any route to the same set | |||
of destinations, or | of destinations, or | |||
b) is the only route to that destination, or | b) is the only route to that destination, or | |||
c) is selected as a result of the Phase 2 tie breaking rules | c) is selected as a result of the Phase 2 tie breaking rules spec- | |||
specified in 9.1.2.2. | ified in 9.1.2.2. | |||
The local speaker SHALL then install that route in the Loc-RIB, | The local speaker SHALL then install that route in the Loc-RIB, | |||
replacing any route to the same destination that is currently being | replacing any route to the same destination that is currently being | |||
held in the Loc-RIB. If the new BGP route is installed in the Routing | held in the Loc-RIB. When the new BGP route is installed in the Rout- | |||
Table (as a result of the local policy decision), care must be taken | ing Table, care must be taken to ensure that existing routes to the | |||
to ensure that invalid BGP routes to the same destination are removed | same destination that are now considered invalid are removed from the | |||
from the Routing Table. Whether or not the new route replaces an | Routing Table. Whether or not the new BGP route replaces an existing | |||
already existing non-BGP route in the routing table depends on the | non-BGP route in the Routing Table depends on the policy configured | |||
policy configured on the BGP speaker. | on the BGP speaker. | |||
The local speaker MUST determine the immediate next hop to the | The local speaker MUST determine the immediate next-hop address from | |||
address depicted by the NEXT_HOP attribute of the selected route by | the NEXT_HOP attribute of the selected route (see section 5.1.3). If | |||
performing a best matching route lookup in the Routing Table and | either the immediate next hop or the IGP cost to the NEXT_HOP (where | |||
selecting one of the possible paths (if multiple best paths to the | the NEXT_HOP is resolved through an IGP route) changes, Phase 2: | |||
same prefix are available). If the route to the address depicted by | Route Selection should be performed again. | |||
the NEXT_HOP attribute changes such that the immediate next hop or | ||||
the IGP cost to the NEXT_HOP (if the NEXT_HOP is resolved through an | ||||
IGP route) changes, route selection should be recalculated as | ||||
specified above. | ||||
Notice that even though BGP routes do not have to be installed in the | Notice that even though BGP routes do not have to be installed in the | |||
Routing Table with the immediate next hop(s), implementations must | Routing Table with the immediate next hop(s), implementations must | |||
take care that before any packets are forwarded along a BGP route, | take care that before any packets are forwarded along a BGP route, | |||
its associated NEXT_HOP address is resolved to the immediate | its associated NEXT_HOP address is resolved to the immediate | |||
(directly connected) next-hop address and this address (or multiple | (directly connected) next-hop address and this address (or multiple | |||
addresses) is finally used for actual packet forwarding. | addresses) is finally used for actual packet forwarding. | |||
Unresolvable routes SHALL be removed from the Loc-RIB and the routing | Unresolvable routes SHALL be removed from the Loc-RIB and the routing | |||
table. However, corresponding unresolvable routes SHOULD be kept in | table. However, corresponding unresolvable routes SHOULD be kept in | |||
the Adj-RIBs-In. | the Adj-RIBs-In (in case they become resolvable). | |||
9.1.2.1 Route Resolvability Condition | 9.1.2.1 Route Resolvability Condition | |||
As indicated in Section 9.1.2, BGP routers should exclude | As indicated in Section 9.1.2, BGP routers should exclude unresolv- | |||
unresolvable routes from the Phase 2 decision. This ensures that only | able routes from the Phase 2 decision. This ensures that only valid | |||
valid routes are installed in Loc-RIB and the Routing Table. | routes are installed in Loc-RIB and the Routing Table. | |||
The route resolvability condition is defined as follows. | The route resolvability condition is defined as follows. | |||
1. A route Rte1, referencing only the intermediate network | 1. A route Rte1, referencing only the intermediate network | |||
address, is considered resolvable if the Routing Table contains at | address, is considered resolvable if the Routing Table contains at | |||
least one resolvable route Rte2 that matches Rte1's intermediate | least one resolvable route Rte2 that matches Rte1's intermediate | |||
network address and is not recursively resolved (directly or | network address and is not recursively resolved (directly or indi- | |||
indirectly) through Rte1. If multiple matching routes are | rectly) through Rte1. If multiple matching routes are available, | |||
available, only the longest matching route should be considered. | only the longest matching route should be considered. | |||
2. Routes referencing interfaces (with or without intermediate | 2. Routes referencing interfaces (with or without intermediate | |||
addresses) are considered resolvable if the state of the | addresses) are considered resolvable if the state of the refer- | |||
referenced interface is up and IP processing is enabled on this | enced interface is up and IP processing is enabled on this inter- | |||
interface. | face. | |||
BGP routes do not refer to interfaces, but can be resolved through | BGP routes do not refer to interfaces, but can be resolved through | |||
the routes in the Routing Table that can be of both types. IGP routes | the routes in the Routing Table that can be of both types (those that | |||
and routes to directly connected networks are expected to specify the | specify interfaces or those that do not). IGP routes and routes to | |||
outbound interface. | directly connected networks are expected to specify the outbound | |||
interface. Static routes can specify the outbound interface, or the | ||||
intermediate address, or both. | ||||
Note that a BGP route is considered unresolvable not only in | Note that a BGP route is considered unresolvable not only in situa- | |||
situations where the router's Routing Table contains no route | tions where the router's Routing Table contains no route matching the | |||
matching the BGP route's NEXT_HOP. Mutually recursive routes (routes | BGP route's NEXT_HOP. Mutually recursive routes (routes resolving | |||
resolving each other or themselves), also fail the resolvability | each other or themselves), also fail the resolvability check. | |||
check. | ||||
It is also important that implementations do not consider feasible | It is also important that implementations do not consider feasible | |||
routes that would become unresolvable if they were installed in the | routes that would become unresolvable if they were installed in the | |||
Routing Table even if their NEXT_HOPs are resolvable using the | Routing Table even if their NEXT_HOPs are resolvable using the cur- | |||
current contents of the Routing Table (an example of such routes | rent contents of the Routing Table (an example of such routes would | |||
would be mutually recursive routes). This check ensures that a BGP | be mutually recursive routes). This check ensures that a BGP speaker | |||
speaker does not install in the Routing Table routes that will be | does not install in the Routing Table routes that will be removed and | |||
removed and not used by the speaker. Therefore, in addition to local | not used by the speaker. Therefore, in addition to local Routing | |||
Routing Table stability, this check also improves behavior of the | Table stability, this check also improves behavior of the protocol in | |||
protocol in the network. | the network. | |||
Whenever a BGP speaker identifies a route that fails the | Whenever a BGP speaker identifies a route that fails the resolvabil- | |||
resolvability check because of mutual recursion, an error message | ity check because of mutual recursion, an error message should be | |||
should be logged. | logged. | |||
9.1.2.2 Breaking Ties (Phase 2) | 9.1.2.2 Breaking Ties (Phase 2) | |||
In its Adj-RIBs-In a BGP speaker may have several routes to the same | In its Adj-RIBs-In a BGP speaker may have several routes to the same | |||
destination that have the same degree of preference. The local | destination that have the same degree of preference. The local | |||
speaker can select only one of these routes for inclusion in the | speaker can select only one of these routes for inclusion in the | |||
associated Loc-RIB. The local speaker considers all routes with the | associated Loc-RIB. The local speaker considers all routes with the | |||
same degrees of preference, both those received from internal peers, | same degrees of preference, both those received from internal peers, | |||
and those received from external peers. | and those received from external peers. | |||
The following tie-breaking procedure assumes that for each candidate | The following tie-breaking procedure assumes that for each candidate | |||
route all the BGP speakers within an autonomous system can ascertain | route all the BGP speakers within an autonomous system can ascertain | |||
the cost of a path (interior distance) to the address depicted by the | the cost of a path (interior distance) to the address depicted by the | |||
NEXT_HOP attribute of the route, and follow the same route selection | NEXT_HOP attribute of the route, and follow the same route selection | |||
algorithm. | algorithm. | |||
The tie-breaking algorithm begins by considering all equally | The tie-breaking algorithm begins by considering all equally prefer- | |||
preferable routes to the same destination, and then selects routes to | able routes to the same destination, and then selects routes to be | |||
be removed from consideration. The algorithm terminates as soon as | removed from consideration. The algorithm terminates as soon as only | |||
only one route remains in consideration. The criteria must be | one route remains in consideration. The criteria must be applied in | |||
applied in the order specified. | the order specified. | |||
Several of the criteria are described using pseudo-code. Note that | Several of the criteria are described using pseudo-code. Note that | |||
the pseudo-code shown was chosen for clarity, not efficiency. It is | the pseudo-code shown was chosen for clarity, not efficiency. It is | |||
not intended to specify any particular implementation. BGP | not intended to specify any particular implementation. BGP implemen- | |||
implementations MAY use any algorithm which produces the same results | tations MAY use any algorithm which produces the same results as | |||
as those described here. | those described here. | |||
a) Remove from consideration all routes which are not tied for | a) Remove from consideration all routes which are not tied for | |||
having the smallest number of AS numbers present in their AS_PATH | having the smallest number of AS numbers present in their AS_PATH | |||
attributes. Note, that when counting this number, an AS_SET counts | attributes. Note, that when counting this number, an AS_SET counts | |||
as 1, no matter how many ASs are in the set, and that, if the | as 1, no matter how many ASs are in the set. | |||
implementation supports [13], then AS numbers present in segments | ||||
of type AS_CONFED_SEQUENCE or AS_CONFED_SET are not included in | ||||
the count of AS numbers present in the AS_PATH. | ||||
b) Remove from consideration all routes which are not tied for | b) Remove from consideration all routes which are not tied for | |||
having the lowest Origin number in their Origin attribute. | having the lowest Origin number in their Origin attribute. | |||
c) Remove from consideration routes with less-preferred | c) Remove from consideration routes with less-preferred | |||
MULTI_EXIT_DISC attributes. MULTI_EXIT_DISC is only comparable | MULTI_EXIT_DISC attributes. MULTI_EXIT_DISC is only comparable | |||
between routes learned from the same neighboring AS. Routes which | between routes learned from the same neighboring AS. Routes which | |||
do not have the MULTI_EXIT_DISC attribute are considered to have | do not have the MULTI_EXIT_DISC attribute are considered to have | |||
the lowest possible MULTI_EXIT_DISC value. | the lowest possible MULTI_EXIT_DISC value. | |||
This is also described in the following procedure: | This is also described in the following procedure: | |||
for m = all routes still under consideration | for m = all routes still under consideration | |||
for n = all routes still under consideration | for n = all routes still under consideration | |||
if (neighborAS(m) == neighborAS(n)) and (MED(n) < MED(m)) | if (neighborAS(m) == neighborAS(n)) and (MED(n) < MED(m)) | |||
remove route m from consideration | remove route m from consideration | |||
In the pseudo-code above, MED(n) is a function which returns the | In the pseudo-code above, MED(n) is a function which returns the | |||
value of route n's MULTI_EXIT_DISC attribute. If route n has no | value of route n's MULTI_EXIT_DISC attribute. If route n has no | |||
MULTI_EXIT_DISC attribute, the function returns the lowest | MULTI_EXIT_DISC attribute, the function returns the lowest possi- | |||
possible MULTI_EXIT_DISC value, i.e. 0. | ble MULTI_EXIT_DISC value, i.e. 0. | |||
Similarly, neighborAS(n) is a function which returns the neighbor | Similarly, neighborAS(n) is a function which returns the neighbor | |||
AS from which the route was received. | AS from which the route was received. If the route is learned via | |||
IBGP, and the other IBGP speaker didn't originate the route, it is | ||||
the neighbor AS from which the other IBGP speaker learned the | ||||
route. If the route is learned via IBGP, and the other IBGP | ||||
speaker originated the route, it is the local AS. | ||||
If a MULTI_EXIT_DISC attribute is removed before re-advertising a | ||||
route into IBGP, the MULTI_EXIT_DISC attribute may only be consid- | ||||
ered in the comparison of EBGP learned routes, then removed, then | ||||
the remaining EBGP learned route may be compared to the remaining | ||||
IBGP learned routes, without considering the MULTI_EXIT_DISC | ||||
attribute for those EBGP learned routes whose MULTI_EXIT_DISC will | ||||
be dropped before advertising to IBGP. Including the | ||||
MULTI_EXIT_DISC of an EBGP learned route in the comparison with an | ||||
IBGP learned route, then dropping the MULTI_EXIT_DISC and adver- | ||||
tising the route has been proven to cause route loops. | ||||
d) If at least one of the candidate routes was received from an | d) If at least one of the candidate routes was received from an | |||
external peer in a neighboring autonomous system, remove from | external peer in a neighboring autonomous system, remove from con- | |||
consideration all routes which were received from internal peers. | sideration all routes which were received from internal peers. | |||
e) Remove from consideration any routes with less-preferred | e) Remove from consideration any routes with less-preferred inte- | |||
interior cost. The interior cost of a route is determined by | rior cost. The interior cost of a route is determined by calcu- | |||
calculating the metric to the next hop for the route using the | lating the metric to the NEXT_HOP for the route using the Routing | |||
Routing Table. If the next hop for a route is reachable, but no | Table. If the NEXT_HOP hop for a route is reachable, but no cost | |||
cost can be determined, then this step should be skipped | can be determined, then this step should be skipped (equivalently, | |||
(equivalently, consider all routes to have equal costs). | consider all routes to have equal costs). | |||
This is also described in the following procedure. | This is also described in the following procedure. | |||
for m = all routes still under consideration | for m = all routes still under consideration | |||
for n = all routes in still under consideration | for n = all routes in still under consideration | |||
if (cost(n) is better than cost(m)) | if (cost(n) is better than cost(m)) | |||
remove m from consideration | remove m from consideration | |||
In the pseudo-code above, cost(n) is a function which returns the | In the pseudo-code above, cost(n) is a function which returns the | |||
cost of the path (interior distance) to the address given in the | cost of the path (interior distance) to the address given in the | |||
skipping to change at page 51, line 4 | skipping to change at page 65, line 6 | |||
lowest value. | lowest value. | |||
g) Prefer the route received from the lowest neighbor address. | g) Prefer the route received from the lowest neighbor address. | |||
9.1.3 Phase 3: Route Dissemination | 9.1.3 Phase 3: Route Dissemination | |||
The Phase 3 decision function shall be invoked on completion of Phase | The Phase 3 decision function shall be invoked on completion of Phase | |||
2, or when any of the following events occur: | 2, or when any of the following events occur: | |||
a) when routes in the Loc-RIB to local destinations have changed | a) when routes in the Loc-RIB to local destinations have changed | |||
b) when locally generated routes learned by means outside of BGP | b) when locally generated routes learned by means outside of BGP | |||
have changed | have changed | |||
c) when a new BGP speaker - BGP speaker connection has been | c) when a new BGP speaker - BGP speaker connection has been estab- | |||
established | lished | |||
The Phase 3 function is a separate process which completes when it | The Phase 3 function is a separate process which completes when it | |||
has no further work to do. The Phase 3 Routing Decision function | has no further work to do. The Phase 3 Routing Decision function | |||
shall be blocked from running while the Phase 2 decision function is | shall be blocked from running while the Phase 2 decision function is | |||
in process. | in process. | |||
All routes in the Loc-RIB shall be processed into Adj-RIBs-Out | All routes in the Loc-RIB shall be processed into Adj-RIBs-Out | |||
according to configured policy. This policy may exclude a route in | according to configured policy. This policy may exclude a route in | |||
the Loc-RIB from being installed in a particular Adj-RIB-Out. A | the Loc-RIB from being installed in a particular Adj-RIB-Out. A | |||
route shall not be installed in the Adj-Rib-Out unless the | route shall not be installed in the Adj-Rib-Out unless the destina- | |||
destination and NEXT_HOP described by this route may be forwarded | tion and NEXT_HOP described by this route may be forwarded appropri- | |||
appropriately by the Routing Table. If a route in Loc-RIB is excluded | ately by the Routing Table. If a route in Loc-RIB is excluded from a | |||
from a particular Adj-RIB-Out the previously advertised route in that | particular Adj-RIB-Out the previously advertised route in that Adj- | |||
Adj-RIB-Out must be withdrawn from service by means of an UPDATE | RIB-Out must be withdrawn from service by means of an UPDATE message | |||
message (see 9.2). | (see 9.2). | |||
Route aggregation and information reduction techniques (see 9.2.2.1) | Route aggregation and information reduction techniques (see 9.2.2.1) | |||
may optionally be applied. | may optionally be applied. | |||
When the updating of the Adj-RIBs-Out and the Routing Table is | Any local policy which results in routes being added to an Adj-RIB- | |||
complete, the local BGP speaker shall run the Update-Send process of | Out without also being added to the local BGP speaker's forwarding | |||
table, is outside the scope of this document. | ||||
When the updating of the Adj-RIBs-Out and the Routing Table is com- | ||||
plete, the local BGP speaker shall run the Update-Send process of | ||||
9.2. | 9.2. | |||
9.1.4 Overlapping Routes | 9.1.4 Overlapping Routes | |||
A BGP speaker may transmit routes with overlapping Network Layer | A BGP speaker may transmit routes with overlapping Network Layer | |||
Reachability Information (NLRI) to another BGP speaker. NLRI overlap | Reachability Information (NLRI) to another BGP speaker. NLRI overlap | |||
occurs when a set of destinations are identified in non-matching | occurs when a set of destinations are identified in non-matching mul- | |||
multiple routes. Since BGP encodes NLRI using IP prefixes, overlap | tiple routes. Since BGP encodes NLRI using IP prefixes, overlap will | |||
will always exhibit subset relationships. A route describing a | always exhibit subset relationships. A route describing a smaller | |||
smaller set of destinations (a longer prefix) is said to be more | set of destinations (a longer prefix) is said to be more specific | |||
specific than a route describing a larger set of destinations (a | than a route describing a larger set of destinations (a shorter pre- | |||
shorted prefix); similarly, a route describing a larger set of | fix); similarly, a route describing a larger set of destinations is | |||
destinations (a shorter prefix) is said to be less specific than a | said to be less specific than a route describing a smaller set of | |||
route describing a smaller set of destinations (a longer prefix). | destinations. | |||
The precedence relationship effectively decomposes less specific | The precedence relationship effectively decomposes less specific | |||
routes into two parts: | routes into two parts: | |||
- a set of destinations described only by the less specific route, | - a set of destinations described only by the less specific route, | |||
and | and | |||
- a set of destinations described by the overlap of the less | ||||
specific and the more specific routes | - a set of destinations described by the overlap of the less spe- | |||
cific and the more specific routes | ||||
When overlapping routes are present in the same Adj-RIB-In, the more | When overlapping routes are present in the same Adj-RIB-In, the more | |||
specific route shall take precedence, in order from more specific to | specific route shall take precedence, in order from more specific to | |||
least specific. | least specific. | |||
The set of destinations described by the overlap represents a portion | The set of destinations described by the overlap represents a portion | |||
of the less specific route that is feasible, but is not currently in | of the less specific route that is feasible, but is not currently in | |||
use. If a more specific route is later withdrawn, the set of | use. If a more specific route is later withdrawn, the set of desti- | |||
destinations described by the overlap will still be reachable using | nations described by the overlap will still be reachable using the | |||
the less specific route. | less specific route. | |||
If a BGP speaker receives overlapping routes, the Decision Process | If a BGP speaker receives overlapping routes, the Decision Process | |||
MUST consider both routes based on the configured acceptance policy. | MUST consider both routes based on the configured acceptance policy. | |||
If both a less and a more specific route are accepted, then the | If both a less and a more specific route are accepted, then the Deci- | |||
Decision Process MUST either install both the less and the more | sion Process MUST either install both the less and the more specific | |||
specific routes or it MUST aggregate the two routes and install the | routes or it MUST aggregate the two routes and install the aggregated | |||
aggregated route, provided that both routes have the same value of | route, provided that both routes have the same value of the NEXT_HOP | |||
the NEXT_HOP attribute. | attribute. | |||
If a BGP speaker chooses to aggregate, then it MUST add | If a BGP speaker chooses to aggregate, then it SHOULD either include | |||
ATOMIC_AGGREGATE attribute to the route. A route that carries | all AS used to form the aggreagate in an AS_SET or add the | |||
ATOMIC_AGGREGATE attribute can not be de-aggregated. That is, the | ATOMIC_AGGREGATE attribute to the route. This attribute is now pri- | |||
NLRI of this route can not be made more specific. Forwarding along | marily informational. With the elimination of IP routing protocols | |||
such a route does not guarantee that IP packets will actually | that do not support classless routing and the elimination of router | |||
traverse only ASs listed in the AS_PATH attribute of the route. | and host implementations that do not support classless routing, there | |||
is no longer a need to deaggregate. Routes SHOULD NOT be de-aggre- | ||||
gated. A route that carries ATOMIC_AGGREGATE attribute in particular | ||||
MUST NOT be de-aggregated. That is, the NLRI of this route can not be | ||||
made more specific. Forwarding along such a route does not guarantee | ||||
that IP packets will actually traverse only ASs listed in the AS_PATH | ||||
attribute of the route. | ||||
9.2 Update-Send Process | 9.2 Update-Send Process | |||
The Update-Send process is responsible for advertising UPDATE | The Update-Send process is responsible for advertising UPDATE mes- | |||
messages to all peers. For example, it distributes the routes chosen | sages to all peers. For example, it distributes the routes chosen by | |||
by the Decision Process to other BGP speakers which may be located in | the Decision Process to other BGP speakers which may be located in | |||
either the same autonomous system or a neighboring autonomous system. | either the same autonomous system or a neighboring autonomous system. | |||
When a BGP speaker receives an UPDATE message from an internal peer, | When a BGP speaker receives an UPDATE message from an internal peer, | |||
the receiving BGP speaker shall not re-distribute the routing | the receiving BGP speaker shall not re-distribute the routing infor- | |||
information contained in that UPDATE message to other internal peers, | mation contained in that UPDATE message to other internal peers, | |||
unless the speaker acts as a BGP Route Reflector [11]. | unless the speaker acts as a BGP Route Reflector [RFC2796]. | |||
As part of Phase 3 of the route selection process, the BGP speaker | As part of Phase 3 of the route selection process, the BGP speaker | |||
has updated its Adj-RIBs-Out. All newly installed routes and all | has updated its Adj-RIBs-Out. All newly installed routes and all | |||
newly unfeasible routes for which there is no replacement route shall | newly unfeasible routes for which there is no replacement route shall | |||
be advertised to its peers by means of an UPDATE message. | be advertised to its peers by means of an UPDATE message. | |||
A BGP speaker should not advertise a given feasible BGP route from | A BGP speaker should not advertise a given feasible BGP route from | |||
its Adj-RIB-Out if it would produce an UPDATE message containing the | its Adj-RIB-Out if it would produce an UPDATE message containing the | |||
same BGP route as was previously advertised. | same BGP route as was previously advertised. | |||
Any routes in the Loc-RIB marked as unfeasible shall be removed. | Any routes in the Loc-RIB marked as unfeasible shall be removed. | |||
Changes to the reachable destinations within its own autonomous | Changes to the reachable destinations within its own autonomous sys- | |||
system shall also be advertised in an UPDATE message. | tem shall also be advertised in an UPDATE message. | |||
If due to the limits on the maximum size of an UPDATE message (see | ||||
Section 4) a single route doesn't fit into the message, the BGP | ||||
speaker MUST not advertise the route to its peers and MAY choose to | ||||
log an error locally. | ||||
9.2.1 Controlling Routing Traffic Overhead | 9.2.1 Controlling Routing Traffic Overhead | |||
The BGP protocol constrains the amount of routing traffic (that is, | The BGP protocol constrains the amount of routing traffic (that is, | |||
UPDATE messages) in order to limit both the link bandwidth needed to | UPDATE messages) in order to limit both the link bandwidth needed to | |||
advertise UPDATE messages and the processing power needed by the | advertise UPDATE messages and the processing power needed by the | |||
Decision Process to digest the information contained in the UPDATE | Decision Process to digest the information contained in the UPDATE | |||
messages. | messages. | |||
9.2.1.1 Frequency of Route Advertisement | 9.2.1.1 Frequency of Route Advertisement | |||
The parameter MinRouteAdvertisementInterval determines the minimum | The parameter MinRouteAdvertisementInterval determines the minimum | |||
amount of time that must elapse between advertisement of routes to a | amount of time that must elapse between advertisement and/or with- | |||
particular destination from a single BGP speaker. This rate limiting | drawal of routes to a particular destination by a BGP speaker to a | |||
procedure applies on a per-destination basis, although the value of | peer. This rate limiting procedure applies on a per-destination | |||
MinRouteAdvertisementInterval is set on a per BGP peer basis. | basis, although the value of MinRouteAdvertisementInterval is set on | |||
a per BGP peer basis. | ||||
Two UPDATE messages sent from a single BGP speaker that advertise | Two UPDATE messages sent by a BGP speaker to a peer that advertise | |||
feasible routes to some common set of destinations received from | feasible routes and/or withdrawal of unfeasible routes to some common | |||
external peers must be separated by at least | set of destinations MUST be separated by at least MinRouteAdvertise- | |||
MinRouteAdvertisementInterval. Clearly, this can only be achieved | mentInterval. Clearly, this can only be achieved precisely by keeping | |||
precisely by keeping a separate timer for each common set of | a separate timer for each common set of destinations. This would be | |||
destinations. This would be unwarranted overhead. Any technique which | unwarranted overhead. Any technique which ensures that the interval | |||
ensures that the interval between two UPDATE messages sent from a | between two UPDATE messages sent from a BGP speaker to a peer that | |||
single BGP speaker that advertise feasible routes to some common set | advertise feasible routes and/or withdrawal of unfeasible routes to | |||
of destinations received from external peers will be at least | some common set of destinations will be at least MinRouteAdvertise- | |||
MinRouteAdvertisementInterval, and will also ensure a constant upper | mentInterval, and will also ensure a constant upper bound on the | |||
bound on the interval is acceptable. | interval is acceptable. | |||
Since fast convergence is needed within an autonomous system, this | Since fast convergence is needed within an autonomous system, either | |||
procedure does not apply for routes received from other internal | (a) the MinRouteAdvertisementInterval used for internal peers SHOULD | |||
peers. To avoid long-lived black holes, the procedure does not apply | be shorter than the MinRouteAdvertisementInterval used for external | |||
to the explicit withdrawal of unfeasible routes (that is, routes | peers, or (b) the procedure describe in this section SHOULD NOT apply | |||
whose destinations (expressed as IP prefixes) are listed in the | for routes sent to internal peers. | |||
WITHDRAWN ROUTES field of an UPDATE message). | ||||
This procedure does not limit the rate of route selection, but only | This procedure does not limit the rate of route selection, but only | |||
the rate of route advertisement. If new routes are selected multiple | the rate of route advertisement. If new routes are selected multiple | |||
times while awaiting the expiration of MinRouteAdvertisementInterval, | times while awaiting the expiration of MinRouteAdvertisementInterval, | |||
the last route selected shall be advertised at the end of | the last route selected SHALL be advertised at the end of MinRouteAd- | |||
MinRouteAdvertisementInterval. | vertisementInterval. | |||
9.2.1.2 Frequency of Route Origination | 9.2.1.2 Frequency of Route Origination | |||
The parameter MinASOriginationInterval determines the minimum amount | The parameter MinASOriginationInterval determines the minimum amount | |||
of time that must elapse between successive advertisements of UPDATE | of time that must elapse between successive advertisements of UPDATE | |||
messages that report changes within the advertising BGP speaker's own | messages that report changes within the advertising BGP speaker's own | |||
autonomous systems. | autonomous systems. | |||
9.2.1.3 Jitter | ||||
To minimize the likelihood that the distribution of BGP messages by a | ||||
given BGP speaker will contain peaks, jitter should be applied to the | ||||
timers associated with MinASOriginationInterval, Keepalive, and | ||||
MinRouteAdvertisementInterval. A given BGP speaker shall apply the | ||||
same jitter to each of these quantities regardless of the | ||||
destinations to which the updates are being sent; that is, jitter | ||||
will not be applied on a "per peer" basis. | ||||
The amount of jitter to be introduced shall be determined by | ||||
multiplying the base value of the appropriate timer by a random | ||||
factor which is uniformly distributed in the range from 0.75 to 1.0. | ||||
9.2.2 Efficient Organization of Routing Information | 9.2.2 Efficient Organization of Routing Information | |||
Having selected the routing information which it will advertise, a | Having selected the routing information which it will advertise, a | |||
BGP speaker may avail itself of several methods to organize this | BGP speaker may avail itself of several methods to organize this | |||
information in an efficient manner. | information in an efficient manner. | |||
9.2.2.1 Information Reduction | 9.2.2.1 Information Reduction | |||
Information reduction may imply a reduction in granularity of policy | Information reduction may imply a reduction in granularity of policy | |||
control - after information is collapsed, the same policies will | control - after information is collapsed, the same policies will | |||
apply to all destinations and paths in the equivalence class. | apply to all destinations and paths in the equivalence class. | |||
The Decision Process may optionally reduce the amount of information | The Decision Process may optionally reduce the amount of information | |||
that it will place in the Adj-RIBs-Out by any of the following | that it will place in the Adj-RIBs-Out by any of the following meth- | |||
methods: | ods: | |||
a) Network Layer Reachability Information (NLRI): | a) Network Layer Reachability Information (NLRI): | |||
Destination IP addresses can be represented as IP address | Destination IP addresses can be represented as IP address pre- | |||
prefixes. In cases where there is a correspondence between the | fixes. In cases where there is a correspondence between the | |||
address structure and the systems under control of an autonomous | address structure and the systems under control of an autonomous | |||
system administrator, it will be possible to reduce the size of | system administrator, it will be possible to reduce the size of | |||
the NLRI carried in the UPDATE messages. | the NLRI carried in the UPDATE messages. | |||
b) AS_PATHs: | b) AS_PATHs: | |||
AS path information can be represented as ordered AS_SEQUENCEs or | AS path information can be represented as ordered AS_SEQUENCEs or | |||
unordered AS_SETs. AS_SETs are used in the route aggregation | unordered AS_SETs. AS_SETs are used in the route aggregation algo- | |||
algorithm described in 9.2.2.2. They reduce the size of the | rithm described in 9.2.2.2. They reduce the size of the AS_PATH | |||
AS_PATH information by listing each AS number only once, | information by listing each AS number only once, regardless of how | |||
regardless of how many times it may have appeared in multiple | many times it may have appeared in multiple AS_PATHs that were | |||
AS_PATHs that were aggregated. | aggregated. | |||
An AS_SET implies that the destinations listed in the NLRI can be | An AS_SET implies that the destinations listed in the NLRI can be | |||
reached through paths that traverse at least some of the | reached through paths that traverse at least some of the con- | |||
constituent autonomous systems. AS_SETs provide sufficient | stituent autonomous systems. AS_SETs provide sufficient informa- | |||
information to avoid routing information looping; however their | tion to avoid routing information looping; however their use may | |||
use may prune potentially feasible paths, since such paths are no | prune potentially feasible paths, since such paths are no longer | |||
longer listed individually as in the form of AS_SEQUENCEs. In | listed individually as in the form of AS_SEQUENCEs. In practice | |||
practice this is not likely to be a problem, since once an IP | this is not likely to be a problem, since once an IP packet | |||
packet arrives at the edge of a group of autonomous systems, the | arrives at the edge of a group of autonomous systems, the BGP | |||
BGP speaker at that point is likely to have more detailed path | speaker at that point is likely to have more detailed path infor- | |||
information and can distinguish individual paths to destinations. | mation and can distinguish individual paths to destinations. | |||
9.2.2.2 Aggregating Routing Information | 9.2.2.2 Aggregating Routing Information | |||
Aggregation is the process of combining the characteristics of | Aggregation is the process of combining the characteristics of sev- | |||
several different routes in such a way that a single route can be | eral different routes in such a way that a single route can be adver- | |||
advertised. Aggregation can occur as part of the decision process to | tised. Aggregation can occur as part of the decision process to | |||
reduce the amount of routing information that will be placed in the | reduce the amount of routing information that will be placed in the | |||
Adj-RIBs-Out. | Adj-RIBs-Out. | |||
Aggregation reduces the amount of information that a BGP speaker must | Aggregation reduces the amount of information that a BGP speaker must | |||
store and exchange with other BGP speakers. Routes can be aggregated | store and exchange with other BGP speakers. Routes can be aggregated | |||
by applying the following procedure separately to path attributes of | by applying the following procedure separately to path attributes of | |||
like type and to the Network Layer Reachability Information. | like type and to the Network Layer Reachability Information. | |||
Routes that have the following attributes shall not be aggregated | Routes that have different MULTI_EXIT_DISC attribute SHALL NOT be | |||
unless the corresponding attributes of each route are identical: | aggregated. | |||
MULTI_EXIT_DISC, NEXT_HOP. | ||||
If the aggregation occurs as part of the update process, routes with | ||||
different NEXT_HOP values can be aggregated when announced through an | ||||
external BGP session. | ||||
Path attributes that have different type codes can not be aggregated | Path attributes that have different type codes can not be aggregated | |||
together. Path attributes of the same type code may be aggregated, | together. Path attributes of the same type code may be aggregated, | |||
according to the following rules: | according to the following rules: | |||
ORIGIN attribute: If at least one route among routes that are | NEXT_HOP: | |||
aggregated has ORIGIN with the value INCOMPLETE, then the | When aggregating routes that have different NEXT_HOP attribute, | |||
aggregated route must have the ORIGIN attribute with the value | the NEXT_HOP attribute of the aggregated route SHALL identify | |||
INCOMPLETE. Otherwise, if at least one route among routes that | an interface on the router that performs the aggregation. | |||
are aggregated has ORIGIN with the value EGP, then the aggregated | ||||
route must have the origin attribute with the value EGP. In all | ||||
other case the value of the ORIGIN attribute of the aggregated | ||||
route is IGP. | ||||
AS_PATH attribute: If routes to be aggregated have identical | ORIGIN attribute: | |||
AS_PATH attributes, then the aggregated route has the same AS_PATH | If at least one route among routes that are aggregated has ORI- | |||
attribute as each individual route. | GIN with the value INCOMPLETE, then the aggregated route must | |||
have the ORIGIN attribute with the value INCOMPLETE. Other- | ||||
wise, if at least one route among routes that are aggregated | ||||
has ORIGIN with the value EGP, then the aggregated route must | ||||
have the origin attribute with the value EGP. In all other case | ||||
the value of the ORIGIN attribute of the aggregated route is | ||||
IGP. | ||||
For the purpose of aggregating AS_PATH attributes we model each AS | AS_PATH attribute: | |||
within the AS_PATH attribute as a tuple <type, value>, where | If routes to be aggregated have identical AS_PATH attributes, | |||
then the aggregated route has the same AS_PATH attribute as | ||||
each individual route. | ||||
For the purpose of aggregating AS_PATH attributes we model each | ||||
AS within the AS_PATH attribute as a tuple <type, value>, where | ||||
"type" identifies a type of the path segment the AS belongs to | "type" identifies a type of the path segment the AS belongs to | |||
(e.g. AS_SEQUENCE, AS_SET), and "value" is the AS number. If the | (e.g. AS_SEQUENCE, AS_SET), and "value" is the AS number. If | |||
routes to be aggregated have different AS_PATH attributes, then | the routes to be aggregated have different AS_PATH attributes, | |||
the aggregated AS_PATH attribute shall satisfy all of the | then the aggregated AS_PATH attribute shall satisfy all of the | |||
following conditions: | following conditions: | |||
- all tuples of type AS_SEQUENCE in the aggregated AS_PATH | - all tuples of type AS_SEQUENCE in the aggregated AS_PATH | |||
shall appear in all of the AS_PATH in the initial set of routes | shall appear in all of the AS_PATH in the initial set of | |||
to be aggregated. | routes to be aggregated. | |||
- all tuples of type AS_SET in the aggregated AS_PATH shall | - all tuples of type AS_SET in the aggregated AS_PATH shall | |||
appear in at least one of the AS_PATH in the initial set (they | appear in at least one of the AS_PATH in the initial set | |||
may appear as either AS_SET or AS_SEQUENCE types). | (they may appear as either AS_SET or AS_SEQUENCE types). | |||
- for any tuple X of type AS_SEQUENCE in the aggregated AS_PATH | - for any tuple X of type AS_SEQUENCE in the aggregated | |||
which precedes tuple Y in the aggregated AS_PATH, X precedes Y | AS_PATH which precedes tuple Y in the aggregated AS_PATH, X | |||
in each AS_PATH in the initial set which contains Y, regardless | precedes Y in each AS_PATH in the initial set which contains | |||
of the type of Y. | Y, regardless of the type of Y. | |||
- No tuple of type AS_SET with the same value shall appear more | - No tuple of type AS_SET with the same value shall appear | |||
than once in the aggregated AS_PATH. | more than once in the aggregated AS_PATH. | |||
- Multiple tuples of type AS_SEQUENCE with the same value may | - Multiple tuples of type AS_SEQUENCE with the same value | |||
appear in the aggregated AS_PATH only when adjacent to another | may appear in the aggregated AS_PATH only when adjacent to | |||
tuple of the same type and value. | another tuple of the same type and value. | |||
An implementation may choose any algorithm which conforms to these | An implementation may choose any algorithm which conforms to | |||
rules. At a minimum a conformant implementation shall be able to | these rules. At a minimum a conformant implementation shall be | |||
perform the following algorithm that meets all of the above | able to perform the following algorithm that meets all of the | |||
conditions: | above conditions: | |||
- determine the longest leading sequence of tuples (as defined | - determine the longest leading sequence of tuples (as | |||
above) common to all the AS_PATH attributes of the routes to be | defined above) common to all the AS_PATH attributes of the | |||
aggregated. Make this sequence the leading sequence of the | routes to be aggregated. Make this sequence the leading | |||
aggregated AS_PATH attribute. | sequence of the aggregated AS_PATH attribute. | |||
- set the type of the rest of the tuples from the AS_PATH | - set the type of the rest of the tuples from the AS_PATH | |||
attributes of the routes to be aggregated to AS_SET, and append | attributes of the routes to be aggregated to AS_SET, and | |||
them to the aggregated AS_PATH attribute. | append them to the aggregated AS_PATH attribute. | |||
- if the aggregated AS_PATH has more than one tuple with the | - if the aggregated AS_PATH has more than one tuple with the | |||
same value (regardless of tuple's type), eliminate all, but one | same value (regardless of tuple's type), eliminate all, but | |||
such tuple by deleting tuples of the type AS_SET from the | one such tuple by deleting tuples of the type AS_SET from | |||
aggregated AS_PATH attribute. | the aggregated AS_PATH attribute. | |||
Appendix 6, section 6.8 presents another algorithm that satisfies | - for each pair of adjacent tuples in the aggregated | |||
the conditions and allows for more complex policy configurations. | AS_PATH, if both tuples have the same type, merge them | |||
together, as long as doing so will not cause a segment with | ||||
length greater than 255 to be generated. | ||||
ATOMIC_AGGREGATE: If at least one of the routes to be aggregated | Appendix F, section F.6 presents another algorithm that satis- | |||
has ATOMIC_AGGREGATE path attribute, then the aggregated route | fies the conditions and allows for more complex policy configu- | |||
rations. | ||||
ATOMIC_AGGREGATE: | ||||
If at least one of the routes to be aggregated has | ||||
ATOMIC_AGGREGATE path attribute, then the aggregated route | ||||
shall have this attribute as well. | shall have this attribute as well. | |||
AGGREGATOR: All AGGREGATOR attributes of all routes to be | AGGREGATOR: | |||
aggregated should be ignored. The BGP speaker performing the route | All AGGREGATOR attributes of all routes to be aggregated should | |||
aggregation may attach a new AGGREGATOR attribute (see Section | be ignored. The BGP speaker performing the route aggregation | |||
5.1.7). | may attach a new AGGREGATOR attribute (see Section 5.1.7). | |||
9.3 Route Selection Criteria | 9.3 Route Selection Criteria | |||
Generally speaking, additional rules for comparing routes among | Generally speaking, additional rules for comparing routes among sev- | |||
several alternatives are outside the scope of this document. There | eral alternatives are outside the scope of this document. There are | |||
are two exceptions: | two exceptions: | |||
- If the local AS appears in the AS path of the new route being | - If the local AS appears in the AS path of the new route being | |||
considered, then that new route cannot be viewed as better than | considered, then that new route cannot be viewed as better than | |||
any other route (provided that the speaker is configured to accept | any other route (provided that the speaker is configured to accept | |||
such routes). If such a route were ever used, a routing loop could | such routes). If such a route were ever used, a routing loop could | |||
result (see Section 6.3). | result (see Section 6.3). | |||
- In order to achieve successful distributed operation, only | - In order to achieve successful distributed operation, only | |||
routes with a likelihood of stability can be chosen. Thus, an AS | routes with a likelihood of stability can be chosen. Thus, an AS | |||
must avoid using unstable routes, and it must not make rapid | must avoid using unstable routes, and it must not make rapid spon- | |||
spontaneous changes to its choice of route. Quantifying the terms | taneous changes to its choice of route. Quantifying the terms | |||
"unstable" and "rapid" in the previous sentence will require | "unstable" and "rapid" in the previous sentence will require expe- | |||
experience, but the principle is clear. | rience, but the principle is clear. | |||
Care must be taken to ensure that BGP speakers in the same AS do | Care must be taken to ensure that BGP speakers in the same AS do not | |||
not make inconsistent decisions. | make inconsistent decisions. | |||
9.4 Originating BGP routes | 9.4 Originating BGP routes | |||
A BGP speaker may originate BGP routes by injecting routing | A BGP speaker may originate BGP routes by injecting routing informa- | |||
information acquired by some other means (e.g. via an IGP) into BGP. | tion acquired by some other means (e.g. via an IGP) into BGP. A BGP | |||
A BGP speaker that originates BGP routes shall assign the degree of | speaker that originates BGP routes shall assign the degree of prefer- | |||
preference to these routes by passing them through the Decision | ence to these routes by passing them through the Decision Process | |||
Process (see Section 9.1). These routes may also be distributed to | (see Section 9.1). These routes may also be distributed to other BGP | |||
other BGP speakers within the local AS as part of the update process | speakers within the local AS as part of the update process (see Sec- | |||
(see Section 9.2). The decision whether to distribute non-BGP | tion 9.2). The decision whether to distribute non-BGP acquired routes | |||
acquired routes within an AS via BGP or not depends on the | within an AS via BGP or not depends on the environment within the AS | |||
environment within the AS (e.g. type of IGP) and should be controlled | (e.g. type of IGP) and should be controlled via configuration. | |||
via configuration. | ||||
Appendix 1. Comparison with RFC1771 | 10 BGP Timers | |||
BGP employs five timers: ConnectRetry (see Section 8), Hold Time (see | ||||
Section 4.2), KeepAlive (see Section 8), MinASOriginationInterval | ||||
(see Section 9.2.1.2), and MinRouteAdvertisementInterval (see Section | ||||
9.2.1.1). | ||||
The suggested default value for the ConnectRetry timer is 120 sec- | ||||
onds. | ||||
The suggested default value for the Hold Time is 90 seconds. | ||||
The suggested default value for the KeepAlive timer is 1/3 of the | ||||
Hold Time. | ||||
The suggested default value for the MinASOriginationInterval is 15 | ||||
seconds. | ||||
The suggested default value for the MinRouteAdvertisementInterval is | ||||
30 seconds. | ||||
An implementation of BGP MUST allow the Hold Time timer to be config- | ||||
urable on a per peer basis, and MAY allow the other timers to be con- | ||||
figurable. | ||||
To minimize the likelihood that the distribution of BGP messages by a | ||||
given BGP speaker will contain peaks, jitter should be applied to the | ||||
timers associated with MinASOriginationInterval, KeepAlive, Min- | ||||
RouteAdvertisementInterval, and ConnectRetry. A given BGP speaker may | ||||
apply the same jitter to each of these quantities regardless of the | ||||
destinations to which the updates are being sent; that is, jitter | ||||
need not be configured on a "per peer" basis. | ||||
The suggested default amount of jitter shall be determined by multi- | ||||
plying the base value of the appropriate timer by a random factor | ||||
which is uniformly distributed in the range from 0.75 to 1.0. A new | ||||
random value should be picked each time the timer is set. The range | ||||
of the jitter random value MAY be configurable. | ||||
Appendix A. Comparison with RFC1771 | ||||
There are numerous editorial changes (too many to list here). | There are numerous editorial changes (too many to list here). | |||
The following list the technical changes: | The following list the technical changes: | |||
Changes to reflect the usages of such features as TCP MD5 [10], | Changes to reflect the usages of such features as TCP MD5 | |||
BGP Route Reflectors [11], BGP Confederations [13], and BGP Route | [RFC2385], BGP Route Reflectors [RFC2796], BGP Confederations | |||
Refresh [12]. | [RFC3065], and BGP Route Refresh [RFC2918]. | |||
Clarification on the use of the BGP Identifier in the AGGREGATOR | Clarification on the use of the BGP Identifier in the AGGREGATOR | |||
attribute. | attribute. | |||
Procedures for imposing an upper bound on the number of prefixes | Procedures for imposing an upper bound on the number of prefixes | |||
that a BGP speaker would accept from a peer. | that a BGP speaker would accept from a peer. | |||
The ability of a BGP speaker to include more than one instance of | The ability of a BGP speaker to include more than one instance of | |||
its own AS in the AS_PATH attribute for the purpose of inter-AS | its own AS in the AS_PATH attribute for the purpose of inter-AS | |||
traffic engineering. | traffic engineering. | |||
Clarifications on the various types of NEXT_HOPs. | Clarifications on the various types of NEXT_HOPs. | |||
Clarifications to the use of the ATOMIC_AGGREGATE attribute. | Clarifications to the use of the ATOMIC_AGGREGATE attribute. | |||
The relationship between the immediate next hop, and the next hop | The relationship between the immediate next hop, and the next hop | |||
as specified in the NEXT_HOP path attribute. | as specified in the NEXT_HOP path attribute. | |||
Clarifications on the tie-breaking procedures. | Clarifications on the tie-breaking procedures. | |||
Appendix 2. Comparison with RFC1267 | Clarifications on the frequency of route advertisements. | |||
All the changes listed in Appendix 1, plus the following. | Optional Parameter Type 1 (Authentication Information) has been | |||
deprecated. | ||||
BGP-4 is capable of operating in an environment where a set of | UPDATE Message Error subcode 7 (AS Routing Loop) has been depre- | |||
reachable destinations may be expressed via a single IP prefix. The | cated. | |||
concept of network classes, or subnetting is foreign to BGP-4. To | ||||
Use of the Marker field for authentication has been deprecated. | ||||
Appendix B. Comparison with RFC1267 | ||||
All the changes listed in Appendix A, plus the following. | ||||
BGP-4 is capable of operating in an environment where a set of reach- | ||||
able destinations may be expressed via a single IP prefix. The con- | ||||
cept of network classes, or subnetting is foreign to BGP-4. To | ||||
accommodate these capabilities BGP-4 changes semantics and encoding | accommodate these capabilities BGP-4 changes semantics and encoding | |||
associated with the AS_PATH attribute. New text has been added to | associated with the AS_PATH attribute. New text has been added to | |||
define semantics associated with IP prefixes. These abilities allow | define semantics associated with IP prefixes. These abilities allow | |||
BGP-4 to support the proposed supernetting scheme [9]. | BGP-4 to support the proposed supernetting scheme [9]. | |||
To simplify configuration this version introduces a new attribute, | To simplify configuration this version introduces a new attribute, | |||
LOCAL_PREF, that facilitates route selection procedures. | LOCAL_PREF, that facilitates route selection procedures. | |||
The INTER_AS_METRIC attribute has been renamed to be MULTI_EXIT_DISC. | The INTER_AS_METRIC attribute has been renamed to be MULTI_EXIT_DISC. | |||
A new attribute, ATOMIC_AGGREGATE, has been introduced to insure that | A new attribute, ATOMIC_AGGREGATE, has been introduced to insure that | |||
certain aggregates are not de-aggregated. Another new attribute, | certain aggregates are not de-aggregated. Another new attribute, | |||
AGGREGATOR, can be added to aggregate routes in order to advertise | AGGREGATOR, can be added to aggregate routes in order to advertise | |||
which AS and which BGP speaker within that AS caused the aggregation. | which AS and which BGP speaker within that AS caused the aggregation. | |||
To insure that Hold Timers are symmetric, the Hold Time is now | To insure that Hold Timers are symmetric, the Hold Time is now nego- | |||
negotiated on a per-connection basis. Hold Times of zero are now | tiated on a per-connection basis. Hold Times of zero are now sup- | |||
supported. | ported. | |||
Appendix 3. Comparison with RFC 1163 | Appendix C. Comparison with RFC 1163 | |||
All of the changes listed in Appendices 1 and 2, plus the following. | All of the changes listed in Appendices A and B, plus the following. | |||
To detect and recover from BGP connection collision, a new field (BGP | To detect and recover from BGP connection collision, a new field (BGP | |||
Identifier) has been added to the OPEN message. New text (Section | Identifier) has been added to the OPEN message. New text (Section | |||
6.8) has been added to specify the procedure for detecting and | 6.8) has been added to specify the procedure for detecting and recov- | |||
recovering from collision. | ering from collision. | |||
The new document no longer restricts the border router that is passed | The new document no longer restricts the border router that is passed | |||
in the NEXT_HOP path attribute to be part of the same Autonomous | in the NEXT_HOP path attribute to be part of the same Autonomous Sys- | |||
System as the BGP Speaker. | tem as the BGP Speaker. | |||
New document optimizes and simplifies the exchange of the information | New document optimizes and simplifies the exchange of the information | |||
about previously reachable routes. | about previously reachable routes. | |||
Appendix 4. Comparison with RFC 1105 | Appendix D. Comparison with RFC 1105 | |||
All of the changes listed in Appendices 1, 2 and 3, plus the | All of the changes listed in Appendices A, B and C, plus the follow- | |||
following. | ing. | |||
Minor changes to the RFC1105 Finite State Machine were necessary to | Minor changes to the RFC1105 Finite State Machine were necessary to | |||
accommodate the TCP user interface provided by 4.3 BSD. | accommodate the TCP user interface provided by 4.3 BSD. | |||
The notion of Up/Down/Horizontal relations present in RFC1105 has | The notion of Up/Down/Horizontal relations present in RFC1105 has | |||
been removed from the protocol. | been removed from the protocol. | |||
The changes in the message format from RFC1105 are as follows: | The changes in the message format from RFC1105 are as follows: | |||
1. The Hold Time field has been removed from the BGP header and | 1. The Hold Time field has been removed from the BGP header and | |||
added to the OPEN message. | added to the OPEN message. | |||
2. The version field has been removed from the BGP header and | 2. The version field has been removed from the BGP header and | |||
added to the OPEN message. | added to the OPEN message. | |||
3. The Link Type field has been removed from the OPEN message. | 3. The Link Type field has been removed from the OPEN message. | |||
4. The OPEN CONFIRM message has been eliminated and replaced with | 4. The OPEN CONFIRM message has been eliminated and replaced with | |||
implicit confirmation provided by the KEEPALIVE message. | implicit confirmation provided by the KEEPALIVE message. | |||
5. The format of the UPDATE message has been changed | 5. The format of the UPDATE message has been changed signifi- | |||
significantly. New fields were added to the UPDATE message to | cantly. New fields were added to the UPDATE message to support | |||
support multiple path attributes. | multiple path attributes. | |||
6. The Marker field has been expanded and its role broadened to | 6. The Marker field has been expanded and its role broadened to | |||
support authentication. | support authentication. | |||
Note that quite often BGP, as specified in RFC 1105, is referred | Note that quite often BGP, as specified in RFC 1105, is referred | |||
to as BGP-1, BGP, as specified in RFC 1163, is referred to as | to as BGP-1, BGP, as specified in RFC 1163, is referred to as | |||
BGP-2, BGP, as specified in RFC1267 is referred to as BGP-3, and | BGP-2, BGP, as specified in RFC1267 is referred to as BGP-3, and | |||
BGP, as specified in this document is referred to as BGP-4. | BGP, as specified in this document is referred to as BGP-4. | |||
Appendix 5. TCP options that may be used with BGP | Appendix E. TCP options that may be used with BGP | |||
If a local system TCP user interface supports TCP PUSH function, then | If a local system TCP user interface supports TCP PUSH function, then | |||
each BGP message should be transmitted with PUSH flag set. Setting | each BGP message should be transmitted with PUSH flag set. Setting | |||
PUSH flag forces BGP messages to be transmitted promptly to the | PUSH flag forces BGP messages to be transmitted promptly to the | |||
receiver. | receiver. | |||
If a local system TCP user interface supports setting precedence for | If a local system TCP user interface supports setting precedence for | |||
TCP connection, then the BGP transport connection should be opened | TCP connection, then TCP connection used by BGP should be opened with | |||
with precedence set to Internetwork Control (110) value (see also | precedence set to Internetwork Control (110) value (see also | |||
[6]). | [RFC791]). | |||
A local system may protect its BGP sessions by using the TCP MD5 | A local system may protect its BGP connections by using the TCP MD5 | |||
Signature Option [10]. | Signature Option [RFC2385]. | |||
Appendix 6. Implementation Recommendations | Appendix F. Implementation Recommendations | |||
This section presents some implementation recommendations. | This section presents some implementation recommendations. | |||
6.1 Multiple Networks Per Message | Appendix F.1 Multiple Networks Per Message | |||
The BGP protocol allows for multiple address prefixes with the same | The BGP protocol allows for multiple address prefixes with the same | |||
path attributes to be specified in one message. Making use of this | path attributes to be specified in one message. Making use of this | |||
capability is highly recommended. With one address prefix per message | capability is highly recommended. With one address prefix per message | |||
there is a substantial increase in overhead in the receiver. Not only | there is a substantial increase in overhead in the receiver. Not only | |||
does the system overhead increase due to the reception of multiple | does the system overhead increase due to the reception of multiple | |||
messages, but the overhead of scanning the routing table for updates | messages, but the overhead of scanning the routing table for updates | |||
to BGP peers and other routing protocols (and sending the associated | to BGP peers and other routing protocols (and sending the associated | |||
messages) is incurred multiple times as well. | messages) is incurred multiple times as well. | |||
skipping to change at page 61, line 41 | skipping to change at page 77, line 11 | |||
per path attribute set basis is to build many messages as the routing | per path attribute set basis is to build many messages as the routing | |||
table is scanned. As each address prefix is processed, a message for | table is scanned. As each address prefix is processed, a message for | |||
the associated set of path attributes is allocated, if it does not | the associated set of path attributes is allocated, if it does not | |||
exist, and the new address prefix is added to it. If such a message | exist, and the new address prefix is added to it. If such a message | |||
exists, the new address prefix is just appended to it. If the message | exists, the new address prefix is just appended to it. If the message | |||
lacks the space to hold the new address prefix, it is transmitted, a | lacks the space to hold the new address prefix, it is transmitted, a | |||
new message is allocated, and the new address prefix is inserted into | new message is allocated, and the new address prefix is inserted into | |||
the new message. When the entire routing table has been scanned, all | the new message. When the entire routing table has been scanned, all | |||
allocated messages are sent and their resources released. Maximum | allocated messages are sent and their resources released. Maximum | |||
compression is achieved when all the destinations covered by the | compression is achieved when all the destinations covered by the | |||
address prefixes share a common set of path attributes making it | address prefixes share a common set of path attributes making it pos- | |||
possible to send many address prefixes in one 4096-byte message. | sible to send many address prefixes in one 4096-byte message. | |||
When peering with a BGP implementation that does not compress | ||||
multiple address prefixes into one message, it may be necessary to | ||||
take steps to reduce the overhead from the flood of data received | ||||
when a peer is acquired or a significant network topology change | ||||
occurs. One method of doing this is to limit the rate of updates. | ||||
This will eliminate the redundant scanning of the routing table to | ||||
provide flash updates for BGP peers and other routing protocols. A | ||||
disadvantage of this approach is that it increases the propagation | ||||
latency of routing information. By choosing a minimum flash update | ||||
interval that is not much greater than the time it takes to process | ||||
the multiple messages this latency should be minimized. A better | ||||
method would be to read all received messages before sending updates. | ||||
6.2 Processing Messages on a Stream Protocol | ||||
BGP uses TCP as a transport mechanism. Due to the stream nature of | ||||
TCP, all the data for received messages does not necessarily arrive | ||||
at the same time. This can make it difficult to process the data as | ||||
messages, especially on systems such as BSD Unix where it is not | ||||
possible to determine how much data has been received but not yet | ||||
processed. | ||||
One method that can be used in this situation is to first try to read | ||||
just the message header. For the KEEPALIVE message type, this is a | ||||
complete message; for other message types, the header should first be | ||||
verified, in particular the total length. If all checks are | ||||
successful, the specified length, minus the size of the message | ||||
header is the amount of data left to read. An implementation that | ||||
would "hang" the routing information process while trying to read | ||||
from a peer could set up a message buffer (4096 bytes) per peer and | ||||
fill it with data as available until a complete message has been | ||||
received. | ||||
6.3 Reducing route flapping | ||||
To avoid excessive route flapping a BGP speaker which needs to | ||||
withdraw a destination and send an update about a more specific or | ||||
less specific route SHOULD combine them into the same UPDATE message. | ||||
6.4 BGP Timers | When peering with a BGP implementation that does not compress multi- | |||
ple address prefixes into one message, it may be necessary to take | ||||
steps to reduce the overhead from the flood of data received when a | ||||
peer is acquired or a significant network topology change occurs. One | ||||
method of doing this is to limit the rate of updates. This will elim- | ||||
inate the redundant scanning of the routing table to provide flash | ||||
updates for BGP peers and other routing protocols. A disadvantage of | ||||
this approach is that it increases the propagation latency of routing | ||||
information. By choosing a minimum flash update interval that is not | ||||
much greater than the time it takes to process the multiple messages | ||||
this latency should be minimized. A better method would be to read | ||||
all received messages before sending updates. | ||||
BGP employs five timers: ConnectRetry, Hold Time, KeepAlive, | Appendix F.2 Reducing route flapping | |||
MinASOriginationInterval, and MinRouteAdvertisementInterval The | ||||
suggested value for the ConnectRetry timer is 120 seconds. The | ||||
suggested value for the Hold Time is 90 seconds. The suggested value | ||||
for the KeepAlive timer is 1/3 of the Hold Time. The suggested value | ||||
for the MinASOriginationInterval is 15 seconds. The suggested value | ||||
for the MinRouteAdvertisementInterval is 30 seconds. | ||||
An implementation of BGP MUST allow the Hold Time timer to be | To avoid excessive route flapping a BGP speaker which needs to with- | |||
configurable, and MAY allow the other timers to be configurable. | draw a destination and send an update about a more specific or less | |||
specific route SHOULD combine them into the same UPDATE message. | ||||
6.5 Path attribute ordering | Appendix F.3 Path attribute ordering | |||
Implementations which combine update messages as described above in | Implementations which combine update messages as described above in | |||
6.1 may prefer to see all path attributes presented in a known order. | 6.1 may prefer to see all path attributes presented in a known order. | |||
This permits them to quickly identify sets of attributes from | This permits them to quickly identify sets of attributes from differ- | |||
different update messages which are semantically identical. To | ent update messages which are semantically identical. To facilitate | |||
facilitate this, it is a useful optimization to order the path | this, it is a useful optimization to order the path attributes | |||
attributes according to type code. This optimization is entirely | according to type code. This optimization is entirely optional. | |||
optional. | ||||
6.6 AS_SET sorting | Appendix F.4 AS_SET sorting | |||
Another useful optimization that can be done to simplify this | Another useful optimization that can be done to simplify this situa- | |||
situation is to sort the AS numbers found in an AS_SET. This | tion is to sort the AS numbers found in an AS_SET. This optimization | |||
optimization is entirely optional. | is entirely optional. | |||
6.7 Control over version negotiation | Appendix F.5 Control over version negotiation | |||
Since BGP-4 is capable of carrying aggregated routes which cannot be | Since BGP-4 is capable of carrying aggregated routes which cannot be | |||
properly represented in BGP-3, an implementation which supports BGP-4 | properly represented in BGP-3, an implementation which supports BGP-4 | |||
and another BGP version should provide the capability to only speak | and another BGP version should provide the capability to only speak | |||
BGP-4 on a per-peer basis. | BGP-4 on a per-peer basis. | |||
6.8 Complex AS_PATH aggregation | Appendix F.6 Complex AS_PATH aggregation | |||
An implementation which chooses to provide a path aggregation | An implementation which chooses to provide a path aggregation algo- | |||
algorithm which retains significant amounts of path information may | rithm which retains significant amounts of path information may wish | |||
wish to use the following procedure: | to use the following procedure: | |||
For the purpose of aggregating AS_PATH attributes of two routes, | For the purpose of aggregating AS_PATH attributes of two routes, | |||
we model each AS as a tuple <type, value>, where "type" identifies | we model each AS as a tuple <type, value>, where "type" identifies | |||
a type of the path segment the AS belongs to (e.g. AS_SEQUENCE, | a type of the path segment the AS belongs to (e.g. AS_SEQUENCE, | |||
AS_SET), and "value" is the AS number. Two ASs are said to be the | AS_SET), and "value" is the AS number. Two ASs are said to be the | |||
same if their corresponding <type, value> tuples are the same. | same if their corresponding <type, value> tuples are the same. | |||
The algorithm to aggregate two AS_PATH attributes works as | The algorithm to aggregate two AS_PATH attributes works as fol- | |||
follows: | lows: | |||
a) Identify the same ASs (as defined above) within each AS_PATH | a) Identify the same ASs (as defined above) within each AS_PATH | |||
attribute that are in the same relative order within both | attribute that are in the same relative order within both | |||
AS_PATH attributes. Two ASs, X and Y, are said to be in the | AS_PATH attributes. Two ASs, X and Y, are said to be in the | |||
same order if either: | same order if either: | |||
- X precedes Y in both AS_PATH attributes, or - Y precedes X | - X precedes Y in both AS_PATH attributes, or - Y precedes X | |||
in both AS_PATH attributes. | in both AS_PATH attributes. | |||
b) The aggregated AS_PATH attribute consists of ASs identified | b) The aggregated AS_PATH attribute consists of ASs identified | |||
in (a) in exactly the same order as they appear in the AS_PATH | in (a) in exactly the same order as they appear in the AS_PATH | |||
attributes to be aggregated. If two consecutive ASs identified | attributes to be aggregated. If two consecutive ASs identified | |||
in (a) do not immediately follow each other in both of the | in (a) do not immediately follow each other in both of the | |||
AS_PATH attributes to be aggregated, then the intervening ASs | AS_PATH attributes to be aggregated, then the intervening ASs | |||
(ASs that are between the two consecutive ASs that are the | (ASs that are between the two consecutive ASs that are the | |||
same) in both attributes are combined into an AS_SET path | same) in both attributes are combined into an AS_SET path seg- | |||
segment that consists of the intervening ASs from both AS_PATH | ment that consists of the intervening ASs from both AS_PATH | |||
attributes; this segment is then placed in between the two | attributes; this segment is then placed in between the two con- | |||
consecutive ASs identified in (a) of the aggregated attribute. | secutive ASs identified in (a) of the aggregated attribute. If | |||
If two consecutive ASs identified in (a) immediately follow | two consecutive ASs identified in (a) immediately follow each | |||
each other in one attribute, but do not follow in another, then | other in one attribute, but do not follow in another, then the | |||
the intervening ASs of the latter are combined into an AS_SET | intervening ASs of the latter are combined into an AS_SET path | |||
path segment; this segment is then placed in between the two | segment; this segment is then placed in between the two consec- | |||
consecutive ASs identified in (a) of the aggregated attribute. | utive ASs identified in (a) of the aggregated attribute. | |||
c) For each pair of adjacent tuples in the aggregated AS_PATH, | ||||
if both tuples have the same type, merge them together, as long | ||||
as doing so will not cause a segment with length greater than | ||||
255 to be generated. | ||||
If as a result of the above procedure a given AS number appears | If as a result of the above procedure a given AS number appears | |||
more than once within the aggregated AS_PATH attribute, all, but | more than once within the aggregated AS_PATH attribute, all, but | |||
the last instance (rightmost occurrence) of that AS number should | the last instance (rightmost occurrence) of that AS number should | |||
be removed from the aggregated AS_PATH attribute. | be removed from the aggregated AS_PATH attribute. | |||
Security Considerations | Security Considerations | |||
BGP supports the ability to authenticate BGP messages by using BGP | BGP supports the ability to authenticate BGP messages by using BGP | |||
authentication. The authentication could be done on a per peer basis. | authentication. The authentication could be done on a per peer basis. | |||
In addition, BGP supports the ability to authenticate its data stream | In addition, BGP supports the ability to authenticate its data stream | |||
by using [10]. This authentication could be done on a per peer basis. | by using [RFC2385]. This authentication could be done on a per peer | |||
Finally, BGP could also use IPSec to authenticate its data stream. | basis. Finally, BGP could also use IPSec to authenticate its data | |||
Among the mechanisms mentioned in this paragraph, [10] is the most | stream. Among the mechanisms mentioned in this paragraph, [RFC2385] | |||
widely deployed. | is the most widely deployed. | |||
References | Normative References | |||
[1] Mills, D., "Exterior Gateway Protocol Formal Specification", | [RFC793] Postel, J., "Transmission Control Protocol - DARPA Internet | |||
RFC904, April 1984. | Program Protocol Specification", RFC793, September 1981. | |||
[2] Rekhter, Y., "EGP and Policy Based Routing in the New NSFNET | [RFC791] Postel, J., "Internet Protocol - DARPA Internet Program Pro- | |||
Backbone", RFC1092, February 1989. | tocol Specification", RFC791, September 1981. | |||
[3] Braun, H-W., "The NSFNET Routing Architecture", RFC1093, February | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
1989. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[4] Postel, J., "Transmission Control Protocol - DARPA Internet | Non-normative References | |||
Program Protocol Specification", RFC793, September 1981. | ||||
[5] Rekhter, Y., and P. Gross, "Application of the Border Gateway | [RFC904] Mills, D., "Exterior Gateway Protocol Formal Specification", | |||
Protocol in the Internet", RFC1772, March 1995. | RFC904, April 1984. | |||
[6] Postel, J., "Internet Protocol - DARPA Internet Program Protocol | [RFC1092] Rekhter, Y., "EGP and Policy Based Routing in the New | |||
Specification", RFC791, September 1981. | NSFNET Backbone", RFC1092, February 1989. | |||
[7] "Information Processing Systems - Telecommunications and | [RFC1093] Braun, H-W., "The NSFNET Routing Architecture", RFC1093, | |||
Information Exchange between Systems - Protocol for Exchange of | February 1989. | |||
Inter-domain Routeing Information among Intermediate Systems to | ||||
Support Forwarding of ISO 8473 PDUs", ISO/IEC IS10747, 1993 | ||||
[8] Fuller, V., Li, T., Yu, J., and Varadhan, K., ""Classless Inter- | [RFC1772] Rekhter, Y., and P. Gross, "Application of the Border Gate- | |||
Domain Routing (CIDR): an Address Assignment and Aggregation | way Protocol in the Internet", RFC1772, March 1995. | |||
[RFC1518] Rekhter, Y., Li, T., "An Architecture for IP Address Allo- | ||||
cation with CIDR", RFC 1518, September 1993. | ||||
[RFC1519] Fuller, V., Li, T., Yu, J., and Varadhan, K., ""Classless | ||||
Inter-Domain Routing (CIDR): an Address Assignment and Aggregation | ||||
Strategy", RFC1519, September 1993. | Strategy", RFC1519, September 1993. | |||
[9] Rekhter, Y., Li, T., "An Architecture for IP Address Allocation | [RFC1997] R. Chandra, P. Traina, T. Li, "BGP Communities Attribute", | |||
with CIDR", RFC 1518, September 1993. | RFC 1997, August 1996. | |||
[10] Heffernan, A., "Protection of BGP Sessions via the TCP MD5 | [RFC2385] Heffernan, A., "Protection of BGP Sessions via the TCP MD5 | |||
Signature Option", RFC2385, August 1998. | Signature Option", RFC2385, August 1998. | |||
[11] Bates, T., Chandra, R., Chen, E., "BGP Route Reflection - An | [RFC2439] C. Villamizar, R. Chandra, R. Govindan, "BGP Route Flap | |||
Alternative to Full Mesh IBGP", RFC2796, April 2000. | Damping", RFC2439, November 1998. | |||
[12] Chen, E., "Route Refresh Capability for BGP-4", RFC2918, | [RFC2796] Bates, T., Chandra, R., Chen, E., "BGP Route Reflection - | |||
An Alternative to Full Mesh IBGP", RFC2796, April 2000. | ||||
[RFC2842] R. Chandra, J. Scudder, "Capabilities Advertisement with | ||||
BGP-4", RFC2842. | ||||
[RFC2858] T. Bates, R. Chandra, D. Katz, Y. Rekhter, "Multiprotocol | ||||
Extensions for BGP-4", RFC2858. | ||||
[RFC2918] Chen, E., "Route Refresh Capability for BGP-4", RFC2918, | ||||
September 2000. | September 2000. | |||
[13] Traina, P, McPherson, D., Scudder, J., "Autonomous System | [RFC3065] Traina, P, McPherson, D., Scudder, J., "Autonomous System | |||
Confederations for BGP", RFC3065, February 2001. | Confederations for BGP", RFC3065, February 2001. | |||
[IS10747] "Information Processing Systems - Telecommunications and | ||||
Information Exchange between Systems - Protocol for Exchange of | ||||
Inter-domain Routeing Information among Intermediate Systems to Sup- | ||||
port Forwarding of ISO 8473 PDUs", ISO/IEC IS10747, 1993 | ||||
Editors' Addresses | Editors' Addresses | |||
Yakov Rekhter | Yakov Rekhter | |||
Juniper Networks | Juniper Networks | |||
1194 N. Mathilda Avenue | ||||
Sunnyvale, CA 94089 | ||||
email: yakov@juniper.net | email: yakov@juniper.net | |||
Tony Li | Tony Li | |||
Procket Networks | Procket Networks, Inc. | |||
1100 Cadillac Ct. | ||||
Milpitas, CA 95035 | email: tli@procket.com | |||
Email: tli@procket.com | ||||
Susan Hares | ||||
NextHop Technologies, Inc. | ||||
email: skh@nexthop.com | ||||
End of changes. | ||||
This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |