draft-ietf-idr-bgp-analysis-00.txt | draft-ietf-idr-bgp-analysis-01.txt | |||
---|---|---|---|---|
INTERNET-DRAFT David Meyer | INTERNET-DRAFT David Meyer | |||
draft-ietf-idr-bgp-analysis-00.txt Keyur Patel | draft-ietf-idr-bgp-analysis-01.txt Keyur Patel | |||
Category Informational | Category Informational | |||
Expires: September 2003 March 2003 | Expires: October 2003 April 2003 | |||
BGP-4 Protocol Analysis | BGP-4 Protocol Analysis | |||
<draft-ietf-idr-bgp-analysis-00.txt> | <draft-ietf-idr-bgp-analysis-01.txt> | |||
Status of this Document | Status of this Document | |||
This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
Drafts. | Drafts. | |||
skipping to change at page 3, line 8 | skipping to change at page 3, line 8 | |||
This report satisfies the requirement for "the second report", as | This report satisfies the requirement for "the second report", as | |||
described in Section 6.0 of RFC 1264 [RFC1264]. In order to fulfill | described in Section 6.0 of RFC 1264 [RFC1264]. In order to fulfill | |||
the requirement, this report augments RFC 1774 [RFC1774] and | the requirement, this report augments RFC 1774 [RFC1774] and | |||
summarizes the key features of BGP protocol, and analyzes the | summarizes the key features of BGP protocol, and analyzes the | |||
protocol with respect to scaling and performance. | protocol with respect to scaling and performance. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2. Key Features and algorithms of the BGP-4 | 2. Key Features and algorithms of the BGP protocol. . . . . . . . 4 | |||
protocol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | ||||
2.1. Key Features. . . . . . . . . . . . . . . . . . . . . . . . 4 | 2.1. Key Features. . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2.2. BGP Algorithms. . . . . . . . . . . . . . . . . . . . . . . 5 | 2.2. BGP Algorithms. . . . . . . . . . . . . . . . . . . . . . . 5 | |||
2.3. BGP Finite State Machine (FSM). . . . . . . . . . . . . . . 6 | 2.3. BGP Finite State Machine (FSM). . . . . . . . . . . . . . . 6 | |||
3. BGP Performance characteristics and Scalability. . . . . . . . 6 | 3. BGP Capabilities . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
3.1. Link bandwidth and CPU utilization. . . . . . . . . . . . . 7 | 4. BGP Persistent Peer Oscillations . . . . . . . . . . . . . . . 8 | |||
3.1.1. CPU utilization. . . . . . . . . . . . . . . . . . . . . 8 | 5. BGP Performance characteristics and Scalability. . . . . . . . 8 | |||
3.1.2. Memory requirements. . . . . . . . . . . . . . . . . . . 9 | 5.1. Link bandwidth and CPU utilization. . . . . . . . . . . . . 8 | |||
4. Applicability. . . . . . . . . . . . . . . . . . . . . . . . . 11 | 5.1.1. CPU utilization. . . . . . . . . . . . . . . . . . . . . 9 | |||
5. Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 12 | 5.1.2. Memory requirements. . . . . . . . . . . . . . . . . . . 11 | |||
6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 | 6. BGP Policy Expressiveness and its Implications . . . . . . . . 12 | |||
7. Author's Address . . . . . . . . . . . . . . . . . . . . . . . 14 | 6.1. Existence of Unique Stable Routings . . . . . . . . . . . . 13 | |||
8. Full Copyright Statement . . . . . . . . . . . . . . . . . . . 14 | 6.2. Existence of Stable Routings. . . . . . . . . . . . . . . . 14 | |||
7. Applicability. . . . . . . . . . . . . . . . . . . . . . . . . 15 | ||||
8. Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . 16 | ||||
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 | ||||
10. Author's Address. . . . . . . . . . . . . . . . . . . . . . . 18 | ||||
11. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 18 | ||||
1. Introduction | 1. Introduction | |||
BGP-4 is an inter-autonomous system routing protocol designed for | BGP-4 is an inter-autonomous system routing protocol designed for | |||
TCP/IP internets. Version 1 of the BGP protocol was published in RFC | TCP/IP internets. Version 1 of the BGP protocol was published in RFC | |||
1105 [RFC1105]. Since then BGP versions 2, 3, and 4 have been | 1105 [RFC1105]. Since then BGP versions 2, 3, and 4 have been | |||
developed. Version 2 was documented in RFC 1163 [RFC1163]. Version 3 | developed. Version 2 was documented in RFC 1163 [RFC1163]. Version 3 | |||
is documented in RFC 1267 [RFC1267]. Version 4 is documented in the | is documented in RFC 1267 [RFC1267]. Version 4 is documented in the | |||
[BGP4]. The changes between versions are explained in Appendix A of | [BGP4] (version 4 of BGP will hereafter be referred to as BGP). The | |||
[BGP4]. Possible applications of BGP in the Internet are documented | changes between versions are explained in Appendix A of [BGP4]. | |||
in [RFC1772]. | Possible applications of BGP in the Internet are documented in RFC | |||
1772 [RFC1772]. | ||||
2. Key Features and algorithms of the BGP-4 protocol | 2. Key Features and algorithms of the BGP protocol | |||
This section summarizes the key features and algorithms of the BGP | This section summarizes the key features and algorithms of the BGP | |||
protocol. BGP is an inter-autonomous system routing protocol; it is | protocol. BGP is an inter-autonomous system routing protocol; it is | |||
designed to be used between multiple autonomous systems. BGP assumes | designed to be used between multiple autonomous systems. BGP assumes | |||
that routing within an autonomous system is done by an intra- | that routing within an autonomous system is done by an intra- | |||
autonomous system routing protocol. BGP does not make any assumptions | autonomous system routing protocol. BGP does not make any assumptions | |||
about intra-autonomous system routing protocols deployed within the | about intra-autonomous system routing protocols deployed within the | |||
various autonomous systems. Specifically, BGP does not require all | various autonomous systems. Specifically, BGP does not require all | |||
autonomous systems to run the same intra-autonomous system routing | autonomous systems to run the same intra-autonomous system routing | |||
protocol (i.e., interior gateway protocol or IGP). | protocol (i.e., interior gateway protocol or IGP). | |||
skipping to change at page 4, line 39 | skipping to change at page 4, line 40 | |||
protocol, and as such it imposes no constraints on the underlying | protocol, and as such it imposes no constraints on the underlying | |||
Internet topology. The information exchanged via BGP is sufficient to | Internet topology. The information exchanged via BGP is sufficient to | |||
construct a graph of autonomous systems connectivity from which | construct a graph of autonomous systems connectivity from which | |||
routing loops may be pruned and many routing policy decisions at the | routing loops may be pruned and many routing policy decisions at the | |||
autonomous system level may be enforced. | autonomous system level may be enforced. | |||
2.1. Key Features | 2.1. Key Features | |||
The key features of the protocol are the notion of path attributes | The key features of the protocol are the notion of path attributes | |||
and aggregation of network layer reachability information (NLRI). | and aggregation of network layer reachability information (NLRI). | |||
Path attributes provide BGP with flexibility and expandability. Path | Path attributes provide BGP with flexibility and extensibility. Path | |||
attributes are partitioned into well-known and optional. The | attributes are partitioned into well-known and optional. The | |||
provision for optional attributes allows experimentation that may | provision for optional attributes allows experimentation that may | |||
involve a group of BGP routers without affecting the rest of the | involve a group of BGP routers without affecting the rest of the | |||
Internet. New optional attributes can be added to the protocol in | Internet. New optional attributes can be added to the protocol in | |||
much the same way that new options are added to, say, the Telnet | much the same way that new options are added to, say, the Telnet | |||
protocol [RFC854]. | protocol [RFC854]. | |||
One of the most important path attributes is the AS-PATH. As | One of the most important path attributes is the AS_PATH. AS | |||
reachability information traverses the Internet, this information is | reachability information traverses the Internet, this information is | |||
augmented by the list of autonomous systems that have been traversed | augmented by the list of autonomous systems that have been traversed | |||
thus far, forming the AS-PATH. The AS-PATH allows straightforward | thus far, forming the AS_PATH. The AS_PATH allows straightforward | |||
suppression of the looping of routing information. In addition, the | suppression of the looping of routing information. In addition, the | |||
AS-PATH serves as a powerful and versatile mechanism for policy-based | AS_PATH serves as a powerful and versatile mechanism for policy-based | |||
routing. | routing. | |||
BGP-4 enhances the AS-PATH attribute to include sets of autonomous | BGP enhances the AS_PATH attribute to include sets of autonomous | |||
systems as well as lists. This extended format allows generated | systems as well as lists via the AS_SET attribute. This extended | |||
aggregate routes to carry path information from the more specific | format allows generated aggregate routes to carry path information | |||
routes used to generate the aggregate. It should be noted however, | from the more specific routes used to generate the aggregate. It | |||
that as of this writing, AS-SETs are in rarely used in the Internet | should be noted however, that as of this writing, AS_SETs are rarely | |||
[ROUTEVIEWS]. | used in the Internet [ROUTEVIEWS]. | |||
2.2. BGP Algorithms | 2.2. BGP Algorithms | |||
BGP uses an algorithm that cannot be classified as either a pure | BGP uses an algorithm that is neither a pure distance vector | |||
distance vector, or a pure link state. Carrying a complete AS path in | algorithm or a pure link state algorithm. It is instead a modified | |||
the AS-PATH attribute allows to reconstruct large portions of the | distance vector algorithm that uses path information to avoid | |||
overall topology. That makes it similar to the link state algorithms. | traditional distance vector problems. Each route within BGP pairs | |||
Exchanging only the currently used routes between the peers makes it | destination with path information to that destination. Path | |||
similar to the distance vector algorithms. | information (also known as AS_PATH information) is stored within the | |||
AS_PATH attribute in BGP. This allows BGP to reconstruct large | ||||
portions of overall topology whenever required. | ||||
BGP-4 uses an incremental update strategy in order To conserve | BGP uses an incremental update strategy in order to conserve | |||
bandwidth and processing power. That is, after initial exchange of | bandwidth and processing power. That is, after initial exchange of | |||
complete routing information, a pair of BGP routers exchanges only | complete routing information, a pair of BGP routers exchanges only | |||
changes (deltas) to that information. Such an incremental update | changes to that information. Such an incremental update design | |||
design requires reliable transport between a pair of BGP routers to | requires reliable transport between a pair of BGP routers to function | |||
function correctly. BGP uses TCP as its reliable transport. | correctly. BGP solves this problem by using TCP for reliable | |||
transport. | ||||
In addition to incremental updates, BGP-4 has added the concept of | In addition to incremental updates, BGP has added the concept of | |||
route aggregation so that information about groups of networks may be | route aggregation so that information about groups of networks may be | |||
aggregated and sent as a single Network Layer Reachability (NLRI) | aggregated and sent as a single Network Layer Reachability (NLRI). | |||
Attribute. | ||||
Finally, note that BGP is a self-contained protocol. That is, it | Finally, note that BGP is a self-contained protocol. That is, BGP | |||
specifies how routing information is exchanged both between BGP | specifies how routing information is exchanged both between BGP | |||
speakers in different autonomous systems, and between BGP speakers | speakers in different autonomous systems, and between BGP speakers | |||
within a single autonomous system. | within a single autonomous system. | |||
2.3. BGP Finite State Machine (FSM) | 2.3. BGP Finite State Machine (FSM) | |||
The BGP FSM is a set of rules that are applied to a BGP speaker's set | The BGP FSM is a set of rules that are applied to a BGP speaker's set | |||
of configured peers for the BGP operation. A BGP implementation | of configured peers for the BGP operation. A BGP implementation | |||
requires that a BGP speaker must connect and listen to tcp port 179 | requires that a BGP speaker must connect to and listen on TCP port | |||
for accepting any new BGP connections from it's peers. The BGP FSM | 179 for accepting any new BGP connections from it's peers. The BGP | |||
must be initiated and maintained for each new incoming and outgoing | Finite State Machine, or FSM, must be initiated and maintained for | |||
peer connections. However, in steady state operation, there will be | each new incoming and outgoing peer connections. However, in steady | |||
only one BGP FSM per connection per peer. | state operation, there will be only one BGP FSM per connection per | |||
peer. | ||||
There may exist a temporary period where in a BGP peer may have | There may exist a temporary period where in a BGP peer may have | |||
separate incoming and outgoing connections resulting into two | separate incoming and outgoing connections resulting into two | |||
different BGP FSMs for a peer (instead of one). This can be resolved | different BGP FSMs for a peer (instead of one). This can be resolved | |||
following BGP connection collision rules defined in the [BGP4]. | following BGP connection collision rules defined in the [BGP4]. | |||
Following are different states of BGP FSM for its peers: | Following are different states of BGP FSM for its peers: | |||
IDLE: State when BGP peer refuses any incoming | IDLE: State when BGP peer refuses any incoming | |||
connections. | connections. | |||
skipping to change at page 6, line 41 | skipping to change at page 6, line 42 | |||
OPENSENT: BGP peer is waiting for OPEN message from its | OPENSENT: BGP peer is waiting for OPEN message from its | |||
peer. | peer. | |||
OPENCONFIRM: BGP peer is waiting for KEEPALIVE or NOTIFICATION | OPENCONFIRM: BGP peer is waiting for KEEPALIVE or NOTIFICATION | |||
message from its peer. | message from its peer. | |||
ESTABLISHED: BGP peer connection is established and exchanges | ESTABLISHED: BGP peer connection is established and exchanges | |||
UPDATE, NOTIFICATION, and KEEPALIVE messages with | UPDATE, NOTIFICATION, and KEEPALIVE messages with | |||
its peer. | its peer. | |||
3. BGP Performance characteristics and Scalability | There are different BGP events that operates on above mentioned states | |||
of BGP FSM for its peers. These BGP events are used for initiating and | ||||
terminating peer connections. They also assist BGP in identifying any | ||||
persistent peer connection oscillations and provides mechanism | ||||
for controlling it. | ||||
Following are different BGP events: | ||||
Manual Start: Manually start the peer connection. | ||||
Manual Stop: Manually stop the peer connection. | ||||
Automatic Start: Local system automatically starts the peer | ||||
connection. | ||||
Manual start with | ||||
passive TCP flag: Local system administrator manually starts the | ||||
peer connection with peer in passive mode. | ||||
Automatic start | ||||
with passive TCP flag: Local system administrator automatically starts | ||||
the peer connection with peer in passive mode. | ||||
Automatic start | ||||
with bgp_stop_flap | ||||
option set: Local system administrator automatically starts | ||||
the peer connection with peer oscillation | ||||
damping enabled | ||||
Automatic start with | ||||
bgp_stop_flap option | ||||
set and passive TCP | ||||
establishment | ||||
option set: Local system administrator automatically starts | ||||
the peer connection with peer oscillation | ||||
damping enabled and with peer in passive mode. | ||||
Automatic stop: Local system automatically stops the | ||||
BGP connection. | ||||
Both, Manual Start and Manual Stop are mandatory BGP events. All | ||||
other events are optional. | ||||
3. BGP Capabilities | ||||
The BGP Capability mechanism [RFC2842] provides easy and flexible way | ||||
to introduce new features within the protocol. In particular, the BGP | ||||
capability mechanism allows peers to negotiate various optional | ||||
features during startup. This allows the base BGP protocol to contain | ||||
only essential functionality, while at the same time providing a | ||||
flexible mechanism for signaling protocol extensions. | ||||
4. BGP Persistent Peer Oscillations | ||||
Ideally, whenever a BGP speaker detects an error in any peer | ||||
connection, it shuts down the peer and changes its FSM state to IDLE. | ||||
BGP speaker requires a Start event to re-initiate its idle peer | ||||
connection. If error remains persistent and BGP speaker generates | ||||
Start event automatically then it may result in persistent peer | ||||
flapping. However, although peer oscillation is found to be wide- | ||||
spread in BGP implementations, methods for preventing persistent peer | ||||
oscillations are outside the scope of base BGP protocol | ||||
specification. | ||||
5. BGP Performance characteristics and Scalability | ||||
In this section, we provide "order of magnitude" answers to the | In this section, we provide "order of magnitude" answers to the | |||
questions of how much link bandwidth, router memory and router CPU | questions of how much link bandwidth, router memory and router CPU | |||
cycles the BGP protocol will consume under normal conditions. In | cycles the BGP protocol will consume under normal conditions. In | |||
particular, we will address the scalability of BGP and its | particular, we will address the scalability of BGP and its | |||
limitations. | limitations. | |||
It is important to note that BGP does not require all the routers | It is important to note that BGP does not require all the routers | |||
within an autonomous system to participate in the BGP protocol. In | within an autonomous system to participate in the BGP protocol. In | |||
particular, only the border routers that provide connectivity between | particular, only the border routers that provide connectivity between | |||
the local autonomous system and their adjacent autonomous systems | the local autonomous system and their adjacent autonomous systems | |||
need participate in BGP. Constraining this set of participants is | need participate in BGP. The ability to constraint the set of BGP | |||
just one way of addressing the scaling issue. | speakers is one way to address scaling issues. | |||
3.1. Link bandwidth and CPU utilization | 5.1. Link bandwidth and CPU utilization | |||
Immediately after the initial BGP connection setup, the peers | Immediately after the initial BGP connection setup, BGP peers | |||
exchange complete set of routing information. If we denote the total | exchange complete set of routing information. If we denote the total | |||
number of routes in the Internet by N, the mean AS distance of the | number of routes in the Internet by N, the mean AS distance of the | |||
Internet by M (distance at the level of an autonomous system, | Internet by M (distance at the level of an autonomous system, | |||
expressed in terms of the number of autonomous systems), the total | expressed in terms of the number of autonomous systems), the total | |||
number of autonomous systems in the Internet by A, and assume that | number of autonomous systems in the Internet by A, and assume that | |||
the networks are uniformly distributed among the autonomous systems, | the networks are uniformly distributed among the autonomous systems, | |||
then the worst case amount of bandwidth consumed during the initial | then the worst case amount of bandwidth consumed during the initial | |||
exchange between a pair of BGP speakers is | exchange between a pair of BGP speakers is | |||
MR = O(N + M * A) | MR = O(N + M * A) | |||
skipping to change at page 7, line 41 | skipping to change at page 9, line 19 | |||
# NLRI Mean AS Distance # AS's Bandwidth (MR) | # NLRI Mean AS Distance # AS's Bandwidth (MR) | |||
---------- ---------------- ------ ---------------- | ---------- ---------------- ------ ---------------- | |||
40,000 15 400 184,000 bytes | 40,000 15 400 184,000 bytes | |||
100,000 10 10,000 800,000 bytes | 100,000 10 10,000 800,000 bytes | |||
120,000 10 15,000 1,080,000 bytes | 120,000 10 15,000 1,080,000 bytes | |||
140,000 15 20,000 1,760,000 bytes | 140,000 15 20,000 1,760,000 bytes | |||
[note that most of this bandwidth is consumed by the NLRI exchange] | [note that most of this bandwidth is consumed by the NLRI exchange] | |||
BGP-4 was created specifically to reduce the size of the set of NLRI | BGP was created specifically to reduce the size of the set of NLRI | |||
entires carried and exchanged by border routers. The aggregation | entries which have to be carried and exchanged by border routers. The | |||
scheme, defined in RFC 1519 [RFC1519], describes the provider-based | aggregation scheme, defined in RFC 1519 [RFC1519], describes the | |||
aggregation scheme in use in today's Internet. | provider-based aggregation scheme in use in today's Internet. | |||
Due to the advantages of advertising a few large aggregate blocks | Due to the advantages of advertising a few large aggregate blocks | |||
instead of many smaller class-based individual networks, it is | instead of many smaller class-based individual networks, it is | |||
difficult to estimate the actual reduction in bandwidth and | difficult to estimate the actual reduction in bandwidth and | |||
processing that BGP-4 has provided over BGP-3. If we simply | processing that BGP has provided over BGP-3. If we simply enumerate | |||
enumerate all aggregate blocks into their individual class-based | all aggregate blocks into their individual class-based networks, we | |||
networks, we would not take into account "dead" space that has been | would not take into account "dead" space that has been reserved for | |||
reserved for future expansion. The best metric for determining the | future expansion. The best metric for determining the success of | |||
success of BGP-4's aggregation is to sample the number NLRI entries | BGP's aggregation is to sample the number NLRI entries in the | |||
in the globally connected Internet today and compare it to projected | globally connected Internet today and compare it to projected growth | |||
growth rates before BGP-4 was deployed. | rates before BGP was deployed. | |||
At the time of this writing, the full set of exterior routes carried | At the time of this writing, the full set of exterior routes carried | |||
by BGP is approximately 120,000 network entries [ROUTEVIEWS]. | by BGP approximately 120,000 network entries [ROUTEVIEWS]. | |||
3.1.1. CPU utilization | 5.1.1. CPU utilization | |||
An important (and fundamental) feature of BGP is that BGP's CPU | An important and fundamental feature of BGP is that BGP's CPU | |||
utilization depends only on the stability of the Internet. If the | utilization depends only on the stability of the Internet. If the | |||
Internet is stable, then the only link bandwidth and router CPU | Internet is stable, then the only link bandwidth and router CPU | |||
cycles consumed by BGP are due to the exchange of the BGP KEEPALIVE | cycles consumed by BGP are due to the exchange of the BGP KEEPALIVE | |||
messages. The KEEPALIVE messages are exchanged only between peers. | messages. The KEEPALIVE messages are exchanged only between peers. | |||
The suggested frequency of the exchange is 30 seconds. The KEEPALIVE | The suggested frequency of the exchange is 30 seconds. The KEEPALIVE | |||
messages are quite short (19 octets), and require virtually no | messages are quite short (19 octets), and require virtually no | |||
processing. Therefore, the bandwidth consumed by the KEEPALIVE | processing. As a result, the bandwidth consumed by the KEEPALIVE | |||
messages is about 5 bits/sec. Operational experience confirms that | messages is about 5 bits/sec. Operational experience confirms that | |||
the overhead (in terms of bandwidth and CPU) associated with the | the overhead (in terms of bandwidth and CPU) associated with the | |||
KEEPALIVE messages should be viewed as negligible. | KEEPALIVE messages should be viewed as negligible. | |||
During periods of Internet instability, changes to the reachability | During periods of Internet instability, changes to the reachability | |||
information are passed between routers in UPDATE messages. If we | information are passed between routers in UPDATE messages. If we | |||
denote the number of routing changes per second by C, then in the | denote the number of routing changes per second by C, then in the | |||
worst case the amount of bandwidth consumed by the BGP can be | worst case the amount of bandwidth consumed by the BGP can be | |||
expressed as O(C * M). The greatest overhead per UPDATE message | expressed as O(C * M). The greatest overhead per UPDATE message | |||
occurs when each UPDATE message contains only a single network. It | occurs when each UPDATE message contains only a single network. It | |||
skipping to change at page 9, line 12 | skipping to change at page 10, line 37 | |||
the size of the IPv4 Internet routing table is bounded by O(2^32 * | the size of the IPv4 Internet routing table is bounded by O(2^32 * | |||
M), (where M is a slow-moving function describing the AS | M), (where M is a slow-moving function describing the AS | |||
interconnectivity of the network), no such bound can be formulated | interconnectivity of the network), no such bound can be formulated | |||
for the dynamic properties (i.e., stability) of BGP. Finally, since | for the dynamic properties (i.e., stability) of BGP. Finally, since | |||
the dynamic properties of the network cannot be quantitatively | the dynamic properties of the network cannot be quantitatively | |||
bounded, stability must be addressed via heuristics such as BGP | bounded, stability must be addressed via heuristics such as BGP | |||
Route Flap Dampening [RFC2439]. Due to the nature of BGP, such | Route Flap Dampening [RFC2439]. Due to the nature of BGP, such | |||
dampening should be viewed as a local to an autonomous system matter | dampening should be viewed as a local to an autonomous system matter | |||
(see also Appendix F.2 of [BGP4]). | (see also Appendix F.2 of [BGP4]). | |||
Growth of the Internet has made the stability issue one of the most | ||||
crucial issues for Internet operations. BGP by itself does not | ||||
introduce any instabilities into the Internet. Rather, instabilities | ||||
are largely due to the the dynamic nature of the edges of the | ||||
network, coupled with less than optimal aggregation. As a result, | ||||
stability should be addressed through improved aggregation and | ||||
isolating the core of the network from the dynamic nature of the edge | ||||
networks. | ||||
It may also be instructive to compare bandwidth and CPU requirements | It may also be instructive to compare bandwidth and CPU requirements | |||
of BGP with EGP. While with BGP the complete information is exchanged | of BGP with EGP. While with BGP the complete information is exchanged | |||
only at the connection establishment time, with EGP the complete | only at the connection establishment time, with EGP the complete | |||
information is exchanged periodically (usually every 3 minutes). Note | information is exchanged periodically (usually every 3 minutes). Note | |||
that both for BGP and for EGP the amount of information exchanged is | that both for BGP and for EGP the amount of information exchanged is | |||
roughly on the order of the networks reachable via a peer that sends | roughly on the order of the networks reachable via a peer that sends | |||
the information. Therefore, even if one assumes extreme instabilities | the information. Therefore, even if one assumes extreme instabilities | |||
of BGP, its worst case behavior will be the same as the steady state | of BGP, its worst case behavior will be the same as the steady state | |||
behavior of it's predecessor, EGP. | behavior of it's predecessor, EGP. | |||
Operational experience with BGP showed that the incremental update | Operational experience with BGP showed that the incremental update | |||
approach employed by BGP presents an enormous improvement both in the | approach employed by BGP provides qualitative improvement in both | |||
area of bandwidth and in the CPU utilization, as compared with | bandwidth and CPU utilization when compared with complete periodic | |||
complete periodic updates used by EGP (see also presentation by | updates used by EGP (see also presentation by Dennis Ferguson at the | |||
Dennis Ferguson at the Twentieth IETF, March 11-15, 1991, St.Louis). | Twentieth IETF, March 11-15, 1991, St.Louis). | |||
3.1.2. Memory requirements | 5.1.2. Memory requirements | |||
To quantify the worst case memory requirements for BGP, denote the | To quantify the worst case memory requirements for BGP, we denote the | |||
total number of networks in the Internet by N, the mean AS distance | total number of networks in the Internet by N, the mean AS distance | |||
of the Internet by M (distance at the level of an autonomous system, | of the Internet by M (distance at the level of an autonomous system, | |||
expressed in terms of the number of autonomous systems), the total | expressed in terms of the number of autonomous systems), the total | |||
number of autonomous systems in the Internet by A, and the total | number of autonomous systems in the Internet by A, and the total | |||
number of BGP speakers that a system is peering with by K (note that | number of BGP speakers that a system is peering with by K (note that | |||
K will usually be dominated by the total number of the BGP speakers | K will usually be dominated by the total number of the BGP speakers | |||
within a single autonomous system). Then the worst case memory | within a single autonomous system). Then the worst case memory | |||
requirements (MR) can be expressed as | requirements (MR) can be expressed as | |||
MR = O((N + M * A) * K) | MR = O((N + M * A) * K) | |||
It is interesting to note that prior to the introduction of BGP in | It is interesting to note that prior to the introduction of BGP in | |||
the NSFNET Backbone, memory requirements on the NSFNET Backbone | the NSFNET Backbone, memory requirements on the NSFNET Backbone | |||
routers running EGP were on the order of O(N *K). Therefore, the | routers running EGP were on the order of O(N *K). Therefore, the | |||
extra overhead in memory incurred by the modern routers running BGP | extra overhead in memory incurred by modern routers running BGP is | |||
is less than 7 percent. | less than 7 percent. | |||
Since a mean AS distance M is a slow moving function of the | Since a mean AS distance M is a slow moving function of the | |||
interconnectivity ("meshiness") of the Internet, for all practical | interconnectivity ("meshiness") of the Internet, for all practical | |||
purposes the worst case router memory requirements are on the order | purposes the worst case router memory requirements are on the order | |||
of the total number of networks in the Internet times the number of | of the total number of networks in the Internet times the number of | |||
peers the local system is peering with. We expect that the total | peers the local system is peering with. We expect that the total | |||
number of networks in the Internet will grow much faster than the | number of networks in the Internet will grow much faster than the | |||
average number of peers per router. As a result, scaling with | average number of peers per router. As a result, BGP's memory | |||
respect to the memory requirements is going to be heavily dominated | scaling properties are linearly related to the total number of | |||
by the factor that is linearly proportional to the total number of | ||||
networks in the Internet. | networks in the Internet. | |||
The following table illustrates typical memory requirements of a | The following table illustrates typical memory requirements of a | |||
router running BGP. It is assumed that each network is encoded as | router running BGP. It is assumed that each network is encoded as | |||
four bytes, each AS is encoded as two bytes, and each networks is | four bytes, each AS is encoded as two bytes, and each networks is | |||
reachable via some fraction of all of the peers (# BGP peers/per | reachable via some fraction of all of the peers (# BGP peers/per | |||
net). For purposes of estimates here, we will calculate MR = ((N*4) + | net). For purposes of the estimates here, we will calculate MR = | |||
(M*A)*2) * K. | ((N*4) + (M*A)*2) * K. | |||
# Networks Mean AS Distance # AS's # BGP peers/per net Memory Req (MR) | # Networks Mean AS Distance # AS's # BGP peers/per net Memory Req (MR) | |||
---------- ---------------- ------ ------------------- -------------- | ---------- ---------------- ------ ------------------- -------------- | |||
100,000 20 3,000 20 1,040,000 | 100,000 20 3,000 20 1,040,000 | |||
100,000 20 15,000 20 1,040,000 | 100,000 20 15,000 20 1,040,000 | |||
120,000 10 15,000 100 75,000,000 | 120,000 10 15,000 100 75,000,000 | |||
140,000 15 20,000 100 116,000,000 | 140,000 15 20,000 100 116,000,000 | |||
In analyzing BGP's memory requirements, we focus on the size of the | In analyzing BGP's memory requirements, we focus on the size of the | |||
forwarding table (ignoring implementation details). In particular, we | forwarding table (and ignoring implementation details). In | |||
derive upper bounds for the size of the forwarding table. For | particular, we derive upper bounds for the size of the forwarding | |||
example, at the time of this writing, the forwarding tables of a | table. For example, at the time of this writing, the forwarding | |||
typical backbone router carries on the order of 120,000 entries. | tables of a typical backbone router carries on the order of 120,000 | |||
Given this number, one might ask whether it would be possible to have | entries. Given this number, one might ask whether it would be | |||
a functional router with a table that will have 1,000,000 entries. | possible to have a functional router with a table that will have | |||
Clearly the answer to this question is independent of BGP. On the | 1,000,000 entries. Clearly the answer to this question is independent | |||
other hand the answer to the original questions (that was asked with | of BGP. Interestingly, in his review of the BGP protocol for the BGP | |||
respect to BGP) is directly related to the latter question. Very | review committee in March of 1990, Paul Tsuchiya noted that "BGP does | |||
interesting comments were given by Paul Tsuchiya in his review of BGP | not scale well. This is not really the fault of BGP. It is the fault | |||
in March of 1990 (as part of the BGP review committee appointed by | of the flat IP address space. Given the flat IP address space, any | |||
Bob Hinden). In the review he said that, "BGP does not scale well. | routing protocol must carry network numbers in its updates." The | |||
This is not really the fault of BGP. It is the fault of the flat IP | introduction of the provider based aggregation schemes (e.g., CIDR | |||
address space. Given the flat IP address space, any routing protocol | [RFC1519]) have sought to address this issue, to the extent possible, | |||
must carry network numbers in its updates." With the introduction of | within the context of current addressing architectures. | |||
CIDR [RFC1519] and BGP-4 [BGP4], we have attempted to reduce this | ||||
limitation. Unfortunately, we cannot erase history nor can BGP-4 | ||||
solve the problems inherent with inefficient assignment of future | ||||
address blocks. | ||||
To reiterate, BGP limits with respect to the memory requirements are | 6. BGP Policy Expressiveness and its Implications | |||
directly related to the underlying Internet Protocol (IP), and | ||||
specifically the addressing scheme employed by IP. BGP would provide | ||||
much better scaling in environments with more flexible addressing | ||||
schemes. It should be pointed out that with only very minor additions | ||||
BGP was extended to support hierarchies of autonomous system | ||||
[KUZINGER]. Such hierarchies, combined with an addressing scheme that | ||||
would allow more flexible address aggregation capabilities, can be | ||||
utilized by BGP-like protocols, thus providing practically unlimited | ||||
scaling capabilities. | ||||
4. Applicability | BGP is unique among deployed IP routing protocols in that routing is | |||
determined using semantically rich routing policies. Although | ||||
routing policies are usually the first thing that comes to a network | ||||
operator's mind concerning BGP, it is important to note that the | ||||
languages and techniques for specifying BGP routing policies are not | ||||
actually a part of the protocol specification (see RFC 2622 [RFC2622] | ||||
for an example of such a policy language). In addition, the BGP | ||||
specification contains few restrictions, either explicitly or | ||||
implicitly, on routing policy languages. These languages have | ||||
typically been developed by vendors and have evolved through | ||||
interactions with network engineers in an environment lacking vendor- | ||||
independent standards. | ||||
The complexity of typical BGP configurations, at least in provider | ||||
networks, has been steadily increasing. Router vendors typically | ||||
provided hundreds of special commands for use in the configuration of | ||||
BGP, and this command set is continually expanding. For example, BGP | ||||
communities [RFC1997] allow policy writers to selectively attach tags | ||||
to routes and use these to signal policy information to other BGP- | ||||
speaking routers. Many providers allow customers, and sometimes | ||||
peers, to send communities that determine the scope and preference of | ||||
their routes. These developments have more and more given the task of | ||||
writing BGP configurations aspects associated with open-ended | ||||
programming. This has allowed network operators to encode complex | ||||
policies in order to address many unforeseen situations, and has | ||||
opened the door for a great deal of creativity and experimentation in | ||||
routing policies. This policy flexibility is one of the main reasons | ||||
why BGP is so well suited to the commercial environment of the | ||||
current Internet. | ||||
However, this rich policy expressiveness has come with a cost that is | ||||
often not recognized. In particular, it is possible to construct | ||||
locally defined routing policies that can lead to unexpected global | ||||
routing anomalies such as (unintended) nondeterminism and to protocol | ||||
divergence. If the interacting policies causing such anomalies are | ||||
defined in different autonomous systems, then these problems can be | ||||
very difficult to debug and correct. In the following sections, we | ||||
describe two such cases relating to the existence (or lack thereof) | ||||
of stable routings. | ||||
6.1. Existence of Unique Stable Routings | ||||
One can easily construct sets of policies for which BGP can not | ||||
guarantee that stable routings are unique. This can be illustrated by | ||||
the following simple example. Consider the example of four Autonomous | ||||
Systems, AS1, AS2, AS3, and AS4. AS1 and AS2 are peers, and they | ||||
provide transit for AS3 and AS4 respectively, Suppose further that | ||||
AS3 provides transit for AS4 (in this case AS3 is a customer of AS1, | ||||
and AS4 is a multihomed customer of both AS3 and AS4). AS4 may want | ||||
to use the link to AS3 as a "backup" link, and sends AS3 a community | ||||
value that AS3 has configured to lower the preference of AS4's routes | ||||
to a level below that of its upstream provider, AS1. The intended | ||||
"backup routing" to AS4 is illustrated here: | ||||
AS1 ------> AS2 | ||||
/|\ | | ||||
| | | ||||
| | | ||||
| \|/ | ||||
AS3 ------- AS4 | ||||
That is, the AS3-AS4 link is intended to be used only when the | ||||
AS2-AS4 link is down (for outbound traffic, AS4 simply gives routes | ||||
from AS2 a higher local preference). This is a common scenario in | ||||
today's Internet. But note that this configuration has another | ||||
stable solution: | ||||
AS1 ------- AS2 | ||||
| | | ||||
| | | ||||
| | | ||||
\|/ \|/ | ||||
AS3 ------> AS4 | ||||
In this case, AS3 does not translate the "depref my route" community | ||||
received from AS4 into a "depref my route" community for AS1, and so | ||||
if AS1 hears the route to AS4 that transits AS3 it will prefer that | ||||
route (since AS3 is a customer). This state could be reached, for | ||||
example, by starting in the "correct" backup routing shown first, | ||||
bringing down the AS2-AS4 BGP session, and then bringing it back up. | ||||
In general, BGP has no way to prefer the "intended" solution over the | ||||
anomalous one, and which is picked will depend on the unpredictable | ||||
order of BGP messages. | ||||
While this example is relatively simple, many operators may fail to | ||||
recognize that the true source of the problem is that the BGP | ||||
policies of ASes can interact in unexpected ways, and that these | ||||
interactions can result in multiple stable routings. One can imagine | ||||
that the interactions could be much more complex in the real | ||||
Internet. We suspect that such anomalies will only become more common | ||||
as BGP continues to evolve with richer policy expressiveness. For | ||||
example, extended communities provide an even more flexible means of | ||||
signaling information within and between autonomous systems than is | ||||
possible with RFC 1997 communities. At the same time, applications of | ||||
communities by network operators are evolving to address complex | ||||
issues of inter-domain traffic engineering. | ||||
6.2. Existence of Stable Routings | ||||
One can also construct a set of policies for which BGP can not | ||||
guarantee that a stable routing exists (or worse, that a stable | ||||
routing will ever be found). For example, RFC 3345 [RFC3345] | ||||
documents several scenarios that lead to route oscillations | ||||
associated with the use of MEDs. Route oscillation will happen in BGP | ||||
when a set of policies has no solution. That is, when there is no | ||||
stable routing that satisfies the constraints imposed by policy, then | ||||
BGP has no choice by to keep trying. In addition, BGP configurations | ||||
can have a stable routing, yet the protocol may not be able to find | ||||
it; BGP can "get trapped" down a blind alley that has no solution. | ||||
Protocol divergence is not, however, a problem associated solely with | ||||
use of the MED attribute. This potential exists in BGP even without | ||||
the use of the MED attribute. Hence, like the unintended | ||||
nondeterminism described in the previous section, this type of | ||||
protocol divergence is an unintended consequence of the unconstrained | ||||
nature of BGP policy languages. | ||||
7. Applicability | ||||
In this section we answer the question of which environments is BGP | In this section we answer the question of which environments is BGP | |||
well suited, and for which environments it is not suitable. | well suited, and for which environments it is not suitable. This | |||
Partially this question is answered in the Section 2 of [RFC1771], | question is partially answered in Section 2 of RFC 1771 [RFC1771], | |||
where the document states the following: | which states: | |||
"To characterize the set of policy decisions that can be enforced | "To characterize the set of policy decisions that can be enforced | |||
using BGP, one must focus on the rule that an AS advertises to its | using BGP, one must focus on the rule that an AS advertises to its | |||
neighbor ASs only those routes that it itself uses. This rule | neighbor ASs only those routes that it itself uses. This rule | |||
reflects the "hop-by-hop" routing paradigm generally used | reflects the "hop-by-hop" routing paradigm generally used | |||
throughout the current Internet. Note that some policies cannot | throughout the current Internet. Note that some policies cannot | |||
be supported by the "hop-by-hop" routing paradigm and thus require | be supported by the "hop-by-hop" routing paradigm and thus require | |||
techniques such as source routing to enforce. For example, BGP | techniques such as source routing to enforce. For example, BGP | |||
does not enable one AS to send traffic to a neighbor AS intending | does not enable one AS to send traffic to a neighbor AS intending | |||
that the traffic take a different route from that taken by traffic | that the traffic take a different route from that taken by traffic | |||
originating in the neighbor AS. On the other hand, BGP can | originating in the neighbor AS. On the other hand, BGP can | |||
support any policy conforming to the "hop-by-hop" routing | support any policy conforming to the "hop-by-hop" routing | |||
paradigm. Since the current Internet uses only the "hop-by-hop" | paradigm. Since the current Internet uses only the "hop-by-hop" | |||
routing paradigm and since BGP can support any policy that | routing paradigm and since BGP can support any policy that | |||
conforms to that paradigm, BGP is highly applicable as an inter-AS | conforms to that paradigm, BGP is highly applicable as an inter-AS | |||
routing protocol for the current Internet." | routing protocol for the current Internet." | |||
Importantly, the BGP protocol contains only the functionality that is | ||||
essential, while at the same time provides flexible mechanisms within | One of the important points here is that the BGP protocol contains | |||
the protocol itself that allow to expand its functionality. For | only the functionality that is essential, while at the same time | |||
example, BGP capabilities provide an easy and flexible way to | providing a flexible mechanism within the protocol that allow us to | |||
introduce new features within the protocol. Finally, since BGP was | extend its functionality. For example, BGP capabilities provide an | |||
designed with flexibility and expandability in mind, new or evolving | easy and flexible way to introduce new features within the protocol. | |||
requirements can be addressed via existing mechanisms. The existence | Finally, since BGP was designed with flexibility and extensibility in | |||
proof of this statement may be found in the way how new features | mind, new and/or evolving requirements can be addressed via existing | |||
(like repairing a partitioned autonomous system with BGP) are already | mechanisms. | |||
introduced in the protocol. | ||||
To summarize, BGP is well suitable as an inter-autonomous system | To summarize, BGP is well suitable as an inter-autonomous system | |||
routing protocol for the IPv4 Internet that is based on IP [RFC791] | routing protocol for the IPv4 Internet that is based on IP [RFC791] | |||
as the Internet Protocol and "hop-by-hop" routing paradigm. Finally, | as the Internet Protocol and "hop-by-hop" routing paradigm. Finally, | |||
there is no reason to believe that BGP should not be equally | BGP is equally applicable to IPv6 [RFC2460] internets. | |||
applicable to IPv6 [RFC2460]. | ||||
5. Acknowledgments | 8. Acknowledgments | |||
We would like to thank Paul Traina for authoring previous versions of | We would like to thank Paul Traina for authoring previous versions of | |||
this document. | this document. Tim Griffin also provided many insightful comments on | |||
earlier versions of this document. | ||||
6. References | 9. References | |||
[BGP4] Rekhter, Y., T. Li., and Hares. S, Editors, "A | [BGP4] Rekhter, Y., T. Li., and Hares. S, Editors, "A | |||
Border Gateway Protocol 4 (BGP-4)", | Border Gateway Protocol 4 (BGP-4)", | |||
draft-ietf-idr-bgp4-19.txt. Work in progress. | draft-ietf-idr-bgp4-19.txt. Work in progress. | |||
[KUZINGER] ISO/IEC 10747, Kunzinger, C., Editor, | ||||
"Inter-Domain Routing Protocol", October 1993. | ||||
[RFC791] "INTERNET PROTOCOL", DARPA INTERNET PROGRAM | [RFC791] "INTERNET PROTOCOL", DARPA INTERNET PROGRAM | |||
PROTOCOL SPECIFICATION, RFC 791, September, | PROTOCOL SPECIFICATION, RFC 791, September, | |||
1981. | 1981. | |||
[RFC854] Postel, J. and Reynolds, J., "TELNET PROTOCOL | [RFC854] Postel, J. and Reynolds, J., "TELNET PROTOCOL | |||
SPECIFICATION", RFC 854, May, 1983. | SPECIFICATION", RFC 854, May, 1983. | |||
[RFC1105] Lougheed, K., and Rekhter, Y, "Border Gateway | [RFC1105] Lougheed, K., and Rekhter, Y, "Border Gateway | |||
Protocol BGP", RFC 1105, June 1989. | Protocol BGP", RFC 1105, June 1989. | |||
skipping to change at page 13, line 45 | skipping to change at page 17, line 42 | |||
Address Assignment and Aggregation Strategy", RFC | Address Assignment and Aggregation Strategy", RFC | |||
1519, September 1993. | 1519, September 1993. | |||
[RFC1771] Rekhter, Y., and T. Li, "A Border Gateway | [RFC1771] Rekhter, Y., and T. Li, "A Border Gateway | |||
Protocol 4 (BGP-4)", RFC 1771, March 1995. | Protocol 4 (BGP-4)", RFC 1771, March 1995. | |||
[RFC1772] Rekhter, Y., and P. Gross, Editors, "Application | [RFC1772] Rekhter, Y., and P. Gross, Editors, "Application | |||
of the Border Gateway Protocol in the Internet", | of the Border Gateway Protocol in the Internet", | |||
RFC 1772, March 1995. | RFC 1772, March 1995. | |||
[RFC1997] Chandra. R, and T. Li, "BGP Communities | ||||
Attribute", RFC 1997, August, 1996. | ||||
[RFC2439] Villamizar, C., Chandra, R., and Govindan, R., | [RFC2439] Villamizar, C., Chandra, R., and Govindan, R., | |||
"BGP Route Flap Damping", RFC 2439, November | "BGP Route Flap Damping", RFC 2439, November | |||
1998. | 1998. | |||
[RFC2460] Deering, S, and R. Hinden, "Internet Protocol, | [RFC2460] Deering, S, and R. Hinden, "Internet Protocol, | |||
Version 6 (IPv6) Specification", RFC 2460, | Version 6 (IPv6) Specification", RFC 2460, | |||
December, 1998. | December, 1998. | |||
[RFC2622] C. Alaettinoglu, et al., "Routing Policy | ||||
Specification Language (RPSL)" RFC 2622, May, | ||||
1998. | ||||
[RFC2842] Chandra, R. and J. Scudder, "Capabilities | ||||
Advertisement with BGP-4", RFC 2842, May 2000. | ||||
[RFC3345] McPherson, D., Gill, V., Walton, D., and | ||||
Retana, A, "Border Gateway Protocol (BGP) Persistent | ||||
Route Oscillation Condition", RFC 3345, | ||||
August, 2002. | ||||
[ROUTEVIEWS] Meyer, David, "The Route Views Project", | [ROUTEVIEWS] Meyer, David, "The Route Views Project", | |||
http://www.routeviews.org | http://www.routeviews.org | |||
7. Author's Address | 10. Author's Address | |||
David Meyer | David Meyer | |||
Email: dmm@maoz.com | Email: dmm@maoz.com | |||
Keyur Patel | Keyur Patel | |||
Cisco Systems | Cisco Systems | |||
Email: keyupate@cisco.com | Email: keyupate@cisco.com | |||
8. Full Copyright Statement | 11. Full Copyright Statement | |||
Copyright (C) The Internet Society (2003). All Rights Reserved. | Copyright (C) The Internet Society (2003). All Rights Reserved. | |||
This document and translations of it may be copied and furnished to | This document and translations of it may be copied and furnished to | |||
others, and derivative works that comment on or otherwise explain it | others, and derivative works that comment on or otherwise explain it | |||
or assist in its implementation may be prepared, copied, published | or assist in its implementation may be prepared, copied, published | |||
and distributed, in whole or in part, without restriction of any | and distributed, in whole or in part, without restriction of any | |||
kind, provided that the above copyright notice and this paragraph are | kind, provided that the above copyright notice and this paragraph are | |||
included on all such copies and derivative works. However, this | included on all such copies and derivative works. However, this | |||
document itself may not be modified in any way, such as by removing | document itself may not be modified in any way, such as by removing | |||
End of changes. | ||||
This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |