draft-ietf-idr-bgp4-experience-protocol-05.txt | rfc4277.txt | |||
---|---|---|---|---|
INTERNET-DRAFT Danny McPherson | Network Working Group D. McPherson | |||
Arbor Networks | Request for Comments: 4277 Arbor Networks | |||
Keyur Patel | Category: Informational K. Patel | |||
Cisco Systems | Cisco Systems | |||
Category Informational | January 2006 | |||
Expires: March 2005 September 2004 | ||||
Experience with the BGP-4 Protocol | Experience with the BGP-4 Protocol | |||
<draft-ietf-idr-bgp4-experience-protocol-05.txt> | ||||
Status of this Document | ||||
By submitting this Internet-Draft, I certify that any applicable | ||||
patent or other IPR claims of which I am aware have been disclosed, | ||||
and any of which I become aware will be disclosed, in accordance with | ||||
RFC 3668. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF), its areas, and its working groups. Note that | ||||
other groups may also distribute working documents as Internet- | ||||
Drafts. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | ||||
and may be updated, replaced, or obsoleted by other documents at any | ||||
time. It is inappropriate to use Internet-Drafts as reference | ||||
material or to cite them other than as "work in progress." | ||||
The list of current Internet-Drafts can be accessed at | ||||
http://www.ietf.org/ietf/1id-abstracts.txt. | ||||
The list of Internet-Draft Shadow Directories can be accessed at | Status of This Memo | |||
http://www.ietf.org/shadow.html. | ||||
This document is an individual submission. Comments are solicited and | This memo provides information for the Internet community. It does | |||
should be addressed to the author(s). | not specify an Internet standard of any kind. Distribution of this | |||
memo is unlimited. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (C) The Internet Society (2004). All Rights Reserved. | Copyright (C) The Internet Society (2006). | |||
Abstract | Abstract | |||
The purpose of this memo is to document how the requirements for | The purpose of this memo is to document how the requirements for | |||
advancing a routing protocol from Draft Standard to full Standard | publication of a routing protocol as an Internet Draft Standard have | |||
have been satisfied by Border Gateway Protocol version 4 (BGP-4). | been satisfied by Border Gateway Protocol version 4 (BGP-4). | |||
This report satisfies the requirement for "the second report", as | This report satisfies the requirement for "the second report", as | |||
described in Section 6.0 of RFC 1264. In order to fulfill the | described in Section 6.0 of RFC 1264. In order to fulfill the | |||
requirement, this report augments RFC 1773 and describes additional | requirement, this report augments RFC 1773 and describes additional | |||
knowledge and understanding gained in the time between when the | knowledge and understanding gained in the time between when the | |||
protocol was made a Draft Standard and when it was submitted for | protocol was made a Draft Standard and when it was submitted for | |||
Standard. | Standard. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction ................................................. 3 | |||
2. BGP-4 Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. BGP-4 Overview ............................................... 3 | |||
2.1. A Border Gateway Protocol . . . . . . . . . . . . . . . . . 4 | 2.1. A Border Gateway Protocol .............................. 3 | |||
3. Management Information Base (MIB). . . . . . . . . . . . . . . 5 | 3. Management Information Base (MIB) ............................ 3 | |||
4. Implementation Information . . . . . . . . . . . . . . . . . . 5 | 4. Implementation Information ................................... 4 | |||
5. Operational Experience . . . . . . . . . . . . . . . . . . . . 5 | 5. Operational Experience ....................................... 4 | |||
6. TCP Awareness. . . . . . . . . . . . . . . . . . . . . . . . . 6 | 6. TCP Awareness ................................................ 5 | |||
7. Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 7. Metrics ...................................................... 5 | |||
7.1. MULTI_EXIT_DISC (MED) . . . . . . . . . . . . . . . . . . . 7 | 7.1. MULTI_EXIT_DISC (MED) .................................. 5 | |||
7.1.1. MEDs and Potatoes. . . . . . . . . . . . . . . . . . . . 8 | 7.1.1. MEDs and Potatoes .............................. 6 | |||
7.1.2. Sending MEDs to BGP Peers. . . . . . . . . . . . . . . . 8 | 7.1.2. Sending MEDs to BGP Peers ...................... 7 | |||
7.1.3. MED of Zero Versus No MED. . . . . . . . . . . . . . . . 9 | 7.1.3. MED of Zero Versus No MED ...................... 7 | |||
7.1.4. MEDs and Temporal Route Selection. . . . . . . . . . . . 9 | 7.1.4. MEDs and Temporal Route Selection .............. 7 | |||
8. Local Preference . . . . . . . . . . . . . . . . . . . . . . . 9 | 8. Local Preference ............................................. 8 | |||
9. Internal BGP In Large Autonomous Systems . . . . . . . . . . . 10 | 9. Internal BGP In Large Autonomous Systems ..................... 9 | |||
10. Internet Dynamics . . . . . . . . . . . . . . . . . . . . . . 11 | 10. Internet Dynamics ............................................ 9 | |||
11. BGP Routing Information Bases (RIBs). . . . . . . . . . . . . 12 | 11. BGP Routing Information Bases (RIBs) ......................... 10 | |||
12. Update Packing. . . . . . . . . . . . . . . . . . . . . . . . 12 | 12. Update Packing ............................................... 10 | |||
13. Limit Rate Updates. . . . . . . . . . . . . . . . . . . . . . 13 | 13. Limit Rate Updates ........................................... 11 | |||
13.1. Consideration of TCP Characteristics . . . . . . . . . . . 14 | 13.1. Consideration of TCP Characteristics ................... 11 | |||
14. Ordering of Path Attributes . . . . . . . . . . . . . . . . . 14 | 14. Ordering of Path Attributes .................................. 12 | |||
15. AS_SET Sorting. . . . . . . . . . . . . . . . . . . . . . . . 15 | 15. AS_SET Sorting ............................................... 12 | |||
16. Control over Version Negotiation. . . . . . . . . . . . . . . 15 | 16. Control Over Version Negotiation ............................. 13 | |||
17. Security Considerations . . . . . . . . . . . . . . . . . . . 15 | 17. Security Considerations ...................................... 13 | |||
17.1. TCP MD5 Signature Option . . . . . . . . . . . . . . . . . 16 | 17.1. TCP MD5 Signature Option ............................... 13 | |||
17.2. BGP Over IPSEC . . . . . . . . . . . . . . . . . . . . . . 16 | 17.2. BGP Over IPsec ......................................... 14 | |||
17.3. Miscellaneous. . . . . . . . . . . . . . . . . . . . . . . 17 | 17.3. Miscellaneous .......................................... 14 | |||
18. PTOMAINE and GROW . . . . . . . . . . . . . . . . . . . . . . 17 | 18. PTOMAINE and GROW ............................................ 14 | |||
19. Internet Routing Registries (IRRs). . . . . . . . . . . . . . 17 | 19. Internet Routing Registries (IRRs) ........................... 15 | |||
20. Regional Internet Registries (RIRs) and IRRs, A | 20. Regional Internet Registries (RIRs) and IRRs, A Bit | |||
Bit of History. . . . . . . . . . . . . . . . . . . . . . . . . . 18 | of History ................................................... 15 | |||
21. Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . 19 | 21. Acknowledgements ............................................. 16 | |||
22. References. . . . . . . . . . . . . . . . . . . . . . . . . . 20 | 22. References ................................................... 17 | |||
22.1. Normative References . . . . . . . . . . . . . . . . . . . 20 | 22.1. Normative References ................................... 17 | |||
22.2. Informative References . . . . . . . . . . . . . . . . . . 21 | 22.2. Informative References ................................. 17 | |||
23. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . 21 | ||||
1. Introduction | 1. Introduction | |||
The purpose of this memo is to document how the requirements for | The purpose of this memo is to document how the requirements for | |||
advancing a routing protocol from Draft Standard to full Standard | publication of a routing protocol as an Internet Draft Standard have | |||
have been satisfied by Border Gateway Protocol version 4 (BGP-4). | been satisfied by Border Gateway Protocol version 4 (BGP-4). | |||
This report satisfies the requirement for "the second report", as | This report satisfies the requirement for "the second report", as | |||
described in Section 6.0 of RFC 1264. In order to fulfill the | described in Section 6.0 of [RFC1264]. In order to fulfill the | |||
requirement, this report augments RFC 1773 and describes additional | requirement, this report augments [RFC1773] and describes additional | |||
knowledge and understanding gained in the time between when the | knowledge and understanding gained in the time between when the | |||
protocol was made a Draft Standard and when it was submitted for | protocol was made a Draft Standard and when it was submitted for | |||
Standard. | Standard. | |||
2. BGP-4 Overview | 2. BGP-4 Overview | |||
BGP is an inter-autonomous system routing protocol designed for | BGP is an inter-autonomous system routing protocol designed for | |||
TCP/IP internets. The primary function of a BGP speaking system is | TCP/IP internets. The primary function of a BGP speaking system is | |||
to exchange network reachability information with other BGP systems. | to exchange network reachability information with other BGP systems. | |||
This network reachability information includes information on the | This network reachability information includes information on the | |||
list of Autonomous Systems (ASs) that reachability information | list of Autonomous Systems (ASes) that reachability information | |||
traverses. This information is sufficient to construct a graph of AS | traverses. This information is sufficient to construct a graph of AS | |||
connectivity for this reachability from which routing loops may be | connectivity for this reachability, from which routing loops may be | |||
pruned and some policy decisions at the AS level may be enforced. | pruned and some policy decisions, at the AS level, may be enforced. | |||
The initial version of the BGP protocol was published in RFC 1105. | The initial version of the BGP protocol was published in [RFC1105]. | |||
Since then BGP Versions 2, 3, and 4 have been developed and are | Since then, BGP Versions 2, 3, and 4 have been developed and are | |||
specified in [RFC 1163], [RFC 1267], and [RFC 1771], respectively. | specified in [RFC 1163], [RFC 1267], and [RFC 1771], respectively. | |||
Changes since BGP-4 went to Draft Standard [RFC 1771] are listed in | Changes to BGP-4 after it went to Draft Standard [RFC1771] are listed | |||
Appendix N of [BGP4]. | in Appendix N of [RFC4271]. | |||
2.1. A Border Gateway Protocol | 2.1. A Border Gateway Protocol | |||
The Initial Version of BGP protocol was published in [RFC 1105]. BGP | The initial version of the BGP protocol was published in [RFC1105]. | |||
version 2 is defined in [RFC 1163]. BGP version 3 is defined in [RFC | BGP version 2 is defined in [RFC1163]. BGP version 3 is defined in | |||
1267]. BGP version 4 is defined in [RFC 1771] and [BGP4]. | [RFC1267]. BGP version 4 is defined in [RFC1771] and [RFC4271]. | |||
Appendices A, B, C, and D of [BGP4] provide summaries of the changes | Appendices A, B, C, and D of [RFC4271] provide summaries of the | |||
between each iteration of the BGP specification. | changes between each iteration of the BGP specification. | |||
3. Management Information Base (MIB) | 3. Management Information Base (MIB) | |||
The BGP-4 Management Information Base (MIB) has been published [BGP- | The BGP-4 Management Information Base (MIB) has been published | |||
MIB]. The MIB was updated from previous versions documented in [RFC | [BGP-MIB]. The MIB was updated from previous versions, which are | |||
1657] and [RFC 1269], respectively. | documented in [RFC1657] and [RFC1269], respectively. | |||
Apart from a few system variables, the BGP MIB is broken into two | Apart from a few system variables, the BGP MIB is broken into two | |||
tables: the BGP Peer Table and the BGP Received Path Attribute Table. | tables: the BGP Peer Table and the BGP Received Path Attribute Table. | |||
The Peer Table reflects information about BGP peer connections, such | The Peer Table reflects information about BGP peer connections, such | |||
as their state and current activity. The Received Path Attribute | as their state and current activity. The Received Path Attribute | |||
Table contains all attributes received from all peers before local | Table contains all attributes received from all peers before local | |||
routing policy has been applied. The actual attributes used in | routing policy has been applied. The actual attributes used in | |||
determining a route are a subset of the received attribute table. | determining a route are a subset of the received attribute table. | |||
4. Implementation Information | 4. Implementation Information | |||
There are numerous independent interoperable implementations of BGP | There are numerous independent interoperable implementations of BGP | |||
currently available. Although the previous version of this report | currently available. Although the previous version of this report | |||
provided an overview of the implementations currently used in the | provided an overview of the implementations currently used in the | |||
operational Internet, at this time it has been suggested that a | operational Internet, at that time it has been suggested that a | |||
separate BGP Implementation Report [BGP-IMPL] be generated. | separate BGP Implementation Report [RFC4276] be generated. | |||
It should be noted that implementation experience with Cisco's BGP-4 | It should be noted that implementation experience with Cisco's BGP-4 | |||
implementation was documented as part of [RFC 1656]. | implementation was documented as part of [RFC 1656]. | |||
For all additional implementation information please reference [BGP- | For all additional implementation information please reference | |||
IMPL]. | [RFC4276]. | |||
5. Operational Experience | 5. Operational Experience | |||
This section discusses operational experience with BGP and BGP-4. | This section discusses operational experience with BGP and BGP-4. | |||
BGP has been used in the production environment since 1989, BGP-4 | BGP has been used in the production environment since 1989; BGP-4 has | |||
since 1993. Production use of BGP includes utilization of all | been used since 1993. Production use of BGP includes utilization of | |||
significant features of the protocol. The present production | all significant features of the protocol. The present production | |||
environment, where BGP is used as the inter-autonomous system routing | environment, where BGP is used as the inter-autonomous system routing | |||
protocol, is highly heterogeneous. In terms of the link bandwidth it | protocol, is highly heterogeneous. In terms of link bandwidth, it | |||
varies from 56 Kbps to 10 Gbps. In terms of the actual routers that | varies from 56 Kbps to 10 Gbps. In terms of the actual routers that | |||
run BGP, it ranges from a relatively slow performance general purpose | run BGP, they range from relatively slow performance, general purpose | |||
CPUs to very high performance RISC network processors, and includes | CPUs to very high performance RISC network processors, and include | |||
both special purpose routers and the general purpose workstations | both special purpose routers and the general purpose workstations | |||
running various UNIX derivatives and other operating systems. | that run various UNIX derivatives and other operating systems. | |||
In terms of the actual topologies it varies from very sparse to quite | In terms of the actual topologies, it varies from very sparse to | |||
dense. The requirement for full-mesh IBGP topologies has been | quite dense. The requirement for full-mesh IBGP topologies has been | |||
largely remedied by BGP Route Reflection, Autonomous System | largely remedied by BGP Route Reflection, Autonomous System | |||
Confederations for BGP, and often some mix of the two. BGP Route | Confederations for BGP, and often some mix of the two. BGP Route | |||
Reflection was initially defined in [RFC 1966] and subsequently | Reflection was initially defined in [RFC1966] and was updated in | |||
updated in [RFC 2796]. Autonomous System Confederations for BGP were | [RFC2796]. Autonomous System Confederations for BGP were initially | |||
initially defined in [RFC 1965] and subsequently updated in [RFC | defined in [RFC1965] and were updated in [RFC3065]. | |||
3065]. | ||||
At the time of this writing BGP-4 is used as an inter-autonomous | At the time of this writing, BGP-4 is used as an inter-autonomous | |||
system routing protocol between all Internet-attached autonomous | system routing protocol between all Internet-attached autonomous | |||
systems, with nearly 15k active autonomous systems in the global | systems, with nearly 21k active autonomous systems in the global | |||
Internet routing table. | Internet routing table. | |||
BGP is used both for the exchange of routing information between a | BGP is used both for the exchange of routing information between a | |||
transit and a stub autonomous system, and for the exchange of routing | transit and a stub autonomous system, and for the exchange of routing | |||
information between multiple transit autonomous systems. There is no | information between multiple transit autonomous systems. There is no | |||
protocol distinction between sites historically considered | protocol distinction between sites historically considered | |||
"backbones" versus "regional" or "edge" networks. | "backbones" versus "regional" or "edge" networks. | |||
The full set of exterior routes that is carried by BGP is well over | The full set of exterior routes carried by BGP is well over 170,000 | |||
134,000 aggregate entries, representing several times that number of | aggregate entries, representing several times that number of | |||
connected networks. The number of active paths in some service | connected networks. The number of active paths in some service | |||
provider core routers exceeds 2.5 million. Native AS path lengths | provider core routers exceeds 2.5 million. Native AS path lengths | |||
are as long as 10 for some routes, and "padded" path lengths of 25 or | are as long as 10 for some routes, and "padded" path lengths of 25 or | |||
more autonomous systems exist. | more autonomous systems exist. | |||
6. TCP Awareness | 6. TCP Awareness | |||
BGP employs TCP [RFC 793] as it's Transport Layer protocol. As such, | BGP employs TCP [RFC 793] as it's Transport Layer protocol. As such, | |||
all characteristics inherent to TCP are inherited by BGP. | all characteristics inherent to TCP are inherited by BGP. | |||
For example, due to TCP's behavior, bandwidth capabilities may not be | For example, due to TCP's behavior, bandwidth capabilities may not be | |||
realized due to TCP's slow start algorithms, and slow-start restarts | realized because of TCP's slow start algorithms and slow-start | |||
of connections, etc.. | restarts of connections, etc. | |||
7. Metrics | 7. Metrics | |||
This section discusses different metrics used within the BGP | This section discusses different metrics used within the BGP | |||
protocol. BGP has a separate metric parameter for IBGP and EBGP. This | protocol. BGP has a separate metric parameter for IBGP and EBGP. | |||
allows policy based metrics to overwrite the distance based metrics; | This allows policy-based metrics to overwrite the distance-based | |||
allowing each autonomous systems to define their independent policies | metrics; this allows each autonomous system to define its independent | |||
in Intra-AS as well as Inter-AS. BGP Multi Exit Discriminator (MED) | policies in Intra-AS, as well as Inter-AS. BGP Multi Exit | |||
is used as a metric by EBGP peers (i.e., inter-domain) while Local | Discriminator (MED) is used as a metric by EBGP peers (i.e., inter- | |||
Preference (LOCAL_PREF) is used by IBGP peers (i.e., intra-domain). | domain), while Local Preference (LOCAL_PREF) is used by IBGP peers | |||
(i.e., intra-domain). | ||||
7.1. MULTI_EXIT_DISC (MED) | 7.1. MULTI_EXIT_DISC (MED) | |||
BGP version 4 re-defined the old INTER-AS metric as a MULTI_EXIT_ | BGP version 4 re-defined the old INTER-AS metric as a MULTI_EXIT_DISC | |||
DISC (MED). This value may be used in the tie-breaking process when | (MED). This value may be used in the tie-breaking process when | |||
selecting a preferred path to a given address space, and provides BGP | selecting a preferred path to a given address space, and provides BGP | |||
speakers with the capability to convey to a peer AS the optimal entry | speakers with the capability of conveying the optimal entry point | |||
point into the local AS. | into the local AS to a peer AS. | |||
Although the MED was meant to only be used when comparing paths | Although the MED was meant to only be used when comparing paths | |||
received from different external peers in the same AS, many | received from different external peers in the same AS, many | |||
implementations provide the capability to compare MEDs between | implementations provide the capability to compare MEDs between | |||
different autonomous systems as well. | different autonomous systems. | |||
Though this may seem a fine idea for some configurations, care must | Though this may seem a fine idea for some configurations, care must | |||
be taken when comparing MEDs between different autonomous systems. | be taken when comparing MEDs of different autonomous systems. BGP | |||
BGP speakers often derive MED values by obtaining the IGP metric | speakers often derive MED values by obtaining the IGP metric | |||
associated with reaching a given BGP NEXT_HOP within the local AS. | associated with reaching a given BGP NEXT_HOP within the local AS. | |||
This allows MEDs to reasonably reflect IGP topologies when | This allows MEDs to reasonably reflect IGP topologies when | |||
advertising routes to peers. While this is fine when comparing MEDs | advertising routes to peers. While this is fine when comparing MEDs | |||
between multiple paths learned from a single adjacent AS, it can | of multiple paths learned from a single adjacent AS, it can result in | |||
result in potentially bad decisions when comparing MEDs between | potentially bad decisions when comparing MEDs of different autonomous | |||
different automomous systems. This is most typically the case when | systems. This is most typically the case when the autonomous systems | |||
the autonomous systems use different mechanisms to derive IGP | use different mechanisms to derive IGP metrics, BGP MEDs, or perhaps | |||
metrics, BGP MEDs, or perhaps even use different IGP procotols with | even use different IGP protocols with vastly contrasting metric | |||
vastly contrasting metric spaces. | spaces. | |||
Another MED deployment consideration involves the impact of | Another MED deployment consideration involves the impact of the | |||
aggregation of BGP routing information on MEDs. Aggregates are often | aggregation of BGP routing information on MEDs. Aggregates are often | |||
generated from multiple locations in an AS in order to accommodate | generated from multiple locations in an AS to accommodate stability, | |||
stability, redundancy and other network design goals. When MEDs are | redundancy, and other network design goals. When MEDs are derived | |||
derived from IGP metrics associated with said aggregates the MED | from IGP metrics associated with said aggregates, the MED value | |||
value advertised to peers can result in very suboptimal routing. | advertised to peers can result in very suboptimal routing. | |||
The MED was purposely designed to be a "weak" metric that would only | The MED was purposely designed to be a "weak" metric that would only | |||
be used late in the best-path decision process. The BGP working | be used late in the best-path decision process. The BGP working | |||
group was concerned that any metric specified by a remote operator | group was concerned that any metric specified by a remote operator | |||
would only affect routing in a local AS if no other preference was | would only affect routing in a local AS if no other preference was | |||
specified. A paramount goal of the design of the MED was to ensure | specified. A paramount goal of the design of the MED was to ensure | |||
that peers could not "shed" or "absorb" traffic for networks that | that peers could not "shed" or "absorb" traffic for networks they | |||
they advertise. | advertise. | |||
7.1.1. MEDs and Potatoes | 7.1.1. MEDs and Potatoes | |||
In a situation where traffic flows between a pair of destinations, | Where traffic flows between a pair of destinations, each is connected | |||
each connected to two transit networks, each of the transit networks | to two transit networks, each of the transit networks has the choice | |||
has the choice of either sending the traffic to the closest peering | of sending the traffic to the peering closest to another transit | |||
to other transit provider or passing traffic to the peering which | provider or passing traffic to the peering that advertises the least | |||
advertises the least cost through the other provider. The former | cost through the other provider. The former method is called "hot | |||
method is called "hot potato routing" because like a hot potato held | potato routing" because, like a hot potato held in bare hands, | |||
in bare hands, whoever has it tries to get rid of it quickly. Hot | whoever has it tries to get rid of it quickly. Hot potato routing is | |||
potato routing is accomplished by not passing the EGBP learned MED | accomplished by not passing the EBGP-learned MED into the IBGP. This | |||
into IBGP. This minimizes transit traffic for the provider routing | minimizes transit traffic for the provider routing the traffic. Far | |||
the traffic. Far less common is "cold potato routing" where the | less common is "cold potato routing", where the transit provider uses | |||
transit provider uses their own transit capacity to get the traffic | its own transit capacity to get the traffic to the point in the | |||
to the point in the adjacent transit provider advertised as being | adjacent transit provider advertised as being closest to the | |||
closest to the destination. Cold potato routing is accomplished by | destination. Cold potato routing is accomplished by passing the | |||
passing the EBGP learned MED into IBGP. | EBGP-learned MED into IBGP. | |||
If one transit provider uses hot potato routing and another uses cold | If one transit provider uses hot potato routing and another uses cold | |||
potato, traffic between the two tends to be symetric. Depending on | potato routing, traffic between the two tends to be symmetric. | |||
the business relationships, if one provider has more capacity or a | Depending on the business relationships, if one provider has more | |||
significantly less congested transit network, then that provider may | capacity or a significantly less congested transit network, then that | |||
use cold potato routing. An example of widespread use of cold potato | provider may use cold potato routing. The NSF-funded NSFNET backbone | |||
routing was the NSF funded NSFNET backbone and NSF funded regional | and NSF-funded regional networks are examples of widespread use of | |||
networks in the mid 1990s. | cold potato routing in the mid 1990s. | |||
In some cases a provider may use hot potato routing for some | In some cases, a provider may use hot potato routing for some | |||
destinations for a given peer AS and cold potato routing for others. | destinations for a given peer AS, and cold potato routing for others. | |||
An example of this is the different treatment of commercial and | The different treatment of commercial and research traffic in the | |||
research traffic in the NSFNET in the mid 1990s. Then again, this | NSFNET in the mid 1990s is an example of this. However, this might | |||
might best be described as 'mashed potato routing', a term which | best be described as 'mashed potato routing', a term that reflects | |||
reflects the complexity of router configurations in use at the time. | the complexity of router configurations in use at the time. | |||
Seemingly more intuitive references that fall outside the vegetable | Seemingly more intuitive references, which fall outside the vegetable | |||
kingdom refer to cold potato routing as "best exit routing", and hot | kingdom, refer to cold potato routing as "best exit routing", and hot | |||
potato routing as "closest exit routing". | potato routing as "closest exit routing". | |||
7.1.2. Sending MEDs to BGP Peers | 7.1.2. Sending MEDs to BGP Peers | |||
[BGP4] allows MEDs received from any EBGP peers by a BGP speaker to | [RFC4271] allows MEDs received from any EBGP peers by a BGP speaker | |||
be passed to its IBGP peers. Although advertising MEDs to IBGP peers | to be passed to its IBGP peers. Although advertising MEDs to IBGP | |||
is not a required behavior, it is a common default. MEDs received | peers is not a required behavior, it is a common default. MEDs | |||
from EBGP peers by a BGP speaker SHOULD NOT be sent to other EBGP | received from EBGP peers by a BGP speaker SHOULD NOT be sent to other | |||
peers. | EBGP peers. | |||
Note that many implementations provide a mechanism to derive MED | Note that many implementations provide a mechanism to derive MED | |||
values from IGP metrics in order to allow BGP MED information to | values from IGP metrics to allow BGP MED information to reflect the | |||
reflect the IGP topologies and metrics of the network when | IGP topologies and metrics of the network when propagating | |||
propagating information to adjacent autonomous systems. | information to adjacent autonomous systems. | |||
7.1.3. MED of Zero Versus No MED | 7.1.3. MED of Zero Versus No MED | |||
[BGP4] requires that an implementation must provide a mechanism that | [RFC4271] requires an implementation to provide a mechanism that | |||
allows for MED to be removed. Previously, implementations did not | allows MED to be removed. Previously, implementations did not | |||
consider a missing MED value to be the same as a MED of zero. [BGP4] | consider a missing MED value the same as a MED of zero. [RFC4271] | |||
now requires that no MED value be equal to a value of zero. | now requires that no MED value be equal to zero. | |||
Note that many implementations provide a mechanism to explicitly | Note that many implementations provide a mechanism to explicitly | |||
define a missing MED value as "worst" or less preferable than zero or | define a missing MED value as "worst", or less preferable than zero | |||
larger values. | or larger values. | |||
7.1.4. MEDs and Temporal Route Selection | 7.1.4. MEDs and Temporal Route Selection | |||
Some implementations have hooks to apply temporal behavior in MED- | Some implementations have hooks to apply temporal behavior in MED- | |||
based best path selection. That is, all other things being equal up | based best path selection. That is, all things being equal up to MED | |||
to MED consideration, preference would be applied to the "oldest" | consideration, preference would be applied to the "oldest" path, | |||
path, without preferring the lower MED value. The reasoning for this | without preference for the lower MED value. The reasoning for this | |||
is that "older" paths are presumably more stable, and thus more | is that "older" paths are presumably more stable, and thus | |||
preferable. However, temporal behavior in route selection results in | preferable. However, temporal behavior in route selection results in | |||
non-deterministic behavior, and as such, may often be undesirable. | non-deterministic behavior, and as such, may often be undesirable. | |||
8. Local Preference | 8. Local Preference | |||
The LOCAL_PREF attribute was added so a network operator could easily | The LOCAL_PREF attribute was added to enable a network operator to | |||
configure a policy that overrode the standard best path determination | easily configure a policy that overrides the standard best path | |||
mechanism without independently configuring local preference policy | determination mechanism without independently configuring local | |||
on each router. | preference policy on each router. | |||
One shortcoming in the BGP-4 specification was a suggestion for a | One shortcoming in the BGP-4 specification was the suggestion that a | |||
default value of LOCAL_PREF to be assumed if none was provided. | default value of LOCAL_PREF be assumed if none was provided. | |||
Defaults of 0 or the maximum value each have range limitations, so a | Defaults of zero or the maximum value each have range limitations, so | |||
common default would aid in the interoperation of multi-vendor | a common default would aid in the interoperation of multi-vendor | |||
routers in the same AS (since LOCAL_PREF is a local administration | routers in the same AS (since LOCAL_PREF is a local administration | |||
attribute, there is no interoperability drawback across AS | attribute, there is no interoperability drawback across AS | |||
boundaries). | boundaries). | |||
[BGP4] requires that LOCAL_PREF be sent to IBGP Peers and must not be | [RFC4271] requires that LOCAL_PREF be sent to IBGP Peers and not to | |||
sent to EBGP Peers. Although no default value for LOCAL_PREF is | EBGP Peers. Although no default value for LOCAL_PREF is defined, the | |||
defined, the common default value is 100. | common default value is 100. | |||
Another area where more exploration is required is a method whereby | Another area where exploration is required is a method whereby an | |||
an originating AS may influence the best path selection process. For | originating AS may influence the best path selection process. For | |||
example, a dual-connected site may select one AS as a primary transit | example, a dual-connected site may select one AS as a primary transit | |||
service provider and have one as a backup. | service provider and have one as a backup. | |||
/---- transit B ----\ | /---- transit B ----\ | |||
end-customer transit A---- | end-customer transit A---- | |||
/---- transit C ----\ | /---- transit C ----\ | |||
In a topology where the two transit service providers connect to a | In a topology where the two transit service providers connect to a | |||
third provider, the real decision is performed by the third provider | third provider, the real decision is performed by the third provider. | |||
and there is no mechanism for indicating a preference should the | There is no mechanism to indicate a preference should the third | |||
third provider wish to respect that preference. | provider wish to respect that preference. | |||
A general purpose suggestion that has been brought up is the | A general purpose suggestion has been the possibility of carrying an | |||
possibility of carrying an optional vector corresponding to the AS_ | optional vector, corresponding to the AS_PATH, where each transit AS | |||
PATH where each transit AS may indicate a preference value for a | may indicate a preference value for a given route. Cooperating | |||
given route. Cooperating autonomous systems may then chose traffic | autonomous systems may then choose traffic based upon comparison of | |||
based upon comparison of "interesting" portions of this vector | "interesting" portions of this vector, according to routing policy. | |||
according to routing policy. | ||||
While protecting a given autonoumous systems routing policy is of | While protecting a given autonomous systems routing policy is of | |||
paramount concern, avoiding extensive hand configuration of routing | paramount concern, avoiding extensive hand configuration of routing | |||
policies needs to be examined more carefully in future BGP-like | policies needs to be examined more carefully in future BGP-like | |||
protocols. | protocols. | |||
9. Internal BGP In Large Autonomous Systems | 9. Internal BGP In Large Autonomous Systems | |||
While not strictly a protocol issue, one other concern has been | While not strictly a protocol issue, another concern has been raised | |||
raised by network operators who need to maintain autonomous systems | by network operators who need to maintain autonomous systems with a | |||
with a large number of peers. Each speaker peering with an external | large number of peers. Each speaker peering with an external router | |||
router is responsible for propagating reachability and path | is responsible for propagating reachability and path information to | |||
information to all other transit and border routers within that AS. | all other transit and border routers within that AS. This is | |||
This is typically done by establishing internal BGP connections to | typically done by establishing internal BGP connections to all | |||
all transit and border routers in the local AS. | transit and border routers in the local AS. | |||
Note that the number of BGP peers that can be fully meshed depends on | Note that the number of BGP peers that can be fully meshed depends on | |||
a number of factors, to include number of prefixes in the routing | a number of factors, including the number of prefixes in the routing | |||
system, number of unique path, stability of the system, and perhaps | system, the number of unique paths, stability of the system, and, | |||
most importantly, implementation efficiency. As a result, although | perhaps most importantly, implementation efficiency. As a result, | |||
it's difficult to define "a large number of peers", there is always | although it's difficult to define "a large number of peers", there is | |||
some practical limit. | always some practical limit. | |||
In a large AS, this leads to a full mesh of TCP connections (n * | In a large AS, this leads to a full mesh of TCP connections | |||
(n-1)) and some method of configuring and maintaining those | (n * (n-1)) and some method of configuring and maintaining those | |||
connections. BGP does not specify how this information is to be | connections. BGP does not specify how this information is to be | |||
propagated, so alternatives, such as injecting BGP routing | propagated. Therefore, alternatives, such as injecting BGP routing | |||
information into the local IGP have been attempted, though it turned | information into the local IGP, have been attempted, but turned out | |||
out to be a non-practical alternative (to say the least). | to be non-practical alternatives (to say the least). | |||
Several alternatives to a full mesh IBGP have been defined, to | To alleviate the need for "full mesh" IBGP, several alternatives have | |||
include BGP Route Reflection [RFC 2796] and AS Confederations for BGP | been defined, including BGP Route Reflection [RFC2796] and AS | |||
[RFC 3065], in order to alleviate the the need for "full mesh" IBGP. | Confederations for BGP [RFC3065]. | |||
10. Internet Dynamics | 10. Internet Dynamics | |||
As discussed in [BGP4-ANALYSIS], the driving force in CPU and | As discussed in [RFC4274], the driving force in CPU and bandwidth | |||
bandwidth utilization is the dynamic nature of routing in the | utilization is the dynamic nature of routing in the Internet. As the | |||
Internet. As the Internet has grown, the frequency of route changes | Internet has grown, the frequency of route changes per second has | |||
per second has increased. | increased. | |||
We automatically get some level of damping when more specific NLRI is | We automatically get some level of damping when more specific NLRI is | |||
aggregated into larger blocks, however, this isn't sufficient. In | aggregated into larger blocks; however, this is not sufficient. In | |||
Appendix F of [BGP4] are descriptions of damping techniques that | Appendix F of [RFC4271], there are descriptions of damping techniques | |||
should be applied to advertisements. In future specifications of | that should be applied to advertisements. In future specifications | |||
BGP-like protocols, damping methods should be considered for | of BGP-like protocols, damping methods should be considered for | |||
mandatory inclusion in compliant implementations. | mandatory inclusion in compliant implementations. | |||
BGP Route Flap Damping is defined in [RFC 2439]. BGP Route Flap | BGP Route Flap Damping is defined in [RFC 2439]. BGP Route Flap | |||
Damping defines a mechanism to help reduce the amount of routing | Damping defines a mechanism to help reduce the amount of routing | |||
information passed between BGP peers, and subsequently, the load on | information passed between BGP peers, which reduces the load on these | |||
these peers, without adversely affecting route convergence time for | peers without adversely affecting route convergence time for | |||
relatively stable routes. | relatively stable routes. | |||
None of the current implementations of BGP Route Flap Damping store | None of the current implementations of BGP Route Flap Damping store | |||
route history by unique NRLI and AS Path although it is listed as | route history by unique NRLI or AS Path, although RFC 2439 lists this | |||
mandatory in RFC 2439. A potential result of failure to consider | as mandatory. A potential result of failure to consider each AS Path | |||
each AS Path separately is an overly aggressive suppression of | separately is an overly aggressive suppression of destinations in a | |||
destinations in a densely meshed network, with the most severe | densely meshed network, with the most severe consequence being | |||
consequence being suppression of a destination after a single | suppression of a destination after a single failure. Because the top | |||
failure. Because the top tier autonomous systems in the Internet are | tier autonomous systems in the Internet are densely meshed, these | |||
densely meshed, these adverse consequences are observed. | adverse consequences are observed. | |||
Route changes are announced using BGP UPDATE messages. The greatest | Route changes are announced using BGP UPDATE messages. The greatest | |||
overhead in advertising UPDATE messages happens whenever route | overhead in advertising UPDATE messages happens whenever route | |||
changes to be announced are inefficiently packed. As discussed in a | changes to be announced are inefficiently packed. Announcing routing | |||
later section, announcing routing changes sharing common attributes | changes that share common attributes in a single BGP UPDATE message | |||
in a single BGP UPDATE message helps save considerable bandwidth and | helps save considerable bandwidth and reduces processing overhead, as | |||
lower processing overhead. | discussed in Section 12, Update Packing. | |||
Persistent BGP errors may cause BGP peers to flap persistently if | Persistent BGP errors may cause BGP peers to flap persistently if | |||
peer dampening is not implemented. This would result in significant | peer dampening is not implemented, resulting in significant CPU | |||
CPU utilization. Implementors may find it useful to implement peer | utilization. Implementors may find it useful to implement peer | |||
dampening to avoid such persistent peer flapping [BGP4]. | dampening to avoid such persistent peer flapping [RFC4271]. | |||
11. BGP Routing Information Bases (RIBs) | 11. BGP Routing Information Bases (RIBs) | |||
[BGP4] states "Any local policy which results in routes being added | [RFC4271] states "Any local policy which results in routes being | |||
to an Adj-RIB-Out without also being added to the local BGP speaker's | added to an Adj-RIB-Out without also being added to the local BGP | |||
forwarding table, is outside the scope of this document". | speaker's forwarding table, is outside the scope of this document". | |||
However, several well-known implementations do not confirm that Loc- | However, several well-known implementations do not confirm that | |||
RIB entries were used to populate the forwarding table before | Loc-RIB entries were used to populate the forwarding table before | |||
installing them in the Adj-RIB-Out. The most common occurrence of | installing them in the Adj-RIB-Out. The most common occurrence of | |||
this is when routes for a given prefix are presented by more than one | this is when routes for a given prefix are presented by more than one | |||
protocol and the preferences for the BGP learned route is lower than | protocol, and the preferences for the BGP-learned route is lower than | |||
that of another protocol. As such, the route learned via the other | that of another protocol. As such, the route learned via the other | |||
protocol is used to populate the forwarding table. | protocol is used to populate the forwarding table. | |||
It may be desirable for an implementation to provide a knob that | It may be desirable for an implementation to provide a knob that | |||
permits advertisement of "inactive" BGP routes. | permits advertisement of "inactive" BGP routes. | |||
It may be also desirable for an implementation to provide a knob that | It may be also desirable for an implementation to provide a knob that | |||
allows a BGP speaker to advertise BGP routes that were not selected | allows a BGP speaker to advertise BGP routes that were not selected | |||
by decision process. | in the decision process. | |||
12. Update Packing | 12. Update Packing | |||
Multiple unfeasible routes can be advertised in a single BGP Update | Multiple unfeasible routes can be advertised in a single BGP Update | |||
message. In addition, one or more feasible routes can be advertised | message. In addition, one or more feasible routes can be advertised | |||
in a single Update message so long as all prefixes share a common | in a single Update message, as long as all prefixes share a common | |||
attribute set. | attribute set. | |||
The BGP4 protocol permits advertisement of multiple prefixes with a | The BGP4 protocol permits advertisement of multiple prefixes with a | |||
common set of path attributes to be advertised in a single update | common set of path attributes in a single update message, which is | |||
message, this is commonly referred to as "update packing". When | commonly referred to as "update packing". When possible, update | |||
possible, update packing is recommended as it provides a mechanism | packing is recommended, as it provides a mechanism for more efficient | |||
for more efficient behavior in a number of areas, to include: | behavior in a number of areas, including: | |||
o Reduction in system overhead due to generation or receipt of | o Reduction in system overhead due to generation or receipt of | |||
fewer Update messages. | fewer Update messages. | |||
o Reduction in network overhead as a result of less packets | o Reduction in network overhead as a result of less packets and | |||
and lower bandwidth consumption. | lower bandwidth consumption. | |||
o Allows you to process path attributes and look for matching | o Reduction in frequency of processing path attributes and looking | |||
sets in your AS_PATH database (if you have one) less | for matching sets in the AS_PATH database (if you have one). | |||
frequently. Consistent ordering of the path attributes | Consistent ordering of the path attributes allows for ease of | |||
allows for ease of matching in the database as you don't have | matching in the database, as different representations of the | |||
different representations of the same data. | same data do not exist. | |||
The BGP protocol suggests that withdrawal information should be | The BGP protocol suggests that withdrawal information should be | |||
packed in the begining of Update message, followed by information | packed in the beginning of an Update message, followed by information | |||
about more or less specific reachable routes in a single UPDATE | about reachable routes in a single UPDATE message. This helps | |||
message. This helps alleviate excessive route flapping in BGP. | alleviate excessive route flapping in BGP. | |||
13. Limit Rate Updates | 13. Limit Rate Updates | |||
The BGP protocol defines different mechanisms to rate limit Update | The BGP protocol defines different mechanisms to rate limit Update | |||
advertisement. The BGP protocol defines MinRouteAdvertisementInterval | advertisement. The BGP protocol defines a | |||
parameter that determines the minimum time that must be elapse | MinRouteAdvertisementInterval parameter that determines the minimum | |||
between the advertisement of routes to a particular destination from | time that must elapse between the advertisement of routes to a | |||
a single BGP speaker. This value is set on a per BGP peer basis. | particular destination from a single BGP speaker. This value is set | |||
on a per-BGP-peer basis. | ||||
Due to the fact that BGP relies on TCP as the Transport protocol, TCP | Because BGP relies on TCP as the Transport protocol, TCP can prevent | |||
can prevent transmission of data due to empty windows. As a result, | transmission of data due to empty windows. As a result, multiple | |||
multiple Updates may be spaced closer together than orginally queued. | updates may be spaced closer together than was originally queued. | |||
Although this is not a common occurrence, implementations should be | Although it is not common, implementations should be aware of this | |||
aware of this. | occurrence. | |||
13.1. Consideration of TCP Characteristics | 13.1. Consideration of TCP Characteristics | |||
If a TCP receiver is processing input more slowly than the sender or | If either a TCP receiver is processing input more slowly than the | |||
if the TCP connection rate is the limiting factor, a form of | sender, or if the TCP connection rate is the limiting factor, a form | |||
backpressure is observed by the TCP sending application. When the | of backpressure is observed by the TCP sending application. When the | |||
TCP buffer fills, the sending application will either block on the | TCP buffer fills, the sending application will either block on the | |||
write or receive an error on the write. Common errors in either | write or receive an error on the write. In early implementations or | |||
early implementations or an occasional naive new implementation are | naive new implementations, setting options to block on the write or | |||
to either set options to block on the write or set options for non- | setting options for non-blocking writes are common errors. Such | |||
blocking writes and then treat the errors due to a full buffer as | implementations treat full buffer related errors as fatal. | |||
fatal. | ||||
Having recognized that full write buffers are to be expected | Having recognized that full write buffers are to be expected, | |||
additional implementation pitfalls exist. The application should not | additional implementation pitfalls exist. The application should not | |||
attempt to store the TCP stream within the application itself. If | attempt to store the TCP stream within the application itself. If | |||
the receiver or the TCP connection is persistently slow, then the | the receiver or the TCP connection is persistently slow, then the | |||
buffer can grow until memory is exhausted. A BGP implementation is | buffer can grow until memory is exhausted. A BGP implementation is | |||
required to send changes to all peers for which the TCP connection is | required to send changes to all peers for which the TCP connection is | |||
not blocked and is required to remember to send those changes to the | not blocked, and is required to send those changes to the remaining | |||
remaining peers when the connection becomes unblocked. | peers when the connection becomes unblocked. | |||
If the preferred route for a given NLRI changes multiple times while | If the preferred route for a given NLRI changes multiple times while | |||
writes to one or more peers is blocked, only the most recent best | writes to one or more peers are blocked, only the most recent best | |||
route needs to be sent. In this way BGP is work conserving. In | route needs to be sent. In this way, BGP is work conserving | |||
times of extremely high route change, a higher volume of route change | [RFC4274]. In cases of extremely high route change, a higher volume | |||
is sent to those peers which are able to process it more quickly and | of route change is sent to those peers that are able to process it | |||
a lower volume of route change is sent to those peers not able to | more quickly; a lower volume of route change is sent to those peers | |||
process the changes as quickly. | that are not able to process the changes as quickly. | |||
For implentations which handle differing peer capacity to absorb | For implementations that handle differing peer capacities to absorb | |||
route change well, if the majority of route change is contributed by | route change well, if the majority of route change is contributed by | |||
a subset of unstable NRLI, the only impact on relatively stable NRLI | a subset of unstable NRLI, the only impact on relatively stable NRLI | |||
which make an isolated route change is a slower convergence for which | that makes an isolated route change is a slower convergence, for | |||
convergence time remains bounded regardless of the amount of | which convergence time remains bounded, regardless of the amount of | |||
instability. | instability. | |||
14. Ordering of Path Attributes | 14. Ordering of Path Attributes | |||
The BGP protocol suggests that BGP speakers sending multiple prefixes | The BGP protocol suggests that BGP speakers sending multiple prefixes | |||
per an UPDATE message should sort and order path attributes according | per an UPDATE message sort and order path attributes according to | |||
to Type Codes. This would help their peers to quickly identify sets | Type Codes. This would help their peers quickly identify sets of | |||
of attributes from different update messages which are semantically | attributes from different update messages that are semantically | |||
different. | different. | |||
Implementers may find it useful to order path attributes according to | Implementers may find it useful to order path attributes according to | |||
Type Code so that sets of attributes with identical semantics can be | Type Code, such that sets of attributes with identical semantics can | |||
more quickly identified. | be more quickly identified. | |||
15. AS_SET Sorting | 15. AS_SET Sorting | |||
AS_SETs are commonly used in BGP route aggregation. They reduce the | AS_SETs are commonly used in BGP route aggregation. They reduce the | |||
size of AS_PATH information by listing AS numbers only once | size of AS_PATH information by listing AS numbers only once, | |||
regardless of any number of times it might appear in process of | regardless of the number of times it might appear in the process of | |||
aggregation. AS_SETs are usually sorted in increasing order to | aggregation. AS_SETs are usually sorted in increasing order to | |||
facilitate efficient lookups of AS numbers within them. This | facilitate efficient lookups of AS numbers within them. This | |||
optimization is entirely optional. | optimization is optional. | |||
16. Control over Version Negotiation | 16. Control Over Version Negotiation | |||
Because pre-BGP-4 route aggregation can't be supported by earlier | Because pre-BGP-4 route aggregation can't be supported by earlier | |||
version of BGP, an implementation that supports versions in addition | versions of BGP, an implementation that supports versions in addition | |||
to BGP-4 should provide the version support on a per-peer basis. At | to BGP-4 should provide the version support on a per-peer basis. At | |||
the time of this writing all BGP speakers on the Internet are thought | the time of this writing, all BGP speakers on the Internet are | |||
to be running BGP version 4. | thought to be running BGP version 4. | |||
17. Security Considerations | 17. Security Considerations | |||
BGP a provides flexible and extendable mechanism for authentication | BGP provides a flexible and extendable mechanism for authentication | |||
and security. The mechanism allows to support schemes with various | and security. The mechanism allows support for schemes with various | |||
degree of complexity. BGP sessions are authenticated based on the IP | degrees of complexity. BGP sessions are authenticated based on the | |||
address of a peer. In addition, all BGP sessions are authenticated | IP address of a peer. In addition, all BGP sessions are | |||
based on the autonomous system number advertised by a peer. | authenticated based on the autonomous system number advertised by a | |||
peer. | ||||
Since BGP runs over TCP and IP, BGP's authentication scheme may be | Because BGP runs over TCP and IP, BGP's authentication scheme may be | |||
augmented by any authentication or security mechanism provided by | augmented by any authentication or security mechanism provided by | |||
either TCP or IP. | either TCP or IP. | |||
17.1. TCP MD5 Signature Option | 17.1. TCP MD5 Signature Option | |||
[RFC 2385] defines a way in which the TCP MD5 signature option can be | [RFC 2385] defines a way in which the TCP MD5 signature option can be | |||
used to validate information transmitted between two peers. This | used to validate information transmitted between two peers. This | |||
method prevents any third party from injecting information (e.g., a | method prevents a third party from injecting information (e.g., a TCP | |||
TCP Reset) into the datastream, or modifying the routing information | Reset) into the datastream, or modifying the routing information | |||
carried between two BGP peers. | carried between two BGP peers. | |||
TCP MD5 is not ubiquitously deployed at the moment, especially in | At the moment, TCP MD5 is not ubiquitously deployed, especially in | |||
inter- domain scenarios, largely because of key distribution issues. | inter- domain scenarios, largely because of key distribution issues. | |||
Most key distribution mechanisms are considered to be too "heavy" at | Most key distribution mechanisms are considered to be too "heavy" at | |||
this point. | this point. | |||
It was naively assumed by many for some time that in order to inject | Many have naively assumed that an attacker must correctly guess the | |||
a data segement or reset a TCP transport connection between two BGP | exact TCP sequence number (along with the source and destination | |||
peers an attacker must correctly guess the exact TCP sequence number | ports and IP addresses) to inject a data segment or reset a TCP | |||
(of course, in addition to source and destination ports and IP | transport connection between two BGP peers. However, recent | |||
addresses). However, it has recently been observed and openly | observation and open discussion show that the malicious data only | |||
discussed that the malicous data only needs to fall within the TCP | needs to fall within the TCP receive window, which may be quite | |||
receive window, which may be quite large, thereby significantly | large, thereby significantly lowering the complexity of such an | |||
lowering the complexity of such an attack. | attack. | |||
As such, it is recommended that the MD5 TCP Signature Option be | As such, it is recommended that the MD5 TCP Signature Option be | |||
employed to protect BGP from session resets and malicious data | employed to protect BGP from session resets and malicious data | |||
injection. | injection. | |||
17.2. BGP Over IPSEC | 17.2. BGP Over IPsec | |||
BGP can run over IPSEC, either in a tunnel, or in transport mode, | BGP can run over IPsec, either in a tunnel or in transport mode, | |||
where the TCP portion of the IP packet is encrypted. This not only | where the TCP portion of the IP packet is encrypted. This not only | |||
prevents random insertion of information into the data stream between | prevents random insertion of information into the data stream between | |||
two BGP peers, it also prevents an attacker from learning the data | two BGP peers, but also prevents an attacker from learning the data | |||
which is being exchanged between the peers. | being exchanged between the peers. | |||
IPSEC does, however, offer several options for exchanging session | However, IPsec does offer several options for exchanging session | |||
keys, which may be useful on inter-domain configurations. These | keys, which may be useful on inter-domain configurations. These | |||
options are being explored in many deployments, although no | options are being explored in many deployments, although no | |||
definitive solution has been reached on the issue of key exchange for | definitive solution has been reached on the issue of key exchange for | |||
BGP in IPSEC. | BGP in IPsec. | |||
It should be noted that since BGP runs over TCP and IP, BGP is | Because BGP runs over TCP and IP, it should be noted that BGP is | |||
vulnerable to the same denial of service or authentication attacks | vulnerable to the same denial of service and authentication attacks | |||
that are present in any other TCP based protocol. | that are present in any TCP based protocol. | |||
17.3. Miscellaneous | 17.3. Miscellaneous | |||
Another issue any routing protocol faces is providing evidence of the | Another routing protocol issue is providing evidence of the validity | |||
validity and authority of the routing information carried within the | and authority of routing information carried within the routing | |||
routing system. This is currently the focus of several efforts at | system. This is currently the focus of several efforts, including | |||
the moment, including efforts to define the threats which can be used | efforts to define threats that can be used against this routing | |||
against this routing information in BGP [draft-murphy, attack tree], | information in BGP [BGPATTACK], and efforts to develop a means of | |||
and efforts at developing a means to provide validation and authority | providing validation and authority for routing information carried | |||
for routing information carried within BGP [SBGP] [soBGP]. | within BGP [SBGP] [soBGP]. | |||
In addition, the Routing Protocol Security Requirements (RPSEC) | In addition, the Routing Protocol Security Requirements (RPSEC) | |||
working group has been chartered within the Routing Area of the IETF | working group has been chartered, within the Routing Area of the | |||
in order to discuss and assist in addressing issues surrounding | IETF, to discuss and assist in addressing issues surrounding routing | |||
routing protocol security. It is the intent that this work within | protocol security. Within RPSEC, this work is intended to result in | |||
RPSEC will result in feedback to BGPv4 and future enhancements to the | feedback to BGP4 and future protocol enhancements. | |||
protocol where appropriate. | ||||
18. PTOMAINE and GROW | 18. PTOMAINE and GROW | |||
The Prefix Taxonomy (PTOMAINE) working group, recently replaced by | The Prefix Taxonomy (PTOMAINE) working group, recently replaced by | |||
the Global Routing Operations (GROW) working group, is chartered to | the Global Routing Operations (GROW) working group, is chartered to | |||
consider and measure the problem of routing table growth, the effects | consider and measure the problem of routing table growth, the effects | |||
of the interactions between interior and exterior routing protocols, | of the interactions between interior and exterior routing protocols, | |||
and the effect of address allocation policies and practices on the | and the effect of address allocation policies and practices on the | |||
global routing system. Finally, where appropriate, GROW will also | global routing system. Finally, where appropriate, GROW will also | |||
document the operational aspects of measurement, policy, security and | document the operational aspects of measurement, policy, security, | |||
VPN infrastructures. | and VPN infrastructures. | |||
One such item GROW is currently studying is the effects of route | GROW is currently studying the effects of route aggregation, and also | |||
aggregation and the inability to aggregate over multiple provider | the inability to aggregate over multiple provider boundaries due to | |||
boundaries due to inadequate provider coordination. | inadequate provider coordination. | |||
It is the intent that this work within GROW will result in feedback | Within GROW, this work is intended to result in feedback to BGPv4 and | |||
to BGPv4 and future enhancements to the protocol as necessary. | future protocol enhancements. | |||
19. Internet Routing Registries (IRRs) | 19. Internet Routing Registries (IRRs) | |||
Many organizations register their routing policy and prefix | Many organizations register their routing policy and prefix | |||
origination in the various distributed databases of the Internet | origination in the various distributed databases of the Internet | |||
Routing Registry. These databases provide access to the information | Routing Registry. These databases provide access to information | |||
using the RPSL language as defined in [RFC 2622]. While registered | using the RPSL language, as defined in [RFC2622]. While registered | |||
information may be maintained and correct for certain providers, the | information may be maintained and correct for certain providers, the | |||
lack of timely or correct data in the various IRR databases has | lack of timely or correct data in the various IRR databases has | |||
prevented wide-spread use of this resource. | prevented wide spread use of this resource. | |||
20. Regional Internet Registries (RIRs) and IRRs, A Bit of History | 20. Regional Internet Registries (RIRs) and IRRs, A Bit of History | |||
The NSFNET program used EGP and then BGP to provide external routing | The NSFNET program used EGP, and then BGP, to provide external | |||
information. It was the NSF policy of offering differing pricing and | routing information. It was the NSF policy of offering different | |||
providing a different level of support to the Research and Education | prices and providing different levels of support to the Research and | |||
(RE) networks and the Commercial (CO) networks that led to BGP's | Education (RE) and the Commercial (CO) networks that led to BGP's | |||
initial policy requirements. CO networks were not able to use the | initial policy requirements. In addition to being charged more, CO | |||
NSFNET backbone to reach other CO networks, in addition to being | networks were not able to use the NSFNET backbone to reach other CO | |||
charged more. The rationale was that commercial users of the NSFNET | networks. The rationale for higher prices was that commercial users | |||
with business with research entities should subsidize the RE | of the NSFNET within the business and research entities should | |||
community. Recognition that the Internet was evolving away from a | subsidize the RE community. Recognition that the Internet was | |||
hierarchical network to a mesh of peers led to changes from EGP and | evolving away from a hierarchical network to a mesh of peers led to | |||
BGP-1 that eliminated any assumptions of hierarchy. | changes away from EGP and BGP-1 that eliminated any assumptions of | |||
hierarchy. | ||||
Enforcement of NSF policy was accomplished through maintenance of the | Enforcement of NSF policy was accomplished through maintenance of the | |||
NSF Policy Routing Database (PRDB). The PRDB not only contained each | NSF Policy Routing Database (PRDB). The PRDB not only contained each | |||
networks designation as CO or RE, but also contained a list of the | networks designation as CO or RE, but also contained a list of the | |||
preferred exit points to the NSFNET to reach each network. This was | preferred exit points to the NSFNET to reach each network. This was | |||
the basis for setting what would later be called BGP LOCAL_PREF on | the basis for setting what would later be called BGP LOCAL_PREF on | |||
the NSFNET. Tools provided with the PRDB generated complete router | the NSFNET. Tools provided with the PRDB generated complete router | |||
configurations for the NSFNET. | configurations for the NSFNET. | |||
Use of the PRDB had the fortunate consequence of greatly improving | Use of the PRDB had the fortunate consequence of greatly improving | |||
reliability of the NSFNET relative to peer networks of the time and | reliability of the NSFNET, relative to peer networks of the time. | |||
offering more optimal routing for those networks sufficiently | PRDB offered more optimal routing for those networks that were | |||
knowledgeable and willing to keep their entries current. | sufficiently knowledgeable and willing to keep their entries current. | |||
With the decommission of the NSFNET Backbone Network Service in 1995, | With the decommission of the NSFNET Backbone Network Service in 1995, | |||
it was recognized that the PRDB should be made less single provider | it was recognized that the PRDB should be made less single provider | |||
centric and its legacy contents plus any further updates made | centric, and its legacy contents, plus any further updates, should be | |||
available to any provider willing to make use of it. The European | made available to any provider willing to make use of it. The | |||
networking community had long seen the PRDB as too US centric. | European networking community had long seen the PRDB as too US- | |||
Through Reseaux IP Europeens (RIPE) the Europeans had created an open | centric. Through Reseaux IP Europeens (RIPE), the Europeans created | |||
format in RIPE-181 and had been maintaining an open database used for | an open format in RIPE-181 and maintained an open database used for | |||
address and AS registry more than policy. The initial conversion of | address and AS registry more than policy. The initial conversion of | |||
the PRDB was to RIPE-181 format and tools were converted to make use | the PRDB was to RIPE-181 format, and tools were converted to make use | |||
of this format. The collection of databases was termed the Internet | of this format. The collection of databases was termed the Internet | |||
Routing Registry, with the RIPE database and US NSF funded Routing | Routing Registry (IRR), with the RIPE database and US NSF-funded | |||
Arbitrator (RA) being the inital components of the IRR. | Routing Arbitrator (RA) being the initial components of the IRR. | |||
A need to extend RIPE-181 was recognized and RIPE agreed to allow the | A need to extend RIPE-181 was recognized and RIPE agreed to allow the | |||
extensions to be defined within the IETF in the RPS WG. The result | extensions to be defined within the IETF in the RPS WG, resulting in | |||
was the RPSL language. Other work products of the RPS WG provided an | the RPSL language. Other work products of the RPS WG provided an | |||
authentication framework and means to widely distribute the database | authentication framework and a means to widely distribute the | |||
in a controlled manner and synchronize the many repositories. Freely | database in a controlled manner and synchronize the many | |||
available tools were provided primarily by RIPE, Merit, and ISI, the | repositories. Freely available tools were provided, primarily by | |||
most comprehensive set from ISI. The efforts of the IRR participants | RIPE, Merit, and ISI, the most comprehensive set from ISI. The | |||
has been severely hampered by providers unwilling to keep information | efforts of the IRR participants has been severely hampered by | |||
in the IRR up to date. The larger of these providers have been | providers unwilling to keep information in the IRR up to date. The | |||
vocal, claiming that the database entry, simple as it may be, are an | larger of these providers have been vocal, claiming that the database | |||
administrative burden and some acknowledge that doing so provides a | entry, simple as it may be, is an administrative burden, and some | |||
advantage to competitors that use the IRR. The result has been an | acknowledge that doing so provides an advantage to competitors that | |||
erosion of the usefulness of the IRR and an increase in vulnerability | use the IRR. The result has been an erosion of the usefulness of the | |||
of the Internet to routing based attack or accidental injection of | IRR and an increase in vulnerability of the Internet to routing based | |||
faulty routing information. | attacks or accidental injection of faulty routing information. | |||
There have been numerous cases of accidental disruption of Internet | There have been a number of cases in which accidental disruption of | |||
routing which were avoided by providers using the IRR but highly | Internet routing was avoided by providers using the IRR, but this was | |||
detrimental to non-users. As filters have had to be relaxed due to | highly detrimental to non-users. Filters have been forced to provide | |||
the erosion of the IRR to less complete coverage, these types of | less complete coverage because of the erosion of the IRR; these types | |||
disruptions have continued to occur very infrequently, but have had | of disruptions continue to occur infrequently, but have an | |||
increasingly widespread impact. | increasingly widespread impact. | |||
21. Acknowledgements | 21. Acknowledgements | |||
We would like to thank Paul Traina and Yakov Rekhter for authoring | We would like to thank Paul Traina and Yakov Rekhter for authoring | |||
previous versions of this document and providing valuable input on | previous versions of this document and providing valuable input on | |||
this update as well. We would also like to explicitly acknowledge | this update. We would also like to acknowledge Curtis Villamizar for | |||
Curtis Villamizar for providing both text and thorough reviews. | providing both text and thorough reviews. Thanks to Russ White, | |||
Thanks to Russ White, Jeffrey Haas, Sean Mentzer, Mitchell Erblich | Jeffrey Haas, Sean Mentzer, Mitchell Erblich, and Jude Ballard for | |||
and Jude Ballard for supplying their usual keen eye. | supplying their usual keen eyes. | |||
Finally, we'd like to think the IDR WG for general and specific input | Finally, we'd like to think the IDR WG for general and specific input | |||
that contributed to this document. | that contributed to this document. | |||
22. References | 22. References | |||
22.1. Normative References | 22.1. Normative References | |||
[RFC 1519] Fuller, V., Li. T., Yu J., and K. Varadhan, "Classless | [RFC1966] Bates, T. and R. Chandra, "BGP Route Reflection An | |||
Inter-Domain Routing (CIDR): an Address Assignment and | ||||
Aggregation Strategy", RFC 1519, September 1993. | ||||
[RFC 1966] Bates, T., Chandra, R., "BGP Route Reflection: An | ||||
alternative to full mesh IBGP", RFC 1966, June 1996. | alternative to full mesh IBGP", RFC 1966, June 1996. | |||
[RFC 2385] Heffernan, A., "Protection of BGP Sessions via the TCP | [RFC 2385] Heffernan, A., "Protection of BGP Sessions via the TCP | |||
MD5 Signature Option", RFC 2385, August 1998. | MD5 Signature Option", RFC 2385, August 1998. | |||
[RFC 2439] Villamizar, C. and Chandra, R., "BGP Route Flap Damping", | [RFC2439] Villamizar, C., Chandra, R., and R. Govindan, "BGP Route | |||
RFC 2439, November 1998. | Flap Damping", RFC 2439, November 1998. | |||
[RFC 2796] Bates, T., Chandra, R., and Chen, E, "Route Reflection - | ||||
An Alternative to Full Mesh IBGP", RFC 2796, April 2000. | ||||
[RFC 3065] Traina, P., McPherson, D., and Scudder, J, "Autonomous | ||||
System Confederations for BGP", RFC 3065, Febuary 2001. | ||||
[RFC 3345] McPherson, D., Gill, V., Walton, D., and Retana, A, "BGP | ||||
Persistent Route Oscillation Condition", RFC 3345, | ||||
August 2002. | ||||
[BGP4-ANALYSIS] "BGP-4 Protocol Analysis", Internet-Draft, Work in | [RFC2796] Bates, T., Chandra, R., and E. Chen, "BGP Route | |||
Progress. | Reflection - An Alternative to Full Mesh IBGP", RFC 2796, | |||
April 2000. | ||||
[BGP4-IMPL] "BGP 4 Implementation Report ", Internet-Draft, Work | [RFC3065] Traina, P., McPherson, D., and J. Scudder, "Autonomous | |||
in Progress. | System Confederations for BGP", RFC 3065, February 2001. | |||
[BGP4] Rekhter, Y., T. Li., and Hares. S, Editors, "A Border | [RFC4274] Meyer, D. and K. Patel, "BGP-4 Protocol Analysis", RFC | |||
Gateway Protocol 4 (BGP-4)", BGP Draft, Work in Progress. | 4274, January 2006. | |||
[RFC 1657] Willis, S., Burruss, J., Chu, J., " Definitions of | [RFC4276] Hares, S. and A. Retana, "BGP 4 Implementation Report", | |||
Managed Objects for the Fourth Version of the Border | RFC 4276, January 2006. | |||
Gateway Protocol (BGP-4) using SMIv2", RFC 1657, July | ||||
1994. | ||||
[SBGP] "Secure BGP", Internet-Draft, Work in Progress. | [RFC4271] Rekhter, Y., Li, T., and S. Hares, Eds., "A Border | |||
Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. | ||||
[soBGP] "Secure Origin BGP", Internet-Draft, Work in Progress. | [RFC1657] Willis, S., Burruss, J., Chu, J., "Definitions of Managed | |||
Objects for the Fourth Version of the Border Gateway | ||||
Protocol (BGP-4) using SMIv2", RFC 1657, July 1994. | ||||
[RFC 793] Postel, J., "Transmission Control Protocol", RFC 793, | [RFC793] Postel, J., "Transmission Control Protocol", STD 7, RFC | |||
September 1981. | 793, September 1981. | |||
22.2. Informative References | 22.2. Informative References | |||
[RFC 1105] Lougheed, K., and Rekhter, Y, "Border Gateway Protocol | [RFC1105] Lougheed, K. and Y. Rekhter, "Border Gateway Protocol | |||
BGP", RFC 1105, June 1989. | (BGP)", RFC 1105, June 1989. | |||
[RFC 1163] Lougheed, K., and Rekhter, Y, "Border Gateway Protocol | [RFC1163] Lougheed, K. and Y. Rekhter, "Border Gateway Protocol | |||
BGP", RFC 1105, June 1990. | (BGP)", RFC 1163, June 1990. | |||
[RFC 1264] Hinden, R., "Internet Routing Protocol Standardization | [RFC1264] Hinden, R., "Internet Engineering Task Force Internet | |||
Criteria", RFC 1264, October 1991. | Routing Protocol Standardization Criteria", RFC 1264, | |||
October 1991. | ||||
[RFC 1267] Lougheed, K., and Rekhter, Y, "Border Gateway Protocol 3 | [RFC1267] Lougheed, K. and Y. Rekhter, "Border Gateway Protocol 3 | |||
(BGP-3)", RFC 1105, October 1991. | (BGP-3)", RFC 1267, October 1991. | |||
[RFC 1269] Willis, S., and Burruss, J., "Definitions of Managed | [RFC1269] Willis, S. and J. Burruss, "Definitions of Managed | |||
Objects for the Border Gateway Protocol (Version 3)", | Objects for the Border Gateway Protocol: Version 3", RFC | |||
RFC 1269, October 1991. | 1269, October 1991. | |||
[RFC 1656] Traina, P., "BGP-4 Protocol Document Roadmap and | [RFC 1656] Traina, P., "BGP-4 Protocol Document Roadmap and | |||
Implementation Experience", RFC 1656, July 1994. | Implementation Experience", RFC 1656, July 1994. | |||
[RFC 1771] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 | [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 | |||
(BGP-4)", RFC 1771, March 1995. | (BGP-4)", RFC 1771, March 1995. | |||
[RFC 1772] Rekhter, Y., and P. Gross, Editors, "Application of the | ||||
Border Gateway Protocol in the Internet", RFC 1772, March | ||||
1995. | ||||
[RFC 1773] Traina, P., "Experience with the BGP-4 protocol", RFC | [RFC 1773] Traina, P., "Experience with the BGP-4 protocol", RFC | |||
1773, March 1995. | 1773, March 1995. | |||
[RFC 2622] C. Alaettinoglu et al., "Routing Policy Specification | [RFC1965] Traina, P., "Autonomous System Confederations for BGP", | |||
Language", RFC 2622, June 1999. | RFC 1965, June 1996. | |||
[RFC2622] Alaettinoglu, C., Villamizar, C., Gerich, E., Kessens, | ||||
D., Meyer, D., Bates, T., Karrenberg, D., and M. | ||||
Terpstra, "Routing Policy Specification Language (RPSL)", | ||||
RFC 2622, June 1999. | ||||
[BGPATTACK] Convery, C., "An Attack Tree for the Border Gateway | ||||
Protocol", Work in Progress. | ||||
[SBGP] "Secure BGP", Work in Progress. | ||||
[soBGP] "Secure Origin BGP", Work in Progress. | ||||
Authors' Addresses | ||||
23. Authors' Addresses | ||||
Danny McPherson | Danny McPherson | |||
Arbor Networks | Arbor Networks | |||
Email: danny@arbor.net | ||||
EMail: danny@arbor.net | ||||
Keyur Patel | Keyur Patel | |||
Cisco Systems | Cisco Systems | |||
Email: keyupate@cisco.com | ||||
Intellectual Property Statement | EMail: keyupate@cisco.com | |||
Full Copyright Statement | ||||
Copyright (C) The Internet Society (2006). | ||||
This document is subject to the rights, licenses and restrictions | ||||
contained in BCP 78, and except as set forth therein, the authors | ||||
retain all their rights. | ||||
This document and the information contained herein are provided on an | ||||
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | ||||
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | ||||
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | ||||
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
Intellectual Property | ||||
The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
Intellectual Property Rights or other rights that might be claimed to | Intellectual Property Rights or other rights that might be claimed to | |||
pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
might or might not be available; nor does it represent that it has | might or might not be available; nor does it represent that it has | |||
made any independent effort to identify any such rights. Information | made any independent effort to identify any such rights. Information | |||
on the procedures with respect to rights in RFC documents can be | on the procedures with respect to rights in RFC documents can be | |||
found in BCP 78 and BCP 79. | found in BCP 78 and BCP 79. | |||
skipping to change at page 22, line 36 | skipping to change at page 19, line 45 | |||
such proprietary rights by implementers or users of this | such proprietary rights by implementers or users of this | |||
specification can be obtained from the IETF on-line IPR repository at | specification can be obtained from the IETF on-line IPR repository at | |||
http://www.ietf.org/ipr. | http://www.ietf.org/ipr. | |||
The IETF invites any interested party to bring to its attention any | The IETF invites any interested party to bring to its attention any | |||
copyrights, patents or patent applications, or other proprietary | copyrights, patents or patent applications, or other proprietary | |||
rights that may cover technology that may be required to implement | rights that may cover technology that may be required to implement | |||
this standard. Please address the information to the IETF at | this standard. Please address the information to the IETF at | |||
ietf-ipr@ietf.org. | ietf-ipr@ietf.org. | |||
Disclaimer of Validity | Acknowledgement | |||
This document and the information contained herein are provided on an | ||||
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | ||||
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | ||||
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | ||||
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
Copyright Statement | ||||
Copyright (C) The Internet Society (2004). This document | ||||
is subject to the rights, licenses and restrictions contained in | ||||
BCP 78, and except as set forth therein, the authors retain all | ||||
their rights. | ||||
Acknowledgment | ||||
Funding for the RFC Editor function is currently provided by the | Funding for the RFC Editor function is provided by the IETF | |||
Internet Society. | Administrative Support Activity (IASA). | |||
End of changes. 138 change blocks. | ||||
477 lines changed or deleted | 453 lines changed or added | |||
This html diff was produced by rfcdiff 1.28, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |