draft-ietf-idr-bgp4-experience-protocol-02.txt | draft-ietf-idr-bgp4-experience-protocol-03.txt | |||
---|---|---|---|---|
INTERNET-DRAFT Danny McPherson | INTERNET-DRAFT Danny McPherson | |||
Arbor Networks | Arbor Networks | |||
Keyur Patel | Keyur Patel | |||
Cisco Systems | Cisco Systems | |||
Category Informational | Category Informational | |||
Expires: March 2004 September 2003 | Expires: March 2004 September 2003 | |||
Experience with the BGP-4 Protocol | Experience with the BGP-4 Protocol | |||
<draft-ietf-idr-bgp4-experience-protocol-02.txt> | <draft-ietf-idr-bgp4-experience-protocol-03.txt> | |||
Status of this Document | Status of this Document | |||
This document is an Internet-Draft and is in full conformance with | This document is an Internet-Draft and is in full conformance with | |||
all provisions of Section 10 of RFC2026. | all provisions of Section 10 of RFC2026. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
Drafts. | Drafts. | |||
skipping to change at page 3, line 16 | skipping to change at page 3, line 16 | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2. BGP-4 Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. BGP-4 Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2.1. A Border Gateway Protocol . . . . . . . . . . . . . . . . . 4 | 2.1. A Border Gateway Protocol . . . . . . . . . . . . . . . . . 4 | |||
3. Management Information Base (MIB). . . . . . . . . . . . . . . 5 | 3. Management Information Base (MIB). . . . . . . . . . . . . . . 5 | |||
4. Implementations. . . . . . . . . . . . . . . . . . . . . . . . 5 | 4. Implementations. . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
5. Operational Experience . . . . . . . . . . . . . . . . . . . . 5 | 5. Operational Experience . . . . . . . . . . . . . . . . . . . . 5 | |||
6. TCP Awareness. . . . . . . . . . . . . . . . . . . . . . . . . 6 | 6. TCP Awareness. . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
7. Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 7. Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
7.1. MULTI_EXIT_DISC (MED) . . . . . . . . . . . . . . . . . . . 7 | 7.1. MULTI_EXIT_DISC (MED) . . . . . . . . . . . . . . . . . . . 7 | |||
7.1.1. Sending MEDs to BGP Peers. . . . . . . . . . . . . . . . 8 | 7.1.1. MEDs and Potatoes. . . . . . . . . . . . . . . . . . . . 8 | |||
7.1.2. MED of Zero Versus No MED. . . . . . . . . . . . . . . . 8 | 7.1.2. Sending MEDs to BGP Peers. . . . . . . . . . . . . . . . 8 | |||
7.1.3. MEDs and Temporal Route Selection. . . . . . . . . . . . 8 | 7.1.3. MED of Zero Versus No MED. . . . . . . . . . . . . . . . 9 | |||
7.1.4. MEDs and Temporal Route Selection. . . . . . . . . . . . 9 | ||||
8. LOCAL_PREF . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 8. LOCAL_PREF . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
9. Internal BGP In Large Autonomous Systems . . . . . . . . . . . 10 | 9. Internal BGP In Large Autonomous Systems . . . . . . . . . . . 10 | |||
10. Internet Dynamics . . . . . . . . . . . . . . . . . . . . . . 10 | 10. Internet Dynamics . . . . . . . . . . . . . . . . . . . . . . 11 | |||
11. BGP Routing Information Bases (RIBs). . . . . . . . . . . . . 11 | 11. BGP Routing Information Bases (RIBs). . . . . . . . . . . . . 12 | |||
12. Update Packing. . . . . . . . . . . . . . . . . . . . . . . . 11 | 12. Update Packing. . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
13. Limit Rate Updates. . . . . . . . . . . . . . . . . . . . . . 12 | 13. Limit Rate Updates. . . . . . . . . . . . . . . . . . . . . . 13 | |||
14. Ordering of Path Attributes . . . . . . . . . . . . . . . . . 13 | 13.1. Consideration of TCP Characteristics . . . . . . . . . . . 13 | |||
15. AS_SET Sorting. . . . . . . . . . . . . . . . . . . . . . . . 13 | 14. Ordering of Path Attributes . . . . . . . . . . . . . . . . . 14 | |||
16. Control over Version Negotiation. . . . . . . . . . . . . . . 13 | 15. AS_SET Sorting. . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
17. Security Considerations . . . . . . . . . . . . . . . . . . . 13 | 16. Control over Version Negotiation. . . . . . . . . . . . . . . 15 | |||
17.1. TCP MD5 Signature Option . . . . . . . . . . . . . . . . . 14 | 17. Security Considerations . . . . . . . . . . . . . . . . . . . 15 | |||
17.2. BGP Over IPSEC . . . . . . . . . . . . . . . . . . . . . . 14 | 17.1. TCP MD5 Signature Option . . . . . . . . . . . . . . . . . 15 | |||
17.3. Miscellaneous. . . . . . . . . . . . . . . . . . . . . . . 14 | 17.2. BGP Over IPSEC . . . . . . . . . . . . . . . . . . . . . . 16 | |||
17.4. PTOMAINE and GROW. . . . . . . . . . . . . . . . . . . . . 15 | 17.3. Miscellaneous. . . . . . . . . . . . . . . . . . . . . . . 16 | |||
17.5. Internet Routing Registries (IRRs) . . . . . . . . . . . . 15 | 17.4. PTOMAINE and GROW. . . . . . . . . . . . . . . . . . . . . 17 | |||
17.5. Internet Routing Registries (IRRs) . . . . . . . . . . . . 17 | ||||
17.6. Regional Internet Registries (RIRs) and IRRs, | 17.6. Regional Internet Registries (RIRs) and IRRs, | |||
A Bit of History . . . . . . . . . . . . . . . . . . . . . . . . 16 | A Bit of History . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
17.7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 17 | 17.7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 19 | |||
18. References. . . . . . . . . . . . . . . . . . . . . . . . . . 18 | 18. References. . . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
19. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . 19 | 19. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . 21 | |||
20. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 20 | 20. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 22 | |||
1. Introduction | 1. Introduction | |||
The purpose of this memo is to document how the requirements for | The purpose of this memo is to document how the requirements for | |||
advancing a routing protocol from Draft Standard to full Standard | advancing a routing protocol from Draft Standard to full Standard | |||
have been satisfied by Border Gateway Protocol version 4 (BGP-4). | have been satisfied by Border Gateway Protocol version 4 (BGP-4). | |||
This report satisfies the requirement for "the second report", as | This report satisfies the requirement for "the second report", as | |||
described in Section 6.0 of RFC 1264. In order to fulfill the | described in Section 6.0 of RFC 1264. In order to fulfill the | |||
requirement, this report augments RFC 1773 and describes additional | requirement, this report augments RFC 1773 and describes additional | |||
skipping to change at page 7, line 31 | skipping to change at page 7, line 31 | |||
implementations provide the capability to compare MEDs between | implementations provide the capability to compare MEDs between | |||
different ASs as well. | different ASs as well. | |||
Though this may seem a fine idea for some configurations, care must | Though this may seem a fine idea for some configurations, care must | |||
be taken when comparing MEDs between different autonomous systems. | be taken when comparing MEDs between different autonomous systems. | |||
BGP speakers often derive MED values by obtaining the IGP metric | BGP speakers often derive MED values by obtaining the IGP metric | |||
associated with reaching a given BGP NEXT_HOP within the local AS. | associated with reaching a given BGP NEXT_HOP within the local AS. | |||
This allows MEDs to reasonably reflect IGP topologies when | This allows MEDs to reasonably reflect IGP topologies when | |||
advertising routes to peers. While this is fine when comparing MEDs | advertising routes to peers. While this is fine when comparing MEDs | |||
between multiple paths learned from a single AS, it can result in | between multiple paths learned from a single AS, it can result in | |||
potentially bad decisions when comparing MEDs between differt | potentially bad decisions when comparing MEDs between different | |||
automomous systems. This is most typically the case when the | automomous systems. This is most typically the case when the | |||
autonomous systems use different mechanisms to derive IGP metrics, | autonomous systems use different mechanisms to derive IGP metrics, | |||
BGP MEDs, or perhaps even use different IGP procotols with vastly | BGP MEDs, or perhaps even use different IGP procotols with vastly | |||
contrasting metric spaces. | contrasting metric spaces. | |||
Another MED deployment consideration involves the impact of | Another MED deployment consideration involves the impact of | |||
aggregation of BGP routing information on MEDs. Aggregates are often | aggregation of BGP routing information on MEDs. Aggregates are often | |||
generated from multiple locations in an AS in order to accommodate | generated from multiple locations in an AS in order to accommodate | |||
stability, redundancy and other network design goals. When MEDs are | stability, redundancy and other network design goals. When MEDs are | |||
derived from IGP metrics associated with said aggregates the MED | derived from IGP metrics associated with said aggregates the MED | |||
value advertised to peers can result in very suboptimal routing. | value advertised to peers can result in very suboptimal routing. | |||
The MED was purposely designed to be a "weak" metric that would only | The MED was purposely designed to be a "weak" metric that would only | |||
be used late in the best-path decision process. The BGP working | be used late in the best-path decision process. The BGP working | |||
group was concerned that any metric specified by a remote operator | group was concerned that any metric specified by a remote operator | |||
would only affect routing in a local AS if no other preference was | would only affect routing in a local AS if no other preference was | |||
specified. A paramount goal of the design of the MED was to ensure | specified. A paramount goal of the design of the MED was to ensure | |||
that peers could not "shed" or "absorb" traffic for networks that | that peers could not "shed" or "absorb" traffic for networks that | |||
they advertise. | they advertise. | |||
7.1.1. Sending MEDs to BGP Peers | 7.1.1. MEDs and Potatoes | |||
In a situation where traffic flows between a pair of destinations, | ||||
each connected to two transit networks, each of the transit networks | ||||
has the choice of either sending the traffic to the closest peering | ||||
to other transit provider or passing traffic to the peering which | ||||
advertises the least cost through the other provider. The former | ||||
method is called "hot potatoe routing" because like a hot potatoe | ||||
held in bare hands, whoever has it tries to get rid of it quickly. | ||||
Hot potatoe routing is accomplished by not passing the EGBP learned | ||||
MED into IBGP. This minimizes transit traffic for the provider | ||||
routing the traffic. Far less common is "cold potatoe routing" where | ||||
the transit provider uses their own transit capacity to get the | ||||
traffic to the point in the adjacent transit provider advertised as | ||||
being closest to the destination. Cold potatoe routing is | ||||
accomplished by passing the EBGP learned MED into IBGP. | ||||
If one transit provider uses hot potatoe routing and another uses | ||||
cold potatoe, traffic between the two tends to be symetric. | ||||
Depending on the business relationships, if one provider has more | ||||
capacity or a significantly less congested transit network, then that | ||||
provider may use cold potatoe routing. An example of widespread use | ||||
of cold potatoe routing was the NSF funded NSFNET backbone and NSF | ||||
funded regional networks in the mid 1990s. | ||||
In some cases a provider may use hot potatoe routing for some | ||||
destinations for a given peer AS and cold potatoe routing for others. | ||||
An example of this is the different treatment of commercial and | ||||
research traffic in the NSFNET in the mid 1990s. Then again, this | ||||
might best be described as 'mashed potatoe routing', a term which | ||||
reflects the complexity of router configurations in use at the time. | ||||
7.1.2. Sending MEDs to BGP Peers | ||||
[BGP4] allows MEDs received from any EBGP peers by a BGP speaker to | [BGP4] allows MEDs received from any EBGP peers by a BGP speaker to | |||
be passed to its IBGP peers. Although advertising MEDs to IBGP peers | be passed to its IBGP peers. Although advertising MEDs to IBGP peers | |||
is not a required behavior, it is a common default. MEDs received | is not a required behavior, it is a common default. MEDs received | |||
from EBGP peers by a BGP speaker MUST NOT be sent to other EBGP | from EBGP peers by a BGP speaker MUST NOT be sent to other EBGP | |||
peers. | peers. | |||
Note that many implementations provide a mechanism to derive MED | Note that many implementations provide a mechanism to derive MED | |||
values from IGP metrics in order to allow BGP MED information to | values from IGP metrics in order to allow BGP MED information to | |||
reflect the IGP topologies and metrics of the network when | reflect the IGP topologies and metrics of the network when | |||
propagating information to adjacent autonomous systems. | propagating information to adjacent autonomous systems. | |||
7.1.2. MED of Zero Versus No MED | 7.1.3. MED of Zero Versus No MED | |||
An implementation MUST provide a mechanism that allows for MED to be | An implementation MUST provide a mechanism that allows for MED to be | |||
removed. Previously, implementations did not consider a missing MED | removed. Previously, implementations did not consider a missing MED | |||
value to be the same as a MED of zero. No MED value should now be | value to be the same as a MED of zero. No MED value should now be | |||
equal to a value of zero. | equal to a value of zero. | |||
Note that many implementations provide an mechanism to explicitly | Note that many implementations provide an mechanism to explicitly | |||
define a missing MED value as "worst" or less preferable than zero or | define a missing MED value as "worst" or less preferable than zero or | |||
larger values. | larger values. | |||
7.1.3. MEDs and Temporal Route Selection | 7.1.4. MEDs and Temporal Route Selection | |||
Some implementations have hooks to apply temporal behavior in MED- | Some implementations have hooks to apply temporal behavior in MED- | |||
based best path selection. That is, all other things being equal up | based best path selection. That is, all other things being equal up | |||
to MED consideration, preference would be applied to the "oldest" | to MED consideration, preference would be applied to the "oldest" | |||
path, without preferring the lower MED value. The reasoning for this | path, without preferring the lower MED value. The reasoning for this | |||
is that "older" paths are presumably more stable, and thus more | is that "older" paths are presumably more stable, and thus more | |||
preferable. However, temporal behavior in route selection results in | preferable. However, temporal behavior in route selection results in | |||
non-deterministic behavior, and as such, is often undesirable. | non-deterministic behavior, and as such, is often undesirable. | |||
8. LOCAL_PREF | 8. LOCAL_PREF | |||
skipping to change at page 11, line 8 | skipping to change at page 11, line 38 | |||
should be applied to advertisements. In future specifications of | should be applied to advertisements. In future specifications of | |||
BGP-like protocols, damping methods should be considered for | BGP-like protocols, damping methods should be considered for | |||
mandatory inclusion in compliant implementations. | mandatory inclusion in compliant implementations. | |||
BGP Route Flap Damping is defined in [RFC 2439]. BGP Route Flap | BGP Route Flap Damping is defined in [RFC 2439]. BGP Route Flap | |||
Damping defines a mechanism to help reduce the amount of routing | Damping defines a mechanism to help reduce the amount of routing | |||
information passed between BGP peers, and subsequently, the load on | information passed between BGP peers, and subsequently, the load on | |||
these peers, without adversely affecting route convergence time for | these peers, without adversely affecting route convergence time for | |||
relatively stable routes. | relatively stable routes. | |||
None of the current implementations of BGP Route Flap Damping store | ||||
route history by unique NRLI and AS Path although it is listed as | ||||
manditory in RFC 2439. A potential result of failure to consider | ||||
each AS Path separately is an overly aggressive suppression of | ||||
destinations in a densely meshed network, with the most severe | ||||
consequence being suppression of a destination after a single | ||||
failure. Because the top tier autonomous systems in the Internet are | ||||
densely meshed, these adverse consequences are observed. | ||||
Route changes are announced using BGP UPDATE messages. The greatest | Route changes are announced using BGP UPDATE messages. The greatest | |||
overhead in advertising UPDATE messages happens whenever route | overhead in advertising UPDATE messages happens whenever route | |||
changes to be announced are inefficiently packed. As previously | changes to be announced are inefficiently packed. As previously | |||
discussed, announcing routing changes sharing common attributes in a | discussed, announcing routing changes sharing common attributes in a | |||
single BGP UPDATE message helps save considerable bandwidth and lower | single BGP UPDATE message helps save considerable bandwidth and lower | |||
processing overhead. | processing overhead. | |||
Persistent BGP errors may cause BGP peers to flap persistently if | Persistent BGP errors may cause BGP peers to flap persistently if | |||
peer dampening is not implemented. This would result in significant | peer dampening is not implemented. This would result in significant | |||
CPU utilization. Implementors may find it useful to implement peer | CPU utilization. Implementors may find it useful to implement peer | |||
skipping to change at page 13, line 4 | skipping to change at page 13, line 36 | |||
advertisement. The BGP protocol defines MinRouteAdvertisementInterval | advertisement. The BGP protocol defines MinRouteAdvertisementInterval | |||
parameter that determines the minimum time that must be elapse | parameter that determines the minimum time that must be elapse | |||
between the advertisement of routes to a particular destination from | between the advertisement of routes to a particular destination from | |||
a single BGP speaker. This value is set on a per BGP peer basis. | a single BGP speaker. This value is set on a per BGP peer basis. | |||
Due to the fact that BGP relies on TCP as the Transport protocol, TCP | Due to the fact that BGP relies on TCP as the Transport protocol, TCP | |||
can prevent transmission of data due to empty windows. As a result, | can prevent transmission of data due to empty windows. As a result, | |||
multiple Updates may be spaced closer together than orginally queued. | multiple Updates may be spaced closer together than orginally queued. | |||
Although this is not a common occurrence, implementations should be | Although this is not a common occurrence, implementations should be | |||
aware of this. | aware of this. | |||
13.1. Consideration of TCP Characteristics | ||||
If a TCP receiver is processing input more slowly than the sender or | ||||
if the TCP connection rate is the limiting factor, a form of | ||||
backpressure is observed by the TCP sending application. When the | ||||
TCP buffer fills, the sending application will either block on the | ||||
write or receive an error on the write. Common errors in either | ||||
early implementations or an occasional naive new implementation are | ||||
to either set options to block on the write or set options for non- | ||||
blocking writes and then treat the errors due to a full buffer as | ||||
fatal. | ||||
Having recognized that full write buffers are to be expected | ||||
additional implementation pitfalls exist. The application should not | ||||
attempt to store the TCP stream within the application itself. If | ||||
the receiver or the TCP connection is persistently slow, then the | ||||
buffer can grow until memory is exhausted. A BGP implementation must | ||||
send changes to all peers for which the TCP connection is not blocked | ||||
and must remember to send those changes to the remaining peers when | ||||
the connection becomes unblocked. | ||||
If the preferred route for a given NLRI changes multiple times while | ||||
writes to one or more peers is blocked, only the most recent best | ||||
route needs to be sent. In this way BGP is work conserving. In | ||||
times of extremely high route change, a higher volume of route change | ||||
is sent to those peers which are able to process it more quickly and | ||||
a lower volume of route change is sent to those peers not able to | ||||
process the changes as quickly. | ||||
For implentations which handle differing peer capacity to absorb | ||||
route change well, if the majority of route change is contributed by | ||||
a subset of unstable NRLI, the only impact on relatively stable NRLI | ||||
which make an isolated route change is a slower convergence for which | ||||
convergence time remains bounded regardless of the amount of | ||||
instability. | ||||
14. Ordering of Path Attributes | 14. Ordering of Path Attributes | |||
The BGP protocol suggests that BGP speakers sending multiple prefixes | The BGP protocol suggests that BGP speakers sending multiple prefixes | |||
per an UPDATE message should sort and order path attributes according | per an UPDATE message should sort and order path attributes according | |||
to Type Codes. This would help their peers to quickly identify sets | to Type Codes. This would help their peers to quickly identify sets | |||
of attributes from different update messages which are semantically | of attributes from different update messages which are semantically | |||
different. | different. | |||
Implementers may find it useful to order path attributes according to | Implementers may find it useful to order path attributes according to | |||
End of changes. | ||||
This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |