--- 1/draft-ietf-idr-bgp4-experience-protocol-02.txt 2006-02-04 23:30:35.000000000 +0100 +++ 2/draft-ietf-idr-bgp4-experience-protocol-03.txt 2006-02-04 23:30:35.000000000 +0100 @@ -1,19 +1,19 @@ INTERNET-DRAFT Danny McPherson Arbor Networks Keyur Patel Cisco Systems Category Informational Expires: March 2004 September 2003 Experience with the BGP-4 Protocol - + Status of this Document This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. @@ -57,44 +57,46 @@ 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. BGP-4 Overview . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. A Border Gateway Protocol . . . . . . . . . . . . . . . . . 4 3. Management Information Base (MIB). . . . . . . . . . . . . . . 5 4. Implementations. . . . . . . . . . . . . . . . . . . . . . . . 5 5. Operational Experience . . . . . . . . . . . . . . . . . . . . 5 6. TCP Awareness. . . . . . . . . . . . . . . . . . . . . . . . . 6 7. Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 7.1. MULTI_EXIT_DISC (MED) . . . . . . . . . . . . . . . . . . . 7 - 7.1.1. Sending MEDs to BGP Peers. . . . . . . . . . . . . . . . 8 - 7.1.2. MED of Zero Versus No MED. . . . . . . . . . . . . . . . 8 - 7.1.3. MEDs and Temporal Route Selection. . . . . . . . . . . . 8 + 7.1.1. MEDs and Potatoes. . . . . . . . . . . . . . . . . . . . 8 + 7.1.2. Sending MEDs to BGP Peers. . . . . . . . . . . . . . . . 8 + 7.1.3. MED of Zero Versus No MED. . . . . . . . . . . . . . . . 9 + 7.1.4. MEDs and Temporal Route Selection. . . . . . . . . . . . 9 8. LOCAL_PREF . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9. Internal BGP In Large Autonomous Systems . . . . . . . . . . . 10 - 10. Internet Dynamics . . . . . . . . . . . . . . . . . . . . . . 10 - 11. BGP Routing Information Bases (RIBs). . . . . . . . . . . . . 11 - 12. Update Packing. . . . . . . . . . . . . . . . . . . . . . . . 11 - 13. Limit Rate Updates. . . . . . . . . . . . . . . . . . . . . . 12 - 14. Ordering of Path Attributes . . . . . . . . . . . . . . . . . 13 - 15. AS_SET Sorting. . . . . . . . . . . . . . . . . . . . . . . . 13 - 16. Control over Version Negotiation. . . . . . . . . . . . . . . 13 - 17. Security Considerations . . . . . . . . . . . . . . . . . . . 13 - 17.1. TCP MD5 Signature Option . . . . . . . . . . . . . . . . . 14 - 17.2. BGP Over IPSEC . . . . . . . . . . . . . . . . . . . . . . 14 - 17.3. Miscellaneous. . . . . . . . . . . . . . . . . . . . . . . 14 - 17.4. PTOMAINE and GROW. . . . . . . . . . . . . . . . . . . . . 15 - 17.5. Internet Routing Registries (IRRs) . . . . . . . . . . . . 15 + 10. Internet Dynamics . . . . . . . . . . . . . . . . . . . . . . 11 + 11. BGP Routing Information Bases (RIBs). . . . . . . . . . . . . 12 + 12. Update Packing. . . . . . . . . . . . . . . . . . . . . . . . 12 + 13. Limit Rate Updates. . . . . . . . . . . . . . . . . . . . . . 13 + 13.1. Consideration of TCP Characteristics . . . . . . . . . . . 13 + 14. Ordering of Path Attributes . . . . . . . . . . . . . . . . . 14 + 15. AS_SET Sorting. . . . . . . . . . . . . . . . . . . . . . . . 15 + 16. Control over Version Negotiation. . . . . . . . . . . . . . . 15 + 17. Security Considerations . . . . . . . . . . . . . . . . . . . 15 + 17.1. TCP MD5 Signature Option . . . . . . . . . . . . . . . . . 15 + 17.2. BGP Over IPSEC . . . . . . . . . . . . . . . . . . . . . . 16 + 17.3. Miscellaneous. . . . . . . . . . . . . . . . . . . . . . . 16 + 17.4. PTOMAINE and GROW. . . . . . . . . . . . . . . . . . . . . 17 + 17.5. Internet Routing Registries (IRRs) . . . . . . . . . . . . 17 17.6. Regional Internet Registries (RIRs) and IRRs, - A Bit of History . . . . . . . . . . . . . . . . . . . . . . . . 16 - 17.7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 17 - 18. References. . . . . . . . . . . . . . . . . . . . . . . . . . 18 - 19. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . 19 - 20. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 20 + A Bit of History . . . . . . . . . . . . . . . . . . . . . . . . 17 + 17.7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 19 + 18. References. . . . . . . . . . . . . . . . . . . . . . . . . . 20 + 19. Authors' Addresses. . . . . . . . . . . . . . . . . . . . . . 21 + 20. Full Copyright Statement. . . . . . . . . . . . . . . . . . . 22 1. Introduction The purpose of this memo is to document how the requirements for advancing a routing protocol from Draft Standard to full Standard have been satisfied by Border Gateway Protocol version 4 (BGP-4). This report satisfies the requirement for "the second report", as described in Section 6.0 of RFC 1264. In order to fulfill the requirement, this report augments RFC 1773 and describes additional @@ -230,66 +232,98 @@ implementations provide the capability to compare MEDs between different ASs as well. Though this may seem a fine idea for some configurations, care must be taken when comparing MEDs between different autonomous systems. BGP speakers often derive MED values by obtaining the IGP metric associated with reaching a given BGP NEXT_HOP within the local AS. This allows MEDs to reasonably reflect IGP topologies when advertising routes to peers. While this is fine when comparing MEDs between multiple paths learned from a single AS, it can result in - potentially bad decisions when comparing MEDs between differt + potentially bad decisions when comparing MEDs between different automomous systems. This is most typically the case when the autonomous systems use different mechanisms to derive IGP metrics, BGP MEDs, or perhaps even use different IGP procotols with vastly contrasting metric spaces. Another MED deployment consideration involves the impact of aggregation of BGP routing information on MEDs. Aggregates are often generated from multiple locations in an AS in order to accommodate stability, redundancy and other network design goals. When MEDs are derived from IGP metrics associated with said aggregates the MED value advertised to peers can result in very suboptimal routing. The MED was purposely designed to be a "weak" metric that would only be used late in the best-path decision process. The BGP working group was concerned that any metric specified by a remote operator would only affect routing in a local AS if no other preference was specified. A paramount goal of the design of the MED was to ensure that peers could not "shed" or "absorb" traffic for networks that they advertise. -7.1.1. Sending MEDs to BGP Peers +7.1.1. MEDs and Potatoes + + In a situation where traffic flows between a pair of destinations, + each connected to two transit networks, each of the transit networks + has the choice of either sending the traffic to the closest peering + to other transit provider or passing traffic to the peering which + advertises the least cost through the other provider. The former + method is called "hot potatoe routing" because like a hot potatoe + held in bare hands, whoever has it tries to get rid of it quickly. + Hot potatoe routing is accomplished by not passing the EGBP learned + MED into IBGP. This minimizes transit traffic for the provider + routing the traffic. Far less common is "cold potatoe routing" where + the transit provider uses their own transit capacity to get the + traffic to the point in the adjacent transit provider advertised as + being closest to the destination. Cold potatoe routing is + accomplished by passing the EBGP learned MED into IBGP. + + If one transit provider uses hot potatoe routing and another uses + cold potatoe, traffic between the two tends to be symetric. + Depending on the business relationships, if one provider has more + capacity or a significantly less congested transit network, then that + provider may use cold potatoe routing. An example of widespread use + of cold potatoe routing was the NSF funded NSFNET backbone and NSF + funded regional networks in the mid 1990s. + + In some cases a provider may use hot potatoe routing for some + destinations for a given peer AS and cold potatoe routing for others. + An example of this is the different treatment of commercial and + research traffic in the NSFNET in the mid 1990s. Then again, this + might best be described as 'mashed potatoe routing', a term which + reflects the complexity of router configurations in use at the time. + +7.1.2. Sending MEDs to BGP Peers [BGP4] allows MEDs received from any EBGP peers by a BGP speaker to be passed to its IBGP peers. Although advertising MEDs to IBGP peers is not a required behavior, it is a common default. MEDs received from EBGP peers by a BGP speaker MUST NOT be sent to other EBGP peers. Note that many implementations provide a mechanism to derive MED values from IGP metrics in order to allow BGP MED information to reflect the IGP topologies and metrics of the network when propagating information to adjacent autonomous systems. -7.1.2. MED of Zero Versus No MED +7.1.3. MED of Zero Versus No MED An implementation MUST provide a mechanism that allows for MED to be removed. Previously, implementations did not consider a missing MED value to be the same as a MED of zero. No MED value should now be equal to a value of zero. Note that many implementations provide an mechanism to explicitly define a missing MED value as "worst" or less preferable than zero or larger values. -7.1.3. MEDs and Temporal Route Selection +7.1.4. MEDs and Temporal Route Selection Some implementations have hooks to apply temporal behavior in MED- based best path selection. That is, all other things being equal up to MED consideration, preference would be applied to the "oldest" path, without preferring the lower MED value. The reasoning for this is that "older" paths are presumably more stable, and thus more preferable. However, temporal behavior in route selection results in non-deterministic behavior, and as such, is often undesirable. 8. LOCAL_PREF @@ -376,20 +410,29 @@ should be applied to advertisements. In future specifications of BGP-like protocols, damping methods should be considered for mandatory inclusion in compliant implementations. BGP Route Flap Damping is defined in [RFC 2439]. BGP Route Flap Damping defines a mechanism to help reduce the amount of routing information passed between BGP peers, and subsequently, the load on these peers, without adversely affecting route convergence time for relatively stable routes. + None of the current implementations of BGP Route Flap Damping store + route history by unique NRLI and AS Path although it is listed as + manditory in RFC 2439. A potential result of failure to consider + each AS Path separately is an overly aggressive suppression of + destinations in a densely meshed network, with the most severe + consequence being suppression of a destination after a single + failure. Because the top tier autonomous systems in the Internet are + densely meshed, these adverse consequences are observed. + Route changes are announced using BGP UPDATE messages. The greatest overhead in advertising UPDATE messages happens whenever route changes to be announced are inefficiently packed. As previously discussed, announcing routing changes sharing common attributes in a single BGP UPDATE message helps save considerable bandwidth and lower processing overhead. Persistent BGP errors may cause BGP peers to flap persistently if peer dampening is not implemented. This would result in significant CPU utilization. Implementors may find it useful to implement peer @@ -453,20 +496,56 @@ parameter that determines the minimum time that must be elapse between the advertisement of routes to a particular destination from a single BGP speaker. This value is set on a per BGP peer basis. Due to the fact that BGP relies on TCP as the Transport protocol, TCP can prevent transmission of data due to empty windows. As a result, multiple Updates may be spaced closer together than orginally queued. Although this is not a common occurrence, implementations should be aware of this. +13.1. Consideration of TCP Characteristics + + If a TCP receiver is processing input more slowly than the sender or + if the TCP connection rate is the limiting factor, a form of + backpressure is observed by the TCP sending application. When the + TCP buffer fills, the sending application will either block on the + write or receive an error on the write. Common errors in either + early implementations or an occasional naive new implementation are + to either set options to block on the write or set options for non- + blocking writes and then treat the errors due to a full buffer as + fatal. + + Having recognized that full write buffers are to be expected + additional implementation pitfalls exist. The application should not + attempt to store the TCP stream within the application itself. If + the receiver or the TCP connection is persistently slow, then the + buffer can grow until memory is exhausted. A BGP implementation must + send changes to all peers for which the TCP connection is not blocked + and must remember to send those changes to the remaining peers when + the connection becomes unblocked. + + If the preferred route for a given NLRI changes multiple times while + writes to one or more peers is blocked, only the most recent best + route needs to be sent. In this way BGP is work conserving. In + times of extremely high route change, a higher volume of route change + is sent to those peers which are able to process it more quickly and + a lower volume of route change is sent to those peers not able to + process the changes as quickly. + + For implentations which handle differing peer capacity to absorb + route change well, if the majority of route change is contributed by + a subset of unstable NRLI, the only impact on relatively stable NRLI + which make an isolated route change is a slower convergence for which + convergence time remains bounded regardless of the amount of + instability. + 14. Ordering of Path Attributes The BGP protocol suggests that BGP speakers sending multiple prefixes per an UPDATE message should sort and order path attributes according to Type Codes. This would help their peers to quickly identify sets of attributes from different update messages which are semantically different. Implementers may find it useful to order path attributes according to Type Code so that sets of attributes with identical semantics can be