Network Working Group P. Marques Internet-DraftIntended status: Standards Track R. FernandoExpires:September 10,October 22, 2011 R. Fernando E. Chen P. Mohapatra Cisco SystemsMarch 9,H. Gredler Juniper Networks April 20, 2011 Advertisement of the best external route in BGPdraft-ietf-idr-best-external-03.txtdraft-ietf-idr-best-external-04 Abstract Thebase BGP specifications prevent a BGP speaker from advertising any routecurrent BGP-4 protocol specification [RFC4271] states thatis notthe selection process chooses the bestroutepath for aBGP destination. This document specifies a modification of this rule. Routes are divided into two categories, "external" and "internal". A specificationgiven route which isprovided for choosing a "best external route" (for a particular valueadded to the Loc-Rib and advertised to all peers. Previous versions [RFC1771] of theNetwork Layer Reachability Information). Aspecification defined a different rule for Internal BGPspeaker is then allowed to advertise its "best external route"Updates. Given that Internal paths are not re- advertised toits internal BGPInternal peers,even ifit was specified thatis notthe bestroute forof thedestination. Theexternal paths, as determined by the path selection tie breaking algorithm, would be advertised to Internal peers. This document extends that procedure to operate in environments where Route Reflection [RFC4456] or Confederations [RFC5065] are used and explains why advertising thebest external routeadditional routing information can improve convergence time without causing routing loops. Additional benefits include reduction of inter-domain churn and avoidance of permanent route oscillation.The document also generalizes the notions of "internal" and "external" so that they can be applied to Route Reflector Clusters and Autonomous System Confederations.Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire onSeptember 10,October 22, 2011. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.2. Requirements Language . . . . . . . . . . . . . . . . . . . . 52.3. Generalization . . . . . . . . . . . . . . . . . . . . . . . . 6 4. Algorithm for selection ofbest external route .the Adj-RIB-OUT path . . . . . . .5 3.7 5. Advertisement Rules . . . . . . . . . . . . . . . . . . . . .6 4.9 6. Consistency between routing and forwarding . . . . . . . . . .6 5.10 7. Applications . . . . . . . . . . . . . . . . . . . . . . . . .8 5.1.12 8. Fast Connectivity Restoration . . . . . . . . . . . . . .8 5.2.. . 13 9. Inter-Domain Churn Reduction . . . . . . . . . . . . . . .9 5.3.. . 14 10. Reducing Persistent IBGP oscillation . . . . . . . . . . .9 6.. . 15 11. Deployment Considerations . . . . . . . . . . . . . . . . . .9 7.16 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . .9 8.17 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . .10 9.18 14. Security Considerations . . . . . . . . . . . . . . . . . . .10 10. Normative19 15. References . . . . . . . . . . . . . . . . . . . . .10. . . . . 20 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . .1021 1. IntroductionThe base BGP specifications prevent a BGP speakerEarlier versions of the BGP-4 protocol specification [RFC1771] prescribed different route advertisement rules for Internal and External peers. While the overall best path would be advertised to External peers, Internal peers are advertised the best of the externally received paths. This Internal advertisement rule was never implemented as specified and was latter dropped from the protocol. There is a trade-off in advertisinganythe "best-external" route versus the behavior thatisbecame common standard of not advertising thebestroutefor a BGP destination. This document specifies a modification ofwhen the selected best path is received from an Internal peer. By not advertising information in thisrule. Routes are divided into two categories, "external" and "internal". A specificationcase it isprovided for choosing a "best external route" (for a particular value ofpossible to reduce state both in theNetwork Layer Reachability Information). Alocal BGP speakeris then allowed to advertise its "best external route" to its internalas well as in the network overall. Early BGPpeers, even if thatimplementations where very concerned with reducing state as they where limited to relatively low memory footprints (e.g. 16 MB). There isnotalso thebest route forpossible concern regarding advertising a path different than thedestination. The document explains whypath that has been selected for forwarding. However, advertising the best externalroute can improve convergence time without causing routing loops. Additional benefits include reduction of inter-domain churn and avoidance of permanent route oscillation. The document also generalizesroute, when different from thenotionsbest route, presents additional information into an IBGP mesh which may be of"internal" and "external" sovalue for several purposes including: o Faster restoration of connectivity. By providing additional paths, thatthey canmay beappliedused toRoute Reflector Clusters [RFC4456] and Autonomous System Confederations [RFC5065]. More specifically, two routersfail over inthe same route reflector cluster having an IBGP session between them are defined to be "internal" peers, whereas two routers in different clusters having an IBGP session are defined to be "external" peers. Similarly, two routers in the same member AS of a confederation having an IBGP session between them are "internal" peers, whereas two routers in different member ASs of a confederation having a confed EBGP session between them are defined to be "external" peers. The definition of "best external route" ensues from this definition in that it is the most preferred route among those received from the "external" neighbors. Advertising the best external route, when different from the best route, presents additional information into an IBGP mesh which may be of value for several purposes including: o Faster restoration of connectivity, by providing additional paths, that may be used to fail over in casecase the primary path becomes invalid or is withdrawn. o Reducing inter-domain churn and trafficblackholingblack-holing due to the readily available alternate path. o Reducing the potential for situations of permanent IBGP routeoscillation, as discussed in some scenariososcillation [RFC3345]. o Improving selection of lower MED routes from the same neighboring AS. This document defines procedures to select the best external route for eachdestination.peer. It also describes how above benefits are realized with best external route announcement with the help of certain scenarios.1.1.2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].2. Algorithm for selection of best external route Given3. Generalization The BGP-4 protocol [RFC1771] has extended with two alternative mechanisms that provide ways to reduce theintent in advertisingoperational complexity of route distribution within anexternal route, when the bestAS: Route Reflection [RFC4456] and Confederations [RFC5065]. It is important to be able to express routeforadvertisement rules in thesame destinationcontext of both of these mechanisms. When Route Reflection isanused, Internal peers are further classified depending of the reflection cluster they belong to. Non-client internalroute, is to provide additional information intopeers form one BGP peering mesh. Each set of RR clients with theIBGP mesh into whichsame "cluster-id" configuration forms aroute is participating, it is desirableseparate mesh. When selecting the path to add totake into accounttheroutes receivedAdj-RIB-OUT, this document specifies that the path that originate frominternal neighbors inthe same mesh MAY be excluded from consideration. This results in an Adj-RIB-OUT selectionprocess. We proposeper mesh (the set of non-client peers or aroute selection algorithm that selectsspecific cluster). Similarly, when BGP Confederations are used, each confederated AS is atotal order betweenBGP mesh. As with the Route Reflection scenario, when selecting the path to add to the Adj-RIB-OUT, routesand which selectsfrom the same mesh MAY be excluded. 4. Algorithm for selection of the Adj-RIB-OUT path The objective of this protocol extension is to improve the quality of the routing information known to a particular BGP mesh with minimum additional cost in terms of processing and state. Towards that goal, it is useful to define a total order between the Adj-RIB-In routes which provides both the same overall bestroutepath as theone currently specified [RFC4271].algorithm defined in the current BGP-4 specification [RFC4271] as well as an ordering of alternate routes. Using this total order it is then computationally efficient to select the path for a specific Adj-RIB-OUT by excluding the routes that have been received from the BGP mesh corresponding to the peer (or set of peers). In order to achieve this,we needit is helpful to introduce the concept ofroutepath group.For a given NLRI, supposeA group is theBGP decision process has runset of paths that compare as equal through all the steps prior to the MED comparison step (as defined in section 9.1.2.2 of[RFC4271]. Look at the set of routes that are still under consideration at that time. Now partition this set into a number of disjoint route groups, where two routes are in the same group ifRFC 4271 [RFC4271] andonly ifhave been received from the same neighborAS of each route is the same. RoutesAS. Paths are ordered within a group via MED or subsequent route selection rules.The order of all routes for the same destination is determined by the order ofIn pseudo-code: function compare(path_1, path_2) { cmp_result cmp = selection_steps_before_med(path_1, path_2); if (cmp != cmp_result.equal) { return cmp; } if (neighbor_as(path_1) == neighbor_as(path_2)) { return selection_steps_after_med(path_1, path_2); } if (is_group_best(path_1)) { if (!is_group_best(path_2)) { return cmp_result.greater_than; } return selection_steps_after_med(path_1, path_2); } else { if (is_group_best(path_2)) { return cmp_result.less_than; } /* Compare the bestroute in each group.paths of respective groups */ return compare(group_best(path_1), group_best(path_2)); } } As an example, the following set of received routes: +------+----+-----+--------+ | Path | AS | MED | rtr_id | +------+----+-----+--------+ | a | 1 | 10 | 10 | | | | | | | b | 2 | 5 | 1 | | | | | | | c | 1 | 5 | 5 | | | | | | | d | 2 | 20 | 20 | | | | | | | e | 2 | 30 | 30 | | | | | | | f | 3 | 10 | 20Figure 1:| +------+----+-----+--------+ Path Attribute Table Would yield the following order (from the most to the least preferred): b < d < e < c < a < f In this example, comparison of the bestroutepath within each group provides the sequence (b < c < f). The remainingroutespaths are ordered in relation to their respective group best. The firstroutepath in theaboveordering above isindeedthe bestrouteoverall path for a givendestination. Eliminating the best route and executing the above steps leads us toNLRI. When selecting anew total order of the routes. The route to be advertised topath for a particulardomain is selected by choosing the most preferred route that is externalAdj-RIB-Out (or set of RIB-Outs) an implementation MAY choose tothat particular domainselect the first path in theabove order. Note that wheneverglobal order which was not received from theoverall best route is external it will automatically be selected by this algorithm. 3.same BGP mesh (as defined above) as the target peer (or peers). 5. Advertisement Rules 1.In an AS domain, ifWhen advertising arouter has installed an internalrouteas best, it should advertise its "best external route" (as defined in the draft)toits internal neighbors. 2. InaCluster domain, ifnon-client Internal peer, arouter (route reflector) has installed an external route as best, it should advertise its "best internal route" to its external neighbors. (AdvertisingBGP speaker MAY choose tointernal neighbors is unchanged.) Similarly, ifselect theroute reflector has installed an internal route as best, it should advertise its "best external route" to its internal (client) peers. Infirst path in orderforthat did not originate from the same BGP mesh (i.e. the set of non-client Internal peers) whenever the best overall path has been received from this mesh and would be suppressed by thereflectorInternal BGP non- readvertisement rule. 2. When advertising a route to a Route Reflection client peer, in case the overall best path has been received from the same cluster, a BGP speaker MUST be able to advertise the bestexternal route intooverall path to all thecluster, itmembers of the cluster other than the originator, unless "client-to-client" reflection isnecessarydisabled. The implementation MAY choose to advertise an alternate path to the specific peer thatclient-to-client reflection be disabled, since its advertisement may otherwise containoriginates the bestroute withinoverall path by excluding from consideration all paths with thecluster domain.same originator-id. 3.In a Confederation Member domain, if a router (confederation border router) has installed an internal route as best, it advertises its best external route to its internal neighbors. However, if it has installed an external route as best, it advertises its best internal route to its external neighbors. 4. Consistency between routingWhen "client-to-client" reflection is disabled andforwarding The BGP protocol,the cluster is operating asdefined in [RFC1771], specifies thataBGP speaker shallmesh, a Route Reflector MAY opt to advertise toits internal peerstheroute withcluster thehighest degree of preference among routes topreferred path from thesame destinationset of paths not received fromexternal neighbors. This section discusses problems present with the approach described in [RFC1771] andthenext section offers an alternative algorithm to select a best external route whichcluster. While this deployment mode is currently uncommon, it can beadvertiseda practical way to guarantee path diversity inside the cluster. 4. A confederation border route MAY choose to advertise anIBGP mesh.alternate path towards its Internal BGP mesh or towards a con-fed member AS following the same procedure as defined above. 6. Consistency between routing and forwarding The internal update advertisement rules contained in the original BGP-4 specification [RFC1771] can lead to situations where traffic is forwarded through a route other than the route advertised by BGP. Inconsistencies between forwarding and routing are highly undesirable. Service providers use BGP with the dual objective of learningreachabilityreach-ability information and expressing policy over network resources. The latter assumes that forwarding follows routing information. Consider the Autonomous system presented in figure 1, where r1 ... r4 are members of a single IBGP mesh and routes a, b, and c are received from external peers. AS 1 (c) | +----+ +----+ | r1 |...........| r2 | +----+ +----+ . . . . . . +----+ +----+ | r3 |...........| r4 | --- ebgp --- AS X +----+ +----+ / \ / \ AS 1 (a) AS 2 (b)Figure 2:Inconsistency in Routing +------+----+-----+--------+ | Path | AS | MEDrtr_id| rte_id | +------+----+-----+--------+ | a | 1 | 10 | 1 | | | | | | | b | 2 | 5 | 10 | | | | | | | c | 1 | 5 | 5Figure 3:| +------+----+-----+--------+ Path Attribute Table- 2Following the rules as specified in RFC 1771 [RFC1771], router r3 will select path (b) received from AS 2 as its overall best to install in the Loc-Rib, since path (b) is preferable to path (c), the lowest MED route from AS 1. However for the purposes of Internal Update route selection, it will ignore the presence of path (c), and elect (a) as its advertisement, via the router-id tie-breaking rule. In this scenario, router r4 will receive (c) from r1 and (a) from r3. It will pick the lowest MED route (c) and advertise it out viaebgpIBGP to AS X. However at this point routing is inconsistent with forwarding as traffic received from AS X will be forwarded towards AS 2, while theebgpIBGP advertisement is being made for an AS 1 path. Routing policies are typically specified in terms of neighboringASes.AS-es. In the situation above, assuming that AS 1 is network for which this AS provides transit services while AS 2 and AS X are peer networks, one can easily see how the inconsistency between routing and forwarding would lead to transit being inadvertently provided between AS X and AS 2. This could lead to persistent forwarding loops. Inconsistency between routing and forwarding may happen, whenever abgpGP speaker chooses to advertise an external route into IBGP that is different from the overall best route and its overall best is external.5.7. Applications5.1.8. Fast Connectivity Restoration When two exits are available to reach a particular destination and one is preferred over the other, the availability of an alternate path provides fast connectivity restoration when the primary path fails. Restoration can be quick since the alternate path is already at hand. The border router couldprecomputerecompute the backup route andpreinstallperinatal it in FIB ready to be switched when the primary goes away. Note that this requires the border router that's the backup to alsopreinstallperinatal the secondary path and switch to it on failure.5.2.9. Inter-Domain Churn Reduction Within an AS, the non availability of backup best leads to a border router sending a withdraw upstream when the primary fails. This leads to inter-domain churn and packet loss for the time the network takes to converge to the alternate path. Having the alternate path will reduces the churn and eliminates packet loss.5.3.10. Reducing Persistent IBGP oscillation Advertising the best external route, according to the algorithm described in this document will reduce the possibility of route oscillation by introducing additional information into the IBGP system. For a permanent oscillation condition to occur, it is necessary that a circular dependency between paths occurs such that the selection of a new best path by a router, in response to a received IBGP advertisement, causes the withdrawal of information that another router depends on in order to generate the original event. In vanilla BGP, when only the best overall route is advertised, as in most implementations, oscillation can occur whenever there are 2 orclusters/sub-ASesclusters/sub-AS-es such that at least one cluster has more than one path that can potentially contribute to the dependency.6.11. Deployment Considerations The mechanism specified in the draft allows a BGP speaker to advertise a route that is not the best route used for forwarding. This is a departure from the current behavior. However, consistency in the path selection process across the AS is still guaranteed since the ingress routers will not choose the best-external route as the best route for a destination in steady state (for the same reason that the BGP speaker announcing the best-external route chose an IBGP route as best instead of the externally learnt route). Though it is possible to alter this assurance by defining route policies on IBGP sessions, use of such policies in IBGP is not recommended, especially with best-external announcement turned on in the network. It is also worth noting that such inconsistency in routing and forwarding is mitigated in a tunneled network.7.12. Acknowledgments This document greatly benefits from the comments of Yakov Rekhter, John Scudder, Eric Rosen, Jenny Yuan,andJayBorkenhagen. 8.Borkenhagen, Salkat Ray and Jakob Heitz. 13. IANA Considerations This document has no actions for IANA.9.14. Security Considerations There are no additional security risks introduced by this design.10. Normative15. References [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 (BGP-4)", RFC 1771, March 1995. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3345] McPherson, D., Gill, V., Walton, D., and A. Retana, "Border Gateway Protocol (BGP) Persistent Route Oscillation Condition", RFC 3345, August 2002. [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 (BGP-4)", RFC 4271, January 2006. [RFC4456] Bates, T., Chen, E., and R. Chandra, "BGP Route Reflection: An Alternative to Full Mesh Internal BGP (IBGP)", RFC 4456, April 2006. [RFC5065] Traina, P., McPherson, D., and J. Scudder, "Autonomous System Confederations for BGP", RFC 5065, August 2007. Authors' Addresses Pedro Marques Email: pedro.r.marques@gmail.com Rex Fernando Cisco Systems 170 W. TasmanDriveDr. San Jose, CA 95134USAUS Email: rex@cisco.com Enke Chen Cisco Systems 170 W. TasmanDriveDr. San Jose, CA 95134USAUS Email: enkechen@cisco.com Pradosh Mohapatra Cisco Systems 170 W. TasmanDriveDr. San Jose, CA 95134USAUS Email: pmohapat@cisco.com Hannes Gredler Juniper Networks 1194 N. Mathilda Ave. Sunnyvale, CA 94089 US Email: hannes@juniper.net