draft-ietf-idr-route-damp-00.txt | draft-ietf-idr-route-damp-01.txt | |||
---|---|---|---|---|
Internet Engineering Task Force Curtis Villamizar | Internet Engineering Task Force Curtis Villamizar | |||
INTERNET-DRAFT ANS | INTERNET-DRAFT ANS | |||
draft-ietf-idr-route-damp-00 Ravi Chandra | draft-ietf-idr-route-damp-01 Ravi Chandra | |||
Cisco | Cisco | |||
Ramesh Govindan | Ramesh Govindan | |||
ISI | ISI | |||
October 30, 1997 | January 8, 1998 | |||
BGP Route Flap Damping | BGP Route Flap Damping | |||
Status of this Memo | Status of this Memo | |||
This document is an Internet-Draft. Internet-Drafts are working | This document is an Internet-Draft. Internet-Drafts are working | |||
documents of the Internet Engineering Task Force (IETF), its areas, | documents of the Internet Engineering Task Force (IETF), its areas, | |||
and its working groups. Note that other groups may also distribute | and its working groups. Note that other groups may also distribute | |||
working documents as Internet-Drafts. | working documents as Internet-Drafts. | |||
skipping to change at page 2, line ? | skipping to change at page 2, line ? | |||
well behaved routes. | well behaved routes. | |||
This must be accomplished keeping other goals of BGP in mind: | This must be accomplished keeping other goals of BGP in mind: | |||
o pack changes into a small number of updates | o pack changes into a small number of updates | |||
o preserve consistent routing | o preserve consistent routing | |||
o minimal addition space and computational overhead | o minimal addition space and computational overhead | |||
Excessive updates to reachability state has been widespread in the | An excessive rate of update to the advertised reachability of a subset | |||
Internet. This observation was made in the early 1990s by many people | of Internet prefixes has been widespread in the Internet. This | |||
involved in Internet operations and remains to case to date. These | observation was made in the early 1990s by many people involved in | |||
excessive updates are not necessarily periodic so route oscillation | Internet operations and remains the case. These excessive updates are | |||
would be a misleading term. The informal term used to describe this | not necessarily periodic so route oscillation would be a misleading | |||
effect is ``route flap''. The techniques described here are now | term. The informal term used to describe this effect is ``route | |||
widely deployed and are commonly referred to as ``route flap | flap''. The techniques described here are now widely deployed and are | |||
damping''. | commonly referred to as ``route flap damping''. | |||
1 Overview | 1 Overview | |||
It is necessary to reduce the amount of routing traffic (the number of | To maintain scalability of a routed internet, it is necessary to | |||
update message) generated by BGP in order to limit processing | reduce the amount of change in routing state propagated by BGP in | |||
requirements. The primary contributors of processing load resulting | order to limit processing requirements. The primary contributors of | |||
from BGP updates are the BGP decision process and adding and removing | processing load resulting from BGP updates are the BGP decision | |||
forwarding entries. | process and adding and removing forwarding entries. | |||
Consider the following example. A widely deployed BGP implementation | Consider the following example. A widely deployed BGP implementation | |||
may tend to fail due to high routing update volume. For example, it | may tend to fail due to high routing update volume. For example, it | |||
may be unable to maintain it's BGP or IGP sessions if sufficiently | may be unable to maintain it's BGP or IGP sessions if sufficiently | |||
loaded. The failure of one router can further contribute to the load | loaded. The failure of one router can further contribute to the load | |||
on other routers. This additional load may cause failures in other | on other routers. This additional load may cause failures in other | |||
instances of the same implementation or other implementations with a | instances of the same implementation or other implementations with a | |||
similar weakness. In the worst case, a stable oscillation could | similar weakness. In the worst case, a stable oscillation could | |||
result. Such worse cases have already been observed in practice. | result. Such worse cases have already been observed in practice. | |||
A BGP implementation must be prepared for a large volume of routing | A BGP implementation must be prepared for a large volume of routing | |||
traffic. A BGP implementation cannot rely upon the sender to | traffic. A BGP implementation cannot rely upon the sender to | |||
sufficiently shield it from route instabilities. The guidelines here | sufficiently shield it from route instabilities. The guidelines here | |||
are designed to prevent sustained oscillations, but do not eliminate | are designed to prevent sustained oscillations, but do not eliminate | |||
the need for robust and efficient implementations. The mechanisms | the need for robust and efficient implementations. The mechanisms | |||
described here allow routing instability to be contained at an AS | described here allow routing instability to be contained at an AS | |||
border router bordering the instability. | border router bordering the instability. | |||
Even where BGP implementations are highly robust, the performance of | ||||
the routing process is limited. Limiting the propagation of | ||||
unnecessary change then becomes an issue of maintaining reasonable | ||||
route change convergence time as a routing topology grows. | ||||
2 Methods of Limiting Route Advertisement | 2 Methods of Limiting Route Advertisement | |||
Two methods of controlling the frequency of route advertisement are | Two methods of controlling the frequency of route advertisement are | |||
described here. The first involves fixed timers. The fixed timer | described here. The first involves fixed timers. The fixed timer | |||
technique has no space overhead per route but has the disadvantage of | technique has no space overhead per route but has the disadvantage of | |||
slowing route convergence for the normal case where a route does not | slowing route convergence for the normal case where a route does not | |||
have a history of instability. The second method overcomes this | have a history of instability. The second method overcomes this | |||
limitation at the expense of maintaining some additional space | limitation at the expense of maintaining some additional space | |||
overhead. The additional overhead includes a small amount of state | overhead. The additional overhead includes a small amount of state | |||
per route and a very small processing overhead. | per route and a very small processing overhead. | |||
skipping to change at page 4, line 46 | skipping to change at page 4, line 52 | |||
to advertise a route that is different from the route that is being | to advertise a route that is different from the route that is being | |||
used, except for a very minimal time. It is more desirable to | used, except for a very minimal time. It is more desirable to | |||
suppress the acceptance of a route (and therefore the use of that | suppress the acceptance of a route (and therefore the use of that | |||
route in the IGP) rather than suppress only the redistribution. | route in the IGP) rather than suppress only the redistribution. | |||
It is clearly not possible to accurately predict the future stability | It is clearly not possible to accurately predict the future stability | |||
of a route. The recent history of stability is generally regarded as | of a route. The recent history of stability is generally regarded as | |||
a good basis for estimating the likelihood of future stability. The | a good basis for estimating the likelihood of future stability. The | |||
criteria that is used to distinguish well behaved from poorly behaved | criteria that is used to distinguish well behaved from poorly behaved | |||
routes is therefore based on the recent history of stability of the | routes is therefore based on the recent history of stability of the | |||
route. There is no simple direct quantitative expression of recent | route. There is no simple quantitative expression of recent stability | |||
stability so a figure of merit must be defined. Some desirable | so a figure of merit must be defined. Some desirable characteristics | |||
characteristics of this figure of merit would be that the farther in | of this figure of merit would be that the farther in the past that | |||
the past that instability occurred, the less it's affect on the figure | instability occurred, the less it's affect on the figure of merit and | |||
of merit and that the instability measure would be cumulative rather | that the instability measure would be cumulative rather than | |||
than reflecting only the most recent event. | reflecting only the most recent event. | |||
The algorithms should behave such that for routes which have a history | The algorithms should behave such that for routes which have a history | |||
of stability but make a few transitions, those transitions should be | of stability but make a few transitions, those transitions should be | |||
made quickly. If transitions continue, advertisement of the route | made quickly. If transitions continue, advertisement of the route | |||
should be suppressed. There should be some memory of prior instabil- | should be suppressed. There should be some memory of prior instabil- | |||
ity. The degree to which prior instability is considered should be | ity. The degree to which prior instability is considered should be | |||
gradually reduced as long as the route remains announced and stable. | gradually reduced as long as the route remains announced and stable. | |||
2.3 Design Choices | 2.3 Design Choices | |||
After routes have been accepted their readvertisement will be briefly | After routes have been accepted their readvertisement will be briefly | |||
suppressed to improve packing of updates. There may be a lengthy | suppressed to improve packing of updates. There may be a lengthy | |||
suppression of the acceptance of an external route. How long a route | suppression of the acceptance of an external route. How long a route | |||
will be suppressed is based on a figure of merit that is expected to | will be suppressed is based on a figure of merit that is expected to | |||
be loosely correlated to the probability of future instability of a | be correlated to the probability of future instability of a route. | |||
route. Routes with high figure of merit values will be suppressed. | Routes with high figure of merit values will be suppressed. An | |||
An exponential decay algorithm was chosen as the basis for reducing | exponential decay algorithm was chosen as the basis for reducing the | |||
the figure of merit over time. These choices should be viewed as | figure of merit over time. These choices should be viewed as | |||
suggestions for implementation. | suggestions for implementation. | |||
An exponential decay function has the property that previous | An exponential decay function has the property that previous | |||
instability can be remembered for a fairly long time. The rate at | instability can be remembered for a fairly long time. The rate at | |||
which the instability figure of merit decays slows as time goes on. | which the instability figure of merit decays slows as time goes on. | |||
Exponential decay is a transitive function. | Exponential decay has the following property. | |||
f(f(figure-of-merit, t1), t2) = f(figure-of-merit, t1+t2) | f(f(figure-of-merit, t1), t2) = f(figure-of-merit, t1+t2) | |||
This transitive property allows the decay for a long period to be | This property allows the decay for a long period to be computed in a | |||
computed in a single operation regardless of the current value | single operation regardless of the current value (figure-of-merit). | |||
(figure-of-merit). As a performance optimization, the decay can be | As a performance optimization, the decay can be applied in fixed time | |||
applied in fixed time increments. Given a desired decay half life, | increments. Given a desired decay half life, the decay for a single | |||
the decay for a single time increment can be computed ahead of time. | time increment can be computed ahead of time. The decay for multiple | |||
The decay for multiple time increments is expressed below. | time increments is expressed below. | |||
f(figure-of-merit, n * t0) = f(figure-of-merit, t0) ** n = K | f(figure-of-merit, n*t0) = f(figure-of-merit, t0)**n = K**n | |||
** n | ||||
The values of K ** n can be precomputed for a reasonable number of | The values of K ** n can be precomputed for a reasonable number of | |||
``n'' and stored in an array. The value of ``K'' is always less than | ``n'' and stored in an array. The value of ``K'' is always less than | |||
one. The array size can be bounded since the value quickly approaches | one. The array size can be bounded since the value quickly approaches | |||
zero. This makes the decay easy to compute using an array bound | zero. This makes the decay easy to compute using an array bound | |||
check, an array lookup and a single multiply regardless as to how much | check, an array lookup and a single multiply regardless as to how much | |||
time has elapsed. | time has elapsed. | |||
3 Limiting Route Advertisements using Fixed Timers | 3 Limiting Route Advertisements using Fixed Timers | |||
skipping to change at page 6, line 23 | skipping to change at page 6, line 27 | |||
equal to the bound for a reachable advertisement. | equal to the bound for a reachable advertisement. | |||
Routes that need to be readvertised can be marked in the RIB or an | Routes that need to be readvertised can be marked in the RIB or an | |||
external set of structures maintained, which references the RIB. | external set of structures maintained, which references the RIB. | |||
Periodically, a subset of the marked routes can be flushed. This is | Periodically, a subset of the marked routes can be flushed. This is | |||
fairly straightforward and accomplishes the objectives. Computation | fairly straightforward and accomplishes the objectives. Computation | |||
for too simple an implementation may be order N squared. To avoid N | for too simple an implementation may be order N squared. To avoid N | |||
squared performance, some form of data structure is needed to group | squared performance, some form of data structure is needed to group | |||
routes with common attributes. | routes with common attributes. | |||
Any implementation should packs updates efficiently, provide a minimum | An implementation should pack updates efficiently, provide a minimum | |||
readvertisement delay, provide a bounds on the maximum readvertisement | readvertisement delay, provide a bounds on the maximum readvertisement | |||
delay that would be experienced solely as a result of the algorithm | delay that would be experienced solely as a result of the algorithm | |||
used to provide a minimum delay, and must be computationally efficient | used to provide a minimum delay, and must be computationally efficient | |||
in the presence of a very large number of candidates for | in the presence of a very large number of candidates for | |||
readvertisement. | readvertisement. | |||
4 Stability Sensitive Suppression of Route Advertisement | 4 Stability Sensitive Suppression of Route Advertisement | |||
This method of limiting route advertisements uses a measure of route | This method of limiting route advertisements uses a measure of route | |||
stability applied on a per route basis. This technique is applied | stability applied on a per route basis. This technique is applied | |||
skipping to change at page 6, line 51 | skipping to change at page 7, line 9 | |||
suppressed. Each time a route is withdrawn, the figure of merit is | suppressed. Each time a route is withdrawn, the figure of merit is | |||
incremented. While the route is not changing the figure of merit | incremented. While the route is not changing the figure of merit | |||
value is decayed exponentially with separate decay rates depending on | value is decayed exponentially with separate decay rates depending on | |||
whether the route is stable and reachable or has been stable and | whether the route is stable and reachable or has been stable and | |||
unreachable. The decay rate may be slower when the route is unreach- | unreachable. The decay rate may be slower when the route is unreach- | |||
able, or the stability figure of merit could remain fixed (not decay | able, or the stability figure of merit could remain fixed (not decay | |||
at all) while the route remains unreachable. Whether to decay un- | at all) while the route remains unreachable. Whether to decay un- | |||
reachable routes at the same rate, a slower rate, or not at all is an im- | reachable routes at the same rate, a slower rate, or not at all is an im- | |||
plementation choice. Decaying at a slower rate is recommended. | plementation choice. Decaying at a slower rate is recommended. | |||
An very efficient implementation is suggested in the following | A very efficient implementation is suggested in the following | |||
sections. The implementation only requires computation for the routes | sections. The implementation only requires computation for the routes | |||
contained in an update, when an update is received or withdrawn (as | contained in an update, when an update is received or withdrawn (as | |||
opposed to the simplistic approach of periodically decaying each | opposed to the simplistic approach of periodically decaying each | |||
route). The suggested implementation involves only a small number of | route). The suggested implementation involves only a small number of | |||
simple operations, and can be implemented using scaled integers. | simple operations, and can be implemented using scaled integers. | |||
The behavior of unstable routes is believed to be fairly predictable. | The behavior of unstable routes is fairly predictable. Severely | |||
Severely flapping routes will often be advertised and withdrawn at | flapping routes will often be advertised and withdrawn at regular time | |||
regular time intervals corresponding to the timers of a particular | intervals corresponding to the timers of a particular protocol (the | |||
protocol (the IGP or exterior protocol in use where the problem | IGP or exterior protocol in use where the problem exists). Marginal | |||
exists). Marginal circuits or mild congestion can result in a long | circuits or mild congestion can result in a long term pattern of | |||
term pattern of occasional brief route withdrawal or occasional brief | occasional brief route withdrawal or occasional brief connectivity. | |||
connectivity. | ||||
4.1 Single vs. Multiple Configuration Parameter Sets | 4.1 Single vs. Multiple Configuration Parameter Sets | |||
The behavior of the algorithm is modified by a number of configurable | The behavior of the algorithm is modified by a number of configurable | |||
parameters. It is possible to configure separate sets of parameters | parameters. It is possible to configure separate sets of parameters | |||
designed to handle short term severe route flap and chronic milder | designed to handle short term severe route flap and chronic milder | |||
route flap (a pattern of occasional drops over a long time period). | route flap (a pattern of occasional drops over a long time period). | |||
The former would require a fast decay and low threshold (allowing a | The former would require a fast decay and low threshold (allowing a | |||
small number of consecutive flaps to cause a route to be suppressed, | small number of consecutive flaps to cause a route to be suppressed, | |||
but allowing it to be reused after a relatively short period of | but allowing it to be reused after a relatively short period of | |||
skipping to change at page 9, line 49 | skipping to change at page 10, line 5 | |||
This is the time value corresponding to the last reuse list. This | This is the time value corresponding to the last reuse list. This | |||
may be the maximum value of T-hold for all parameter sets of may be | may be the maximum value of T-hold for all parameter sets of may be | |||
configured. | configured. | |||
number of reuse lists (reuse-list-size) | number of reuse lists (reuse-list-size) | |||
This is the number of reuse lists. It may be determined from | This is the number of reuse lists. It may be determined from | |||
reuse-list-max or set explicitly. | reuse-list-max or set explicitly. | |||
A necessary optimization is described in Section 4.8.6 that involves | FIGURES ARE FOUND IN THE POSTSCRIPT AND HTML VERSIONS ONLY | |||
an array referred to as the ``reuse index array''. A reuse index | ||||
Figure 1: Instability figure of merit for flap at a constant rate | Figure 1: Instability figure of merit for flap at a constant rate | |||
A necessary optimization is described in Section 4.8.6 that involves | ||||
an array referred to as the ``reuse index array''. A reuse index | ||||
array is needed for each decay rate in use. The reuse index array is | array is needed for each decay rate in use. The reuse index array is | |||
used to estimate which reuse list to place a route when it is | used to estimate which reuse list to place a route when it is | |||
suppressed. Proper placement avoids the need to periodically evaluate | suppressed. Proper placement avoids the need to periodically evaluate | |||
decay to determine if a route can be reused. Using the reuse index | decay to determine if a route can be reused. Using the reuse index | |||
array avoids the need to compute a logarithm to determine placement. | array avoids the need to compute a logarithm to determine placement. | |||
One additional system wide parameter can be introduced. | One additional system wide parameter can be introduced. | |||
reuse index array size (reuse-index-array-size) | reuse index array size (reuse-index-array-size) | |||
This is the size of reuse index arrays. This size determines the | This is the size of reuse index arrays. This size determines the | |||
accuracy with which suppressed routes can be placed within the set | accuracy with which suppressed routes can be placed within the set | |||
of reuse lists when suppressed for a long time. | of reuse lists when suppressed for a long time. | |||
4.3 Guidelines for Setting Parameters | 4.3 Guidelines for Setting Parameters | |||
The decay half life should be set to a time considerably longer than | The decay half life should be set to a time considerably longer than | |||
the period of the route flap it is intended to address. If for | the period of the route flap it is intended to address. For example, | |||
example, the decay is set to ten minutes and a route is withdrawn and | if the decay is set to ten minutes and a route is withdrawn and | |||
readvertised exactly every ten minutes, the route would continue to | readvertised exactly every ten minutes, the route would continue to | |||
flap if the cutoff was set to a value of 2 or above. | flap if the cutoff was set to a value of 2 or above. | |||
The stability figure of merit itself is an accumulated time decayed | The stability figure of merit itself is an accumulated time decayed | |||
total. This must be kept in mind in setting the decay time, cutoff | total. This must be kept in mind in setting the decay time, cutoff | |||
values and reuse values. For example, if a route flaps at four times | values and reuse values. For example, if a route flaps at four times | |||
the decay rate, it will reach 3 in 4 cycles, 4 in 6 cycles, 5 in 10 | the decay rate, it will reach 3 in 4 cycles, 4 in 6 cycles, 5 in 10 | |||
cycles, and will converge at about 6.3. At twice the decay time, it | cycles, and will converge at about 6.3. At twice the decay time, it | |||
will reach 3 in 7 cycles, and converge at a value of less than 3.5. | will reach 3 in 7 cycles, and converge at a value of less than 3.5. | |||
skipping to change at page 10, line 45 | skipping to change at page 11, line 5 | |||
constant rate. The time axis is labeled in multiples of the decay | constant rate. The time axis is labeled in multiples of the decay | |||
half life. The plots represent route flap with a period of 1/2, 1/3, | half life. The plots represent route flap with a period of 1/2, 1/3, | |||
1/4, and 1/8 times the decay half life. A ceiling of 4.5 was set, | 1/4, and 1/8 times the decay half life. A ceiling of 4.5 was set, | |||
which can be seen to affect three of the plots, effectively limiting | which can be seen to affect three of the plots, effectively limiting | |||
the time it takes to readvertise the route regardless of the prior | the time it takes to readvertise the route regardless of the prior | |||
history. With the cutoff and reuse thresholds suggested by the dotted | history. With the cutoff and reuse thresholds suggested by the dotted | |||
lines, routes would be suppressed after being declared unreachable 2-3 | lines, routes would be suppressed after being declared unreachable 2-3 | |||
times and be used again after approximately 2 decay half life periods | times and be used again after approximately 2 decay half life periods | |||
of stability. | of stability. | |||
FIGURES ARE FOUND IN THE POSTSCRIPT AND HTML VERSIONS ONLY | ||||
Figure 2: Separate decay constants when unreachable | ||||
From either maximum hold time value (Tmax-ok or Tmax-ng), a ratio of | From either maximum hold time value (Tmax-ok or Tmax-ng), a ratio of | |||
the cutoff to a ceiling can be determined. An integer value for the | the cutoff to a ceiling can be determined. An integer value for the | |||
ceiling can then be chosen such that overflow will not be a problem | ceiling can then be chosen such that overflow will not be a problem | |||
and all other values can be scaled accordingly. If both cutoffs are | and all other values can be scaled accordingly. If both cutoffs are | |||
specified or if multiple parameter sets are used the highest ceiling | specified or if multiple parameter sets are used the highest ceiling | |||
Figure 2: Separate decay constants when unreachable | ||||
will be used. | will be used. | |||
Figure 2 show the effect of configuring separate decay rates to be | Figure 2 show the effect of configuring separate decay rates to be | |||
used when the route is reachable or unreachable. The decay rate is | used when the route is reachable or unreachable. The decay rate is | |||
5 times slower when the route is unreachable. In the three case | 5 times slower when the route is unreachable. In the three case | |||
shown, the period of the route flap is equal to the decay half life | shown, the period of the route flap is equal to the decay half life | |||
but the route is reachable 1/8 of the time in one, reachable 1/2 the | but the route is reachable 1/8 of the time in one, reachable 1/2 the | |||
time in one, and reachable 7/8 of the time in the other. In the last | time in one, and reachable 7/8 of the time in the other. In the last | |||
case the route is not suppressed until after the third unreachable | case the route is not suppressed until after the third unreachable | |||
(when it is above the top threshold after becoming reachable again). | (when it is above the top threshold after becoming reachable again). | |||
skipping to change at page 13, line 38 | skipping to change at page 14, line 5 | |||
Implementations may chose to make the array size shorter and multiply | Implementations may chose to make the array size shorter and multiply | |||
more than once when decaying a long time interval to reduce storage. | more than once when decaying a long time interval to reduce storage. | |||
The reuse index arrays serve a similar purpose to the decay arrays. | The reuse index arrays serve a similar purpose to the decay arrays. | |||
The amount of time until a route can be reused can be determined using | The amount of time until a route can be reused can be determined using | |||
a array lookup. The array can be built given the decay rate. The | a array lookup. The array can be built given the decay rate. The | |||
array is indexed using a scaled integer proportional to the ratio | array is indexed using a scaled integer proportional to the ratio | |||
between a current stability figure of merit value and the value needed | between a current stability figure of merit value and the value needed | |||
for the route to be reused. | for the route to be reused. | |||
4.4.3 Data Structures per Route | 4.4.3 Per Route State | |||
Information must be maintained per some tuple representing a route. | ||||
At the very minimum, the NLRI (BGP prefix and length) must be | ||||
contained in the tuple. Different BGP attributes may be included or | ||||
excluded depending on the specific situation. The AS path should also | ||||
be contained in the tuple be default. The tuple may also optionally | ||||
contain other BGP attributes such as MULTI_EXIT_DISCRIMINATOR (MED). | ||||
The tuple representing a route for the purpose of route flap damping | ||||
is: | ||||
tuple entry default options | ||||
------------------------------------------- | ||||
NLRI | ||||
prefix required | ||||
length required | ||||
AS path included option to exclude | ||||
last AS set in path excluded option to include | ||||
next hop excluded option to include | ||||
MED excluded option to include | ||||
in comparisons only | ||||
The AS path is generally included in order to identify downstream | ||||
instability which is not being damped or not being sufficiently damped | ||||
and is alternating between a stable and an unstable path. Under rare | ||||
circumstances it may be desirable to exclude AS path for all or a | ||||
subset of prefixes. If an AS path ends in an AS set, in practice the | ||||
path is always for an aggregate. Changes to the trailing AS set | ||||
should be ignored. Ideally the AS path comparison should insure that | ||||
at least one AS has remained constant in the old and new AS set, but | ||||
completely ignoring the contents of a trailing AS set is also | ||||
acceptable. | ||||
Including next hop and MED changes can help suppress the use of an AS | ||||
which is internally unstable or avoid a next hop which is closer to an | ||||
unstable IGP path in the adjacent AS. If a large number of MED values | ||||
are used, the increase in the amount of state may become a problem. | ||||
For this reason MED is disabled by default and enabled only as part of | ||||
the tuple comparison, using a single state entry regardless of MED | ||||
value. Including MED will suppress the use of the adjacent AS even | ||||
though the change need not be propagated further. Using MED is only a | ||||
safe practice if a path is known to exist through another AS or where | ||||
there are enough peering sites with the adjacent AS such that routes | ||||
heard at only a subset of the peering sites will be suppressed. | ||||
4.4.4 Data Structures per Route | ||||
The following information must be maintained per route. A route here | The following information must be maintained per route. A route here | |||
is considered to be a tuple containing at least NLRI prefix, next hop, | is considered to be a tuple usually containing NLRI, next hop, and AS | |||
and AS path. The tuple may also contain other BGP attributes such as | path as defined in Section 4.4.3. | |||
MULTI_EXIT_DISCRIMINATOR (MED). | ||||
stability figure of merit (figure-of-merit) | stability figure of merit (figure-of-merit) | |||
Each route must have a stability figure of merit per applicable | Each route must have a stability figure of merit per applicable | |||
parameter set. | parameter set. | |||
last time updated (time-update) | last time updated (time-update) | |||
The exact last time updated must be maintained to allow exponential | The exact last time updated must be maintained to allow exponential | |||
decay of the accumulated figure of merit to be deferred until the | decay of the accumulated figure of merit to be deferred until the | |||
route might reasonable be considered eligible for a change in | route might reasonable be considered eligible for a change in | |||
status (having gone from unreachable to reachable or advancing | status (having gone from unreachable to reachable or advancing | |||
within the reuse lists). | within the reuse lists). | |||
config block pointer | config block pointer | |||
Any implementation that supports multiple parameter sets must | Any implementation that supports multiple parameter sets must | |||
provide a means of quickly identifying which set of parameters | provide a means of quickly identifying which set of parameters | |||
skipping to change at page 17, line 22 | skipping to change at page 18, line 45 | |||
This example is used in later sections. The use of multiple parameter | This example is used in later sections. The use of multiple parameter | |||
sets complicates the examples somewhat. Where multiple parameter sets | sets complicates the examples somewhat. Where multiple parameter sets | |||
are allowed for a single route, the decay portion of the algorithm is | are allowed for a single route, the decay portion of the algorithm is | |||
repeated for each parameter set. If different routes are allowed to | repeated for each parameter set. If different routes are allowed to | |||
have different parameter sets, the routes must have pointers to the | have different parameter sets, the routes must have pointers to the | |||
parameter sets to keep the time to locate to a minimum, but the | parameter sets to keep the time to locate to a minimum, but the | |||
algorithms are otherwise unchanged. | algorithms are otherwise unchanged. | |||
A sample set of configuration parameters and a sample set of | A sample set of configuration parameters and a sample set of | |||
implementation parameters are provided in in the two following list. | implementation parameters are provided in in the two following lists. | |||
1. Configuration Parameters | 1. Configuration Parameters | |||
o cut = 1.25 | o cut = 1.25 | |||
o reuse = 0.5 | o reuse = 0.5 | |||
o T-hold = 15 mins | o T-hold = 15 mins | |||
o decay-ok = 5 min | o decay-ok = 5 min | |||
skipping to change at page 18, line 4 | skipping to change at page 19, line 31 | |||
o delta-t = 1 sec | o delta-t = 1 sec | |||
o delta-reuse | o delta-reuse | |||
o reuse-list-size = 256 | o reuse-list-size = 256 | |||
o reuse-index-array-size = 1,024 | o reuse-index-array-size = 1,024 | |||
Using these configuration and implementation parameters and the | Using these configuration and implementation parameters and the | |||
equations in Section 4.5, the space overhead can be computed. There | equations in Section 4.5, the space overhead can be computed. There | |||
Figure 3: Some fairly long route flap cycles, repeated for 12 | ||||
minutes, followed by a period of stability. | ||||
is a fixed space overhead that is independent of the number of routes. | is a fixed space overhead that is independent of the number of routes. | |||
There is an space requirement associated with a stable route. There | There is a space requirement associated with a stable route. There is | |||
is a larger space requirement associated with an unstable route. The | a larger space requirement associated with an unstable route. The | |||
space requirements for the parameters above are provide in the lists | space requirements for the parameters above are provide in the lists | |||
below. | below. | |||
1. fixed overhead (using parameters from previous example) | 1. fixed overhead (using parameters from previous example) | |||
o 900 * integer - decay array | o 900 * integer - decay array | |||
o 1,800 * integer - decay array | o 1,800 * integer - decay array | |||
o 120 * pointer - reuse list-heads | o 120 * pointer - reuse list-heads | |||
o 2,048 * integer - reuse index arrays | o 2,048 * integer - reuse index arrays | |||
2. overhead per stable route | 2. overhead per stable route | |||
o pointer - containing null entry | o pointer - containing null entry | |||
FIGURES ARE FOUND IN THE POSTSCRIPT AND HTML VERSIONS ONLY | ||||
Figure 3: Some fairly long route flap cycles, repeated for 12 | ||||
minutes, followed by a period of stability. | ||||
3. overhead per unstable route | 3. overhead per unstable route | |||
o pointer - to a damping structure containing the following | o pointer - to a damping structure containing the following | |||
o integer - figure of merit + bit for state | o integer - figure of merit + bit for state | |||
o integer - last time updated | o integer - last time updated | |||
o pointer (optional) to configuration parameter block | o pointer (optional) to configuration parameter block | |||
skipping to change at page 19, line 32 | skipping to change at page 21, line 14 | |||
2. A route becomes unreachable (Section 4.8.2) | 2. A route becomes unreachable (Section 4.8.2) | |||
3. A route becomes reachable again (Section 4.8.3) | 3. A route becomes reachable again (Section 4.8.3) | |||
4. A route changes (Section 4.8.4) | 4. A route changes (Section 4.8.4) | |||
5. A peer goes down (Section 4.8.5) | 5. A peer goes down (Section 4.8.5) | |||
The reuse list is used to provide a means of fast evaluation of route | The reuse list is used to provide a means of fast evaluation of route | |||
that had been suppressed, but had been stable long enough to be reused | that had been suppressed, but had been stable long enough to be reused | |||
again. The following two operations are described. | again or had been suppressed long enough that it can be treated as a | |||
new route. The following two operations are described. | ||||
1. Inserting into a reuse list (Section 4.8.6) | 1. Inserting into a reuse list (Section 4.8.6) | |||
2. Reuse list processing every delta-t seconds (Section 4.8.7) | 2. Reuse list processing every delta-t seconds (Section 4.8.7) | |||
4.8.1 Processing a New Peer or New Routes | 4.8.1 Processing a New Peer or New Routes | |||
When a peer comes up, no action is required if the routes had no | When a peer comes up, no action is required if the routes had no | |||
previous history of instability, for example if this is the first time | previous history of instability, for example if this is the first time | |||
the peer is coming up and announcing these routes. For each route, | the peer is coming up and announcing these routes. For each route, | |||
skipping to change at page 22, line 20 | skipping to change at page 23, line 47 | |||
} | } | |||
3. if ( not suppressed and figure-of-merit < cut ) { | 3. if ( not suppressed and figure-of-merit < cut ) { | |||
usethe route | usethe route | |||
}else if( suppressedand figure-of-merit< reuse) { | }else if( suppressedand figure-of-merit< reuse) { | |||
setstate tonot suppressed | setstate tonot suppressed | |||
removethe routefrom areuse listif itis on one | remove the route from a reuse list | |||
usethe route | usethe route | |||
}else { | }else { | |||
setstate tosuppressed | setstate tosuppressed | |||
don'tuse theroute | don'tuse theroute | |||
insertinto areuse list(see Section4.8.6) | insertinto areuse list(see Section4.8.6) | |||
skipping to change at page 22, line 51 | skipping to change at page 24, line 33 | |||
zeropointer todamping struct | zeropointer todamping struct | |||
} | } | |||
If the route is deemed usable, a search for the current best route | If the route is deemed usable, a search for the current best route | |||
must be made. The newly reachable route is then evaluated according | must be made. The newly reachable route is then evaluated according | |||
to the BGP protocol rules for route selection. | to the BGP protocol rules for route selection. | |||
If the new route is usable, the previous best route is examined. | If the new route is usable, the previous best route is examined. | |||
Prior to coute comparisons, the current best route may have to be | Prior to route comparisons, the current best route may have to be | |||
reevaluated if separate parameter sets are used depending on the | reevaluated if separate parameter sets are used depending on the | |||
presence or absence of an alternate route. If there had been no | presence or absence of an alternate route. If there had been no | |||
alternate the previous best route may be suppressed. | alternate the previous best route may be suppressed. | |||
If the new route is to be suppressed it is placed on a reuse list only | If the new route is to be suppressed it is placed on a reuse list only | |||
if it would have been preferred to the current best route had the new | if it would have been preferred to the current best route had the new | |||
route been accepted as stable. There is no reason to queue a route on | route been accepted as stable. There is no reason to queue a route on | |||
a reuse list if after the route becomes usable it would not be used | a reuse list if after the route becomes usable it would not be used | |||
anyway due to the existence of a more preferred route. Such a route | anyway due to the existence of a more preferred route. Such a route | |||
would not have to be reevaluated unless the preferred route became | would not have to be reevaluated unless the preferred route became | |||
skipping to change at page 23, line 27 | skipping to change at page 25, line 11 | |||
If a route is replaced by a peer router by supplying a new path, the | If a route is replaced by a peer router by supplying a new path, the | |||
route that is being replaced should be treated as if an unreachable | route that is being replaced should be treated as if an unreachable | |||
were received (see Section 4.8.2). This will occur when a peer | were received (see Section 4.8.2). This will occur when a peer | |||
somewhere back in the AS path is continuously switching between two AS | somewhere back in the AS path is continuously switching between two AS | |||
paths and that peer is not damping route flap (or applying less | paths and that peer is not damping route flap (or applying less | |||
damping). There is no way to determine if one AS path is stable and | damping). There is no way to determine if one AS path is stable and | |||
the other is flapping, or if they are both flapping. If the cycle is | the other is flapping, or if they are both flapping. If the cycle is | |||
sufficiently short compared to convergence times neither route through | sufficiently short compared to convergence times neither route through | |||
that peer will deliver packets very reliably. Since there is no way | that peer will deliver packets very reliably. Since there is no way | |||
to affect the peer such that it choses the stable of the two AS paths, | to affect the peer such that it chooses the stable of the two AS | |||
the only viable option is to penalize both routes by considering each change | paths, the only viable option is to penalize both routes by considering | |||
as an unreachable followed by a route advertisement. | each change as an unreachable followed by a route advertisement. | |||
4.8.5 Processing A Peer Router Loss | 4.8.5 Processing A Peer Router Loss | |||
When a peer routing session is broken, either all individual routes | When a peer routing session is broken, either all individual routes | |||
advertised by that peer may be marked as unstable, or the peering | advertised by that peer may be marked as unstable, or the peering | |||
session itself may be marked as unstable. Marking the peer will save | session itself may be marked as unstable. Marking the peer will save | |||
considerable memory. Since the individual routes are advertised as | considerable memory. Since the individual routes are advertised as | |||
unreachable to routers beyond the immediate problem, per route state | unreachable to routers beyond the immediate problem, per route state | |||
will be incurred beyond the peer immediately adjacent to the BGP | will be incurred beyond the peer immediately adjacent to the BGP | |||
session that went down. If the instability continues, the immediately | session that went down. If the instability continues, the immediately | |||
skipping to change at page 25, line 14 | skipping to change at page 26, line 40 | |||
4.8.7 Handling Reuse Timer Events | 4.8.7 Handling Reuse Timer Events | |||
The granularity of the reuse timer should be more course that that of | The granularity of the reuse timer should be more course that that of | |||
the decay timer. As a result, when the reuse timer fires, suppressed | the decay timer. As a result, when the reuse timer fires, suppressed | |||
routes should be decayed by multiple increments of decay time. Some | routes should be decayed by multiple increments of decay time. Some | |||
computation can be avoided by always inserting into the reuse list | computation can be avoided by always inserting into the reuse list | |||
corresponding to one time increment past reuse eligibility. In cases | corresponding to one time increment past reuse eligibility. In cases | |||
where the reuse lists have a longer ``memory'' than the ``decay | where the reuse lists have a longer ``memory'' than the ``decay | |||
memory'' (described above), all of the routes in the first queue will | memory'' (described above), all of the routes in the first queue will | |||
be available for immediate reuse. | be available for immediate reuse if reachable or the history entry | |||
could be disposed of if unreachable. | ||||
When it is time to advance the lists, the first queue on the reuse | When it is time to advance the lists, the first queue on the reuse | |||
list must be processed and the circular queue must be rotated. Using | list must be processed and the circular queue must be rotated. Using | |||
an array and an offset as a circular array (as described in | an array and an offset as a circular array (as described in | |||
Section 4.8.6), the algorithm below is repeated every t-reuse seconds. | Section 4.8.6), the algorithm below is repeated every t-reuse seconds. | |||
1. save a pointer to the current zeroth queue head and zero the list | 1. save a pointer to the current zeroth queue head and zero the list | |||
head entry | head entry | |||
2. set offset = modulo reuse-list-size ( offset + 1 ), thereby | 2. set offset = modulo reuse-list-size ( offset + 1 ), thereby | |||
skipping to change at page 25, line 50 | skipping to change at page 27, line 32 | |||
else | else | |||
re-insertinto anotherlist (seeSection 4.8.6) | re-insertinto anotherlist (seeSection 4.8.6) | |||
} | } | |||
The value of the zeroth list head would be saved and the array entry | The value of the zeroth list head would be saved and the array entry | |||
itself zeroed. The list heads would then be advanced by incrementing | itself zeroed. The list heads would then be advanced by incrementing | |||
the offset. Starting with the saved head of the old zeroth list, each | the offset. Starting with the saved head of the old zeroth list, each | |||
route would be reevaluated and used or requeued if it were not ready | route would be reevaluated and used, disposed of entirely or requeued | |||
for reuse. If a route is used, it must be treated as if it were a new | if it were not ready for reuse. If a route is used, it must be | |||
route advertisement as described in Section 4.8.3. | treated as if it were a new route advertisement as described in | |||
Section 4.8.3. | ||||
5 Implementation Experience | 5 Implementation Experience | |||
The first implementations of ``route flap damping'' were the route | The first implementations of ``route flap damping'' were the route | |||
server daemon (rsd) coding by Ramesh Govindan (ISI) and the Cisco IOS | server daemon (rsd) coding by Ramesh Govindan (ISI) and the Cisco IOS | |||
implementation by Ravi Chandra. Both implementations first became | implementation by Ravi Chandra. Both implementations first became | |||
available in 1995 and have been used extensively. The rsd | available in 1995 and have been used extensively. The rsd | |||
implementation has been in use in route servers at the NSF funded | implementation has been in use in route servers at the NSF funded | |||
Network Access Points (NAPs) and at other major Internet | Network Access Points (NAPs) and at other major Internet | |||
interconnects. The Cisco IOS version has been in use by Internet | interconnects. The Cisco IOS version has been in use by Internet | |||
Service Providers worldwide. The rsd implementation has been | Service Providers worldwide. The rsd implementation has been | |||
integrated in releases of gated (see http://www.gated.org) and is | integrated in releases of gated (see http://www.gated.org) and is | |||
available in commercial routers using gated. | available in commercial routers using gated. | |||
There are now more than 2 years of BGP route damping deployment | ||||
experience. Some problems have occurred in deployment. So far these | ||||
are solvable by careful implementation of the algorithm and by careful | ||||
deployment. In some topologies coordinated deployment can be helpful | ||||
and in all cases disclosure of the use of route damping and the param- | ||||
eters used is highly beneficial in debugging connectivity problems. | ||||
Some of the problems have occurred due to subtle implementation | ||||
errors. Route damping should never be applied on IBGP learned routes. | ||||
To do so can open the possibility for persistent route loops. | ||||
Implementations should disallow this configuration. Penalties for | ||||
flapping should only be applied when a route is removed or replaced | ||||
and not when a route is added. If damping parameters are applied | ||||
consistently, this implementation constraint will result in a stable | ||||
secondary path being preferred over an unstable primary path due to | ||||
damping of the primary path near the source. | ||||
In topologies where multiple AS paths to a given destination exist | ||||
flapping of the primary path can result in suppression of the | ||||
secondary path. This can occur if no damping is being done near the | ||||
cause of the route flap or if damping is being applied more | ||||
aggressively by a distant AS. This problem can be solved in one of two | ||||
ways. Damping can be done near the source of the route flap and the | ||||
damping parameters can be made consistent. Alternately, a distant AS | ||||
which insists on more aggressive damping parameters can disable | ||||
penalizing routes on AS path change, penalizing routes only if they | ||||
are withdrawn completely. In order to do so, the implementation must | ||||
support this option (as described in Section 4.4.3). | ||||
Route flap should be damped near the source. Single homed | ||||
destinations can be covered by static routes. Aggregation provides | ||||
another means of damping. Providers should damp their own internal | ||||
problems, however damping on IGP link state origination is not yet | ||||
implemented by router vendors. Providers which use multiple AS within | ||||
their own topology should damp between their own AS. Providers should | ||||
damp adjacent providers AS. | ||||
Damping provides a means to limit propagation excessive route change | ||||
when connectivity is highly intermittent. Once a problem is | ||||
corrected, select damping state can be manually cleared. In order to | ||||
determine where damping may have occurred after connectivity problems, | ||||
providers should publish their damping parameters. Providers should | ||||
be willing to manually clear damping on specific prefixes or AS paths | ||||
at the request of other providers when the request is accompanied by | ||||
assurance that the problem has truly been addressed. | ||||
By damping their own routing information, providers can reduce their | ||||
own need to make requests of other providers to clear damping state | ||||
after correcting a problem. Providers should be pro-active and | ||||
monitor what prefixes and paths are suppressed in addition to | ||||
monitoring link states and BGP session state. | ||||
Acknowledgements | Acknowledgements | |||
This work and this document may not have been completed without the | This work and this document may not have been completed without the | |||
advise, comments and encouragement of Yakov Rekhter (Cisco). Dennis | advise, comments and encouragement of Yakov Rekhter (Cisco). Dennis | |||
Ferguson (MCI) provided a description of the algorithms in the gated | Ferguson (MCI) provided a description of the algorithms in the gated | |||
BGP implementation and many valuable comments and insights. David | BGP implementation and many valuable comments and insights. David | |||
Bolen (ANS) and Jordan Becker (ANS) provided valuable comments, | Bolen (ANS) and Jordan Becker (ANS) provided valuable comments, | |||
particularly regarding early simulations. At the time of this writing | particularly regarding early simulations. Over four years elapsed | |||
two implementations exists. One was led by Ramesh Govindan (ISI) for | between the initial draft presented to the BGP WG (October 1993) and | |||
the NSF Routing Arbiter project. The second was led by Ravi Chandra | this iteration. At the time of this writing there is significant | |||
(Cisco). Sean Doran (Sprintlink) and Serpil Bayraktar (ANS) were | experience with two implementations, each having been deployed since | |||
among the early independent testers of the Cisco pre-beta | 1995. One was led by Ramesh Govindan (ISI) for the NSF Routing Ar- | |||
implementation. | biter project. The second was led by Ravi Chandra (Cisco). Sean Doran | |||
(Sprintlink) and Serpil Bayraktar (ANS) were among the early independent | ||||
testers of the Cisco pre-beta implementation. Valuable comments and im- | ||||
plementation feedback were shared by many individuals on the IETF IDR WG | ||||
and the RIPE Routing Work Group and in NANOG and IEPG. | ||||
References | References | |||
[1] P. Gross and Y. Rekhter. Application of the border gateway proto- | [1] P. Gross and Y. Rekhter. Application of the border gateway proto- | |||
col in | col in | |||
the internet. Request for Comments (Draft Standard) RFC 1268, In- | the internet. Request for Comments (Draft Standard) RFC 1268, In- | |||
ternet Engineering Task Force, October 1991. (Obsoletes RFC1164); | ternet Engineering Task Force, October 1991. (Obsoletes RFC1164); | |||
(Obsoleted by RFC1655). ftp://ds.internic.net/rfc/rfc1268.txt. | (Obsoleted by RFC1655). ftp://ds.internic.net/rfc/rfc1268.txt. | |||
[2] ISO/IEC. Iso/iec 10747 - information technology - telecommunica- | [2] ISO/IEC. Iso/iec 10747 - information technology - telecommunica- | |||
tions and information exchange between systems - protocol for | tions and information exchange between systems - protocol for | |||
exchange of inter-domain routeing information among intermediate | exchange of inter-domain routeing information among intermediate | |||
systems to support forwarding of iso | systems to support forwarding of iso | |||
8473 pdus. Technical report, International Organization for Stan- | 8473 pdus. Technical report, International Organization for Stan- | |||
dardization, August 1994. ftp://merit.edu/pub/iso/idrp.ps.gz. | dardization, August 1994. ftp://merit.edu/pub/iso/idrp.ps.gz. | |||
[3] K. Lougheed and Y. Rekhter. A border gateway protocol 3 (BGP-3). | [3] K. Lougheed and Y. Rekhter. A border gateway protocol 3 (BGP-3). | |||
Request for Comments (Draft Standard) RFC 1267, In- | Request for Comments (Draft Standard) RFC 1267, In- | |||
ternet Engineering Task Force, October 1991. (Obsoletes RFC1163). | ternet Engineering Task Force, October 1991. (Obsoletes RFC1163). | |||
End of changes. | ||||
This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/ |