draft-ietf-idr-restart-13.txt | rfc4724.txt | |||
---|---|---|---|---|
Network Working Group Srihari R. Sangli | Network Working Group S. Sangli | |||
Internet Draft Yakov Rekhter | Request for Comments: 4724 E. Chen | |||
Expiration Date: January 2007 Rex Fernando | Category: Standards Track Cisco Systems | |||
John G. Scudder | R. Fernando | |||
Enke Chen | J. Scudder | |||
Y. Rekhter | ||||
Juniper Networks | ||||
January 2007 | ||||
Graceful Restart Mechanism for BGP | Graceful Restart Mechanism for BGP | |||
draft-ietf-idr-restart-13.txt | Status of This Memo | |||
Status of this Memo | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF), its areas, and its working groups. Note that | ||||
other groups may also distribute working documents as Internet- | ||||
Drafts. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | ||||
and may be updated, replaced, or obsoleted by other documents at any | ||||
time. It is inappropriate to use Internet-Drafts as reference | ||||
material or to cite them other than as "work in progress". | ||||
The list of current Internet-Drafts can be accessed at | ||||
http://www.ietf.org/ietf/1id-abstracts.txt | ||||
The list of Internet-Draft Shadow Directories can be accessed at | This document specifies an Internet standards track protocol for the | |||
http://www.ietf.org/shadow.html. | Internet community, and requests discussion and suggestions for | |||
improvements. Please refer to the current edition of the "Internet | ||||
Official Protocol Standards" (STD 1) for the standardization state | ||||
and status of this protocol. Distribution of this memo is unlimited. | ||||
IPR Disclosure Acknowledgement | Copyright Notice | |||
By submitting this Internet-Draft, each author represents that any | Copyright (C) The IETF Trust (2007). | |||
applicable patent or other IPR claims of which he or she is aware | ||||
have been or will be disclosed, and any of which he or she becomes | ||||
aware will be disclosed, in accordance with Section 6 of BCP 79. | ||||
Abstract | Abstract | |||
This document describes a mechanism for BGP that would help minimize | This document describes a mechanism for BGP that would help minimize | |||
the negative effects on routing caused by BGP restart. An End-of-RIB | the negative effects on routing caused by BGP restart. An End-of-RIB | |||
marker is specified and can be used to convey routing convergence | marker is specified and can be used to convey routing convergence | |||
information. A new BGP capability, termed "Graceful Restart | information. A new BGP capability, termed "Graceful Restart | |||
Capability", is defined which would allow a BGP speaker to express | Capability", is defined that would allow a BGP speaker to express its | |||
its ability to preserve forwarding state during BGP restart. Finally, | ability to preserve forwarding state during BGP restart. Finally, | |||
procedures are outlined for temporarily retaining routing information | procedures are outlined for temporarily retaining routing information | |||
across a TCP session termination/re-establishment. | across a TCP session termination/re-establishment. | |||
The mechanisms described in this document are applicable to all | The mechanisms described in this document are applicable to all | |||
routers, both those with the ability to preserve forwarding state | routers, both those with the ability to preserve forwarding state | |||
during BGP restart and those without (although the latter need to | during BGP restart and those without (although the latter need to | |||
implement only a subset of the mechanisms described in this | implement only a subset of the mechanisms described in this | |||
document). | document). | |||
1. Specification of Requirements | Table of Contents | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | 1. Introduction ....................................................2 | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | 1.1. Specification of Requirements ..............................2 | |||
document are to be interpreted as described in RFC2119 [RFC2119]. | 2. Marker for End-of-RIB ...........................................3 | |||
3. Graceful Restart Capability .....................................3 | ||||
4. Operation .......................................................6 | ||||
4.1. Procedures for the Restarting Speaker ......................6 | ||||
4.2. Procedures for the Receiving Speaker .......................7 | ||||
5. Changes to BGP Finite State Machine .............................9 | ||||
6. Deployment Considerations ......................................11 | ||||
7. Security Considerations ........................................12 | ||||
8. Acknowledgments ................................................13 | ||||
9. IANA Considerations ............................................13 | ||||
10. References ....................................................13 | ||||
10.1. Normative References .....................................13 | ||||
10.2. Informative References ...................................13 | ||||
2. Introduction | 1. Introduction | |||
Usually when BGP on a router restarts, all the BGP peers detect that | Usually, when BGP on a router restarts, all the BGP peers detect that | |||
the session went down, and then came up. This "down/up" transition | the session went down and then came up. This "down/up" transition | |||
results in a "routing flap" and causes BGP route re-computation, | results in a "routing flap" and causes BGP route re-computation, | |||
generation of BGP routing updates and flap the forwarding tables. It | generation of BGP routing updates, and unnecessary churn to the | |||
could spread across multiple routing domains. Such routing flaps may | forwarding tables. It could spread across multiple routing domains. | |||
create transient forwarding blackholes and/or transient forwarding | Such routing flaps may create transient forwarding blackholes and/or | |||
loops. They also consume resources on the control plane of the | transient forwarding loops. They also consume resources on the | |||
routers affected by the flap. As such they are detrimental to the | control plane of the routers affected by the flap. As such, they are | |||
overall network performance. | detrimental to the overall network performance. | |||
This document describes a mechanism for BGP that would help minimize | This document describes a mechanism for BGP that would help minimize | |||
the negative effects on routing caused by BGP restart. An End-of-RIB | the negative effects on routing caused by BGP restart. An End-of-RIB | |||
marker is specified and can be used to convey routing convergence | marker is specified and can be used to convey routing convergence | |||
information. A new BGP capability, termed "Graceful Restart | information. A new BGP capability, termed "Graceful Restart | |||
Capability", is defined which would allow a BGP speaker to express | Capability", is defined that would allow a BGP speaker to express its | |||
its ability to preserve forwarding state during BGP restart. Finally, | ability to preserve forwarding state during BGP restart. Finally, | |||
procedures are outlined for temporarily retaining routing information | procedures are outlined for temporarily retaining routing information | |||
across a TCP session termination/re-establishment. | across a TCP session termination/re-establishment. | |||
3. Marker for End-of-RIB | 1.1 Specification of Requirements | |||
An UPDATE message with no reachable NLRI and empty withdrawn NLRI is | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
specified as the End-Of-RIB Marker that can be used by a BGP speaker | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
to indicate to its peer the completion of the initial routing update | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
after the session is established. For IPv4 unicast address family, | ||||
the End-Of-RIB Marker is an UPDATE message with the minimum length | ||||
[BGP-4]. For any other address family, it is an UPDATE message that | ||||
contains only the MP_UNREACH_NLRI attribute [BGP-MP] with no | ||||
withdrawn routes for that <AFI, SAFI>. | ||||
Although the End-of-RIB Marker is specified for the purpose of BGP | 2. Marker for End-of-RIB | |||
An UPDATE message with no reachable Network Layer Reachability | ||||
Information (NLRI) and empty withdrawn NLRI is specified as the End- | ||||
of-RIB marker that can be used by a BGP speaker to indicate to its | ||||
peer the completion of the initial routing update after the session | ||||
is established. For the IPv4 unicast address family, the End-of-RIB | ||||
marker is an UPDATE message with the minimum length [BGP-4]. For any | ||||
other address family, it is an UPDATE message that contains only the | ||||
MP_UNREACH_NLRI attribute [BGP-MP] with no withdrawn routes for that | ||||
<AFI, SAFI>. | ||||
Although the End-of-RIB marker is specified for the purpose of BGP | ||||
graceful restart, it is noted that the generation of such a marker | graceful restart, it is noted that the generation of such a marker | |||
upon completion of the initial update would be useful for routing | upon completion of the initial update would be useful for routing | |||
convergence in general, and thus the practice is recommended. | convergence in general, and thus the practice is recommended. | |||
In addition, it would be beneficial for routing convergence if a BGP | In addition, it would be beneficial for routing convergence if a BGP | |||
speaker can indicate to its peer up-front that it will generate the | speaker can indicate to its peer up-front that it will generate the | |||
End-Of-RIB marker, regardless of its ability to preserve its | End-of-RIB marker, regardless of its ability to preserve its | |||
forwarding state during BGP restart. This can be accomplished using | forwarding state during BGP restart. This can be accomplished using | |||
the Graceful Restart Capability described in the next section. | the Graceful Restart Capability described in the next section. | |||
4. Graceful Restart Capability | 3. Graceful Restart Capability | |||
The Graceful Restart Capability is a new BGP capability [BGP-CAP] | The Graceful Restart Capability is a new BGP capability [BGP-CAP] | |||
that can be used by a BGP speaker to indicate its ability to preserve | that can be used by a BGP speaker to indicate its ability to preserve | |||
its forwarding state during BGP restart. It can also be used to | its forwarding state during BGP restart. It can also be used to | |||
convey to its peer its intention of generating the End-Of-RIB marker | convey to its peer its intention of generating the End-of-RIB marker | |||
upon the completion of its initial routing updates. | upon the completion of its initial routing updates. | |||
This capability is defined as follows: | This capability is defined as follows: | |||
Capability code: 64 | Capability code: 64 | |||
Capability length: variable | Capability length: variable | |||
Capability value: Consists of the "Restart Flags" field, "Restart | Capability value: Consists of the "Restart Flags" field, "Restart | |||
Time" field, and 0 to 63 of the tuples <AFI, SAFI, Flags for | Time" field, and 0 to 63 of the tuples <AFI, SAFI, Flags for | |||
skipping to change at line 153 | skipping to change at page 4, line 37 | |||
Restart Flags: | Restart Flags: | |||
This field contains bit flags related to restart. | This field contains bit flags related to restart. | |||
0 1 2 3 | 0 1 2 3 | |||
+-+-+-+-+ | +-+-+-+-+ | |||
|R|Resv.| | |R|Resv.| | |||
+-+-+-+-+ | +-+-+-+-+ | |||
The most significant bit is defined as the Restart State (R) | The most significant bit is defined as the Restart State (R) | |||
bit which can be used to avoid possible deadlock caused by | bit, which can be used to avoid possible deadlock caused by | |||
waiting for the End-of-RIB marker when multiple BGP speakers | waiting for the End-of-RIB marker when multiple BGP speakers | |||
peering with each other restart. When set (value 1), this bit | peering with each other restart. When set (value 1), this bit | |||
indicates that the BGP speaker has restarted, and its peer MUST | indicates that the BGP speaker has restarted, and its peer MUST | |||
NOT wait for the End-of-RIB marker from the speaker before | NOT wait for the End-of-RIB marker from the speaker before | |||
advertising routing information to the speaker. | advertising routing information to the speaker. | |||
The remaining bits are reserved, and MUST be set to zero by the | The remaining bits are reserved and MUST be set to zero by the | |||
sender and ignored by the receiver. | sender and ignored by the receiver. | |||
Restart Time: | Restart Time: | |||
This is the estimated time (in seconds) it will take for the | This is the estimated time (in seconds) it will take for the | |||
BGP session to be re-established after a restart. This can be | BGP session to be re-established after a restart. This can be | |||
used to speed up routing convergence by its peer in case that | used to speed up routing convergence by its peer in case that | |||
the BGP speaker does not come back after a restart. | the BGP speaker does not come back after a restart. | |||
Address Family Identifier (AFI): | Address Family Identifier (AFI), Subsequent Address Family | |||
Identifier (SAFI): | ||||
This field carries the identity of the Network Layer protocol | ||||
for which the Graceful Restart support is advertised. Presently | ||||
defined values for this field are specified in [IANA-AFI]. | ||||
Subsequent Address Family Identifier (SAFI): | ||||
This field provides additional information about the type of | The AFI and SAFI, taken in combination, indicate that Graceful | |||
the Network Layer Reachability Information carried in the | Restart is supported for routes that are advertised with the | |||
attribute. Presently defined values for this field are | same AFI and SAFI. Routes may be explicitly associated with a | |||
specified in [IANA-SAFI]. | particular AFI and SAFI using the encoding of [BGP-MP] or | |||
implicitly associated with <AFI=IPv4, SAFI=Unicast> if using | ||||
the encoding of [BGP-4]. | ||||
Flags for Address Family: | Flags for Address Family: | |||
This field contains bit flags for the <AFI, SAFI>. | This field contains bit flags relating to routes that were | |||
advertised with the given AFI and SAFI. | ||||
0 1 2 3 4 5 6 7 | 0 1 2 3 4 5 6 7 | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
|F| Reserved | | |F| Reserved | | |||
+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+ | |||
The most significant bit is defined as the Forwarding State (F) | The most significant bit is defined as the Forwarding State (F) | |||
bit which can be used to indicate if the forwarding state for | bit, which can be used to indicate whether the forwarding state | |||
the <AFI, SAFI> has indeed been preserved during the previous | for routes that were advertised with the given AFI and SAFI has | |||
BGP restart. When set (value 1), the bit indicates that the | indeed been preserved during the previous BGP restart. When | |||
forwarding state has been preserved. | set (value 1), the bit indicates that the forwarding state has | |||
been preserved. | ||||
The remaining bits are reserved, and MUST be set to zero by the | The remaining bits are reserved and MUST be set to zero by the | |||
sender and ignored by the receiver. | sender and ignored by the receiver. | |||
When a sender of this capability doesn't include any <AFI, SAFI> in | When a sender of this capability does not include any <AFI, SAFI> in | |||
the capability, it means that the sender is not capable of preserving | the capability, it means that the sender is not capable of preserving | |||
its forwarding state during BGP restart, but supports procedures for | its forwarding state during BGP restart, but supports procedures for | |||
the Receiving Speaker (as defined in Section 5.2 of this document). | the Receiving Speaker (as defined in Section 4.2 of this document). | |||
In that case the value of the "Restart Time" field advertised by the | In that case, the value of the "Restart Time" field advertised by the | |||
sender is irrelevant. | sender is irrelevant. | |||
A BGP speaker MUST NOT include more than one instance of the Graceful | A BGP speaker MUST NOT include more than one instance of the Graceful | |||
Restart Capability in the capability advertisement [BGP-CAP]. If | Restart Capability in the capability advertisement [BGP-CAP]. If | |||
more than one instance of the Graceful Restart Capability is carried | more than one instance of the Graceful Restart Capability is carried | |||
in the capability advertisement, the receiver of the advertisement | in the capability advertisement, the receiver of the advertisement | |||
MUST ignore all but the last instance of the Graceful Restart | MUST ignore all but the last instance of the Graceful Restart | |||
Capability. | Capability. | |||
Including <AFI=IPv4, SAFI=unicast> into the Graceful Restart | Including <AFI=IPv4, SAFI=unicast> in the Graceful Restart Capability | |||
Capability doesn't imply that the IPv4 unicast routing information | does not imply that the IPv4 unicast routing information should be | |||
should be carried by using the BGP Multiprotocol extensions [BGP-MP] | carried by using the BGP multiprotocol extensions [BGP-MP] -- it | |||
- it could be carried in the NLRI field of the BGP UPDATE message. | could be carried in the NLRI field of the BGP UPDATE message. | |||
5. Operation | 4. Operation | |||
A BGP speaker MAY advertise the Graceful Restart Capability for an | A BGP speaker MAY advertise the Graceful Restart Capability for an | |||
address family to its peer if it has the ability to preserve its | address family to its peer if it has the ability to preserve its | |||
forwarding state for the address family when BGP restarts. In | forwarding state for the address family when BGP restarts. In | |||
addition, even if the speaker does not have the ability to preserve | addition, even if the speaker does not have the ability to preserve | |||
its forwarding state for any address family during BGP restart, it is | its forwarding state for any address family during BGP restart, it is | |||
still recommended that the speaker advertise the Graceful Restart | still recommended that the speaker advertise the Graceful Restart | |||
Capability to its peer (as mentioned before this is done by not | Capability to its peer (as mentioned before this is done by not | |||
including any <AFI, SAFI> in the advertised capability). There are | including any <AFI, SAFI> in the advertised capability). There are | |||
two reasons for doing this. First, to indicate its intention of | two reasons for doing this. The first is to indicate its intention | |||
generating the End-of-RIB marker upon the completion of its initial | of generating the End-of-RIB marker upon the completion of its | |||
routing updates, as doing this would be useful for routing | initial routing updates, as doing this would be useful for routing | |||
convergence in general. Second, to indicate its support for a peer | convergence in general. The second is to indicate its support for a | |||
which wishes to perform a graceful restart. | peer which wishes to perform a graceful restart. | |||
The End-of-RIB marker MUST be sent by a BGP speaker to its peer once | The End-of-RIB marker MUST be sent by a BGP speaker to its peer once | |||
it completes the initial routing update (including the case when | it completes the initial routing update (including the case when | |||
there is no update to send) for an address family after the BGP | there is no update to send) for an address family after the BGP | |||
session is established. | session is established. | |||
It is noted that the normal BGP procedures MUST be followed when the | It is noted that the normal BGP procedures MUST be followed when the | |||
TCP session terminates due to the sending or receiving of a BGP | TCP session terminates due to the sending or receiving of a BGP | |||
NOTIFICATION message. | NOTIFICATION message. | |||
skipping to change at line 259 | skipping to change at page 6, line 44 | |||
whose BGP has restarted, and "Receiving Speaker" refers to a router | whose BGP has restarted, and "Receiving Speaker" refers to a router | |||
that peers with the restarting speaker. | that peers with the restarting speaker. | |||
Consider that the Graceful Restart Capability for an address family | Consider that the Graceful Restart Capability for an address family | |||
is advertised by the Restarting Speaker, and is understood by the | is advertised by the Restarting Speaker, and is understood by the | |||
Receiving Speaker, and a BGP session between them is established. | Receiving Speaker, and a BGP session between them is established. | |||
The following sections detail the procedures that MUST be followed by | The following sections detail the procedures that MUST be followed by | |||
the Restarting Speaker as well as the Receiving Speaker once the | the Restarting Speaker as well as the Receiving Speaker once the | |||
Restarting Speaker restarts. | Restarting Speaker restarts. | |||
5.1. Procedures for the Restarting Speaker | 4.1. Procedures for the Restarting Speaker | |||
When the Restarting Speaker restarts, it MUST retain, if possible, | When the Restarting Speaker restarts, it MUST retain, if possible, | |||
the forwarding state for the BGP routes in the Loc-RIB, and MUST mark | the forwarding state for the BGP routes in the Loc-RIB and MUST mark | |||
them as stale. It MUST NOT differentiate between stale and other | them as stale. It MUST NOT differentiate between stale and other | |||
information during forwarding. | information during forwarding. | |||
To re-establish the session with its peer, the Restarting Speaker | To re-establish the session with its peer, the Restarting Speaker | |||
MUST set the "Restart State" bit in the Graceful Restart Capability | MUST set the "Restart State" bit in the Graceful Restart Capability | |||
of the OPEN message. Unless allowed via configuration, the | of the OPEN message. Unless allowed via configuration, the | |||
"Forwarding State" bit for an address family in the capability can be | "Forwarding State" bit for an address family in the capability can be | |||
set only if the forwarding state has indeed been preserved for that | set only if the forwarding state has indeed been preserved for that | |||
address family during the restart. | address family during the restart. | |||
Once the session between the Restarting Speaker and the Receiving | Once the session between the Restarting Speaker and the Receiving | |||
Speaker is re-established, the Restarting Speaker will receive and | Speaker is re-established, the Restarting Speaker will receive and | |||
process BGP messages from its peers. However, it MUST defer route | process BGP messages from its peers. However, it MUST defer route | |||
selection for an address family until it either (a) receives the End- | selection for an address family until it either (a) receives the | |||
of-RIB marker from all its peers (excluding the ones with the | End-of-RIB marker from all its peers (excluding the ones with the | |||
"Restart State" bit set in the received capability and excluding the | "Restart State" bit set in the received capability and excluding the | |||
ones which do not advertise the graceful restart capability) or (b) | ones that do not advertise the graceful restart capability) or (b) | |||
the Selection_Deferral_Timer referred to below has expired. It is | the Selection_Deferral_Timer referred to below has expired. It is | |||
noted that prior to route selection, the speaker has no routes to | noted that prior to route selection, the speaker has no routes to | |||
advertise to its peers and no routes to update the forwarding state. | advertise to its peers and no routes to update the forwarding state. | |||
In situations where both IGP and BGP have restarted, it might be | In situations where both Interior Gateway Protocol (IGP) and BGP have | |||
advantageous to wait for IGP to converge before the BGP speaker | restarted, it might be advantageous to wait for IGP to converge | |||
performs route selection. | before the BGP speaker performs route selection. | |||
After the BGP speaker performs route selection, the forwarding state | After the BGP speaker performs route selection, the forwarding state | |||
of the speaker MUST be updated and any previously marked stale | of the speaker MUST be updated and any previously marked stale | |||
information MUST be removed. The Adj-RIB-Out can then be advertised | information MUST be removed. The Adj-RIB-Out can then be advertised | |||
to its peers. Once the initial update is complete for an address | to its peers. Once the initial update is complete for an address | |||
family (including the case that there is no routing update to send), | family (including the case that there is no routing update to send), | |||
the End-of-RIB marker MUST be sent. | the End-of-RIB marker MUST be sent. | |||
To put an upper bound on the amount of time a router defers its route | To put an upper bound on the amount of time a router defers its route | |||
selection, an implementation MUST support a (configurable) timer that | selection, an implementation MUST support a (configurable) timer that | |||
imposes this upper bound. This timer is referred to as the | imposes this upper bound. This timer is referred to as the | |||
"Selection_Deferral_Timer". The value of this timer should be large | "Selection_Deferral_Timer". The value of this timer should be large | |||
enough, as to provide all the peers of the Restarting Speaker with | enough, so as to provide all the peers of the Restarting Speaker with | |||
enough time to send all the routes to the Restarting Speaker. | enough time to send all the routes to the Restarting Speaker. | |||
If one wants to apply graceful restart only when the restart is | If one wants to apply graceful restart only when the restart is | |||
planned (as opposed to both planned and unplanned restart), then one | planned (as opposed to both planned and unplanned restart), then one | |||
way to accomplish this would be to set the Forwarding State bit to 1 | way to accomplish this would be to set the Forwarding State bit to 1 | |||
after a planned restart, and to 0 in all other cases. Other | after a planned restart, and to 0 in all other cases. Other | |||
approaches to accomplish this are outside the scope of this document. | approaches to accomplish this are outside the scope of this document. | |||
5.2. Procedures for the Receiving Speaker | 4.2. Procedures for the Receiving Speaker | |||
When the Restarting Speaker restarts, the Receiving Speaker may or | When the Restarting Speaker restarts, the Receiving Speaker may or | |||
may not detect the termination of the TCP session with the Restarting | may not detect the termination of the TCP session with the Restarting | |||
Speaker, depending on the underlying TCP implementation, whether or | Speaker, depending on the underlying TCP implementation, whether or | |||
not [BGP-AUTH] is in use, and the specific circumstances of the | not [BGP-AUTH] is in use, and the specific circumstances of the | |||
restart. In case it does not detect the termination of the old TCP | restart. In case it does not detect the termination of the old TCP | |||
session and still considers the BGP session as being established, it | session and still considers the BGP session as being established, it | |||
MUST treat the subsequent open connection from the peer as an | MUST treat the subsequent open connection from the peer as an | |||
indication of the termination of the old TCP session and act | indication of the termination of the old TCP session and act | |||
accordingly (when the Graceful Restart Capability has been received | accordingly (when the Graceful Restart Capability has been received | |||
from the peer). See Section 8 for a description of this behavior in | from the peer). See Section 8 for a description of this behavior in | |||
terms of the BGP finite state machine. | terms of the BGP finite state machine. | |||
"Acting accordingly" in this context means that the previous TCP | "Acting accordingly" in this context means that the previous TCP | |||
session MUST be closed, and the new one retained. Note that this | session MUST be closed, and the new one retained. Note that this | |||
behavior differs from the default behavior, as specified in [BGP-4] | behavior differs from the default behavior, as specified in [BGP-4], | |||
section 6.8. Since the previous connection is considered to be | Section 6.8. Since the previous connection is considered to be | |||
terminated, no NOTIFICATION message should be sent -- the previous | terminated, no NOTIFICATION message should be sent -- the previous | |||
TCP session is simply closed. | TCP session is simply closed. | |||
When the Receiving Speaker detects termination of the TCP session for | When the Receiving Speaker detects termination of the TCP session for | |||
a BGP session with a peer that has advertised the Graceful Restart | a BGP session with a peer that has advertised the Graceful Restart | |||
Capability, it MUST retain the routes received from the peer for all | Capability, it MUST retain the routes received from the peer for all | |||
the address families that were previously received in the Graceful | the address families that were previously received in the Graceful | |||
Restart Capability, and MUST mark them as stale routing information. | Restart Capability and MUST mark them as stale routing information. | |||
To deal with possible consecutive restarts, a route (from the peer) | To deal with possible consecutive restarts, a route (from the peer) | |||
previously marked as stale MUST be deleted. The router MUST NOT | previously marked as stale MUST be deleted. The router MUST NOT | |||
differentiate between stale and other routing information during | differentiate between stale and other routing information during | |||
forwarding. | forwarding. | |||
In re-establishing the session, the "Restart State" bit in the | In re-establishing the session, the "Restart State" bit in the | |||
Graceful Restart Capability of the OPEN message sent by the Receiving | Graceful Restart Capability of the OPEN message sent by the Receiving | |||
Speaker MUST NOT be set unless the Receiving Speaker has restarted. | Speaker MUST NOT be set unless the Receiving Speaker has restarted. | |||
The presence and the setting of the "Forwarding State" bit for an | The presence and the setting of the "Forwarding State" bit for an | |||
address family depends upon the actual forwarding state and | address family depend upon the actual forwarding state and | |||
configuration. | configuration. | |||
If the session does not get re-established within the "Restart Time" | If the session does not get re-established within the "Restart Time" | |||
that the peer advertised previously, the Receiving Speaker MUST | that the peer advertised previously, the Receiving Speaker MUST | |||
delete all the stale routes from the peer that it is retaining. | delete all the stale routes from the peer that it is retaining. | |||
A BGP speaker could have some way of determining whether its peer's | A BGP speaker could have some way of determining whether its peer's | |||
forwarding state is still viable, for example through [BFD] or | forwarding state is still viable, for example through Bidirectional | |||
through monitoring layer two information. Specifics of such | Forwarding Detection [BFD] or through monitoring layer two | |||
mechanisms are beyond the scope of this document. In the event that | information. Specifics of such mechanisms are beyond the scope of | |||
it determines that its peer's forwarding state is not viable prior to | this document. In the event that it determines that its peer's | |||
the re-establishment of the session, the speaker MAY delete all the | forwarding state is not viable prior to the re-establishment of the | |||
stale routes from the peer that it is retaining. | session, the speaker MAY delete all the stale routes from the peer | |||
that it is retaining. | ||||
Once the session is re-established, if the "Forwarding State" bit for | Once the session is re-established, if the "Forwarding State" bit for | |||
a specific address family is not set in the newly received Graceful | a specific address family is not set in the newly received Graceful | |||
Restart Capability, or if a specific address family is not included | Restart Capability, or if a specific address family is not included | |||
in the newly received Graceful Restart Capability, or if the Graceful | in the newly received Graceful Restart Capability, or if the Graceful | |||
Restart Capability isn't received in the re-established session at | Restart Capability is not received in the re-established session at | |||
all, then Receiving Speaker MUST immediately remove all the stale | all, then the Receiving Speaker MUST immediately remove all the stale | |||
routes from the peer that it is retaining for that address family. | routes from the peer that it is retaining for that address family. | |||
The Receiving Speaker MUST send the End-of-RIB marker once it | The Receiving Speaker MUST send the End-of-RIB marker once it | |||
completes the initial update for an address family (including the | completes the initial update for an address family (including the | |||
case that it has no routes to send) to the peer. | case that it has no routes to send) to the peer. | |||
The Receiving Speaker MUST replace the stale routes by the routing | The Receiving Speaker MUST replace the stale routes by the routing | |||
updates received from the peer. Once the End-of-RIB marker for an | updates received from the peer. Once the End-of-RIB marker for an | |||
address family is received from the peer, it MUST immediately remove | address family is received from the peer, it MUST immediately remove | |||
any routes from the peer that are still marked as stale for that | any routes from the peer that are still marked as stale for that | |||
address family. | address family. | |||
To put an upper bound on the amount of time a router retains the | To put an upper bound on the amount of time a router retains the | |||
stale routes, an implementation MAY support a (configurable) timer | stale routes, an implementation MAY support a (configurable) timer | |||
that imposes this upper bound. | that imposes this upper bound. | |||
6. Changes to BGP Finite State Machine | 5. Changes to BGP Finite State Machine | |||
As mentioned under "Procedures for the Receiving Speaker" above, this | As mentioned under "Procedures for the Receiving Speaker" above, this | |||
specification modifies the BGP finite state machine. | specification modifies the BGP finite state machine. | |||
The specific state machine modifications to [BGP-4] Section 8.2.2 are | The specific state machine modifications to [BGP-4], Section 8.2.2, | |||
as follows. | are as follows. | |||
In the Idle state, make the following changes. | In the Idle state, make the following changes. | |||
Replace this text: | Replace this text: | |||
- initializes all BGP resources for the peer connection, | - initializes all BGP resources for the peer connection, | |||
with | with | |||
- initializes all BGP resources for the peer connection, other | - initializes all BGP resources for the peer connection, other | |||
than those resources required in order to retain routes according | than those resources required in order to retain routes | |||
to section "Procedures for the Receiving Speaker" of this | according to section "Procedures for the Receiving Speaker" of | |||
(Graceful Restart) specification, | this (Graceful Restart) specification, | |||
In the Established state, make the following changes. | In the Established state, make the following changes. | |||
Replace this text: | Replace this text: | |||
In response to an indication that the TCP connection is | In response to an indication that the TCP connection is | |||
successfully established (Event 16 or Event 17), the second | successfully established (Event 16 or Event 17), the second | |||
connection SHALL be tracked until it sends an OPEN message. | connection SHALL be tracked until it sends an OPEN message. | |||
with | with | |||
If the Graceful Restart Capability with one or more AFIs/SAFIs | ||||
has not been received for the session, then in response to an | ||||
indication that a TCP connection is successfully established | ||||
(Event 16 or Event 17), the second connection SHALL be tracked | ||||
until it sends an OPEN message. | ||||
If the Graceful Restart capability with one or more AFI/SAFI has | However, if the Graceful Restart Capability with one or more | |||
not been received for the session, then in response to an | AFIs/SAFIs has been received for the session, then in response | |||
indication that a TCP connection is successfully established | to Event 16 or Event 17 the local system: | |||
(Event 16 or Event 17), the second connection SHALL be tracked | ||||
until it sends an OPEN message. | ||||
However, if the Graceful Restart capability with one or more | ||||
AFI/SAFI has been received for the session, then in response to | ||||
Event 16 or Event 17 the local system: | ||||
- retains all routes associated with this connection according | - retains all routes associated with this connection according | |||
to section "Procedures for the Receiving Speaker" of this | to section "Procedures for the Receiving Speaker" of this | |||
(Graceful Restart) specification, | (Graceful Restart) specification, | |||
- releases all other BGP resources, | - releases all other BGP resources, | |||
- drops the TCP connection associated with the ESTABLISHED | - drops the TCP connection associated with the ESTABLISHED | |||
session, | session, | |||
- initializes all BGP resources for the peer connection, other | - initializes all BGP resources for the peer connection, other | |||
than those required in order to retain routes according to | than those required in order to retain routes according to | |||
section "Procedures for the Receiving Speaker" of this | section "Procedures for the Receiving Speaker" of this | |||
specification, | specification, | |||
- sets ConnectRetryCounter to zero, | - sets ConnectRetryCounter to zero, | |||
- starts the ConnectRetryTimer with the initial value, | - starts the ConnectRetryTimer with the initial value, and | |||
- changes its state to Connect. | - changes its state to Connect. | |||
Replace this text: | Replace this text: | |||
If the local system receives a NOTIFICATION message (Event 24 or | If the local system receives a NOTIFICATION message (Event 24 or | |||
Event 25), or a TcpConnectionFails (Event 18) from the underlying | Event 25), or a TcpConnectionFails (Event 18) from the underlying | |||
TCP, the local system: | TCP, the local system: | |||
- sets the ConnectRetryTimer to zero, | - sets the ConnectRetryTimer to zero, | |||
skipping to change at line 464 | skipping to change at page 11, line 10 | |||
- increments the ConnectRetryCounter by 1, | - increments the ConnectRetryCounter by 1, | |||
- changes its state to Idle. | - changes its state to Idle. | |||
with | with | |||
If the local system receives a NOTIFICATION message (Event 24 or | If the local system receives a NOTIFICATION message (Event 24 or | |||
Event 25), or if the local system receives a TcpConnectionFails | Event 25), or if the local system receives a TcpConnectionFails | |||
(Event 18) from the underlying TCP and the Graceful Restart | (Event 18) from the underlying TCP and the Graceful Restart | |||
capability with one or more AFI/SAFI has not been received for the | capability with one or more AFIs/SAFIs has not been received for | |||
session, the local system: | the session, the local system: | |||
- sets the ConnectRetryTimer to zero, | - sets the ConnectRetryTimer to zero, | |||
- deletes all routes associated with this connection, | - deletes all routes associated with this connection, | |||
- releases all the BGP resources, | - releases all the BGP resources, | |||
- drops the TCP connection, | - drops the TCP connection, | |||
- increments the ConnectRetryCounter by 1, | - increments the ConnectRetryCounter by 1, and | |||
- changes its state to Idle. | - changes its state to Idle. | |||
However, if the local system receives a TcpConnectionFails (Event | However, if the local system receives a TcpConnectionFails (Event | |||
18) from the underlying TCP, and the Graceful Restart capability | 18) from the underlying TCP, and the Graceful Restart Capability | |||
with one or more AFI/SAFI has been received for the session, the | with one or more AFIs/SAFIs has been received for the session, the | |||
local system: | local system: | |||
- sets the ConnectRetryTimer to zero, | - sets the ConnectRetryTimer to zero, | |||
- retains all routes associated with this connection according | - retains all routes associated with this connection according | |||
to section "Procedures for the Receiving Speaker" of this | to section "Procedures for the Receiving Speaker" of this | |||
(Graceful Restart) specification, | (Graceful Restart) specification, | |||
- releases all other BGP resources, | - releases all other BGP resources, | |||
- drops the TCP connection, | - drops the TCP connection, | |||
- increments the ConnectRetryCounter by 1, | - increments the ConnectRetryCounter by 1, and | |||
- changes its state to Idle. | - changes its state to Idle. | |||
7. Deployment Considerations | 6. Deployment Considerations | |||
While the procedures described in this document would help minimize | Although the procedures described in this document would help | |||
the effect of routing flaps, it is noted, however, that when a BGP | minimize the effect of routing flaps, it is noted that when a BGP | |||
Graceful Restart capable router restarts, or if it restarts without | Graceful Restart-capable router restarts, or if it restarts without | |||
preserving its forwarding state (for example due to a power failure) | preserving its forwarding state (e.g., due to a power failure), there | |||
there is a potential for transient routing loops or blackholes in the | is a potential for transient routing loops or blackholes in the | |||
network if routing information changes before the involved routers | network if routing information changes before the involved routers | |||
complete routing updates and convergence. Also, depending on the | complete routing updates and convergence. Also, depending on the | |||
network topology, if not all IBGP speakers are Graceful Restart | network topology, if not all IBGP speakers are Graceful Restart | |||
capable, there could be an increased exposure to transient routing | capable, there could be an increased exposure to transient routing | |||
loops or blackholes when the Graceful Restart procedures are | loops or blackholes when the Graceful Restart procedures are | |||
exercised. | exercised. | |||
The Restart Time, the upper bound for retaining routes and the upper | The Restart Time, the upper bound for retaining routes, and the upper | |||
bound for deferring route selection may need to be tuned as more | bound for deferring route selection may need to be tuned as more | |||
deployment experience is gained. | deployment experience is gained. | |||
Finally, it is noted that the benefits of deploying BGP Graceful | Finally, it is noted that the benefits of deploying BGP Graceful | |||
Restart in an AS whose IGPs and BGP are tightly coupled (i.e., BGP | Restart in an Autonomous System (AS) whose IGPs and BGP are tightly | |||
and IGPs would both restart) and IGPs have no similar Graceful | coupled (i.e., BGP and IGPs would both restart) and IGPs have no | |||
Restart capability are reduced relative to the scenario where IGPs do | similar Graceful Restart Capability are reduced relative to the | |||
have similar Graceful Restart capability. | scenario where IGPs do have similar Graceful Restart Capability. | |||
8. Security Considerations | 7. Security Considerations | |||
Since with this proposal a new connection can cause an old one to be | Since with this proposal a new connection can cause an old one to be | |||
terminated, it might seem to open the door to denial of service | terminated, it might seem to open the door to denial of service | |||
attacks. However, it is noted that unauthenticated BGP is already | attacks. However, it is noted that unauthenticated BGP is already | |||
known to be vulnerable to denials of service through attacks on the | known to be vulnerable to denials of service through attacks on the | |||
TCP transport. The TCP transport is commonly protected through use | TCP transport. The TCP transport is commonly protected through use | |||
of [BGP-AUTH]. Such authentication will equally protect against | of [BGP-AUTH]. Such authentication will equally protect against | |||
denials of service through spurious new connections. | denials of service through spurious new connections. | |||
If an attacker is able to successfully open a TCP connection | If an attacker is able to successfully open a TCP connection | |||
impersonating a legitimate peer, the attacker's connection will | impersonating a legitimate peer, the attacker's connection will | |||
replace the legitimate one, potentially enabling the attacker to | replace the legitimate one, potentially enabling the attacker to | |||
advertise bogus routes. We note, however, that the window for such a | advertise bogus routes. We note, however, that the window for such a | |||
route insertion attack is small since through normal operation of the | route insertion attack is small since through normal operation of the | |||
protocol the legitimate peer would open a new connection, in turn | protocol the legitimate peer would open a new connection, in turn | |||
causing the attacker's connection to be terminated. Thus, this | causing the attacker's connection to be terminated. Thus, this | |||
attack devolves to a form of denial of service. | attack devolves to a form of denial of service. | |||
It is thus concluded that this proposal does not change the | It is thus concluded that this proposal does not change the | |||
underlying security model (and issues) of BGP-4. | underlying security model (and issues) of BGP-4. | |||
We also note that implementations may allow use of graceful restart | We also note that implementations may allow use of graceful restart | |||
to be controlled by configuration. If graceful restart is not | to be controlled by configuration. If graceful restart is not | |||
enabled, naturally the underlying security model of BGP-4 is | enabled, naturally the underlying security model of BGP-4 is | |||
unchanged. | unchanged. | |||
9. Intellectual Property Considerations | 8. Acknowledgments | |||
This section is taken from Section 5 of RFC 3668. | ||||
The IETF takes no position regarding the validity or scope of any | ||||
Intellectual Property Rights or other rights that might be claimed to | ||||
pertain to the implementation or use of the technology described in | ||||
this document or the extent to which any license under such rights | ||||
might or might not be available; nor does it represent that it has | ||||
made any independent effort to identify any such rights. Information | ||||
on the procedures with respect to rights in RFC documents can be | ||||
found in BCP 78 and BCP 79. | ||||
Copies of IPR disclosures made to the IETF Secretariat and any | ||||
assurances of licenses to be made available, or the result of an | ||||
attempt made to obtain a general license or permission for the use of | ||||
such proprietary rights by implementers or users of this | ||||
specification can be obtained from the IETF on-line IPR repository at | ||||
http://www.ietf.org/ipr. | ||||
The IETF invites any interested party to bring to its attention any | ||||
copyrights, patents or patent applications, or other proprietary | ||||
rights that may cover technology that may be required to implement | ||||
this standard. Please address the information to the IETF at ietf- | ||||
ipr@ietf.org. | ||||
10. Copyright Notice | ||||
Copyright (C) The Internet Society (2006). | ||||
This document is subject to the rights, licenses and restrictions | ||||
contained in BCP 78, and except as set forth therein, the authors | ||||
retain all their rights. | ||||
This document and the information contained herein are provided on an | The authors would like to thank Bruce Cole, Lars Eggert, Bill Fenner, | |||
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | Eric Gray, Jeffrey Haas, Sam Hartman, Alvaro Retana, Pekka Savola | |||
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | Naiming Shen, Satinder Singh, Mark Townsley, David Ward, Shane | |||
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | Wright, and Alex Zinin for their review and comments. | |||
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | ||||
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
11. IANA Considerations | 9. IANA Considerations | |||
This document defines a new BGP Capability - Graceful Restart | This document defines a new BGP capability - Graceful Restart | |||
Capability. The Capability Code for Graceful Restart Capability is | Capability. The Capability Code for Graceful Restart Capability is | |||
64. | 64. | |||
12. Acknowledgments | 10. References | |||
The authors would like to thank Bruce Cole, Lars Eggert, Bill Fenner, | ||||
Eric Gray Jeffrey Haas, Sam Hartman Alvaro Retana, Pekka Savola | ||||
Naiming Shen, Satinder Singh, Mark Townsley, David Ward, Shane Wright | ||||
and Alex Zinin for their review and comments. | ||||
13. Normative References | 10.1. Normative References | |||
[BGP-4] Rekhter, Y., T. Li, Hares, S., "A Border Gateway Protocol 4 | [BGP-4] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway | |||
(BGP-4)", RFC4271, January 2006. | Protocol 4 (BGP-4)", RFC 4271, January 2006. | |||
[BGP-MP] Bates, T., Chandra, R., Katz, D., and Rekhter, Y., | [BGP-MP] Bates, T., Rekhter, Y., Chandra, R., and D. Katz, | |||
"Multiprotocol Extensions for BGP-4", RFC2858, June 2000. | "Multiprotocol Extensions for BGP-4", RFC 2858, June | |||
2000. | ||||
[BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with | [BGP-CAP] Chandra, R. and J. Scudder, "Capabilities Advertisement | |||
BGP-4", RFC3392, November 2002. | with BGP-4", RFC 3392, November 2002. | |||
[BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5 | [BGP-AUTH] Heffernan, A., "Protection of BGP Sessions via the TCP | |||
Signature Option", RFC 2385, August 1998. | MD5 Signature Option", RFC 2385, August 1998. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[IANA-AFI] http://www.iana.org/assignments/address-family-numbers. | [IANA-AFI] http://www.iana.org/assignments/address-family-numbers | |||
[IANA-SAFI] http://www.iana.org/assignments/safi-namespace. | [IANA-SAFI] http://www.iana.org/assignments/safi-namespace | |||
14. Non-normative References | 10.2. Informative References | |||
[BFD] Katz, D., Ward, D., "Bidirectional Forwarding Detection", | [BFD] Katz, D. and D. Ward, "Bidirectional Forwarding | |||
draft-ietf-bfd-base-03.txt, work in progress | Detection", Work in Progress. | |||
15. Author Information | Authors' Addresses | |||
Srihari R. Sangli | Srihari R. Sangli | |||
Cisco Systems, Inc. | Cisco Systems, Inc. | |||
EMail: rsrihari@cisco.com | EMail: rsrihari@cisco.com | |||
Yakov Rekhter | Yakov Rekhter | |||
Juniper Networks, Inc. | Juniper Networks, Inc. | |||
EMail: yakov@juniper.net | EMail: yakov@juniper.net | |||
Rex Fernando | Rex Fernando | |||
e-mail: rex_f@yahoo.com | Juniper Networks, Inc. | |||
John G. Scudder | EMail: rex@juniper.net | |||
Cisco Systems, Inc. | ||||
EMail: jgs@cisco.com | John G. Scudder | |||
Juniper Networks, Inc. | ||||
EMail: jgs@juniper.net | ||||
Enke Chen | Enke Chen | |||
Cisco Systems, Inc. | Cisco Systems, Inc. | |||
EMail: enkechen@cisco.com | EMail: enkechen@cisco.com | |||
Full Copyright Statement | ||||
Copyright (C) The IETF Trust (2007). | ||||
This document is subject to the rights, licenses and restrictions | ||||
contained in BCP 78, and except as set forth therein, the authors | ||||
retain all their rights. | ||||
This document and the information contained herein are provided on an | ||||
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND | ||||
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS | ||||
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF | ||||
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
Intellectual Property | ||||
The IETF takes no position regarding the validity or scope of any | ||||
Intellectual Property Rights or other rights that might be claimed to | ||||
pertain to the implementation or use of the technology described in | ||||
this document or the extent to which any license under such rights | ||||
might or might not be available; nor does it represent that it has | ||||
made any independent effort to identify any such rights. Information | ||||
on the procedures with respect to rights in RFC documents can be | ||||
found in BCP 78 and BCP 79. | ||||
Copies of IPR disclosures made to the IETF Secretariat and any | ||||
assurances of licenses to be made available, or the result of an | ||||
attempt made to obtain a general license or permission for the use of | ||||
such proprietary rights by implementers or users of this | ||||
specification can be obtained from the IETF on-line IPR repository at | ||||
http://www.ietf.org/ipr. | ||||
The IETF invites any interested party to bring to its attention any | ||||
copyrights, patents or patent applications, or other proprietary | ||||
rights that may cover technology that may be required to implement | ||||
this standard. Please address the information to the IETF at | ||||
ietf-ipr@ietf.org. | ||||
Acknowledgement | ||||
Funding for the RFC Editor function is currently provided by the | ||||
Internet Society. | ||||
End of changes. 97 change blocks. | ||||
246 lines changed or deleted | 219 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |