draft-ietf-idr-restart-13.txt   rfc4724.txt 
Network Working Group Srihari R. Sangli Network Working Group S. Sangli
Internet Draft Yakov Rekhter Request for Comments: 4724 E. Chen
Expiration Date: January 2007 Rex Fernando Category: Standards Track Cisco Systems
John G. Scudder R. Fernando
Enke Chen J. Scudder
Y. Rekhter
Juniper Networks
January 2007
Graceful Restart Mechanism for BGP Graceful Restart Mechanism for BGP
draft-ietf-idr-restart-13.txt Status of This Memo
Status of this Memo
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress".
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at This document specifies an Internet standards track protocol for the
http://www.ietf.org/shadow.html. Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
IPR Disclosure Acknowledgement Copyright Notice
By submitting this Internet-Draft, each author represents that any Copyright (C) The IETF Trust (2007).
applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79.
Abstract Abstract
This document describes a mechanism for BGP that would help minimize This document describes a mechanism for BGP that would help minimize
the negative effects on routing caused by BGP restart. An End-of-RIB the negative effects on routing caused by BGP restart. An End-of-RIB
marker is specified and can be used to convey routing convergence marker is specified and can be used to convey routing convergence
information. A new BGP capability, termed "Graceful Restart information. A new BGP capability, termed "Graceful Restart
Capability", is defined which would allow a BGP speaker to express Capability", is defined that would allow a BGP speaker to express its
its ability to preserve forwarding state during BGP restart. Finally, ability to preserve forwarding state during BGP restart. Finally,
procedures are outlined for temporarily retaining routing information procedures are outlined for temporarily retaining routing information
across a TCP session termination/re-establishment. across a TCP session termination/re-establishment.
The mechanisms described in this document are applicable to all The mechanisms described in this document are applicable to all
routers, both those with the ability to preserve forwarding state routers, both those with the ability to preserve forwarding state
during BGP restart and those without (although the latter need to during BGP restart and those without (although the latter need to
implement only a subset of the mechanisms described in this implement only a subset of the mechanisms described in this
document). document).
1. Specification of Requirements Table of Contents
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 1. Introduction ....................................................2
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 1.1. Specification of Requirements ..............................2
document are to be interpreted as described in RFC2119 [RFC2119]. 2. Marker for End-of-RIB ...........................................3
3. Graceful Restart Capability .....................................3
4. Operation .......................................................6
4.1. Procedures for the Restarting Speaker ......................6
4.2. Procedures for the Receiving Speaker .......................7
5. Changes to BGP Finite State Machine .............................9
6. Deployment Considerations ......................................11
7. Security Considerations ........................................12
8. Acknowledgments ................................................13
9. IANA Considerations ............................................13
10. References ....................................................13
10.1. Normative References .....................................13
10.2. Informative References ...................................13
2. Introduction 1. Introduction
Usually when BGP on a router restarts, all the BGP peers detect that Usually, when BGP on a router restarts, all the BGP peers detect that
the session went down, and then came up. This "down/up" transition the session went down and then came up. This "down/up" transition
results in a "routing flap" and causes BGP route re-computation, results in a "routing flap" and causes BGP route re-computation,
generation of BGP routing updates and flap the forwarding tables. It generation of BGP routing updates, and unnecessary churn to the
could spread across multiple routing domains. Such routing flaps may forwarding tables. It could spread across multiple routing domains.
create transient forwarding blackholes and/or transient forwarding Such routing flaps may create transient forwarding blackholes and/or
loops. They also consume resources on the control plane of the transient forwarding loops. They also consume resources on the
routers affected by the flap. As such they are detrimental to the control plane of the routers affected by the flap. As such, they are
overall network performance. detrimental to the overall network performance.
This document describes a mechanism for BGP that would help minimize This document describes a mechanism for BGP that would help minimize
the negative effects on routing caused by BGP restart. An End-of-RIB the negative effects on routing caused by BGP restart. An End-of-RIB
marker is specified and can be used to convey routing convergence marker is specified and can be used to convey routing convergence
information. A new BGP capability, termed "Graceful Restart information. A new BGP capability, termed "Graceful Restart
Capability", is defined which would allow a BGP speaker to express Capability", is defined that would allow a BGP speaker to express its
its ability to preserve forwarding state during BGP restart. Finally, ability to preserve forwarding state during BGP restart. Finally,
procedures are outlined for temporarily retaining routing information procedures are outlined for temporarily retaining routing information
across a TCP session termination/re-establishment. across a TCP session termination/re-establishment.
3. Marker for End-of-RIB 1.1 Specification of Requirements
An UPDATE message with no reachable NLRI and empty withdrawn NLRI is The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
specified as the End-Of-RIB Marker that can be used by a BGP speaker "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
to indicate to its peer the completion of the initial routing update document are to be interpreted as described in RFC 2119 [RFC2119].
after the session is established. For IPv4 unicast address family,
the End-Of-RIB Marker is an UPDATE message with the minimum length
[BGP-4]. For any other address family, it is an UPDATE message that
contains only the MP_UNREACH_NLRI attribute [BGP-MP] with no
withdrawn routes for that <AFI, SAFI>.
Although the End-of-RIB Marker is specified for the purpose of BGP 2. Marker for End-of-RIB
An UPDATE message with no reachable Network Layer Reachability
Information (NLRI) and empty withdrawn NLRI is specified as the End-
of-RIB marker that can be used by a BGP speaker to indicate to its
peer the completion of the initial routing update after the session
is established. For the IPv4 unicast address family, the End-of-RIB
marker is an UPDATE message with the minimum length [BGP-4]. For any
other address family, it is an UPDATE message that contains only the
MP_UNREACH_NLRI attribute [BGP-MP] with no withdrawn routes for that
<AFI, SAFI>.
Although the End-of-RIB marker is specified for the purpose of BGP
graceful restart, it is noted that the generation of such a marker graceful restart, it is noted that the generation of such a marker
upon completion of the initial update would be useful for routing upon completion of the initial update would be useful for routing
convergence in general, and thus the practice is recommended. convergence in general, and thus the practice is recommended.
In addition, it would be beneficial for routing convergence if a BGP In addition, it would be beneficial for routing convergence if a BGP
speaker can indicate to its peer up-front that it will generate the speaker can indicate to its peer up-front that it will generate the
End-Of-RIB marker, regardless of its ability to preserve its End-of-RIB marker, regardless of its ability to preserve its
forwarding state during BGP restart. This can be accomplished using forwarding state during BGP restart. This can be accomplished using
the Graceful Restart Capability described in the next section. the Graceful Restart Capability described in the next section.
4. Graceful Restart Capability 3. Graceful Restart Capability
The Graceful Restart Capability is a new BGP capability [BGP-CAP] The Graceful Restart Capability is a new BGP capability [BGP-CAP]
that can be used by a BGP speaker to indicate its ability to preserve that can be used by a BGP speaker to indicate its ability to preserve
its forwarding state during BGP restart. It can also be used to its forwarding state during BGP restart. It can also be used to
convey to its peer its intention of generating the End-Of-RIB marker convey to its peer its intention of generating the End-of-RIB marker
upon the completion of its initial routing updates. upon the completion of its initial routing updates.
This capability is defined as follows: This capability is defined as follows:
Capability code: 64 Capability code: 64
Capability length: variable Capability length: variable
Capability value: Consists of the "Restart Flags" field, "Restart Capability value: Consists of the "Restart Flags" field, "Restart
Time" field, and 0 to 63 of the tuples <AFI, SAFI, Flags for Time" field, and 0 to 63 of the tuples <AFI, SAFI, Flags for
skipping to change at line 153 skipping to change at page 4, line 37
Restart Flags: Restart Flags:
This field contains bit flags related to restart. This field contains bit flags related to restart.
0 1 2 3 0 1 2 3
+-+-+-+-+ +-+-+-+-+
|R|Resv.| |R|Resv.|
+-+-+-+-+ +-+-+-+-+
The most significant bit is defined as the Restart State (R) The most significant bit is defined as the Restart State (R)
bit which can be used to avoid possible deadlock caused by bit, which can be used to avoid possible deadlock caused by
waiting for the End-of-RIB marker when multiple BGP speakers waiting for the End-of-RIB marker when multiple BGP speakers
peering with each other restart. When set (value 1), this bit peering with each other restart. When set (value 1), this bit
indicates that the BGP speaker has restarted, and its peer MUST indicates that the BGP speaker has restarted, and its peer MUST
NOT wait for the End-of-RIB marker from the speaker before NOT wait for the End-of-RIB marker from the speaker before
advertising routing information to the speaker. advertising routing information to the speaker.
The remaining bits are reserved, and MUST be set to zero by the The remaining bits are reserved and MUST be set to zero by the
sender and ignored by the receiver. sender and ignored by the receiver.
Restart Time: Restart Time:
This is the estimated time (in seconds) it will take for the This is the estimated time (in seconds) it will take for the
BGP session to be re-established after a restart. This can be BGP session to be re-established after a restart. This can be
used to speed up routing convergence by its peer in case that used to speed up routing convergence by its peer in case that
the BGP speaker does not come back after a restart. the BGP speaker does not come back after a restart.
Address Family Identifier (AFI): Address Family Identifier (AFI), Subsequent Address Family
Identifier (SAFI):
This field carries the identity of the Network Layer protocol
for which the Graceful Restart support is advertised. Presently
defined values for this field are specified in [IANA-AFI].
Subsequent Address Family Identifier (SAFI):
This field provides additional information about the type of The AFI and SAFI, taken in combination, indicate that Graceful
the Network Layer Reachability Information carried in the Restart is supported for routes that are advertised with the
attribute. Presently defined values for this field are same AFI and SAFI. Routes may be explicitly associated with a
specified in [IANA-SAFI]. particular AFI and SAFI using the encoding of [BGP-MP] or
implicitly associated with <AFI=IPv4, SAFI=Unicast> if using
the encoding of [BGP-4].
Flags for Address Family: Flags for Address Family:
This field contains bit flags for the <AFI, SAFI>. This field contains bit flags relating to routes that were
advertised with the given AFI and SAFI.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
|F| Reserved | |F| Reserved |
+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
The most significant bit is defined as the Forwarding State (F) The most significant bit is defined as the Forwarding State (F)
bit which can be used to indicate if the forwarding state for bit, which can be used to indicate whether the forwarding state
the <AFI, SAFI> has indeed been preserved during the previous for routes that were advertised with the given AFI and SAFI has
BGP restart. When set (value 1), the bit indicates that the indeed been preserved during the previous BGP restart. When
forwarding state has been preserved. set (value 1), the bit indicates that the forwarding state has
been preserved.
The remaining bits are reserved, and MUST be set to zero by the The remaining bits are reserved and MUST be set to zero by the
sender and ignored by the receiver. sender and ignored by the receiver.
When a sender of this capability doesn't include any <AFI, SAFI> in When a sender of this capability does not include any <AFI, SAFI> in
the capability, it means that the sender is not capable of preserving the capability, it means that the sender is not capable of preserving
its forwarding state during BGP restart, but supports procedures for its forwarding state during BGP restart, but supports procedures for
the Receiving Speaker (as defined in Section 5.2 of this document). the Receiving Speaker (as defined in Section 4.2 of this document).
In that case the value of the "Restart Time" field advertised by the In that case, the value of the "Restart Time" field advertised by the
sender is irrelevant. sender is irrelevant.
A BGP speaker MUST NOT include more than one instance of the Graceful A BGP speaker MUST NOT include more than one instance of the Graceful
Restart Capability in the capability advertisement [BGP-CAP]. If Restart Capability in the capability advertisement [BGP-CAP]. If
more than one instance of the Graceful Restart Capability is carried more than one instance of the Graceful Restart Capability is carried
in the capability advertisement, the receiver of the advertisement in the capability advertisement, the receiver of the advertisement
MUST ignore all but the last instance of the Graceful Restart MUST ignore all but the last instance of the Graceful Restart
Capability. Capability.
Including <AFI=IPv4, SAFI=unicast> into the Graceful Restart Including <AFI=IPv4, SAFI=unicast> in the Graceful Restart Capability
Capability doesn't imply that the IPv4 unicast routing information does not imply that the IPv4 unicast routing information should be
should be carried by using the BGP Multiprotocol extensions [BGP-MP] carried by using the BGP multiprotocol extensions [BGP-MP] -- it
- it could be carried in the NLRI field of the BGP UPDATE message. could be carried in the NLRI field of the BGP UPDATE message.
5. Operation 4. Operation
A BGP speaker MAY advertise the Graceful Restart Capability for an A BGP speaker MAY advertise the Graceful Restart Capability for an
address family to its peer if it has the ability to preserve its address family to its peer if it has the ability to preserve its
forwarding state for the address family when BGP restarts. In forwarding state for the address family when BGP restarts. In
addition, even if the speaker does not have the ability to preserve addition, even if the speaker does not have the ability to preserve
its forwarding state for any address family during BGP restart, it is its forwarding state for any address family during BGP restart, it is
still recommended that the speaker advertise the Graceful Restart still recommended that the speaker advertise the Graceful Restart
Capability to its peer (as mentioned before this is done by not Capability to its peer (as mentioned before this is done by not
including any <AFI, SAFI> in the advertised capability). There are including any <AFI, SAFI> in the advertised capability). There are
two reasons for doing this. First, to indicate its intention of two reasons for doing this. The first is to indicate its intention
generating the End-of-RIB marker upon the completion of its initial of generating the End-of-RIB marker upon the completion of its
routing updates, as doing this would be useful for routing initial routing updates, as doing this would be useful for routing
convergence in general. Second, to indicate its support for a peer convergence in general. The second is to indicate its support for a
which wishes to perform a graceful restart. peer which wishes to perform a graceful restart.
The End-of-RIB marker MUST be sent by a BGP speaker to its peer once The End-of-RIB marker MUST be sent by a BGP speaker to its peer once
it completes the initial routing update (including the case when it completes the initial routing update (including the case when
there is no update to send) for an address family after the BGP there is no update to send) for an address family after the BGP
session is established. session is established.
It is noted that the normal BGP procedures MUST be followed when the It is noted that the normal BGP procedures MUST be followed when the
TCP session terminates due to the sending or receiving of a BGP TCP session terminates due to the sending or receiving of a BGP
NOTIFICATION message. NOTIFICATION message.
skipping to change at line 259 skipping to change at page 6, line 44
whose BGP has restarted, and "Receiving Speaker" refers to a router whose BGP has restarted, and "Receiving Speaker" refers to a router
that peers with the restarting speaker. that peers with the restarting speaker.
Consider that the Graceful Restart Capability for an address family Consider that the Graceful Restart Capability for an address family
is advertised by the Restarting Speaker, and is understood by the is advertised by the Restarting Speaker, and is understood by the
Receiving Speaker, and a BGP session between them is established. Receiving Speaker, and a BGP session between them is established.
The following sections detail the procedures that MUST be followed by The following sections detail the procedures that MUST be followed by
the Restarting Speaker as well as the Receiving Speaker once the the Restarting Speaker as well as the Receiving Speaker once the
Restarting Speaker restarts. Restarting Speaker restarts.
5.1. Procedures for the Restarting Speaker 4.1. Procedures for the Restarting Speaker
When the Restarting Speaker restarts, it MUST retain, if possible, When the Restarting Speaker restarts, it MUST retain, if possible,
the forwarding state for the BGP routes in the Loc-RIB, and MUST mark the forwarding state for the BGP routes in the Loc-RIB and MUST mark
them as stale. It MUST NOT differentiate between stale and other them as stale. It MUST NOT differentiate between stale and other
information during forwarding. information during forwarding.
To re-establish the session with its peer, the Restarting Speaker To re-establish the session with its peer, the Restarting Speaker
MUST set the "Restart State" bit in the Graceful Restart Capability MUST set the "Restart State" bit in the Graceful Restart Capability
of the OPEN message. Unless allowed via configuration, the of the OPEN message. Unless allowed via configuration, the
"Forwarding State" bit for an address family in the capability can be "Forwarding State" bit for an address family in the capability can be
set only if the forwarding state has indeed been preserved for that set only if the forwarding state has indeed been preserved for that
address family during the restart. address family during the restart.
Once the session between the Restarting Speaker and the Receiving Once the session between the Restarting Speaker and the Receiving
Speaker is re-established, the Restarting Speaker will receive and Speaker is re-established, the Restarting Speaker will receive and
process BGP messages from its peers. However, it MUST defer route process BGP messages from its peers. However, it MUST defer route
selection for an address family until it either (a) receives the End- selection for an address family until it either (a) receives the
of-RIB marker from all its peers (excluding the ones with the End-of-RIB marker from all its peers (excluding the ones with the
"Restart State" bit set in the received capability and excluding the "Restart State" bit set in the received capability and excluding the
ones which do not advertise the graceful restart capability) or (b) ones that do not advertise the graceful restart capability) or (b)
the Selection_Deferral_Timer referred to below has expired. It is the Selection_Deferral_Timer referred to below has expired. It is
noted that prior to route selection, the speaker has no routes to noted that prior to route selection, the speaker has no routes to
advertise to its peers and no routes to update the forwarding state. advertise to its peers and no routes to update the forwarding state.
In situations where both IGP and BGP have restarted, it might be In situations where both Interior Gateway Protocol (IGP) and BGP have
advantageous to wait for IGP to converge before the BGP speaker restarted, it might be advantageous to wait for IGP to converge
performs route selection. before the BGP speaker performs route selection.
After the BGP speaker performs route selection, the forwarding state After the BGP speaker performs route selection, the forwarding state
of the speaker MUST be updated and any previously marked stale of the speaker MUST be updated and any previously marked stale
information MUST be removed. The Adj-RIB-Out can then be advertised information MUST be removed. The Adj-RIB-Out can then be advertised
to its peers. Once the initial update is complete for an address to its peers. Once the initial update is complete for an address
family (including the case that there is no routing update to send), family (including the case that there is no routing update to send),
the End-of-RIB marker MUST be sent. the End-of-RIB marker MUST be sent.
To put an upper bound on the amount of time a router defers its route To put an upper bound on the amount of time a router defers its route
selection, an implementation MUST support a (configurable) timer that selection, an implementation MUST support a (configurable) timer that
imposes this upper bound. This timer is referred to as the imposes this upper bound. This timer is referred to as the
"Selection_Deferral_Timer". The value of this timer should be large "Selection_Deferral_Timer". The value of this timer should be large
enough, as to provide all the peers of the Restarting Speaker with enough, so as to provide all the peers of the Restarting Speaker with
enough time to send all the routes to the Restarting Speaker. enough time to send all the routes to the Restarting Speaker.
If one wants to apply graceful restart only when the restart is If one wants to apply graceful restart only when the restart is
planned (as opposed to both planned and unplanned restart), then one planned (as opposed to both planned and unplanned restart), then one
way to accomplish this would be to set the Forwarding State bit to 1 way to accomplish this would be to set the Forwarding State bit to 1
after a planned restart, and to 0 in all other cases. Other after a planned restart, and to 0 in all other cases. Other
approaches to accomplish this are outside the scope of this document. approaches to accomplish this are outside the scope of this document.
5.2. Procedures for the Receiving Speaker 4.2. Procedures for the Receiving Speaker
When the Restarting Speaker restarts, the Receiving Speaker may or When the Restarting Speaker restarts, the Receiving Speaker may or
may not detect the termination of the TCP session with the Restarting may not detect the termination of the TCP session with the Restarting
Speaker, depending on the underlying TCP implementation, whether or Speaker, depending on the underlying TCP implementation, whether or
not [BGP-AUTH] is in use, and the specific circumstances of the not [BGP-AUTH] is in use, and the specific circumstances of the
restart. In case it does not detect the termination of the old TCP restart. In case it does not detect the termination of the old TCP
session and still considers the BGP session as being established, it session and still considers the BGP session as being established, it
MUST treat the subsequent open connection from the peer as an MUST treat the subsequent open connection from the peer as an
indication of the termination of the old TCP session and act indication of the termination of the old TCP session and act
accordingly (when the Graceful Restart Capability has been received accordingly (when the Graceful Restart Capability has been received
from the peer). See Section 8 for a description of this behavior in from the peer). See Section 8 for a description of this behavior in
terms of the BGP finite state machine. terms of the BGP finite state machine.
"Acting accordingly" in this context means that the previous TCP "Acting accordingly" in this context means that the previous TCP
session MUST be closed, and the new one retained. Note that this session MUST be closed, and the new one retained. Note that this
behavior differs from the default behavior, as specified in [BGP-4] behavior differs from the default behavior, as specified in [BGP-4],
section 6.8. Since the previous connection is considered to be Section 6.8. Since the previous connection is considered to be
terminated, no NOTIFICATION message should be sent -- the previous terminated, no NOTIFICATION message should be sent -- the previous
TCP session is simply closed. TCP session is simply closed.
When the Receiving Speaker detects termination of the TCP session for When the Receiving Speaker detects termination of the TCP session for
a BGP session with a peer that has advertised the Graceful Restart a BGP session with a peer that has advertised the Graceful Restart
Capability, it MUST retain the routes received from the peer for all Capability, it MUST retain the routes received from the peer for all
the address families that were previously received in the Graceful the address families that were previously received in the Graceful
Restart Capability, and MUST mark them as stale routing information. Restart Capability and MUST mark them as stale routing information.
To deal with possible consecutive restarts, a route (from the peer) To deal with possible consecutive restarts, a route (from the peer)
previously marked as stale MUST be deleted. The router MUST NOT previously marked as stale MUST be deleted. The router MUST NOT
differentiate between stale and other routing information during differentiate between stale and other routing information during
forwarding. forwarding.
In re-establishing the session, the "Restart State" bit in the In re-establishing the session, the "Restart State" bit in the
Graceful Restart Capability of the OPEN message sent by the Receiving Graceful Restart Capability of the OPEN message sent by the Receiving
Speaker MUST NOT be set unless the Receiving Speaker has restarted. Speaker MUST NOT be set unless the Receiving Speaker has restarted.
The presence and the setting of the "Forwarding State" bit for an The presence and the setting of the "Forwarding State" bit for an
address family depends upon the actual forwarding state and address family depend upon the actual forwarding state and
configuration. configuration.
If the session does not get re-established within the "Restart Time" If the session does not get re-established within the "Restart Time"
that the peer advertised previously, the Receiving Speaker MUST that the peer advertised previously, the Receiving Speaker MUST
delete all the stale routes from the peer that it is retaining. delete all the stale routes from the peer that it is retaining.
A BGP speaker could have some way of determining whether its peer's A BGP speaker could have some way of determining whether its peer's
forwarding state is still viable, for example through [BFD] or forwarding state is still viable, for example through Bidirectional
through monitoring layer two information. Specifics of such Forwarding Detection [BFD] or through monitoring layer two
mechanisms are beyond the scope of this document. In the event that information. Specifics of such mechanisms are beyond the scope of
it determines that its peer's forwarding state is not viable prior to this document. In the event that it determines that its peer's
the re-establishment of the session, the speaker MAY delete all the forwarding state is not viable prior to the re-establishment of the
stale routes from the peer that it is retaining. session, the speaker MAY delete all the stale routes from the peer
that it is retaining.
Once the session is re-established, if the "Forwarding State" bit for Once the session is re-established, if the "Forwarding State" bit for
a specific address family is not set in the newly received Graceful a specific address family is not set in the newly received Graceful
Restart Capability, or if a specific address family is not included Restart Capability, or if a specific address family is not included
in the newly received Graceful Restart Capability, or if the Graceful in the newly received Graceful Restart Capability, or if the Graceful
Restart Capability isn't received in the re-established session at Restart Capability is not received in the re-established session at
all, then Receiving Speaker MUST immediately remove all the stale all, then the Receiving Speaker MUST immediately remove all the stale
routes from the peer that it is retaining for that address family. routes from the peer that it is retaining for that address family.
The Receiving Speaker MUST send the End-of-RIB marker once it The Receiving Speaker MUST send the End-of-RIB marker once it
completes the initial update for an address family (including the completes the initial update for an address family (including the
case that it has no routes to send) to the peer. case that it has no routes to send) to the peer.
The Receiving Speaker MUST replace the stale routes by the routing The Receiving Speaker MUST replace the stale routes by the routing
updates received from the peer. Once the End-of-RIB marker for an updates received from the peer. Once the End-of-RIB marker for an
address family is received from the peer, it MUST immediately remove address family is received from the peer, it MUST immediately remove
any routes from the peer that are still marked as stale for that any routes from the peer that are still marked as stale for that
address family. address family.
To put an upper bound on the amount of time a router retains the To put an upper bound on the amount of time a router retains the
stale routes, an implementation MAY support a (configurable) timer stale routes, an implementation MAY support a (configurable) timer
that imposes this upper bound. that imposes this upper bound.
6. Changes to BGP Finite State Machine 5. Changes to BGP Finite State Machine
As mentioned under "Procedures for the Receiving Speaker" above, this As mentioned under "Procedures for the Receiving Speaker" above, this
specification modifies the BGP finite state machine. specification modifies the BGP finite state machine.
The specific state machine modifications to [BGP-4] Section 8.2.2 are The specific state machine modifications to [BGP-4], Section 8.2.2,
as follows. are as follows.
In the Idle state, make the following changes. In the Idle state, make the following changes.
Replace this text: Replace this text:
- initializes all BGP resources for the peer connection, - initializes all BGP resources for the peer connection,
with with
- initializes all BGP resources for the peer connection, other - initializes all BGP resources for the peer connection, other
than those resources required in order to retain routes according than those resources required in order to retain routes
to section "Procedures for the Receiving Speaker" of this according to section "Procedures for the Receiving Speaker" of
(Graceful Restart) specification, this (Graceful Restart) specification,
In the Established state, make the following changes. In the Established state, make the following changes.
Replace this text: Replace this text:
In response to an indication that the TCP connection is In response to an indication that the TCP connection is
successfully established (Event 16 or Event 17), the second successfully established (Event 16 or Event 17), the second
connection SHALL be tracked until it sends an OPEN message. connection SHALL be tracked until it sends an OPEN message.
with with
If the Graceful Restart Capability with one or more AFIs/SAFIs
has not been received for the session, then in response to an
indication that a TCP connection is successfully established
(Event 16 or Event 17), the second connection SHALL be tracked
until it sends an OPEN message.
If the Graceful Restart capability with one or more AFI/SAFI has However, if the Graceful Restart Capability with one or more
not been received for the session, then in response to an AFIs/SAFIs has been received for the session, then in response
indication that a TCP connection is successfully established to Event 16 or Event 17 the local system:
(Event 16 or Event 17), the second connection SHALL be tracked
until it sends an OPEN message.
However, if the Graceful Restart capability with one or more
AFI/SAFI has been received for the session, then in response to
Event 16 or Event 17 the local system:
- retains all routes associated with this connection according - retains all routes associated with this connection according
to section "Procedures for the Receiving Speaker" of this to section "Procedures for the Receiving Speaker" of this
(Graceful Restart) specification, (Graceful Restart) specification,
- releases all other BGP resources, - releases all other BGP resources,
- drops the TCP connection associated with the ESTABLISHED - drops the TCP connection associated with the ESTABLISHED
session, session,
- initializes all BGP resources for the peer connection, other - initializes all BGP resources for the peer connection, other
than those required in order to retain routes according to than those required in order to retain routes according to
section "Procedures for the Receiving Speaker" of this section "Procedures for the Receiving Speaker" of this
specification, specification,
- sets ConnectRetryCounter to zero, - sets ConnectRetryCounter to zero,
- starts the ConnectRetryTimer with the initial value, - starts the ConnectRetryTimer with the initial value, and
- changes its state to Connect. - changes its state to Connect.
Replace this text: Replace this text:
If the local system receives a NOTIFICATION message (Event 24 or If the local system receives a NOTIFICATION message (Event 24 or
Event 25), or a TcpConnectionFails (Event 18) from the underlying Event 25), or a TcpConnectionFails (Event 18) from the underlying
TCP, the local system: TCP, the local system:
- sets the ConnectRetryTimer to zero, - sets the ConnectRetryTimer to zero,
skipping to change at line 464 skipping to change at page 11, line 10
- increments the ConnectRetryCounter by 1, - increments the ConnectRetryCounter by 1,
- changes its state to Idle. - changes its state to Idle.
with with
If the local system receives a NOTIFICATION message (Event 24 or If the local system receives a NOTIFICATION message (Event 24 or
Event 25), or if the local system receives a TcpConnectionFails Event 25), or if the local system receives a TcpConnectionFails
(Event 18) from the underlying TCP and the Graceful Restart (Event 18) from the underlying TCP and the Graceful Restart
capability with one or more AFI/SAFI has not been received for the capability with one or more AFIs/SAFIs has not been received for
session, the local system: the session, the local system:
- sets the ConnectRetryTimer to zero, - sets the ConnectRetryTimer to zero,
- deletes all routes associated with this connection, - deletes all routes associated with this connection,
- releases all the BGP resources, - releases all the BGP resources,
- drops the TCP connection, - drops the TCP connection,
- increments the ConnectRetryCounter by 1, - increments the ConnectRetryCounter by 1, and
- changes its state to Idle. - changes its state to Idle.
However, if the local system receives a TcpConnectionFails (Event However, if the local system receives a TcpConnectionFails (Event
18) from the underlying TCP, and the Graceful Restart capability 18) from the underlying TCP, and the Graceful Restart Capability
with one or more AFI/SAFI has been received for the session, the with one or more AFIs/SAFIs has been received for the session, the
local system: local system:
- sets the ConnectRetryTimer to zero, - sets the ConnectRetryTimer to zero,
- retains all routes associated with this connection according - retains all routes associated with this connection according
to section "Procedures for the Receiving Speaker" of this to section "Procedures for the Receiving Speaker" of this
(Graceful Restart) specification, (Graceful Restart) specification,
- releases all other BGP resources, - releases all other BGP resources,
- drops the TCP connection, - drops the TCP connection,
- increments the ConnectRetryCounter by 1, - increments the ConnectRetryCounter by 1, and
- changes its state to Idle. - changes its state to Idle.
7. Deployment Considerations 6. Deployment Considerations
While the procedures described in this document would help minimize Although the procedures described in this document would help
the effect of routing flaps, it is noted, however, that when a BGP minimize the effect of routing flaps, it is noted that when a BGP
Graceful Restart capable router restarts, or if it restarts without Graceful Restart-capable router restarts, or if it restarts without
preserving its forwarding state (for example due to a power failure) preserving its forwarding state (e.g., due to a power failure), there
there is a potential for transient routing loops or blackholes in the is a potential for transient routing loops or blackholes in the
network if routing information changes before the involved routers network if routing information changes before the involved routers
complete routing updates and convergence. Also, depending on the complete routing updates and convergence. Also, depending on the
network topology, if not all IBGP speakers are Graceful Restart network topology, if not all IBGP speakers are Graceful Restart
capable, there could be an increased exposure to transient routing capable, there could be an increased exposure to transient routing
loops or blackholes when the Graceful Restart procedures are loops or blackholes when the Graceful Restart procedures are
exercised. exercised.
The Restart Time, the upper bound for retaining routes and the upper The Restart Time, the upper bound for retaining routes, and the upper
bound for deferring route selection may need to be tuned as more bound for deferring route selection may need to be tuned as more
deployment experience is gained. deployment experience is gained.
Finally, it is noted that the benefits of deploying BGP Graceful Finally, it is noted that the benefits of deploying BGP Graceful
Restart in an AS whose IGPs and BGP are tightly coupled (i.e., BGP Restart in an Autonomous System (AS) whose IGPs and BGP are tightly
and IGPs would both restart) and IGPs have no similar Graceful coupled (i.e., BGP and IGPs would both restart) and IGPs have no
Restart capability are reduced relative to the scenario where IGPs do similar Graceful Restart Capability are reduced relative to the
have similar Graceful Restart capability. scenario where IGPs do have similar Graceful Restart Capability.
8. Security Considerations 7. Security Considerations
Since with this proposal a new connection can cause an old one to be Since with this proposal a new connection can cause an old one to be
terminated, it might seem to open the door to denial of service terminated, it might seem to open the door to denial of service
attacks. However, it is noted that unauthenticated BGP is already attacks. However, it is noted that unauthenticated BGP is already
known to be vulnerable to denials of service through attacks on the known to be vulnerable to denials of service through attacks on the
TCP transport. The TCP transport is commonly protected through use TCP transport. The TCP transport is commonly protected through use
of [BGP-AUTH]. Such authentication will equally protect against of [BGP-AUTH]. Such authentication will equally protect against
denials of service through spurious new connections. denials of service through spurious new connections.
If an attacker is able to successfully open a TCP connection If an attacker is able to successfully open a TCP connection
impersonating a legitimate peer, the attacker's connection will impersonating a legitimate peer, the attacker's connection will
replace the legitimate one, potentially enabling the attacker to replace the legitimate one, potentially enabling the attacker to
advertise bogus routes. We note, however, that the window for such a advertise bogus routes. We note, however, that the window for such a
route insertion attack is small since through normal operation of the route insertion attack is small since through normal operation of the
protocol the legitimate peer would open a new connection, in turn protocol the legitimate peer would open a new connection, in turn
causing the attacker's connection to be terminated. Thus, this causing the attacker's connection to be terminated. Thus, this
attack devolves to a form of denial of service. attack devolves to a form of denial of service.
It is thus concluded that this proposal does not change the It is thus concluded that this proposal does not change the
underlying security model (and issues) of BGP-4. underlying security model (and issues) of BGP-4.
We also note that implementations may allow use of graceful restart We also note that implementations may allow use of graceful restart
to be controlled by configuration. If graceful restart is not to be controlled by configuration. If graceful restart is not
enabled, naturally the underlying security model of BGP-4 is enabled, naturally the underlying security model of BGP-4 is
unchanged. unchanged.
9. Intellectual Property Considerations 8. Acknowledgments
This section is taken from Section 5 of RFC 3668.
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at ietf-
ipr@ietf.org.
10. Copyright Notice
Copyright (C) The Internet Society (2006).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an The authors would like to thank Bruce Cole, Lars Eggert, Bill Fenner,
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS Eric Gray, Jeffrey Haas, Sam Hartman, Alvaro Retana, Pekka Savola
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET Naiming Shen, Satinder Singh, Mark Townsley, David Ward, Shane
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, Wright, and Alex Zinin for their review and comments.
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
11. IANA Considerations 9. IANA Considerations
This document defines a new BGP Capability - Graceful Restart This document defines a new BGP capability - Graceful Restart
Capability. The Capability Code for Graceful Restart Capability is Capability. The Capability Code for Graceful Restart Capability is
64. 64.
12. Acknowledgments 10. References
The authors would like to thank Bruce Cole, Lars Eggert, Bill Fenner,
Eric Gray Jeffrey Haas, Sam Hartman Alvaro Retana, Pekka Savola
Naiming Shen, Satinder Singh, Mark Townsley, David Ward, Shane Wright
and Alex Zinin for their review and comments.
13. Normative References 10.1. Normative References
[BGP-4] Rekhter, Y., T. Li, Hares, S., "A Border Gateway Protocol 4 [BGP-4] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
(BGP-4)", RFC4271, January 2006. Protocol 4 (BGP-4)", RFC 4271, January 2006.
[BGP-MP] Bates, T., Chandra, R., Katz, D., and Rekhter, Y., [BGP-MP] Bates, T., Rekhter, Y., Chandra, R., and D. Katz,
"Multiprotocol Extensions for BGP-4", RFC2858, June 2000. "Multiprotocol Extensions for BGP-4", RFC 2858, June
2000.
[BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with [BGP-CAP] Chandra, R. and J. Scudder, "Capabilities Advertisement
BGP-4", RFC3392, November 2002. with BGP-4", RFC 3392, November 2002.
[BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5 [BGP-AUTH] Heffernan, A., "Protection of BGP Sessions via the TCP
Signature Option", RFC 2385, August 1998. MD5 Signature Option", RFC 2385, August 1998.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[IANA-AFI] http://www.iana.org/assignments/address-family-numbers. [IANA-AFI] http://www.iana.org/assignments/address-family-numbers
[IANA-SAFI] http://www.iana.org/assignments/safi-namespace. [IANA-SAFI] http://www.iana.org/assignments/safi-namespace
14. Non-normative References 10.2. Informative References
[BFD] Katz, D., Ward, D., "Bidirectional Forwarding Detection", [BFD] Katz, D. and D. Ward, "Bidirectional Forwarding
draft-ietf-bfd-base-03.txt, work in progress Detection", Work in Progress.
15. Author Information Authors' Addresses
Srihari R. Sangli Srihari R. Sangli
Cisco Systems, Inc. Cisco Systems, Inc.
EMail: rsrihari@cisco.com EMail: rsrihari@cisco.com
Yakov Rekhter Yakov Rekhter
Juniper Networks, Inc. Juniper Networks, Inc.
EMail: yakov@juniper.net EMail: yakov@juniper.net
Rex Fernando Rex Fernando
e-mail: rex_f@yahoo.com Juniper Networks, Inc.
John G. Scudder EMail: rex@juniper.net
Cisco Systems, Inc.
EMail: jgs@cisco.com John G. Scudder
Juniper Networks, Inc.
EMail: jgs@juniper.net
Enke Chen Enke Chen
Cisco Systems, Inc. Cisco Systems, Inc.
EMail: enkechen@cisco.com EMail: enkechen@cisco.com
Full Copyright Statement
Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgement
Funding for the RFC Editor function is currently provided by the
Internet Society.
 End of changes. 97 change blocks. 
246 lines changed or deleted 219 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/