draft-ietf-idr-restart-03.txt   draft-ietf-idr-restart-04.txt 
Network Working Group Srihari R. Sangli (Procket Networks) Network Working Group Srihari R. Sangli (Procket Networks)
Internet Draft Yakov Rekhter (Juniper Networks) Internet Draft Yakov Rekhter (Juniper Networks)
Expiration Date: October 2002 Rex Fernando (Procket Networks) Expiration Date: December 2002 Rex Fernando (Procket Networks)
John G. Scudder (Cisco Systems) John G. Scudder (Cisco Systems)
Enke Chen (Redback Networks) Enke Chen (Redback Networks)
Graceful Restart Mechanism for BGP Graceful Restart Mechanism for BGP
draft-ietf-idr-restart-03.txt draft-ietf-idr-restart-04.txt
1. Status of this Memo 1. Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
skipping to change at page 2, line 28 skipping to change at page 2, line 28
the negative effects on routing caused by BGP restart. An End-of-RIB the negative effects on routing caused by BGP restart. An End-of-RIB
marker is specified and can be used to convey routing convergence marker is specified and can be used to convey routing convergence
information. A new BGP capability, termed "Graceful Restart information. A new BGP capability, termed "Graceful Restart
Capability", is defined which would allow a BGP speaker to express Capability", is defined which would allow a BGP speaker to express
its ability to preserve forwarding state during BGP restart. Finally, its ability to preserve forwarding state during BGP restart. Finally,
procedures are outlined for temporarily retaining routing information procedures are outlined for temporarily retaining routing information
across a TCP transport reset. across a TCP transport reset.
4. Marker for End-of-RIB 4. Marker for End-of-RIB
An UPDATE message with empty withdrawn NLRI is specified as the End- An UPDATE message with no reachable NLRI and empty withdrawn NLRI is
Of-RIB Marker that can be used by a BGP speaker to indicate to its specified as the End-Of-RIB Marker that can be used by a BGP speaker
peer the completion of the initial routing update after the session to indicate to its peer the completion of the initial routing update
is established. For IPv4 unicast address family, the End-Of-RIB after the session is established. For IPv4 unicast address family,
Marker is an UPDATE message with the minimum length [BGP-4]. For any the End-Of-RIB Marker is an UPDATE message with the minimum length
other address family, it is an UPDATE message that contains only [BGP-4]. For any other address family, it is an UPDATE message that
MP_UNREACH_NLRI [BGP-MP] with no withdrawn routes for that <AFI, Sub- contains only the MP_UNREACH_NLRI attribute [BGP-MP] with no
AFI>. withdrawn routes for that <AFI, SAFI>.
Although the End-of-RIB Marker is specified for the purpose of BGP Although the End-of-RIB Marker is specified for the purpose of BGP
graceful restart, it is noted that the generation of such a marker graceful restart, it is noted that the generation of such a marker
upon completion of the initial update would be useful for routing upon completion of the initial update would be useful for routing
convergence in general, and thus the practice is recommended. convergence in general, and thus the practice is recommended.
In addition, it would be beneficial for routing convergence if a BGP In addition, it would be beneficial for routing convergence if a BGP
speaker can indicate to its peer up-front that it will generate the speaker can indicate to its peer up-front that it will generate the
End-Of-RIB marker, regardless of its ability to preserve its End-Of-RIB marker, regardless of its ability to preserve its
forwarding state during BGP restart. This can be accomplished using forwarding state during BGP restart. This can be accomplished using
skipping to change at page 3, line 19 skipping to change at page 3, line 19
its forwarding state during BGP restart. It can also be used to its forwarding state during BGP restart. It can also be used to
convey to its peer its intention of generating the End-Of-RIB marker convey to its peer its intention of generating the End-Of-RIB marker
upon the completion of its initial routing updates. upon the completion of its initial routing updates.
This capability is defined as follows: This capability is defined as follows:
Capability code: 64 Capability code: 64
Capability length: variable Capability length: variable
Capability value: Consists of the "Restart Flags" field, Capability value: Consists of the "Restart Flags" field, "Restart
"Restart Time" field, and zero or more of the tuples <AFI, Time" field, and zero or more of the tuples <AFI, SAFI, Flags for
Sub-AFI, Flags for address family> as follows. address family> as follows:
+--------------------------------------------------+ +--------------------------------------------------+
| Restart Flags (4 bits) | | Restart Flags (4 bits) |
+--------------------------------------------------+ +--------------------------------------------------+
| Restart Time in seconds (12 bits) | | Restart Time in seconds (12 bits) |
+--------------------------------------------------+ +--------------------------------------------------+
| Address Family Identifier (16 bits) | | Address Family Identifier (16 bits) |
+--------------------------------------------------+ +--------------------------------------------------+
| Subsequent Address Family Identifier (8 bits) | | Subsequent Address Family Identifier (8 bits) |
+--------------------------------------------------+ +--------------------------------------------------+
skipping to change at page 3, line 49 skipping to change at page 3, line 49
+--------------------------------------------------+ +--------------------------------------------------+
| Flags for Address Family (8 bits) | | Flags for Address Family (8 bits) |
+--------------------------------------------------+ +--------------------------------------------------+
The use and meaning of the fields are as follows: The use and meaning of the fields are as follows:
Restart Flags: Restart Flags:
This field contains bit flags related to restart. This field contains bit flags related to restart.
The most significant bit is defined as the Restart State bit 0 1 2 3
which can be used to avoid possible deadlock caused by waiting +-+-+-+-+
for the End-of-RIB marker when multiple BGP speakers peering |R|Resv.|
with each other restart. When set (value 1), this bit indicates +-+-+-+-+
that the BGP speaker has restarted, and its peer should not wait The most significant bit is defined as the Restart State (R)
for the End-of-RIB marker from the speaker before advertising bit which can be used to avoid possible deadlock caused by
routing information to the speaker. waiting for the End-of-RIB marker when multiple BGP speakers
peering with each other restart. When set (value 1), this bit
indicates that the BGP speaker has restarted, and its peer
should not wait for the End-of-RIB marker from the speaker
before advertising routing information to the speaker.
The remaining bits are reserved. The remaining bits are reserved, and should be set to zero by
the sender and ignored by the receiver.
Restart Time: Restart Time:
This is the estimated time (in seconds) it will take for the BGP This is the estimated time (in seconds) it will take for the
session to be re-established after a restart. This can be used to BGP session to be re-established after a restart. This can be
speed up routing convergence by its peer in case that the BGP used to speed up routing convergence by its peer in case that
speaker does not come back after a restart. the BGP speaker does not come back after a restart.
Address Family Identifier (AFI): Address Family Identifier (AFI):
This field carries the identity of the Network Layer protocol This field carries the identity of the Network Layer protocol
for which the Graceful Restart support is advertised. Presently for which the Graceful Restart support is advertised. Presently
defined values for this field are specified in RFC1700 (see defined values for this field are specified in [IANA-AFI].
the Address Family Numbers section).
Subsequent Address Family Identifier (Sub-AFI): Subsequent Address Family Identifier (SAFI):
This field provides additional information about the type of This field provides additional information about the type of
the Network Layer Reachability Information carried in the the Network Layer Reachability Information carried in the
attribute. attribute. Presently defined values for this field are
specified in [IANA-SAFI].
Flags for Address Family: Flags for Address Family:
This field contains bit flags for the <AFI, Sub-AFI>. This field contains bit flags for the <AFI, SAFI>.
The most significant bit is defined as the Forwarding State 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|F| Reserved |
+-+-+-+-+-+-+-+-+
The most significant bit is defined as the Forwarding State (F)
bit which can be used to indicate if the forwarding state for bit which can be used to indicate if the forwarding state for
the <AFI, Sub-AFI> has indeed been preserved during the previous the <AFI, SAFI> has indeed been preserved during the previous
BGP restart. When set (value 1), the bit indicates that the BGP restart. When set (value 1), the bit indicates that the
forwarding state has been preserved. forwarding state has been preserved.
The remaining bits are reserved. The remaining bits are reserved, and should be set to zero by
the sender and ignored by the receiver.
The advertisement of this capability by a BGP speaker also implies When a sender of this capability doesn't include any <AFI, SAFI> in
that it will generate the End-of-RIB marker (for all address families the capability, it means that the sender is not capable of preserving
exchanged) upon completion of its initial routing update to its peer. its forwarding state during BGP restart, but is going to generate the
The value of the "Restart Time" field is irrelevant in the case that End-of-RIB marker upon the completion of its initial routing updates.
the capability does not carry any <AFI, Sub-AFI>. The value of the "Restart Time" field is irrelevant in that case.
A BGP speaker should not include more than one instance of the
Graceful Restart Capability in the capability advertisement [BGP-
CAP]. If more than one instance of the Graceful Restart Capability
is carried in the capability advertisement, the receiver of the
advertisement should ignore all but the last instance of the Graceful
Restart Capability.
Including <AFI=IPv4, SAFI=unicast> into the Graceful Restart
Capability doesn't imply that the IPv4 unicast routing information
should be carried by using the BGP Multiprotocol extensions [BGP-MP]
- it could be carried in the NLRI field of the BGP UPDATE message.
6. Operation 6. Operation
A BGP speaker may advertise the Graceful Restart Capability for an A BGP speaker may advertise the Graceful Restart Capability for an
address family to its peer only if it has the ability to preserve its address family to its peer if it has the ability to preserve its
forwarding state for the address family when BGP restarts. forwarding state for the address family when BGP restarts. In
addition, even if the speaker does not have the ability to preserve
Even if the speaker does not have the ability to preserve its its forwarding state for any address family during BGP restart, it is
forwarding state for any address family during BGP restart, it is
still recommended that the speaker advertise the Graceful Restart still recommended that the speaker advertise the Graceful Restart
Capability to its peer to indicate its intention of generating the Capability to its peer to indicate its intention of generating the
End-of-RIB marker upon the completion of its initial routing updates. End-of-RIB marker upon the completion of its initial routing updates
(as mentioned before this is done by not including any <AFI, SAFI> in
the advertised capability), as doing this would be useful for routing
convergence in general.
The End-of-RIB marker should be sent by a BGP speaker to its peer The End-of-RIB marker should be sent by a BGP speaker to its peer
once it completes the initial routing update (including the case when once it completes the initial routing update (including the case when
there is no update to send) for an address family after the BGP there is no update to send) for an address family after the BGP
session is established. session is established.
It is noted that the normal BGP procedures MUST be followed when the It is noted that the normal BGP procedures must be followed when the
TCP session terminates due to the sending or receiving of a BGP TCP session terminates due to the sending or receiving of a BGP
NOTIFICATION message. NOTIFICATION message.
In general the Restart Time SHOULD NOT be greater than the HOLDTIME In general the Restart Time should not be greater than the HOLDTIME
carried in the OPEN. carried in the OPEN.
In the following sections, "Restarting Speaker" refers to a router In the following sections, "Restarting Speaker" refers to a router
whose BGP has restarted, and "Receiving Speaker" refers to a router whose BGP has restarted, and "Receiving Speaker" refers to a router
that peers with the restarting speaker. that peers with the restarting speaker.
Consider that the Graceful Restart Capability for an address family Consider that the Graceful Restart Capability for an address family
is advertised by the Restarting Speaker, and is understood by the is advertised by the Restarting Speaker, and is understood by the
Receiving Speaker, and a BGP session between them is established. Receiving Speaker, and a BGP session between them is established.
The following sections detail the procedures that shall be followed The following sections detail the procedures that shall be followed
skipping to change at page 6, line 31 skipping to change at page 6, line 51
of the speaker shall be updated and any previously marked stale of the speaker shall be updated and any previously marked stale
information shall be removed. The Adj-RIB-Out can then be advertised information shall be removed. The Adj-RIB-Out can then be advertised
to its peers. Once the initial update is complete for an address to its peers. Once the initial update is complete for an address
family (including the case that there is no routing update to send), family (including the case that there is no routing update to send),
the End-of-RIB marker shall be sent. the End-of-RIB marker shall be sent.
To put an upper bound on the amount of time a router defers its route To put an upper bound on the amount of time a router defers its route
selection, an implementation must support a (configurable) timer that selection, an implementation must support a (configurable) timer that
imposes this upper bound. imposes this upper bound.
If one wants to apply graceful restart only when the restart is
planned (as opposed to both planned and unplanned restart), then one
way to accomplish this would be to set the Forwarding State bit to 1
after a planned restart, and to 0 in all other cases. Other
approaches to accomplish this are outside the scope of this document.
6.2. Procedures for the Receiving Speaker 6.2. Procedures for the Receiving Speaker
When the Restarting Speaker restarts, the Receiving Speaker may or When the Restarting Speaker restarts, the Receiving Speaker may or
may not detect the termination of the TCP session with the Restarting may not detect the termination of the TCP session with the Restarting
Speaker, depending on the underlying TCP implementation, whether or Speaker, depending on the underlying TCP implementation, whether or
not [BGP-AUTH] is in use, and the specific circumstances of the not [BGP-AUTH] is in use, and the specific circumstances of the
restart. In case it does not detect the TCP reset and still restart. In case it does not detect the TCP reset and still
considers the BGP session as being established, it shall treat the considers the BGP session as being established, it shall treat the
subsequent open connection from the peer as an indication of TCP subsequent open connection from the peer as an indication of TCP
reset and act accordingly (when the Graceful Restart Capabilty has reset and act accordingly (when the Graceful Restart Capability has
been received from the peer). been received from the peer).
"Acting accordingly" in this context means that the previous TCP
session should be closed, and the new one retained. Note that this
behavior differs from the default behavior, as specified in [BGP-4]
section 6.8. Since the previous connection is considered to be
reset, no NOTIFICATION message should be sent -- the previous TCP
session is simply closed.
When the Receiving Speaker detects TCP reset for a BGP session with a When the Receiving Speaker detects TCP reset for a BGP session with a
peer that has advertised the Graceful Restart Capability, it shall peer that has advertised the Graceful Restart Capability, it shall
retain the routes received from the peer for all the address families retain the routes received from the peer for all the address families
that were previously received in the Graceful Restart Capability, and that were previously received in the Graceful Restart Capability, and
shall mark them as stale routing information. To deal with possible shall mark them as stale routing information. To deal with possible
consecutive restarts, a route (from the peer) previously marked as consecutive restarts, a route (from the peer) previously marked as
stale shall be deleted. The router should not differentiate between stale shall be deleted. The router should not differentiate between
stale and other routing information during forwarding. stale and other routing information during forwarding.
In re-establishing the session, the "Restart State" bit in the In re-establishing the session, the "Restart State" bit in the
skipping to change at page 7, line 17 skipping to change at page 7, line 49
Speaker shall not be set unless the Receiving Speaker has restarted. Speaker shall not be set unless the Receiving Speaker has restarted.
The presence and the setting of the "Forwarding State" bit for an The presence and the setting of the "Forwarding State" bit for an
address family depends upon the actual forwarding state and address family depends upon the actual forwarding state and
configuration. configuration.
If the session does not get re-established within the "Restart Time" If the session does not get re-established within the "Restart Time"
that the peer advertised previously, the Receiving Speaker shall that the peer advertised previously, the Receiving Speaker shall
delete all the stale routes from the peer that it is retaining. delete all the stale routes from the peer that it is retaining.
Once the session is re-established, if the "Forwarding State" bit for Once the session is re-established, if the "Forwarding State" bit for
an address family is not set in the received Graceful Restart a specific address family is not set in the newly received Graceful
Capability, or if the capability is not received for an address Restart Capability, or if a specific address family is not included
family, the Receiving Speaker shall immediately remove all the stale in the newly received Graceful Restart Capability, or if the Graceful
Restart Capability isn't received in the re-established session at
all, then Receiving Speaker shall immediately remove all the stale
routes from the peer that it is retaining for that address family. routes from the peer that it is retaining for that address family.
The Receiving Speaker shall send the End-of-RIB marker once it The Receiving Speaker shall send the End-of-RIB marker once it
completes the initial update for an address family (including the completes the initial update for an address family (including the
case that it has no routes to send) to the peer. case that it has no routes to send) to the peer.
The Receiving Speaker shall replace the stale routes by the routing The Receiving Speaker shall replace the stale routes by the routing
updates received from the peer. Once the End-of-RIB marker for an updates received from the peer. Once the End-of-RIB marker for an
address family is received from the peer, it shall immediately remove address family is received from the peer, it shall immediately remove
any routes from the peer that are still marked as stale for that any routes from the peer that are still marked as stale for that
address family. address family.
To put an upper bound on the amount of time a router retains the To put an upper bound on the amount of time a router retains the
stale routes, an implementation may support a (configurable) timer stale routes, an implementation may support a (configurable) timer
that imposes this upper bound. that imposes this upper bound.
7. Deployment Considerations 7. Deployment Considerations
While the procedures described in this document would help minimize While the procedures described in this document would help minimize
the effect of routing flaps, it is noted, however, that when a BGP the effect of routing flaps, it is noted, however, that when a BGP
Graceful-Restart capable router restarts, there is a potential for Graceful Restart capable router restarts, there is a potential for
transient routing loops or blackholes in the network if routing transient routing loops or blackholes in the network if routing
information changes before the involved routers complete routing information changes before the involved routers complete routing
updates and convergence. Also, depending on the network topology, if updates and convergence. Also, depending on the network topology, if
not all IBGP speakers are Graceful-Restart capable, there could be an not all IBGP speakers are Graceful Restart capable, there could be an
increased exposure to transient routing loops or blackholes when the increased exposure to transient routing loops or blackholes when the
Graceful-Restart procedures are exercised. Graceful Restart procedures are exercised.
The Restart Time, the upper bound for retaining routes and the upper The Restart Time, the upper bound for retaining routes and the upper
bound for deferring route selection may need to be tuned as more bound for deferring route selection may need to be tuned as more
deployment experience is gained. deployment experience is gained.
Finally, it is noted that there is little benefit deploying BGP Finally, it is noted that the benefits of deploying BGP Graceful
Graceful-Restart in an AS whose IGPs and BGP are tightly coupled Restart in an AS whose IGPs and BGP are tightly coupled (i.e., BGP
(i.e., BGP and IGPs would both restart), and IGPs have no similar and IGPs would both restart) and IGPs have no similar Graceful
Graceful-Restart capability. Restart capability are reduced relative to the scenario where IGPs do
have similar Graceful Restart capability.
8. Security Considerations 8. Security Considerations
Since with this proposal a new connection can cause an old one to be Since with this proposal a new connection can cause an old one to be
terminated, it might seem to open the door to denial of service terminated, it might seem to open the door to denial of service
attacks. However, it is noted that unauthenticated BGP is already attacks. However, it is noted that unauthenticated BGP is already
known to be vulnerable to denials of service through attacks on the known to be vulnerable to denials of service through attacks on the
TCP transport. The TCP transport is commonly protected through use TCP transport. The TCP transport is commonly protected through use
of [BGP-AUTH]. Such authentication will equally protect against of [BGP-AUTH]. Such authentication will equally protect against
denials of service through spurious new connections. denials of service through spurious new connections.
It is thus concluded that this proposal does not change the It is thus concluded that this proposal does not change the
underlying security model (and issues) of BGP-4. underlying security model (and issues) of BGP-4.
9. Acknowledgments 9. Acknowledgments
The authors would like to thank Alvaro Retana, Satinder Singh, David The authors would like to thank Bruce Cole, Bill Fenner, Eric Gray
Ward, Naiming Shen and Bruce Cole for their review and comments. Jeffrey Haas, Alvaro Retana, Naiming Shen, Satinder Singh, David
Ward, Shane Wright and Alex Zinin for their review and comments.
10. References 10. References
[BGP-4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP- [BGP-4] Rekhter, Y., and T. Li, "A Border Gateway Protocol 4 (BGP-
4)", RFC 1771, March 1995. 4)", RFC 1771, March 1995.
[BGP-MP] Bates, T., Chandra, R., Katz, D., and Rekhter, Y., [BGP-MP] Bates, T., Chandra, R., Katz, D., and Rekhter, Y.,
"Multiprotocol Extensions for BGP-4", RFC 2283, March 1998. "Multiprotocol Extensions for BGP-4", RFC2858, June 2000.
[BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with [BGP-CAP] Chandra, R., Scudder, J., "Capabilities Advertisement with
BGP-4", RFC 2842, May 2000. BGP-4", draft-ietf-idr-rfc2842bis-02.txt, April 2002.
[BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5 [BGP-AUTH] Heffernan A., "Protection of BGP Sessions via the TCP MD5
Signature Option", RFC 2385, August 1998. Signature Option", RFC 2385, August 1998.
[IANA-AFI] http://www.iana.org/assignments/address-family-numbers.
[IANA-SAFI] http://www.iana.org/assignments/safi-namespace.
11. Author Information 11. Author Information
Srihari R. Sangli Srihari R. Sangli
Procket Networks, Inc. Procket Networks, Inc.
1100 Cadillac Court 1100 Cadillac Court
Milpitas, CA 95035 Milpitas, CA 95035
e-mail: srihari@procket.com e-mail: srihari@procket.com
Yakov Rekhter Yakov Rekhter
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/