draft-ietf-rift-applicability-02.txt   draft-ietf-rift-applicability-03.txt 
RIFT WG Yuehua. Wei, Ed. RIFT WG Yuehua. Wei, Ed.
Internet-Draft Zheng. Zhang Internet-Draft Zheng. Zhang
Intended status: Informational ZTE Corporation Intended status: Informational ZTE Corporation
Expires: 12 April 2021 Dmitry. Afanasiev Expires: 16 April 2021 Dmitry. Afanasiev
Yandex Yandex
Tom. Verhaeg Tom. Verhaeg
Juniper Networks Juniper Networks
Jaroslaw. Kowalczyk Jaroslaw. Kowalczyk
Orange Polska Orange Polska
P. Thubert P. Thubert
Cisco Systems Cisco Systems
9 October 2020 13 October 2020
RIFT Applicability RIFT Applicability
draft-ietf-rift-applicability-02 draft-ietf-rift-applicability-03
Abstract Abstract
This document discusses the properties, applicability and operational This document discusses the properties, applicability and operational
considerations of RIFT in different network scenarios. It intends to considerations of RIFT in different network scenarios. It intends to
provide a rough guide how RIFT can be deployed to simplify routing provide a rough guide how RIFT can be deployed to simplify routing
operations in Clos topologies and their variations. operations in Clos topologies and their variations.
Status of This Memo Status of This Memo
skipping to change at page 1, line 41 skipping to change at page 1, line 41
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 12 April 2021. This Internet-Draft will expire on 16 April 2021.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 4, line 13 skipping to change at page 4, line 13
and RIFT protocol concepts. and RIFT protocol concepts.
3.1. Overview of RIFT 3.1. Overview of RIFT
RIFT is a dynamic routing protocol for Clos and fat-tree network RIFT is a dynamic routing protocol for Clos and fat-tree network
topologies. It defines a link-state protocol when "pointing north" topologies. It defines a link-state protocol when "pointing north"
and path-vector protocol when "pointing south". and path-vector protocol when "pointing south".
It floods flat link-state information northbound only so that each It floods flat link-state information northbound only so that each
level obtains the full topology of levels south of it. That level obtains the full topology of levels south of it. That
information is never flooded East-West or back South again. So a top information is never flooded east-west or back South again. So a top
tier node has full set of prefixes from the SPF calculation. tier node has full set of prefixes from the SPF calculation.
In the southbound direction the protocol operates like a "fully In the southbound direction the protocol operates like a "fully
summarizing, unidirectional" path vector protocol or rather a summarizing, unidirectional" path vector protocol or rather a
distance vector with implicit split horizon whereas the information distance vector with implicit split horizon whereas the information
propagates one hop south and is 're-advertised' by nodes at next propagates one hop south and is 're-advertised' by nodes at next
lower level, normally just the default route. lower level, normally just the default route.
+-----------+ +-----------+ +-----------+ +-----------+
| ToF | | ToF | LEVEL 2 | ToF | | ToF | LEVEL 2
+ +-----+--+--+ +-+--+------+ + +-----+--+--+ +-+--+------+
| | | | | | | | | ^ | | | | | | | | | ^
+ | | | +-------------------------+ | + | | | +-------------------------+ |
Distance | +-------------------+ | | | | | Distance | +-------------------+ | | | | |
Vector | | | | | | | | + Vector | | | | | | | | +
South | | | | +--------+ | | | Link+State South | | | | +--------+ | | | Link-state
+ | | | | | | | | Flooding + | | | | | | | | Flooding
| | | +-------------+ | | | North | | | +-------------+ | | | North
v | | | | | | | | + v | | | | | | | | +
+-+--+-+ +------+ +-------+ +--+--+-+ | +-+--+-+ +------+ +-------+ +--+--+-+ |
|SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1 |SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1
+ ++----++ ++---+-+ +--+--+-+ ++----+-+ | + ++----++ ++---+-+ +--+--+-+ ++----+-+ |
+ | | | | | | | | | ^ N + | | | | | | | | | ^ N
Distance | +-------+ | | +--------+ | | | E Distance | +-------+ | | +--------+ | | | E
Vector | | | | | | | | | +------> Vector | | | | | | | | | +------>
South | +-------+ | | | +-------+ | | | | South | +-------+ | | | +-------+ | | | |
skipping to change at page 4, line 51 skipping to change at page 4, line 51
v ++--++ +-+-++ ++-+-+ +-+--++ + v ++--++ +-+-++ ++-+-+ +-+--++ +
|LEAF| |LEAF| |LEAF| |LEAF | LEVEL 0 |LEAF| |LEAF| |LEAF| |LEAF | LEVEL 0
+----+ +----+ +----+ +-----+ +----+ +----+ +----+ +-----+
Figure 1: Rift overview Figure 1: Rift overview
A middle tier node has only information necessary for its level, A middle tier node has only information necessary for its level,
which are all destinations south of the node based on SPF which are all destinations south of the node based on SPF
calculation, default route and potential disaggregated routes. calculation, default route and potential disaggregated routes.
RIFT combines the advantage of both Link-State and Distance Vector: RIFT combines the advantage of both link-state and distance vector:
* Fastest Possible Convergence * Fastest Possible Convergence
* Automatic Detection of Topology * Automatic Detection of Topology
* Minimal Routes/Info on TORs * Minimal Routes/Info on TORs
* High Degree of ECMP * High Degree of ECMP
* Fast De-commissioning of Nodes * Fast De-commissioning of Nodes
* Maximum Propagation Speed with Flexible Prefixes in an Update * Maximum Propagation Speed with Flexible Prefixes in an Update
And RIFT eliminates the disadvantages of Link-State or Distance And RIFT eliminates the disadvantages of link-state or distance
Vector: vector:
* Reduced and Balanced Flooding * Reduced and Balanced Flooding
* Automatic Neighbor Detection * Automatic Neighbor Detection
So there are two types of link state database which are "north So there are two types of link-state database which are "north
representation" N-TIEs and "south representation" S-TIEs. The N-TIEs representation" N-TIEs and "south representation" S-TIEs. The N-TIEs
contain a link state topology description of lower levels and S-TIEs contain a link-state topology description of lower levels and S-TIEs
carry simply default routes for the lower levels. carry simply default routes for the lower levels.
There are a bunch of more advantages unique to RIFT listed below There are a bunch of more advantages unique to RIFT listed below
which could be understood if you read the details of [RIFT]. which could be understood if you read the details of [RIFT].
* True ZTP * True ZTP
* Minimal Blast Radius on Failures * Minimal Blast Radius on Failures
* Can Utilize All Paths Through Fabric Without Looping * Can Utilize All Paths Through Fabric Without Looping
skipping to change at page 6, line 15 skipping to change at page 6, line 15
3.2. Applicable Topologies 3.2. Applicable Topologies
Albeit RIFT is specified primarily for "proper" Clos or "fat-tree" Albeit RIFT is specified primarily for "proper" Clos or "fat-tree"
structures, it already supports PoD concepts which are strictly structures, it already supports PoD concepts which are strictly
speaking not found in original Clos concepts. speaking not found in original Clos concepts.
Further, the specification explains and supports operations of multi- Further, the specification explains and supports operations of multi-
plane Clos variants where the protocol relies on set of rings to plane Clos variants where the protocol relies on set of rings to
allow the reconciliation of topology view of different planes as most allow the reconciliation of topology view of different planes as most
desirable solution making proper disaggregation viable in case of desirable solution making proper disaggregation viable in case of
failures. This observations hold not only in case of RIFT but in the failures. These observations hold not only in case of RIFT but in
generic case of dynamic routing on Clos variants with multiple planes the generic case of dynamic routing on Clos variants with multiple
and failures in bi-sectional bandwidth, especially on the leafs. planes and failures in bi-sectional bandwidth, especially on the
leafs.
3.2.1. Horizontal Links 3.2.1. Horizontal Links
RIFT is not limited to pure Clos divided into PoD and multi-planes RIFT is not limited to pure Clos divided into PoD and multi-planes
but supports horizontal links below the top of fabric level. Those but supports horizontal links below the top of fabric level. Those
links are used however only as routes of last resort northbound when links are used however only as routes of last resort northbound when
a spine loses all northbound links or cannot compute a default route a spine loses all northbound links or cannot compute a default route
through them. through them.
A possible configuration is a "ring" of horizontal links at a level. A possible configuration is a "ring" of horizontal links at a level.
skipping to change at page 7, line 8 skipping to change at page 7, line 8
implementations can be extended to support vertical "shortcuts" as implementations can be extended to support vertical "shortcuts" as
proposed by e.g. [I-D.white-distoptflood]. The RIFT specification proposed by e.g. [I-D.white-distoptflood]. The RIFT specification
itself does not provide the exact details since the resulting itself does not provide the exact details since the resulting
solution suffers from either much larger blast radius with increased solution suffers from either much larger blast radius with increased
flooding volumes or in case of maximum aggregation routing bow-tie flooding volumes or in case of maximum aggregation routing bow-tie
problems. problems.
3.2.3. Generalizing to any Directed Acyclic Graph 3.2.3. Generalizing to any Directed Acyclic Graph
RIFT is an anisotropic routing protocol, meaning that it has a sense RIFT is an anisotropic routing protocol, meaning that it has a sense
of direction (Northbound, Southbound, East-West) and that it operates of direction (northbound, southbound, east-west) and that it operates
differently depending on the direction. differently depending on the direction.
* Northbound, RIFT operates as a Link State IGP, whereby the control * Northbound, RIFT operates as a link-state IGP, whereby the control
packets are reflooded first all the way North and only interpreted packets are reflooded first all the way North and only interpreted
later. All the individual fine grained routes are advertised. later. All the individual fine grained routes are advertised.
* Southbound, RIFT operates as a Distance Vector IGP, whereby the * Southbound, RIFT operates as a distance vector IGP, whereby the
control packets are flooded only one hop, interpreted, and the control packets are flooded only one hop, interpreted, and the
consequence of that computation is what gets flooded on more hop consequence of that computation is what gets flooded on more hop
South. In the most common use-cases, a ToF node can reach most of South. In the most common use-cases, a ToF node can reach most of
the prefixes in the fabric. If that is the case, the ToF node the prefixes in the fabric. If that is the case, the ToF node
advertises the fabric default and disaggregates the prefixes that advertises the fabric default and disaggregates the prefixes that
it cannot reach. On the other hand, a ToF Node that can reach it cannot reach. On the other hand, a ToF Node that can reach
only a small subset of the prefixes in the fabric will preferably only a small subset of the prefixes in the fabric will preferably
advertise those prefixes and refrain from aggregating. advertise those prefixes and refrain from aggregating.
In the general case, what gets advertised South is in more In the general case, what gets advertised South is in more
details: details:
1. A fabric default that aggregates all the prefixes that are 1. A fabric default that aggregates all the prefixes that are
reachable within the fabric, and that could be a default route reachable within the fabric, and that could be a default route
or a prefix that is dedicated to this particular fabric. or a prefix that is dedicated to this particular fabric.
2. The loopback addresses of the Northbound nodes, e.g., for 2. The loopback addresses of the northbound nodes, e.g., for
inband management. inband management.
3. The disaggregated prefixes for the dynamic exceptions to the 3. The disaggregated prefixes for the dynamic exceptions to the
fabric Default, advertised to route around the black hole that fabric Default, advertised to route around the black hole that
may form may form
* East-West routing can optionally be used, with specific * east-west routing can optionally be used, with specific
restrictions. It is useful in particular when a sibling has restrictions. It is useful in particular when a sibling has
access to the fabric default but this node does not. access to the fabric default but this node does not.
A Directed Acyclic Graph (DAG) provides a sense of North (the A Directed Acyclic Graph (DAG) provides a sense of North (the
direction of the DAG) and of South (the reverse), which can be used direction of the DAG) and of South (the reverse), which can be used
to apply RIFT. For the purpose of RIFT, an edge in the DAG that has to apply RIFT. For the purpose of RIFT, an edge in the DAG that has
only incoming vertices is a ToF node. only incoming vertices is a ToF node.
There are a number of caveats though: There are a number of caveats though:
* The DAG structure must exist before RIFT starts, so there is a * The DAG structure must exist before RIFT starts, so there is a
need for a companion protocol to establish the logical DAG need for a companion protocol to establish the logical DAG
structure. structure.
* A generic DAG does not have a sense of East and West. The * A generic DAG does not have a sense of east and west. The
operation specified for East-West links and the Southbound operation specified for east-west links and the southbound
reflection between nodes are not applicable. reflection between nodes are not applicable.
* In order to aggregate and disaggregate routes, RIFT requires that * In order to aggregate and disaggregate routes, RIFT requires that
all the ToF nodes share the full knowledge of the prefixes in the all the ToF nodes share the full knowledge of the prefixes in the
fabric. This can be achieved with a ring as suggested by the RIFT fabric. This can be achieved with a ring as suggested by the RIFT
main specification, by some preconfiguration, or using a main specification, by some preconfiguration, or using a
synchronization with a common repository where all the active synchronization with a common repository where all the active
prefixes are registered. prefixes are registered.
3.3. Use Cases 3.3. Use Cases
skipping to change at page 18, line 7 skipping to change at page 18, line 7
| +--------+ | +--------+
| | | |
+-+---+-+ +-+---+-+
|Spine11| |Spine11|
+-------+ +-------+
Figure 7: Fallen spine Figure 7: Fallen spine
4.6. Positive vs. Negative Disaggregation 4.6. Positive vs. Negative Disaggregation
Disaggregation is the procedure whereby [RIFT] advertises more a Disaggregation is the procedure whereby [RIFT] advertises a more
specific route Southwards as an exception to the aggregated fabric- specific route Southwards as an exception to the aggregated fabric-
default North. Disaggregation is useful when a prefix within the default North. Disaggregation is useful when a prefix within the
aggregation is reachable via some of the parents but not the others aggregation is reachable via some of the parents but not the others
at the same level of the fabric. It is mandatory when the level is at the same level of the fabric. It is mandatory when the level is
the ToF since a ToF node that cannot reach a prefix becomes a black the ToF since a ToF node that cannot reach a prefix becomes a black
hole for that prefix. The hard problem is to know which prefixes are hole for that prefix. The hard problem is to know which prefixes are
reachable by whom. reachable by whom.
In the general case, [RIFT] solves that problem by interconnecting In the general case, [RIFT] solves that problem by interconnecting
the ToF nodes so they can exchange the full list of prefixes that the ToF nodes so they can exchange the full list of prefixes that
exist in the fabric and figure when a ToF node lacks reachability and exist in the fabric and figure when a ToF node lacks reachability and
to existing prefix. This requires additional ports at the ToF, to existing prefix. This requires additional ports at the ToF,
typically 2 ports per ToF node to form a ToF-spanning ring. [RIFT] typically 2 ports per ToF node to form a ToF-spanning ring. [RIFT]
also defines the Southbound Reflection procedure that enables a also defines the southbound reflection procedure that enables a
parent to explore the direct connectivity of its peers, meaning their parent to explore the direct connectivity of its peers, meaning their
own parents and children; based on the advertisements received from own parents and children; based on the advertisements received from
the shared parents and children, it may enable the parent to infer the shared parents and children, it may enable the parent to infer
the prefixes its peers can reach. the prefixes its peers can reach.
When a parent lacks reachability to a prefix, it may disaggregate the When a parent lacks reachability to a prefix, it may disaggregate the
prefix negatively, i.e., advertise that this parent can be used to prefix negatively, i.e., advertise that this parent can be used to
reach any prefix in the aggregation except that one. The Negative reach any prefix in the aggregation except that one. The Negative
Disaggregation signaling is simple and functions transitively from Disaggregation signaling is simple and functions transitively from
ToF to ToP and then from Top to Leaf. But it is hard for a parent to ToF to ToP and then from Top to Leaf. But it is hard for a parent to
skipping to change at page 19, line 44 skipping to change at page 19, line 44
on sending a portion of the traffic to the black hole in the on sending a portion of the traffic to the black hole in the
meantime. In the case of Negative Disaggregation, the last ToF meantime. In the case of Negative Disaggregation, the last ToF
node(s) that injects the route may also incur an incast issue; this node(s) that injects the route may also incur an incast issue; this
problem would occur if a prefix that becomes totally unreachable is problem would occur if a prefix that becomes totally unreachable is
disaggregated, but doing so is mostly useless and is not recommended. disaggregated, but doing so is mostly useless and is not recommended.
4.7. Mobile Edge and Anycast 4.7. Mobile Edge and Anycast
When a physical or a virtual node changes its point of attachement in When a physical or a virtual node changes its point of attachement in
the fabric from a previous-leaf to a next-leaf, new routes must be the fabric from a previous-leaf to a next-leaf, new routes must be
installed that supercede the old ones. Since the flooding flows installed that supersede the old ones. Since the flooding flows
Northwards, the nodes (if any) between the previous-leaf and the Northwards, the nodes (if any) between the previous-leaf and the
common parent are not immediately aware that the path via previous- common parent are not immediately aware that the path via previous-
leaf is obsolete, and a stale route may exist for a while. The leaf is obsolete, and a stale route may exist for a while. The
common parent needs to select the freshest route advertisement in common parent needs to select the freshest route advertisement in
order to install the correct route via the next-leaf. This requires order to install the correct route via the next-leaf. This requires
that the fabric determines the sequence of the movements of the that the fabric determines the sequence of the movements of the
mobile node. mobile node.
On the one hand, a classical sequence counter provides a total order On the one hand, a classical sequence counter provides a total order
for a while but it will eventually wrap. On the other hand, a for a while but it will eventually wrap. On the other hand, a
skipping to change at page 20, line 31 skipping to change at page 20, line 31
routes. Otherwise, the sequence counter from the mobile node, if routes. Otherwise, the sequence counter from the mobile node, if
available, is used. One caveat is that the sequence counter must not available, is used. One caveat is that the sequence counter must not
wrap within the precision of the timing protocol. Another is that wrap within the precision of the timing protocol. Another is that
the mobile node may not even provide a sequence counter, in which the mobile node may not even provide a sequence counter, in which
case the mobility itself must be slower than the precision of the case the mobility itself must be slower than the precision of the
timing. timing.
Mobility must not be confused with Anycast. In both cases, a same Mobility must not be confused with Anycast. In both cases, a same
address is injected in RIFT at different leaves. In the case of address is injected in RIFT at different leaves. In the case of
mobility, only the freshest route must be conserved, since mobile mobility, only the freshest route must be conserved, since mobile
node changed its point of attachement for a leaf to the next. In the node changed its point of attachment for a leaf to the next. In the
case of anycast, the node may be either multihomed (attached to case of anycast, the node may be either multihomed (attached to
multiple leaves in parallel) or reachable beyond the fabric via multiple leaves in parallel) or reachable beyond the fabric via
multiple routes that are redistributed to different leaves; either multiple routes that are redistributed to different leaves; either
way, in the case of anycast, the multiple routes are equally valid way, in the case of anycast, the multiple routes are equally valid
and should be conserved. Without further information from the and should be conserved. Without further information from the
redistributed routing protocol, it is impossible to sort out a redistributed routing protocol, it is impossible to sort out a
movement from a redistribution that happens asynchronously on movement from a redistribution that happens asynchronously on
different leaves. [RIFT] expects that anycast addresses are different leaves. [RIFT] expects that anycast addresses are
advertised within the timing precision, which is typically the case advertised within the timing precision, which is typically the case
with a low-precision timing and a multihomed node. Beyond that time with a low-precision timing and a multihomed node. Beyond that time
skipping to change at page 23, line 42 skipping to change at page 23, line 42
| | | | | | | | | | | | | | | |
+---+ +---+ ...............+---+ +---+ +---+ +---+ ...............+---+ +---+
SV(1) SV(2) SV(n+1) SV(n) SV(1) SV(2) SV(n+1) SV(n)
Figure 10: Dual-homing servers Figure 10: Dual-homing servers
In the single plane, the worst condition is disaggregation of every In the single plane, the worst condition is disaggregation of every
other servers at the same level. Suppose the links from ToR1 to all other servers at the same level. Suppose the links from ToR1 to all
the leaves become not available. All the servers' routes are the leaves become not available. All the servers' routes are
disaggregated and the FIB of the servers will be expanded with n-1 disaggregated and the FIB of the servers will be expanded with n-1
more spicific routes. more specific routes.
Sometimes, pleople may prefer to disaggregate from ToR to servers Sometimes, people may prefer to disaggregate from ToR to servers from
from start on, i.e. the servers have couple tens of routes in FIB start on, i.e. the servers have couple tens of routes in FIB from
from start on beside default routes to avoid breakages at rack level. start on beside default routes to avoid breakages at rack level.
Full disaggregation of the fabric could be achieved by configuration Full disaggregation of the fabric could be achieved by configuration
supported by RIFT. supported by RIFT.
4.11. Fabric With A Controller 4.11. Fabric With A Controller
There are many different ways to deploy the controller. One There are many different ways to deploy the controller. One
possibility is attaching a controller to the RIFT domain from ToF and possibility is attaching a controller to the RIFT domain from ToF and
another possibility is attaching a controller from the leaf. another possibility is attaching a controller from the leaf.
+------------+ +------------+
 End of changes. 23 change blocks. 
29 lines changed or deleted 30 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/