draft-ietf-rift-applicability-00.txt   draft-ietf-rift-applicability-01.txt 
RIFT WG Yuehua. Wei RIFT WG Yuehua. Wei, Ed.
Internet-Draft Zheng. Zhang Internet-Draft Zheng. Zhang
Intended status: Informational ZTE Corporation Intended status: Informational ZTE Corporation
Expires: August 25, 2020 Dmitry. Afanasiev Expires: 5 October 2020 Dmitry. Afanasiev
Yandex Yandex
Tom. Verhaeg Tom. Verhaeg
Interconnect Services B.V. Juniper Networks
Jaroslaw. Kowalczyk Jaroslaw. Kowalczyk
Orange Polska Orange Polska
February 22, 2020 P. Thubert
Cisco Systems
3 April 2020
RIFT Applicability RIFT Applicability
draft-ietf-rift-applicability-00 draft-ietf-rift-applicability-01
Abstract Abstract
This document discusses the properties, applicability and operational This document discusses the properties, applicability and operational
considerations of RIFT in different network scenarios. It intends to considerations of RIFT in different network scenarios. It intends to
provide a rough guide how RIFT can be deployed to simplify routing provide a rough guide how RIFT can be deployed to simplify routing
operations in Clos topologies and their variations. operations in Clos topologies and their variations.
Status of This Memo Status of This Memo
skipping to change at page 1, line 39 skipping to change at page 1, line 41
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 25, 2020. This Internet-Draft will expire on 5 October 2020.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents (https://trustee.ietf.org/
(https://trustee.ietf.org/license-info) in effect on the date of license-info) in effect on the date of publication of this document.
publication of this document. Please review these documents Please review these documents carefully, as they describe your rights
carefully, as they describe your rights and restrictions with respect and restrictions with respect to this document. Code Components
to this document. Code Components extracted from this document must extracted from this document must include Simplified BSD License text
include Simplified BSD License text as described in Section 4.e of as described in Section 4.e of the Trust Legal Provisions and are
the Trust Legal Provisions and are provided without warranty as provided without warranty as described in the Simplified BSD License.
described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Problem Statement of Routing in Modern IP Fabric Fat Tree 2. Problem Statement of Routing in Modern IP Fabric Fat Tree
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Networks . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 3 3. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 3
3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 3 3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 3
3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 5 3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 5
3.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 6 3.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 6
3.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 6 3.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 6
3.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2.3. Generalizing to any Directed Acyclic Graph . . . . . 6
3.3.1. DC Fabrics . . . . . . . . . . . . . . . . . . . . . 6 3.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 7 3.3.1. DC Fabrics . . . . . . . . . . . . . . . . . . . . . 8
3.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 7 3.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 8
3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 7 3.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 8
3.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 7 3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 8
4. Deployment Considerations . . . . . . . . . . . . . . . . . . 9 3.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 9
4.1. South Reflection . . . . . . . . . . . . . . . . . . . . 10 4. Deployment Considerations . . . . . . . . . . . . . . . . . . 11
4.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 10 4.1. South Reflection . . . . . . . . . . . . . . . . . . . . 12
4.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 12 4.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 12
4.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 13 4.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 14
4.5. Miscabling Examples . . . . . . . . . . . . . . . . . . . 13 4.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 15
4.6. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 16 4.5. Miscabling Examples . . . . . . . . . . . . . . . . . . . 15
4.7. In-Band Reachability of Nodes . . . . . . . . . . . . . . 17 4.6. Positive vs. Negative Disaggregation . . . . . . . . . . 18
4.7.1. Reachability of Leafs . . . . . . . . . . . . . . . . 17 4.7. Mobile Edge and Anycast . . . . . . . . . . . . . . . . . 19
4.7.2. Reachability of Spines . . . . . . . . . . . . . . . 17 4.8. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 21
4.8. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 17 4.9. In-Band Reachability of Nodes . . . . . . . . . . . . . . 21
4.9. Fabric With A Controller . . . . . . . . . . . . . . . . 18 4.10. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 22
4.9.1. Controller Attached to ToFs . . . . . . . . . . . . . 19 4.11. Fabric With A Controller . . . . . . . . . . . . . . . . 23
4.9.2. Controller Attached to Leaf . . . . . . . . . . . . . 19 4.11.1. Controller Attached to ToFs . . . . . . . . . . . . 24
4.10. Internet Connectivity Without Underlay . . . . . . . . . 19 4.11.2. Controller Attached to Leaf . . . . . . . . . . . . 24
4.10.1. Internet Default on the Leafs . . . . . . . . . . . 19 4.12. Internet Connectivity With Underlay . . . . . . . . . . . 24
4.10.2. Internet Default on the ToFs . . . . . . . . . . . . 20 4.12.1. Internet Default on the Leaf . . . . . . . . . . . . 25
4.11. Subnet Mismatch and Address Families . . . . . . . . . . 20 4.12.2. Internet Default on the ToFs . . . . . . . . . . . . 25
4.12. Anycast Considerations . . . . . . . . . . . . . . . . . 20 4.13. Subnet Mismatch and Address Families . . . . . . . . . . 25
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 21 4.14. Anycast Considerations . . . . . . . . . . . . . . . . . 25
6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 21 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26
7. Normative References . . . . . . . . . . . . . . . . . . . . 22 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23 7. Normative References . . . . . . . . . . . . . . . . . . . . 27
8. Informative References . . . . . . . . . . . . . . . . . . . 28
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 28
1. Introduction 1. Introduction
This document intends to explain the properties and applicability of This document intends to explain the properties and applicability of
RIFT [I-D.ietf-rift-rift] in different deployment scenarios and "Routing in Fat Trees" [RIFT] in different deployment scenarios and
highlight the operational simplicity of the technology compared to highlight the operational simplicity of the technology compared to
traditional routing solutions. It also documents special traditional routing solutions. It also documents special
considerations when RIFT is used with or without overlays, considerations when RIFT is used with or without overlays,
controllers and corrects topology miscablings and/or node and link controllers and corrects topology miscablings and/or node and link
failures. failures.
2. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks 2. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks
Clos and Fat-Tree topologies have gained prominence in today's Clos and Fat-Tree topologies have gained prominence in today's
networking, primarily as result of the paradigm shift towards a networking, primarily as result of the paradigm shift towards a
centralized data-center based architecture that is poised to deliver centralized data-center based architecture that is poised to deliver
a majority of computation and storage services in the future. a majority of computation and storage services in the future.
Today's current routing protocols were geared towards a network with Today's current routing protocols were geared towards a network with
an irregular topology and low degree of connectivity originally. an irregular topology and low degree of connectivity originally.
When they are applied to Fat-Tree topologies: When they are applied to Fat-Tree topologies:
o they tend to need extensive configuration or provisioning during * they tend to need extensive configuration or provisioning during
bring up and re-dimensioning. bring up and re-dimensioning.
o spine and leaf nodes have the entire network topology and routing * spine and leaf nodes have the entire network topology and routing
information, which is in fact, not needed on the leaf nodes during information, which is in fact, not needed on the leaf nodes during
normal operation. normal operation.
o significant Link State PDUs (LSPs) flooding duplication between * significant Link State PDUs (LSPs) flooding duplication between
spine nodes and leaf nodes occurs during network bring up and spine nodes and leaf nodes occurs during network bring up and
topology updates. It consumes both spine and leaf nodes' CPU and topology updates. It consumes both spine and leaf nodes' CPU and
link bandwidth resources and with that limits protocol link bandwidth resources and with that limits protocol
scalability. scalability.
3. Applicability of RIFT to Clos IP Fabrics 3. Applicability of RIFT to Clos IP Fabrics
Further content of this document assumes that the reader is familiar Further content of this document assumes that the reader is familiar
with the terms and concepts used in OSPF [RFC2328] and IS-IS with the terms and concepts used in OSPF [RFC2328] and IS-IS
[ISO10589-Second-Edition] link-state protocols and at least the [ISO10589-Second-Edition] link-state protocols and at least the
sections of RIFT [I-D.ietf-rift-rift] outlining the requirement of sections of [RIFT] outlining the requirement of routing in IP fabrics
routing in IP fabrics and RIFT protocol concepts. and RIFT protocol concepts.
3.1. Overview of RIFT 3.1. Overview of RIFT
RIFT is a dynamic routing protocol for Clos and fat-tree network RIFT is a dynamic routing protocol for Clos and fat-tree network
topologies. It defines a link-state protocol when "pointing north" topologies. It defines a link-state protocol when "pointing north"
and path-vector protocol when "pointing south". and path-vector protocol when "pointing south".
It floods flat link-state information northbound only so that each It floods flat link-state information northbound only so that each
level obtains the full topology of levels south of it. That level obtains the full topology of levels south of it. That
information is never flooded East-West or back South again. So a top information is never flooded East-West or back South again. So a top
skipping to change at page 4, line 47 skipping to change at page 4, line 47
+----+ +----+ +----+ +-----+ +----+ +----+ +----+ +-----+
Figure 1: Rift overview Figure 1: Rift overview
A middle tier node has only information necessary for its level, A middle tier node has only information necessary for its level,
which are all destinations south of the node based on SPF which are all destinations south of the node based on SPF
calculation, default route and potential disaggregated routes. calculation, default route and potential disaggregated routes.
RIFT combines the advantage of both Link-State and Distance Vector: RIFT combines the advantage of both Link-State and Distance Vector:
o Fastest Possible Convergence * Fastest Possible Convergence
o Automatic Detection of Topology * Automatic Detection of Topology
o Minimal Routes/Info on TORs
o High Degree of ECMP * Minimal Routes/Info on TORs
* High Degree of ECMP
o Fast De-commissioning of Nodes * Fast De-commissioning of Nodes
o Maximum Propagation Speed with Flexible Prefixes in an Update * Maximum Propagation Speed with Flexible Prefixes in an Update
And RIFT eliminates the disadvantages of Link-State or Distance And RIFT eliminates the disadvantages of Link-State or Distance
Vector: Vector:
o Reduced and Balanced Flooding * Reduced and Balanced Flooding
o Automatic Neighbor Detection * Automatic Neighbor Detection
So there are two types of link state database which are "north So there are two types of link state database which are "north
representation" N-TIEs and "south representation" S-TIEs. The N-TIEs representation" N-TIEs and "south representation" S-TIEs. The N-TIEs
contain a link state topology description of lower levels and S-TIEs contain a link state topology description of lower levels and S-TIEs
carry simply default routes for the lower levels. carry simply default routes for the lower levels.
There are a bunch of more advantages unique to RIFT listed below There are a bunch of more advantages unique to RIFT listed below
which could be understood if you read the details of RIFT which could be understood if you read the details of [RIFT].
[I-D.ietf-rift-rift].
o True ZTP * True ZTP
o Minimal Blast Radius on Failures * Minimal Blast Radius on Failures
o Can Utilize All Paths Through Fabric Without Looping * Can Utilize All Paths Through Fabric Without Looping
o Automatic Disaggregation on Failures * Automatic Disaggregation on Failures
o Simple Leaf Implementation that Can Scale Down to Servers * Simple Leaf Implementation that Can Scale Down to Servers
o Key-Value Store * Key-Value Store
o Horizontal Links Used for Protection Only * Horizontal Links Used for Protection Only
o Supports Non-Equal Cost Multipath and Can Replace MC-LAG * Supports Non-Equal Cost Multipath and Can Replace MC-LAG
o Optimal Flooding Reduction and Load-Balancing * Optimal Flooding Reduction and Load-Balancing
3.2. Applicable Topologies 3.2. Applicable Topologies
Albeit RIFT is specified primarily for "proper" Clos or "fat-tree" Albeit RIFT is specified primarily for "proper" Clos or "fat-tree"
structures, it already supports PoD concepts which are strictly structures, it already supports PoD concepts which are strictly
speaking not found in original Clos concepts. speaking not found in original Clos concepts.
Further, the specification explains and supports operations of multi- Further, the specification explains and supports operations of multi-
plane Clos variants where the protocol relies on set of rings to plane Clos variants where the protocol relies on set of rings to
allow the reconciliation of topology view of different planes as most allow the reconciliation of topology view of different planes as most
skipping to change at page 6, line 40 skipping to change at page 6, line 40
its northbound adjacencies (as long as any of the other nodes in the its northbound adjacencies (as long as any of the other nodes in the
level are northbound connected) to still participate in northbound level are northbound connected) to still participate in northbound
forwarding. forwarding.
3.2.2. Vertical Shortcuts 3.2.2. Vertical Shortcuts
Through relaxations of the specified adjacency forming rules RIFT Through relaxations of the specified adjacency forming rules RIFT
implementations can be extended to support vertical "shortcuts" as implementations can be extended to support vertical "shortcuts" as
proposed by e.g. [I-D.white-distoptflood]. The RIFT specification proposed by e.g. [I-D.white-distoptflood]. The RIFT specification
itself does not provide the exact details since the resulting itself does not provide the exact details since the resulting
solution suffers from either much larger blast radii with increased solution suffers from either much larger blast radius with increased
flooding volumes or in case of maximum aggregation routing bow-tie flooding volumes or in case of maximum aggregation routing bow-tie
problems. problems.
3.2.3. Generalizing to any Directed Acyclic Graph
RIFT is an anisotropic routing protocol, meaning that it has a sense
of direction (Northbound, Southbound, East-West) and that it operates
differently depending on the direction.
* Northbound, RIFT operates as a Link State IGP, whereby the control
packets are reflooded first all the way North and only interpreted
later. All the individual fine grained routes are advertised.
* Southbound, RIFT operates as a Distance Vector IGP, whereby the
control packets are flooded only one hop, interpreted, and the
consequence of that computation is what gets flooded on more hop
South. In the most common use-cases, a ToF node can reach most of
the prefixes in the fabric. If that is the case, the ToF node
advertises the fabric default and disaggregates the prefixes that
it cannot reach. On the oethr hand, a ToF Node that can reach
only a small subset of the prefixes in the fabric will preferably
advertise those prefixes and refrain from aggregating.
In the general case, what gets advertised South is in more
details:
1. A fabric default that aggregates all the prefixes that are
reachable within the fabric, and that could be a default route
or a prefix that is dedicated to this particular fabric.
2. The loopback addresses of the Northbound nodes, e.g., for
inband management.
3. The disaggregated prefixes for the dynamic exceptions to the
fabric Default, advertised to route around the black hole that
may form
* East-West routing can optionally be used, with specific
restrictions. It is useful in particular when a sibling has
access to the fabric default but this node does not.
A Directed Acyclic Graph (DAG) provides a sense of North (the
direction of the DAG) and of South (the reverse), which can be used
to apply RIFT. For the purpose of RIFT an edge in the DAG that has
only incoming vertices is a ToF node.
There are a number of caveats though:
* The DAG structure must exist before RIFT starts, so there is a
need for a companion protocol to establish the logical DAG
structure.
* A generic DAG does not have a sense of East and West. The
operation specified for East-West links and the Southbound
reflection between nodes are not applicable.
* In order to aggregate and disaggregate routes, RIFT requires that
all the ToF nodes share the full knowledge of the prefixes in the
fabric. This can be achieved with a ring as suggested by the RIFT
main specification, by some preconfiguration, or using a
synchronization with a common repository where all the active
prefixes are registered.
3.3. Use Cases 3.3. Use Cases
3.3.1. DC Fabrics 3.3.1. DC Fabrics
RIFT is largely driven by demands and hence ideally suited for RIFT is largely driven by demands and hence ideally suited for
application in underlay of data center IP fabrics, vast majority of application in underlay of data center IP fabrics, vast majority of
which seem to be currently (and for the foreseeable future) Clos which seem to be currently (and for the foreseeable future) Clos
architectures. It significantly simplifies operation and deployment architectures. It significantly simplifies operation and deployment
of such fabrics as described in Section 4 for environments compared of such fabrics as described in Section 4 for environments compared
to extensive proprietary provisioning and operational solutions. to extensive proprietary provisioning and operational solutions.
skipping to change at page 8, line 44 skipping to change at page 10, line 44
| |--------| |--------| |----------| |-------| | | |--------| |--------| |----------| |-------| |
| |--------| |--------| |----------| |-------| | | |--------| |--------| |----------| |-------| |
| || VAS7 || || VAS4 || || vIGMP || ||BAA || | | || VAS7 || || VAS4 || || vIGMP || ||BAA || |
| |--------| |--------| |----------| |-------| | | |--------| |--------| |----------| |-------| |
| +--------+ +--------+ +----------+ +-------+ | | +--------+ +--------+ +----------+ +-------+ |
| | | |
++-----------+ +---------++ ++-----------+ +---------++
|Network I/O | |Access I/O| |Network I/O | |Access I/O|
+------------+ +----------+ +------------+ +----------+
Figure 2: An example of CloudCO architecture Figure 2: An example of CloudCO architecture
The Spine-Leaf architectures deployed inside CloudCO meets the The Spine-Leaf architectures deployed inside CloudCO meets the
network requirements of adaptable, agile, scalable and dynamic. network requirements of adaptable, agile, scalable and dynamic.
4. Deployment Considerations 4. Deployment Considerations
RIFT presents the opportunity for organizations building and RIFT presents the opportunity for organizations building and
operating IP fabrics to simplify their operation and deployments operating IP fabrics to simplify their operation and deployments
while achieving many desirable properties of a dynamic routing on while achieving many desirable properties of a dynamic routing on
such a substrate: such a substrate:
o RIFT design follows minimum blast radius and minimum necessary * RIFT design follows minimum blast radius and minimum necessary
epistemological scope philosophy which leads to very good scaling epistemological scope philosophy which leads to very good scaling
properties while delivering maximum reactiveness. properties while delivering maximum reactiveness.
o RIFT allows for extensive Zero Touch Provisioning within the * RIFT allows for extensive Zero Touch Provisioning within the
protocol. In its most extreme version RIFT does not rely on any protocol. In its most extreme version RIFT does not rely on any
specific addressing and for IP fabric can operate using IPv6 ND specific addressing and for IP fabric can operate using IPv6 ND
[RFC4861] only. [RFC4861] only.
o RIFT has provisions to detect common IP fabric mis-cabling * RIFT has provisions to detect common IP fabric mis-cabling
scenarios. scenarios.
o RIFT negotiates automatically BFD per link allowing this way for * RIFT negotiates automatically BFD per link allowing this way for
IP and micro-BFD [RFC7130] to replace LAGs which do hide bandwidth IP and micro-BFD [RFC7130] to replace LAGs which do hide bandwidth
imbalances in case of constituent failures. Further automatic imbalances in case of constituent failures. Further automatic
link validation techniques similar to [RFC5357] could be supported link validation techniques similar to [RFC5357] could be supported
as well. as well.
o RIFT inherently solves many difficult problems associated with the * RIFT inherently solves many difficult problems associated with the
use of traditional routing topologies with dense meshes and high use of traditional routing topologies with dense meshes and high
degrees of ECMP by including automatic bandwidth balancing, flood degrees of ECMP by including automatic bandwidth balancing, flood
reduction and automatic disaggregation on failures while providing reduction and automatic disaggregation on failures while providing
maximum aggregation of prefixes in default scenarios. maximum aggregation of prefixes in default scenarios.
o RIFT reduces FIB size towards the bottom of the IP fabric where * RIFT reduces FIB size towards the bottom of the IP fabric where
most nodes reside and allows with that for cheaper hardware on the most nodes reside and allows with that for cheaper hardware on the
edges and introduction of modern IP fabric architectures that edges and introduction of modern IP fabric architectures that
encompass e.g. server multi-homing. encompass e.g. server multi-homing.
o RIFT provides valley-free routing and with that is loop free. * RIFT provides valley-free routing and with that is loop free.
This allows the use of any such valley-free path in bi-sectional This allows the use of any such valley-free path in bi-sectional
fabric bandwidth between two destination irrespective of their fabric bandwidth between two destination irrespective of their
metrics which can be used to balance load on the fabric in metrics which can be used to balance load on the fabric in
different ways. different ways.
o RIFT includes a key-value distribution mechanism which allows for * RIFT includes a key-value distribution mechanism which allows for
many future applications such as automatic provisioning of basic many future applications such as automatic provisioning of basic
overlay services or automatic key roll-overs over whole fabrics. overlay services or automatic key roll-overs over whole fabrics.
o RIFT is designed for minimum delay in case of prefix mobility on * RIFT is designed for minimum delay in case of prefix mobility on
the fabric. the fabric.
o Many further operational and design points collected over many * Many further operational and design points collected over many
years of routing protocol deployments have been incorporated in years of routing protocol deployments have been incorporated in
RIFT such as fast flooding rates, protection of information RIFT such as fast flooding rates, protection of information
lifetimes and operationally easily recognizable remote ends of lifetimes and operationally easily recognizable remote ends of
links and node names. links and node names.
4.1. South Reflection 4.1. South Reflection
South reflection is a mechanism that South Node TIEs are "reflected" South reflection is a mechanism that South Node TIEs are "reflected"
back up north to allow nodes in same level without E-W links to "see" back up north to allow nodes in same level without E-W links to "see"
each other. each other.
skipping to change at page 12, line 51 skipping to change at page 14, line 49
Without disaggregation mechanism, when linkTS3 and linkTS4 both fail, Without disaggregation mechanism, when linkTS3 and linkTS4 both fail,
the packet from leaf111 to prefix122 would suffer 50% black-holing the packet from leaf111 to prefix122 would suffer 50% black-holing
based on pure default route. The packet supposed to go up through based on pure default route. The packet supposed to go up through
linkSL1 to linkTS1 then go down through linkTS3 or linkTS4 will be linkSL1 to linkTS1 then go down through linkTS3 or linkTS4 will be
dropped. The packet supposed to go up through linkSL3 to linkTS2 dropped. The packet supposed to go up through linkSL3 to linkTS2
then go down through linkTS3 or linkTS4 will be dropped as well. then go down through linkTS3 or linkTS4 will be dropped as well.
It's the case of black-holing. It's the case of black-holing.
With disaggregation mechanism, when linkTS3 and linkTS4 both fail, With disaggregation mechanism, when linkTS3 and linkTS4 both fail,
ToF22 will detect the failure according to the reflected node S-TIE ToF22 will detect the failure according to the reflected node S-TIE
of ToF21 from Spine111\Spine112\Spine121\Spine122. Based on the of ToF21 from Spine111\Spine112. Based on the disaggregation
disaggregation algorithm provided by RITF, ToF22 will explicitly algorithm provided by RITF, ToF22 will explicitly originate an S-TIE
originate an S-TIE with prefix 121 and prefix 122, that is flooded to with prefix 121 and prefix 122, that is flooded to spines 111, 112,
spines 111, 112, 121 and 122. 121 and 122.
The packet from leaf111 to prefix122 will not be routed to linkTS1 or The packet from leaf111 to prefix122 will not be routed to linkTS1 or
linkTS2. The packet from leaf111 to prefix122 will only be routed to linkTS2. The packet from leaf111 to prefix122 will only be routed to
linkTS5 or linkTS7 following a longest-prefix match to prefix122. linkTS5 or linkTS7 following a longest-prefix match to prefix122.
4.4. Zero Touch Provisioning (ZTP) 4.4. Zero Touch Provisioning (ZTP)
Each RIFT node may operate in zero touch provisioning (ZTP) mode. It Each RIFT node may operate in zero touch provisioning (ZTP) mode. It
has no configuration (unless it is a Top-of-Fabric at the top of the has no configuration (unless it is a Top-of-Fabric at the top of the
topology or it is desired to confine it to leaf role w/o leaf-2-leaf topology or it is desired to confine it to leaf role w/o leaf-2-leaf
skipping to change at page 14, line 25 skipping to change at page 16, line 4
|Spin111| |Spin112| | |Spin121| |Spin122| LEVEL 1 |Spin111| |Spin112| | |Spin121| |Spin122| LEVEL 1
+-+---+-+ ++----+-+ | +-+---+-+ ++----+-+ +-+---+-+ ++----+-+ | +-+---+-+ ++----+-+
| | | | | | | | | | | | | | | | | |
| +---------+ | link-M | +---------+ | | +---------+ | link-M | +---------+ |
| | | | | | | | | | | | | | | | | |
| +-------+ | | | | +-------+ | | | +-------+ | | | | +-------+ | |
| | | | | | | | | | | | | | | | | |
+-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ | +-+---+-+ +--+--+-+
|Leaf111| |Leaf112+-----+ |Leaf121| |Leaf122| LEVEL 0 |Leaf111| |Leaf112+-----+ |Leaf121| |Leaf122| LEVEL 0
+-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
Figure 5: A single plane miscabling example Figure 5: A single plane miscabling example
Figure Figure 5 shows a single plane miscabling example. It's a Figure 5 shows a single plane miscabling example. It's a perfect
perfect fat-tree fabric except link-M connecting Leaf112 to ToF22. fat-tree fabric except link-M connecting Leaf112 to ToF22.
The RIFT control protocol can discover the physical links The RIFT control protocol can discover the physical links
automatically and be able to detect cabling that violates fat-tree automatically and be able to detect cabling that violates fat-tree
topology constraints. It react accordingly to such mis-cabling topology constraints. It react accordingly to such mis-cabling
attempts, at a minimum preventing adjacencies between nodes from attempts, at a minimum preventing adjacencies between nodes from
being formed and traffic from being forwarded on those mis-cabled being formed and traffic from being forwarded on those mis-cabled
links. Leaf112 will in such scenario use link-M to derive its level links. Leaf112 will in such scenario use link-M to derive its level
(unless it is leaf) and can report links to spines 111 and 112 as (unless it is leaf) and can report links to spines 111 and 112 as
miscabled unless the implementations allows horizontal links. miscabled unless the implementations allows horizontal links.
Figure Figure 6 shows a multiple plane miscabling example. Since Figure 6 shows a multiple plane miscabling example. Since Leaf112
Leaf112 and Spine121 belong to two different PoDs, the adjacency and Spine121 belong to two different PoDs, the adjacency between
between Leaf112 and Spine121 can not be formed. link-W would be Leaf112 and Spine121 can not be formed. link-W would be detected and
detected and prevented. prevented.
+-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
|ToF A1| |ToF A2| |ToF B1| |ToF B2| LEVEL 2 |ToF A1| |ToF A2| |ToF B1| |ToF B2| LEVEL 2
+-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+ +-------+
| | | | | | | | | | | | | | | |
| | | +-----------------+ | | | | | | +-----------------+ | | |
| +--------------------------+ | | | | | +--------------------------+ | | | |
| | | | | | | | | | | | | | | |
| +------+ | | | +------+ | | +------+ | | | +------+ |
| | +-----------------+ | | | | | | | +-----------------+ | | | | |
skipping to change at page 16, line 31 skipping to change at page 17, line 41
+-+---+-+ +--+--+-+ +-----+-+ +-----+-+ +-+---+-+ +--+--+-+ +-----+-+ +-----+-+
|Leaf111| |Leaf112| |Leaf111| |Leaf112| |Leaf111| |Leaf112| |Leaf111| |Leaf112|
+-------+ +-------+ +-+-----+ +-+-----+ +-------+ +-------+ +-+-----+ +-+-----+
| | | |
| +--------+ | +--------+
| | | |
+-+---+-+ +-+---+-+
|Spine11| |Spine11|
+-------+ +-------+
Figure 7: Fallen spine Figure 7: Fallen spine
4.6. IPv4 over IPv6 4.6. Positive vs. Negative Disaggregation
Disaggregation is the procedure whereby [RIFT] advertises more a
specific route Southwards as an exception to the aggregated fabric-
default North. Disaggregation is useful when a prefix within the
aggregation is reachable via some of the parents but not the others
at the same level of the fabric. It is mandatory when the level is
the ToF since a ToF node that cannot reach a prefix becomes a black
hole for that prefix. The hard problem is to know which prefixes are
reachable by whom.
In the general case, [RIFT] solves that problem by interconnecting
the ToF nodes so they can exchange the full list of prefixes that
exist in the fabric and figure when a ToF node lacks reachability and
to existing prefix. This requires additional ports at the ToF,
typically 2 ports per ToF node to form a ToF-spanning ring. xref
target='I-D.ietf-rift-rift'/> also defines the Southbound Reflection
procedure that enables a parent to explore the direct connectivity of
its peers, meaning their own parents and children; based on the
advertisements received from the shared parents and children, it may
enable the parent to infer the prefixes its peers can reach.
When a parent lacks reachability to a prefix, it may disaggregate the
prefix negatively, i.e., advertise that this parent can be used to
reach any prefix in the aggregation except that one. The Negative
Disaggregation signaling is simple and functions transitively from
ToF to ToP and then from Top to Leaf. But it is hard for a parent to
figure which prefix it needs to disaggregate, because it does not
know what it does not know; it results thet the use of a spanning
ring at the ToF is required to operate the Negative Disaggregation.
Also, though it is only an implementation problem, the programmation
of the FIB is complex compared to normal routes, and may incur
recursions.
The more classical alternative is, for the parents that can reach a
prefix that peers at the same level cannot, to advertise a more
specific route to that prefix. This leverages the normal longest
prefix match in the FIB, and does not require a special
implementation. But as opposed to the Negative Disaggregation, the
Positive Disaggregation is difficult and inefficient to operate
transitively.
Transitivity is not needed to a grandchild if all its parents
received the Positive Disaggregation, meaning that they shall all
avoid the black hole; when that is the case, they collectively build
a ceiling that protects the grandchild. But until then, a parent
that received a Positive Disaggregation may believe that some peers
are lacking the reachability and readvertise too early, or defer and
maintain a black hole situation longer than necessary.
In a non-partitioned fabric, all the ToF nodes see one another
through the reflection and can figure if one is missing a child. In
that case it is possible to compute the prefixes that the peer cannot
reach and disaggregate positively without a ToF-spanning ring. The
ToF nodes can also acertain that the ToP nodes are connected each to
at least a ToF node that can still reach the prefix, meaning that the
transitive operation is not required.
The bottom line is that in a fabric that is partitioned (e.g., using
multiple planes) and/or where the ToP nodes are not guaranteed to
always form a ceiling for their children, it is mandatory to use the
Negative Disaggregation. On the other hand, in a highly symmetrical
and fully connected fabric, (e.g., a canonical Clos Network), the
Positive Disaggregation methods allows to save the complexity and
cost associated to the ToF-spanning ring.
Note that in the case of Positive Disaggregation, the first ToF
node(s) that announces a more-specific route attracts all the traffic
for that route and may suffer from a transient incast. A ToP node
that defers injecting the longer prefix in the FIB, in order to
receive more advertisements and spread the packets better, also keeps
on sending a portion of the traffic to the black hole in the
meantime. In the case of Negative Disaggregation, the last ToF
node(s) that injects the route may also incur an incast issue; this
problem would occur if a prefix that becomes totally unreachable is
disaggregated, but doing so is mostly useless and is not recommended.
4.7. Mobile Edge and Anycast
When a physical or a virtual node changes its point of attachement in
the fabric from a previous-leaf to a next-leaf, new routes must be
installed that supercede the old ones. Since the flooding flows
Northwards, the nodes (if any) between the previous-leaf and the
common parent are not immediately aware that the path via previous-
leaf is obsolete, and a stale route may exist for a while. The
common parent needs to select the freshest route advertisement in
order to install the correct route via the next-leaf. This requires
that the fabric determines the sequence of the movements of the
mobile node.
On the one hand, a classical sequence counter provides a total order
for a while but it will eventually wrap. On the other hand, a
timestamp provides a permanent order but it may miss a movement that
happens too quickly vs. the granularity of the timing information.
It is not envisioned in the short term that the average fabric
supports a Precision Time Protocol, and the precision that may be
available with the Network Time Protocol [RFC5905], in the order of
100 to 200ms, may not be necessarily enough to cover, e.g., the fast
mobility of a Virtual Machine.
Section 4.3.3. "Mobility" of [RIFT] specifies an hybrid method that
combines a sequence counter from the mobile node and a timestamp from
the network taken at the leaf when the route is injected. If the
timestamps of the concurrent advertisements are comparable (i.e.,
more distant than the precision of the timing protocol), then the
timestamp alone is used to determine the relative freshness of the
routes. Otherwise, the sequence counter from the mobile node, if
available, is used. One caveat is that the sequence counter must not
wrap within the precision of the timing protocol. Another is that
the mobile node may not even provide a sequence counter, in which
case the mobility itself must be slower than the precision of the
timing.
Mobility must not be confused with Anycast. In both cases, a same
address is injected in RIFT at different leaves. In the case of
mobility, only the freshest route must be conserved, since mobile
node changed its point of attachement for a leaf ot the next. In the
case of anycast, the node may be either multihomed (attached to
multiple leaves in parallel) or reachable beyond the fabric via
multiple routes that are redistributed to different leaves; either
way, in the case of anycast, the multiple routes are equally valid
and should be conserved. Without further information from the
redistributed routing protocol, it is impossible to sort out a
movement from a redistribution that happens asynchronously on
different leaves. [RIFT] expects that anycast addresses are
advertised within the timing precision, which is typically the case
with a low-precision timing and a multihomed node. Beyond that time
interval, RIFT interprets the lag as a mobility and only the freshest
route is retained.
When using IPv6 [RFC8200], RIFT suggests to leverage "Registration
Extensions for IPv6 over Low-Power Wireless Personal Area Network
(6LoWPAN) Neighbor Discovery (ND)" [RFC8505] as the IPv6 ND
interaction between the mobile node and the leaf. This provides not
only a sequence counter but also a lifetime and a security token that
may be used to protect the ownership of an address. When using
[RFC8505], the parallel registration of an anycast address to
multiple leaves is done with the same sequence counter, whereas the
sequence counter is incremented when the point of attachement
changes. This way, it is possible to differentiate a mobile node
from a multihomed node, even when the mobility happens within the
timing precision. It is also possible for a mobile node to be
multihomed as well, e.g., to change only one of its points of
attachement.
4.8. IPv4 over IPv6
RIFT allows advertising IPv4 prefixes over IPv6 RIFT network. IPv6 RIFT allows advertising IPv4 prefixes over IPv6 RIFT network. IPv6
AF configures via the usual ND mechanisms and then V4 can use V6 AF configures via the usual ND mechanisms and then V4 can use V6
nexthops analogous to RFC5549. It is expected that the whole fabric nexthops analogous to RFC5549. It is expected that the whole fabric
supports the same type of forwarding of address families on all the supports the same type of forwarding of address families on all the
links. RIFT provides an indication whether a node is v4 forwarding links. RIFT provides an indication whether a node is v4 forwarding
capable and implementations are possible where different routing capable and implementations are possible where different routing
tables are computed per address family as long as the computation tables are computed per address family as long as the computation
remains loop-free. remains loop-free.
skipping to change at page 17, line 30 skipping to change at page 21, line 43
+---+---+ |LEAF | | LEAF| +---+---+ |LEAF | | LEAF|
+--+--+ +--+--+ +--+--+ +--+--+
| | | |
IPv4 prefixes| |IPv4 prefixes IPv4 prefixes| |IPv4 prefixes
| | | |
+---+----+ +---+----+ +---+----+ +---+----+
| V4 | | V4 | | V4 | | V4 |
| subnet | | subnet | | subnet | | subnet |
+--------+ +--------+ +--------+ +--------+
Figure 8: IPv4 over IPv6 Figure 8: IPv4 over IPv6
4.7. In-Band Reachability of Nodes 4.9. In-Band Reachability of Nodes
4.7.1. Reachability of Leafs RIFT doesn't precondition that nodes of the fabric have reachable
addresses. But the operational purposes to reach the internal nodes
may exist. Figure 9 shows an example that the NMS attaches to LEAF1.
TODO +-------+ +-------+
| ToF1 | | ToF2 |
++---- ++ ++-----++
| | | |
| +----------+ |
| +--------+ | |
| | | |
++-----++ +--+---++
|SPINE1 | |SPINE2 |
++-----++ ++-----++
| | | |
| +----------+ |
| +--------+ | |
| | | |
++-----++ +--+---++
| LEAF1 | | LEAF2 |
+---+---+ +-------+
|
|NMS
4.7.2. Reachability of Spines Figure 9: In-Band reachability of node
TODO If NMS wants to access LEAF2, it simply works. Because loopback
address of LEAF2 is flooded in its Prefix North TIE.
4.8. Dual Homing Servers If NMS wants to access SPINE2, it simply works too. Because spine
node always advertises its loopback address in the Prefix North TIE.
NMS may reach SPINE2 from LEAF1-SPINE2 or LEAF1-SPINE1-ToF1/
ToF2-SPINE2.
If NMS wants to access ToF2, ToF2's loopback address needs to be
injected into its Prefix South TIE. Otherwise, the traffic from NMS
may be sent to ToF1.
And in case of failure between ToF2 and spine nodes, ToF2's loopback
address must be sent all the way down to the leaves.
4.10. Dual Homing Servers
Each RIFT node may operate in zero touch provisioning (ZTP) mode. It Each RIFT node may operate in zero touch provisioning (ZTP) mode. It
has no configuration (unless it is a Top-of-Fabric at the top of the has no configuration (unless it is a Top-of-Fabric at the top of the
topology or the must operate in the topology as leaf and/or support topology or the must operate in the topology as leaf and/or support
leaf-2-leaf procedures) and it will fully configure itself after leaf-2-leaf procedures) and it will fully configure itself after
being attached to the topology. being attached to the topology.
+---+ +---+ +---+ +---+ +---+ +---+
|ToF| |ToF| |ToF| |ToF| |ToF| |ToF|
+---+ +---+ +---+ +---+ +---+ +---+
skipping to change at page 18, line 28 skipping to change at page 23, line 28
| +-----------------+ | | | | +-----------------+ | | |
| | | +-------------+ | | | | | +-------------+ | |
+ | + | | |-----------------+ | + | + | | |-----------------+ |
X | X | +--------x-----+ | X | X | X | +--------x-----+ | X |
+ | + | | | + | + | + | | | + |
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+
| | | | | | | | | | | | | | | |
+---+ +---+ ...............+---+ +---+ +---+ +---+ ...............+---+ +---+
SV(1) SV(2) SV(n+1) SV(n) SV(1) SV(2) SV(n+1) SV(n)
Figure 9: Dual-homing servers Figure 10: Dual-homing servers
In the single plane, the worst condition is disaggregation of every In the single plane, the worst condition is disaggregation of every
other servers at the same level. Suppose the links from ToR1 to all other servers at the same level. Suppose the links from ToR1 to all
the leaves become not available. All the servers' routes are the leaves become not available. All the servers' routes are
disaggregated and the FIB of the servers will be expanded with n-1 disaggregated and the FIB of the servers will be expanded with n-1
more spicific routes. more spicific routes.
Sometimes, pleople may prefer to disaggregate from ToR to servers Sometimes, pleople may prefer to disaggregate from ToR to servers
from start on, i.e. the servers have couple tens of routes in FIB from start on, i.e. the servers have couple tens of routes in FIB
from start on beside default routes to avoid breakages at rack level. from start on beside default routes to avoid breakages at rack level.
Full disaggregation of the fabric could be achieved by configuration Full disaggregation of the fabric could be achieved by configuration
supported by RIFT. supported by RIFT.
4.9. Fabric With A Controller 4.11. Fabric With A Controller
There are many different ways to deploy the controller. One There are many different ways to deploy the controller. One
possibility is attaching a controller to the RIFT domain from ToF and possibility is attaching a controller to the RIFT domain from ToF and
another possibility is attaching a controller from the leaf. another possibility is attaching a controller from the leaf.
+------------+ +------------+
| Controller | | Controller |
++----------++ ++----------++
| | | |
| | | |
+----++ ++----+ +----++ ++----+
---------- | ToF | | ToF | ------- | ToF | | ToF |
| +--+--+ +-----+ | +--+--+ +-----+
| | | | | | | | | |
| | +-------------+ | | | +-------------+ |
| | +--------+ | | | | +--------+ | |
| | | | | | | | | |
+-----+ +-+---+ +-----+ +-+---+
RIFT domain |SPINE| |SPINE| RIFT domain |SPINE| |SPINE|
+--+--+ +-----+ +--+--+ +-----+
| | | | | | | | | |
| | +-------------+ | | | +-------------+ |
| | +--------+ | | | | +--------+ | |
| | | | | | | | | |
| +-----+ +-+---+ | +-----+ +-+---+
---------- |LEAF | | LEAF| ------- |LEAF | | LEAF|
+-----+ +-----+ +-----+ +-----+
Figure 10: Fabric with a controller Figure 11: Fabric with a controller
4.9.1. Controller Attached to ToFs 4.11.1. Controller Attached to ToFs
If a controller is attaching to the RIFT domain from ToF, it usually If a controller is attaching to the RIFT domain from ToF, it usually
uses dual-homing connections. The loopback prefix of the controller uses dual-homing connections. The loopback prefix of the controller
should be advertised down by the ToF and spine to leaves. If the should be advertised down by the ToF and spine to leaves. If the
controller loses link to ToF, make sure the ToF withdraw the prefix controller loses link to ToF, make sure the ToF withdraw the prefix
of the controller(use different mechanisms). of the controller(use different mechanisms).
4.9.2. Controller Attached to Leaf 4.11.2. Controller Attached to Leaf
If the controller is attaching from a leaf to the fabric, no special If the controller is attaching from a leaf to the fabric, no special
provisions are needed. provisions are needed.
4.10. Internet Connectivity Without Underlay 4.12. Internet Connectivity With Underlay
4.10.1. Internet Default on the Leafs If global addressing is running without overlay, an external default
route needs to be advertised through rift fabric to achieve internet
connectivity. For the purpose of forwarding of the entire rift
fabric, an internal fabric prefix needs to be advertised in the South
Prefix TIE by ToF and spine nodes.
TODO 4.12.1. Internet Default on the Leaf
4.10.2. Internet Default on the ToFs In case that an internet access request comes from a leaf and the
internet gateway is another leaf, the leaf node as the internet
gateway needs to advertise a default route in its Prefix North TIE.
TODO 4.12.2. Internet Default on the ToFs
4.11. Subnet Mismatch and Address Families In case that an internet access request comes from a leaf and the
internet gateway is a ToF, the ToF and spine nodes need to advertise
a default route in the Prefix South TIE.
4.13. Subnet Mismatch and Address Families
+--------+ +--------+ +--------+ +--------+
| | LIE LIE | | | | LIE LIE | |
| A | +----> <----+ | B | | A | +----> <----+ | B |
| +---------------------+ | | +---------------------+ |
+--------+ +--------+ +--------+ +--------+
X/24 Y/24 X/24 Y/24
Figure 11: subnet mismatch Figure 12: subnet mismatch
LIEs are exchanged over all links running RIFT to perform Link LIEs are exchanged over all links running RIFT to perform Link
(Neighbor) Discovery. A node MUST NOT originate LIEs on an address (Neighbor) Discovery. A node MUST NOT originate LIEs on an address
family if it does not process received LIEs on that family. LIEs on family if it does not process received LIEs on that family. LIEs on
same link are considered part of the same negotiation independent on same link are considered part of the same negotiation independent on
the address family they arrive on. An implementation MUST be ready the address family they arrive on. An implementation MUST be ready
to accept TIEs on all addresses it used as source of LIE frames. to accept TIEs on all addresses it used as source of LIE frames.
As shown in the above figure, without further checks adjacency of As shown in the above figure, without further checks adjacency of
node A and B may form, but the forwarding between node A and node B node A and B may form, but the forwarding between node A and node B
may fail because subnet X mismatches with subnet Y. may fail because subnet X mismatches with subnet Y.
To prevent this a RIFT implementation should check for subnet To prevent this a RIFT implementation should check for subnet
mismatch just like e.g. ISIS does. This can lead to scenarios where mismatch just like e.g. ISIS does. This can lead to scenarios where
an adjacency, despite exchange of LIEs in both address families may an adjacency, despite exchange of LIEs in both address families may
end up having an adjacency in a single AF only. This is a end up having an adjacency in a single AF only. This is a
consideration especially in Section 4.6 scenarios. consideration especially in Section 4.8 scenarios.
4.12. Anycast Considerations 4.14. Anycast Considerations
+ traffic + traffic
| |
v v
+------+------+ +------+------+
| ToF | | ToF |
+---+-----+---+ +---+-----+---+
| | | | | | | |
+------------+ | | +------------+ +------------+ | | +------------+
| | | | | | | |
+---+---+ +-------+ +-------+ +---+---+ +---+---+ +-------+ +-------+ +---+---+
skipping to change at page 21, line 31 skipping to change at page 26, line 31
| | | | | | | | | | | | | | | |
+-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+ +-+---+-+ +--+--+-+
| | | | | | | | | | | | | | | |
|Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0 |Leaf111| |Leaf112| |Leaf121| |Leaf122| LEVEL 0
+-+-----+ ++------+ +-----+-+ +-----+-+ +-+-----+ ++------+ +-----+-+ +-----+-+
+ + + ^ | + + + ^ |
PrefixA PrefixB PrefixA | PrefixC PrefixA PrefixB PrefixA | PrefixC
| |
+ traffic + traffic
Figure 12: Anycast Figure 13: Anycast
If the traffic comes from ToF to Leaf111 or Leaf121 which has anycast If the traffic comes from ToF to Leaf111 or Leaf121 which has anycast
prefix PrefixA. RIFT can deal with this case well. But if the prefix PrefixA. RIFT can deal with this case well. But if the
traffic comes from Leaf122, it will always get to Leaf121 and never traffic comes from Leaf122, it arrives Spine21 or Spine22 at level 1.
get to Leaf111. If the intension is that the traffic should been But Spine21 or Spine22 doesn't know another PrefixA attaching
offloaded to Leaf111, then use policy guided prefixes [PGP Leaf111. So it will always get to Leaf121 and never get to Leaf111.
reference]. If the intension is that the traffic should been offloaded to
Leaf111, then use policy guided prefixes [PGP reference].
5. Acknowledgements 5. Acknowledgements
6. Contributors 6. Contributors
The following people (listed in alphabetical order) contributed The following people (listed in alphabetical order) contributed
significantly to the content of this document and should be significantly to the content of this document and should be
considered co-authors: considered co-authors:
Tony Przygienda Tony Przygienda
skipping to change at page 22, line 14 skipping to change at page 27, line 16
1194 N. Mathilda Ave 1194 N. Mathilda Ave
Sunnyvale, CA 94089 Sunnyvale, CA 94089
US US
Email: prz@juniper.net Email: prz@juniper.net
7. Normative References 7. Normative References
[I-D.ietf-rift-rift]
Przygienda, T., Sharma, A., Thubert, P., Rijsman, B., and
D. Afanasiev, "RIFT: Routing in Fat Trees", draft-ietf-
rift-rift-10 (work in progress), January 2020.
[I-D.white-distoptflood]
White, R., Hegde, S., and S. Zandi, "IS-IS Optimal
Distributed Flooding for Dense Topologies", draft-white-
distoptflood-01 (work in progress), September 2019.
[ISO10589-Second-Edition] [ISO10589-Second-Edition]
International Organization for Standardization, International Organization for Standardization,
"Intermediate system to Intermediate system intra-domain "Intermediate system to Intermediate system intra-domain
routeing information exchange protocol for use in routeing information exchange protocol for use in
conjunction with the protocol for providing the conjunction with the protocol for providing the
connectionless-mode Network Service (ISO 8473)", Nov 2002. connectionless-mode Network Service (ISO 8473)", November
2002.
[TR-384] Broadband Forum Technical Report, "TR-384 Cloud Central
Office Reference Architectural Framework", January 2018.
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328, [RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328,
DOI 10.17487/RFC2328, April 1998, DOI 10.17487/RFC2328, April 1998,
<https://www.rfc-editor.org/info/rfc2328>. <https://www.rfc-editor.org/info/rfc2328>.
[RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
"Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
DOI 10.17487/RFC4861, September 2007, DOI 10.17487/RFC4861, September 2007,
<https://www.rfc-editor.org/info/rfc4861>. <https://www.rfc-editor.org/info/rfc4861>.
skipping to change at page 23, line 5 skipping to change at page 27, line 47
Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)",
RFC 5357, DOI 10.17487/RFC5357, October 2008, RFC 5357, DOI 10.17487/RFC5357, October 2008,
<https://www.rfc-editor.org/info/rfc5357>. <https://www.rfc-editor.org/info/rfc5357>.
[RFC7130] Bhatia, M., Ed., Chen, M., Ed., Boutros, S., Ed., [RFC7130] Bhatia, M., Ed., Chen, M., Ed., Boutros, S., Ed.,
Binderberger, M., Ed., and J. Haas, Ed., "Bidirectional Binderberger, M., Ed., and J. Haas, Ed., "Bidirectional
Forwarding Detection (BFD) on Link Aggregation Group (LAG) Forwarding Detection (BFD) on Link Aggregation Group (LAG)
Interfaces", RFC 7130, DOI 10.17487/RFC7130, February Interfaces", RFC 7130, DOI 10.17487/RFC7130, February
2014, <https://www.rfc-editor.org/info/rfc7130>. 2014, <https://www.rfc-editor.org/info/rfc7130>.
[TR-384] Broadband Forum Technical Report, "TR-384 Cloud Central [RIFT] Przygienda, T., Sharma, A., Thubert, P., Rijsman, B., and
Office Reference Architectural Framework", Jan 2018. D. Afanasiev, "RIFT: Routing in Fat Trees", Work in
Progress, Internet-Draft, draft-ietf-rift-rift-11, 10
March 2020,
<https://tools.ietf.org/html/draft-ietf-rift-rift-11>.
[I-D.white-distoptflood]
White, R., Hegde, S., and S. Zandi, "IS-IS Optimal
Distributed Flooding for Dense Topologies", Work in
Progress, Internet-Draft, draft-white-distoptflood-01, 30
September 2019,
<https://tools.ietf.org/html/draft-white-distoptflood-01>.
8. Informative References
[RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch,
"Network Time Protocol Version 4: Protocol and Algorithms
Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010,
<https://www.rfc-editor.org/info/rfc5905>.
[RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6
(IPv6) Specification", STD 86, RFC 8200,
DOI 10.17487/RFC8200, July 2017,
<https://www.rfc-editor.org/info/rfc8200>.
[RFC8505] Thubert, P., Ed., Nordmark, E., Chakrabarti, S., and C.
Perkins, "Registration Extensions for IPv6 over Low-Power
Wireless Personal Area Network (6LoWPAN) Neighbor
Discovery", RFC 8505, DOI 10.17487/RFC8505, November 2018,
<https://www.rfc-editor.org/info/rfc8505>.
Authors' Addresses Authors' Addresses
Yuehua Wei Yuehua Wei (editor)
ZTE Corporation ZTE Corporation
No.50, Software Avenue No.50, Software Avenue
Nanjing 210012 Nanjing
P. R. China 210012
China
Email: wei.yuehua@zte.com.cn Email: wei.yuehua@zte.com.cn
Zheng Zhang Zheng Zhang
ZTE Corporation ZTE Corporation
No.50, Software Avenue No.50, Software Avenue
Nanjing 210012 Nanjing
P. R. China 210012
China
Email: zzhang_ietf@hotmail.com Email: zzhang_ietf@hotmail.com
Dmitry Afanasiev Dmitry Afanasiev
Yandex Yandex
Email: fl0w@yandex-team.ru Email: fl0w@yandex-team.ru
Tom Verhaeg Tom Verhaeg
Interconnect Services B.V. Juniper Networks
Email: t.verhaeg@interconnect.nl Email: tverhaeg@juniper.net
Jaroslaw Kowalczyk Jaroslaw Kowalczyk
Orange Polska Orange Polska
Email: jaroslaw.kowalczyk2@orange.com Email: jaroslaw.kowalczyk2@orange.com
Pascal Thubert
Cisco Systems, Inc
Building D
45 Allee des Ormes - BP1200
06254 MOUGINS - Sophia Antipolis
France
Phone: +33 497 23 26 34
Email: pthubert@cisco.com
 End of changes. 84 change blocks. 
163 lines changed or deleted 436 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/