--- 1/draft-ietf-rift-applicability-02.txt 2020-10-13 03:13:09.556446830 -0700 +++ 2/draft-ietf-rift-applicability-03.txt 2020-10-13 03:13:09.592447321 -0700 @@ -1,26 +1,26 @@ RIFT WG Yuehua. Wei, Ed. Internet-Draft Zheng. Zhang Intended status: Informational ZTE Corporation -Expires: 12 April 2021 Dmitry. Afanasiev +Expires: 16 April 2021 Dmitry. Afanasiev Yandex Tom. Verhaeg Juniper Networks Jaroslaw. Kowalczyk Orange Polska P. Thubert Cisco Systems - 9 October 2020 + 13 October 2020 RIFT Applicability - draft-ietf-rift-applicability-02 + draft-ietf-rift-applicability-03 Abstract This document discusses the properties, applicability and operational considerations of RIFT in different network scenarios. It intends to provide a rough guide how RIFT can be deployed to simplify routing operations in Clos topologies and their variations. Status of This Memo @@ -30,21 +30,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on 12 April 2021. + This Internet-Draft will expire on 16 April 2021. Copyright Notice Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights @@ -138,37 +138,37 @@ and RIFT protocol concepts. 3.1. Overview of RIFT RIFT is a dynamic routing protocol for Clos and fat-tree network topologies. It defines a link-state protocol when "pointing north" and path-vector protocol when "pointing south". It floods flat link-state information northbound only so that each level obtains the full topology of levels south of it. That - information is never flooded East-West or back South again. So a top + information is never flooded east-west or back South again. So a top tier node has full set of prefixes from the SPF calculation. In the southbound direction the protocol operates like a "fully summarizing, unidirectional" path vector protocol or rather a distance vector with implicit split horizon whereas the information propagates one hop south and is 're-advertised' by nodes at next lower level, normally just the default route. +-----------+ +-----------+ | ToF | | ToF | LEVEL 2 + +-----+--+--+ +-+--+------+ | | | | | | | | | ^ + | | | +-------------------------+ | Distance | +-------------------+ | | | | | Vector | | | | | | | | + - South | | | | +--------+ | | | Link+State + South | | | | +--------+ | | | Link-state + | | | | | | | | Flooding | | | +-------------+ | | | North v | | | | | | | | + +-+--+-+ +------+ +-------+ +--+--+-+ | |SPINE | |SPINE | | SPINE | | SPINE | | LEVEL 1 + ++----++ ++---+-+ +--+--+-+ ++----+-+ | + | | | | | | | | | ^ N Distance | +-------+ | | +--------+ | | | E Vector | | | | | | | | | +------> South | +-------+ | | | +-------+ | | | | @@ -176,44 +176,44 @@ v ++--++ +-+-++ ++-+-+ +-+--++ + |LEAF| |LEAF| |LEAF| |LEAF | LEVEL 0 +----+ +----+ +----+ +-----+ Figure 1: Rift overview A middle tier node has only information necessary for its level, which are all destinations south of the node based on SPF calculation, default route and potential disaggregated routes. - RIFT combines the advantage of both Link-State and Distance Vector: + RIFT combines the advantage of both link-state and distance vector: * Fastest Possible Convergence * Automatic Detection of Topology * Minimal Routes/Info on TORs * High Degree of ECMP * Fast De-commissioning of Nodes * Maximum Propagation Speed with Flexible Prefixes in an Update - And RIFT eliminates the disadvantages of Link-State or Distance - Vector: + And RIFT eliminates the disadvantages of link-state or distance + vector: * Reduced and Balanced Flooding * Automatic Neighbor Detection - So there are two types of link state database which are "north + So there are two types of link-state database which are "north representation" N-TIEs and "south representation" S-TIEs. The N-TIEs - contain a link state topology description of lower levels and S-TIEs + contain a link-state topology description of lower levels and S-TIEs carry simply default routes for the lower levels. There are a bunch of more advantages unique to RIFT listed below which could be understood if you read the details of [RIFT]. * True ZTP * Minimal Blast Radius on Failures * Can Utilize All Paths Through Fabric Without Looping @@ -233,23 +233,24 @@ 3.2. Applicable Topologies Albeit RIFT is specified primarily for "proper" Clos or "fat-tree" structures, it already supports PoD concepts which are strictly speaking not found in original Clos concepts. Further, the specification explains and supports operations of multi- plane Clos variants where the protocol relies on set of rings to allow the reconciliation of topology view of different planes as most desirable solution making proper disaggregation viable in case of - failures. This observations hold not only in case of RIFT but in the - generic case of dynamic routing on Clos variants with multiple planes - and failures in bi-sectional bandwidth, especially on the leafs. + failures. These observations hold not only in case of RIFT but in + the generic case of dynamic routing on Clos variants with multiple + planes and failures in bi-sectional bandwidth, especially on the + leafs. 3.2.1. Horizontal Links RIFT is not limited to pure Clos divided into PoD and multi-planes but supports horizontal links below the top of fabric level. Those links are used however only as routes of last resort northbound when a spine loses all northbound links or cannot compute a default route through them. A possible configuration is a "ring" of horizontal links at a level. @@ -271,68 +272,68 @@ implementations can be extended to support vertical "shortcuts" as proposed by e.g. [I-D.white-distoptflood]. The RIFT specification itself does not provide the exact details since the resulting solution suffers from either much larger blast radius with increased flooding volumes or in case of maximum aggregation routing bow-tie problems. 3.2.3. Generalizing to any Directed Acyclic Graph RIFT is an anisotropic routing protocol, meaning that it has a sense - of direction (Northbound, Southbound, East-West) and that it operates + of direction (northbound, southbound, east-west) and that it operates differently depending on the direction. - * Northbound, RIFT operates as a Link State IGP, whereby the control + * Northbound, RIFT operates as a link-state IGP, whereby the control packets are reflooded first all the way North and only interpreted later. All the individual fine grained routes are advertised. - * Southbound, RIFT operates as a Distance Vector IGP, whereby the + * Southbound, RIFT operates as a distance vector IGP, whereby the control packets are flooded only one hop, interpreted, and the consequence of that computation is what gets flooded on more hop South. In the most common use-cases, a ToF node can reach most of the prefixes in the fabric. If that is the case, the ToF node advertises the fabric default and disaggregates the prefixes that it cannot reach. On the other hand, a ToF Node that can reach only a small subset of the prefixes in the fabric will preferably advertise those prefixes and refrain from aggregating. In the general case, what gets advertised South is in more details: 1. A fabric default that aggregates all the prefixes that are reachable within the fabric, and that could be a default route or a prefix that is dedicated to this particular fabric. - 2. The loopback addresses of the Northbound nodes, e.g., for + 2. The loopback addresses of the northbound nodes, e.g., for inband management. 3. The disaggregated prefixes for the dynamic exceptions to the fabric Default, advertised to route around the black hole that may form - * East-West routing can optionally be used, with specific + * east-west routing can optionally be used, with specific restrictions. It is useful in particular when a sibling has access to the fabric default but this node does not. A Directed Acyclic Graph (DAG) provides a sense of North (the direction of the DAG) and of South (the reverse), which can be used to apply RIFT. For the purpose of RIFT, an edge in the DAG that has only incoming vertices is a ToF node. There are a number of caveats though: * The DAG structure must exist before RIFT starts, so there is a need for a companion protocol to establish the logical DAG structure. - * A generic DAG does not have a sense of East and West. The - operation specified for East-West links and the Southbound + * A generic DAG does not have a sense of east and west. The + operation specified for east-west links and the southbound reflection between nodes are not applicable. * In order to aggregate and disaggregate routes, RIFT requires that all the ToF nodes share the full knowledge of the prefixes in the fabric. This can be achieved with a ring as suggested by the RIFT main specification, by some preconfiguration, or using a synchronization with a common repository where all the active prefixes are registered. 3.3. Use Cases @@ -733,35 +734,35 @@ | +--------+ | | +-+---+-+ |Spine11| +-------+ Figure 7: Fallen spine 4.6. Positive vs. Negative Disaggregation - Disaggregation is the procedure whereby [RIFT] advertises more a + Disaggregation is the procedure whereby [RIFT] advertises a more specific route Southwards as an exception to the aggregated fabric- default North. Disaggregation is useful when a prefix within the aggregation is reachable via some of the parents but not the others at the same level of the fabric. It is mandatory when the level is the ToF since a ToF node that cannot reach a prefix becomes a black hole for that prefix. The hard problem is to know which prefixes are reachable by whom. In the general case, [RIFT] solves that problem by interconnecting the ToF nodes so they can exchange the full list of prefixes that exist in the fabric and figure when a ToF node lacks reachability and to existing prefix. This requires additional ports at the ToF, typically 2 ports per ToF node to form a ToF-spanning ring. [RIFT] - also defines the Southbound Reflection procedure that enables a + also defines the southbound reflection procedure that enables a parent to explore the direct connectivity of its peers, meaning their own parents and children; based on the advertisements received from the shared parents and children, it may enable the parent to infer the prefixes its peers can reach. When a parent lacks reachability to a prefix, it may disaggregate the prefix negatively, i.e., advertise that this parent can be used to reach any prefix in the aggregation except that one. The Negative Disaggregation signaling is simple and functions transitively from ToF to ToP and then from Top to Leaf. But it is hard for a parent to @@ -812,21 +813,21 @@ on sending a portion of the traffic to the black hole in the meantime. In the case of Negative Disaggregation, the last ToF node(s) that injects the route may also incur an incast issue; this problem would occur if a prefix that becomes totally unreachable is disaggregated, but doing so is mostly useless and is not recommended. 4.7. Mobile Edge and Anycast When a physical or a virtual node changes its point of attachement in the fabric from a previous-leaf to a next-leaf, new routes must be - installed that supercede the old ones. Since the flooding flows + installed that supersede the old ones. Since the flooding flows Northwards, the nodes (if any) between the previous-leaf and the common parent are not immediately aware that the path via previous- leaf is obsolete, and a stale route may exist for a while. The common parent needs to select the freshest route advertisement in order to install the correct route via the next-leaf. This requires that the fabric determines the sequence of the movements of the mobile node. On the one hand, a classical sequence counter provides a total order for a while but it will eventually wrap. On the other hand, a @@ -847,21 +848,21 @@ routes. Otherwise, the sequence counter from the mobile node, if available, is used. One caveat is that the sequence counter must not wrap within the precision of the timing protocol. Another is that the mobile node may not even provide a sequence counter, in which case the mobility itself must be slower than the precision of the timing. Mobility must not be confused with Anycast. In both cases, a same address is injected in RIFT at different leaves. In the case of mobility, only the freshest route must be conserved, since mobile - node changed its point of attachement for a leaf to the next. In the + node changed its point of attachment for a leaf to the next. In the case of anycast, the node may be either multihomed (attached to multiple leaves in parallel) or reachable beyond the fabric via multiple routes that are redistributed to different leaves; either way, in the case of anycast, the multiple routes are equally valid and should be conserved. Without further information from the redistributed routing protocol, it is impossible to sort out a movement from a redistribution that happens asynchronously on different leaves. [RIFT] expects that anycast addresses are advertised within the timing precision, which is typically the case with a low-precision timing and a multihomed node. Beyond that time @@ -994,25 +995,25 @@ | | | | | | | | +---+ +---+ ...............+---+ +---+ SV(1) SV(2) SV(n+1) SV(n) Figure 10: Dual-homing servers In the single plane, the worst condition is disaggregation of every other servers at the same level. Suppose the links from ToR1 to all the leaves become not available. All the servers' routes are disaggregated and the FIB of the servers will be expanded with n-1 - more spicific routes. + more specific routes. - Sometimes, pleople may prefer to disaggregate from ToR to servers - from start on, i.e. the servers have couple tens of routes in FIB - from start on beside default routes to avoid breakages at rack level. + Sometimes, people may prefer to disaggregate from ToR to servers from + start on, i.e. the servers have couple tens of routes in FIB from + start on beside default routes to avoid breakages at rack level. Full disaggregation of the fabric could be achieved by configuration supported by RIFT. 4.11. Fabric With A Controller There are many different ways to deploy the controller. One possibility is attaching a controller to the RIFT domain from ToF and another possibility is attaching a controller from the leaf. +------------+