--- 1/draft-ietf-rift-applicability-01.txt 2020-10-09 19:13:10.965773555 -0700 +++ 2/draft-ietf-rift-applicability-02.txt 2020-10-09 19:13:11.005774115 -0700 @@ -1,26 +1,26 @@ RIFT WG Yuehua. Wei, Ed. Internet-Draft Zheng. Zhang Intended status: Informational ZTE Corporation -Expires: 5 October 2020 Dmitry. Afanasiev +Expires: 12 April 2021 Dmitry. Afanasiev Yandex Tom. Verhaeg Juniper Networks Jaroslaw. Kowalczyk Orange Polska P. Thubert Cisco Systems - 3 April 2020 + 9 October 2020 RIFT Applicability - draft-ietf-rift-applicability-01 + draft-ietf-rift-applicability-02 Abstract This document discusses the properties, applicability and operational considerations of RIFT in different network scenarios. It intends to provide a rough guide how RIFT can be deployed to simplify routing operations in Clos topologies and their variations. Status of This Memo @@ -30,21 +30,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on 5 October 2020. + This Internet-Draft will expire on 12 April 2021. Copyright Notice Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights @@ -52,52 +52,52 @@ extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Problem Statement of Routing in Modern IP Fabric Fat Tree Networks . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Applicability of RIFT to Clos IP Fabrics . . . . . . . . . . 3 - 3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 3 - 3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 5 + 3.1. Overview of RIFT . . . . . . . . . . . . . . . . . . . . 4 + 3.2. Applicable Topologies . . . . . . . . . . . . . . . . . . 6 3.2.1. Horizontal Links . . . . . . . . . . . . . . . . . . 6 3.2.2. Vertical Shortcuts . . . . . . . . . . . . . . . . . 6 - 3.2.3. Generalizing to any Directed Acyclic Graph . . . . . 6 + 3.2.3. Generalizing to any Directed Acyclic Graph . . . . . 7 3.3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3.1. DC Fabrics . . . . . . . . . . . . . . . . . . . . . 8 3.3.2. Metro Fabrics . . . . . . . . . . . . . . . . . . . . 8 3.3.3. Building Cabling . . . . . . . . . . . . . . . . . . 8 - 3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 8 + 3.3.4. Internal Router Switching Fabrics . . . . . . . . . . 9 3.3.5. CloudCO . . . . . . . . . . . . . . . . . . . . . . . 9 4. Deployment Considerations . . . . . . . . . . . . . . . . . . 11 4.1. South Reflection . . . . . . . . . . . . . . . . . . . . 12 4.2. Suboptimal Routing on Link Failures . . . . . . . . . . . 12 4.3. Black-Holing on Link Failures . . . . . . . . . . . . . . 14 4.4. Zero Touch Provisioning (ZTP) . . . . . . . . . . . . . . 15 4.5. Miscabling Examples . . . . . . . . . . . . . . . . . . . 15 4.6. Positive vs. Negative Disaggregation . . . . . . . . . . 18 4.7. Mobile Edge and Anycast . . . . . . . . . . . . . . . . . 19 4.8. IPv4 over IPv6 . . . . . . . . . . . . . . . . . . . . . 21 - 4.9. In-Band Reachability of Nodes . . . . . . . . . . . . . . 21 - 4.10. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 22 - 4.11. Fabric With A Controller . . . . . . . . . . . . . . . . 23 + 4.9. In-Band Reachability of Nodes . . . . . . . . . . . . . . 22 + 4.10. Dual Homing Servers . . . . . . . . . . . . . . . . . . . 23 + 4.11. Fabric With A Controller . . . . . . . . . . . . . . . . 24 4.11.1. Controller Attached to ToFs . . . . . . . . . . . . 24 4.11.2. Controller Attached to Leaf . . . . . . . . . . . . 24 - 4.12. Internet Connectivity With Underlay . . . . . . . . . . . 24 + 4.12. Internet Connectivity With Underlay . . . . . . . . . . . 25 4.12.1. Internet Default on the Leaf . . . . . . . . . . . . 25 4.12.2. Internet Default on the ToFs . . . . . . . . . . . . 25 4.13. Subnet Mismatch and Address Families . . . . . . . . . . 25 - 4.14. Anycast Considerations . . . . . . . . . . . . . . . . . 25 + 4.14. Anycast Considerations . . . . . . . . . . . . . . . . . 26 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 26 - 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 26 + 6. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 27 7. Normative References . . . . . . . . . . . . . . . . . . . . 27 8. Informative References . . . . . . . . . . . . . . . . . . . 28 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 28 1. Introduction This document intends to explain the properties and applicability of "Routing in Fat Trees" [RIFT] in different deployment scenarios and highlight the operational simplicity of the technology compared to traditional routing solutions. It also documents special @@ -283,21 +284,21 @@ * Northbound, RIFT operates as a Link State IGP, whereby the control packets are reflooded first all the way North and only interpreted later. All the individual fine grained routes are advertised. * Southbound, RIFT operates as a Distance Vector IGP, whereby the control packets are flooded only one hop, interpreted, and the consequence of that computation is what gets flooded on more hop South. In the most common use-cases, a ToF node can reach most of the prefixes in the fabric. If that is the case, the ToF node advertises the fabric default and disaggregates the prefixes that - it cannot reach. On the oethr hand, a ToF Node that can reach + it cannot reach. On the other hand, a ToF Node that can reach only a small subset of the prefixes in the fabric will preferably advertise those prefixes and refrain from aggregating. In the general case, what gets advertised South is in more details: 1. A fabric default that aggregates all the prefixes that are reachable within the fabric, and that could be a default route or a prefix that is dedicated to this particular fabric. @@ -307,21 +308,21 @@ 3. The disaggregated prefixes for the dynamic exceptions to the fabric Default, advertised to route around the black hole that may form * East-West routing can optionally be used, with specific restrictions. It is useful in particular when a sibling has access to the fabric default but this node does not. A Directed Acyclic Graph (DAG) provides a sense of North (the direction of the DAG) and of South (the reverse), which can be used - to apply RIFT. For the purpose of RIFT an edge in the DAG that has + to apply RIFT. For the purpose of RIFT, an edge in the DAG that has only incoming vertices is a ToF node. There are a number of caveats though: * The DAG structure must exist before RIFT starts, so there is a need for a companion protocol to establish the logical DAG structure. * A generic DAG does not have a sense of East and West. The operation specified for East-West links and the Southbound @@ -423,22 +424,22 @@ | || VAS7 || || VAS4 || || vIGMP || ||BAA || | | |--------| |--------| |----------| |-------| | | +--------+ +--------+ +----------+ +-------+ | | | ++-----------+ +---------++ |Network I/O | |Access I/O| +------------+ +----------+ Figure 2: An example of CloudCO architecture - The Spine-Leaf architectures deployed inside CloudCO meets the - network requirements of adaptable, agile, scalable and dynamic. + The Spine-Leaf architecture deployed inside CloudCO meets the network + requirements of adaptable, agile, scalable and dynamic. 4. Deployment Considerations RIFT presents the opportunity for organizations building and operating IP fabrics to simplify their operation and deployments while achieving many desirable properties of a dynamic routing on such a substrate: * RIFT design follows minimum blast radius and minimum necessary epistemological scope philosophy which leads to very good scaling @@ -745,34 +746,34 @@ aggregation is reachable via some of the parents but not the others at the same level of the fabric. It is mandatory when the level is the ToF since a ToF node that cannot reach a prefix becomes a black hole for that prefix. The hard problem is to know which prefixes are reachable by whom. In the general case, [RIFT] solves that problem by interconnecting the ToF nodes so they can exchange the full list of prefixes that exist in the fabric and figure when a ToF node lacks reachability and to existing prefix. This requires additional ports at the ToF, - typically 2 ports per ToF node to form a ToF-spanning ring. xref - target='I-D.ietf-rift-rift'/> also defines the Southbound Reflection - procedure that enables a parent to explore the direct connectivity of - its peers, meaning their own parents and children; based on the - advertisements received from the shared parents and children, it may - enable the parent to infer the prefixes its peers can reach. + typically 2 ports per ToF node to form a ToF-spanning ring. [RIFT] + also defines the Southbound Reflection procedure that enables a + parent to explore the direct connectivity of its peers, meaning their + own parents and children; based on the advertisements received from + the shared parents and children, it may enable the parent to infer + the prefixes its peers can reach. When a parent lacks reachability to a prefix, it may disaggregate the prefix negatively, i.e., advertise that this parent can be used to reach any prefix in the aggregation except that one. The Negative Disaggregation signaling is simple and functions transitively from ToF to ToP and then from Top to Leaf. But it is hard for a parent to figure which prefix it needs to disaggregate, because it does not - know what it does not know; it results thet the use of a spanning + know what it does not know; it results that the use of a spanning ring at the ToF is required to operate the Negative Disaggregation. Also, though it is only an implementation problem, the programmation of the FIB is complex compared to normal routes, and may incur recursions. The more classical alternative is, for the parents that can reach a prefix that peers at the same level cannot, to advertise a more specific route to that prefix. This leverages the normal longest prefix match in the FIB, and does not require a special implementation. But as opposed to the Negative Disaggregation, the @@ -784,21 +785,21 @@ avoid the black hole; when that is the case, they collectively build a ceiling that protects the grandchild. But until then, a parent that received a Positive Disaggregation may believe that some peers are lacking the reachability and readvertise too early, or defer and maintain a black hole situation longer than necessary. In a non-partitioned fabric, all the ToF nodes see one another through the reflection and can figure if one is missing a child. In that case it is possible to compute the prefixes that the peer cannot reach and disaggregate positively without a ToF-spanning ring. The - ToF nodes can also acertain that the ToP nodes are connected each to + ToF nodes can also ascertain that the ToP nodes are connected each to at least a ToF node that can still reach the prefix, meaning that the transitive operation is not required. The bottom line is that in a fabric that is partitioned (e.g., using multiple planes) and/or where the ToP nodes are not guaranteed to always form a ceiling for their children, it is mandatory to use the Negative Disaggregation. On the other hand, in a highly symmetrical and fully connected fabric, (e.g., a canonical Clos Network), the Positive Disaggregation methods allows to save the complexity and cost associated to the ToF-spanning ring. @@ -846,21 +847,21 @@ routes. Otherwise, the sequence counter from the mobile node, if available, is used. One caveat is that the sequence counter must not wrap within the precision of the timing protocol. Another is that the mobile node may not even provide a sequence counter, in which case the mobility itself must be slower than the precision of the timing. Mobility must not be confused with Anycast. In both cases, a same address is injected in RIFT at different leaves. In the case of mobility, only the freshest route must be conserved, since mobile - node changed its point of attachement for a leaf ot the next. In the + node changed its point of attachement for a leaf to the next. In the case of anycast, the node may be either multihomed (attached to multiple leaves in parallel) or reachable beyond the fabric via multiple routes that are redistributed to different leaves; either way, in the case of anycast, the multiple routes are equally valid and should be conserved. Without further information from the redistributed routing protocol, it is impossible to sort out a movement from a redistribution that happens asynchronously on different leaves. [RIFT] expects that anycast addresses are advertised within the timing precision, which is typically the case with a low-precision timing and a multihomed node. Beyond that time @@ -1185,30 +1187,30 @@ . [RFC7130] Bhatia, M., Ed., Chen, M., Ed., Boutros, S., Ed., Binderberger, M., Ed., and J. Haas, Ed., "Bidirectional Forwarding Detection (BFD) on Link Aggregation Group (LAG) Interfaces", RFC 7130, DOI 10.17487/RFC7130, February 2014, . [RIFT] Przygienda, T., Sharma, A., Thubert, P., Rijsman, B., and D. Afanasiev, "RIFT: Routing in Fat Trees", Work in - Progress, Internet-Draft, draft-ietf-rift-rift-11, 10 - March 2020, - . + Progress, Internet-Draft, draft-ietf-rift-rift-12, 26 May + 2020, + . [I-D.white-distoptflood] White, R., Hegde, S., and S. Zandi, "IS-IS Optimal Distributed Flooding for Dense Topologies", Work in - Progress, Internet-Draft, draft-white-distoptflood-01, 30 - September 2019, - . + Progress, Internet-Draft, draft-white-distoptflood-04, 27 + July 2020, + . 8. Informative References [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch, "Network Time Protocol Version 4: Protocol and Algorithms Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010, . [RFC8200] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", STD 86, RFC 8200, @@ -1224,29 +1226,29 @@ Authors' Addresses Yuehua Wei (editor) ZTE Corporation No.50, Software Avenue Nanjing 210012 China Email: wei.yuehua@zte.com.cn - Zheng Zhang ZTE Corporation No.50, Software Avenue Nanjing 210012 China - Email: zzhang_ietf@hotmail.com + Email: zhang.zheng@zte.com.cn + Dmitry Afanasiev Yandex Email: fl0w@yandex-team.ru Tom Verhaeg Juniper Networks Email: tverhaeg@juniper.net