draft-ietf-intarea-frag-fragile-05.txt   draft-ietf-intarea-frag-fragile-06.txt 
Internet Area WG R. Bonica Internet Area WG R. Bonica
Internet-Draft Juniper Networks Internet-Draft Juniper Networks
Intended status: Best Current Practice F. Baker Intended status: Best Current Practice F. Baker
Expires: July 15, 2019 Unaffiliated Expires: August 2, 2019 Unaffiliated
G. Huston G. Huston
APNIC APNIC
R. Hinden R. Hinden
Check Point Software Check Point Software
O. Troan O. Troan
Cisco Cisco
F. Gont F. Gont
SI6 Networks SI6 Networks
January 11, 2019 January 29, 2019
IP Fragmentation Considered Fragile IP Fragmentation Considered Fragile
draft-ietf-intarea-frag-fragile-05 draft-ietf-intarea-frag-fragile-06
Abstract Abstract
This document describes IP fragmentation and explains how it reduces This document describes IP fragmentation and explains how it reduces
the reliability of Internet communication. the reliability of Internet communication.
This document also proposes alternatives to IP fragmentation and This document also proposes alternatives to IP fragmentation and
provides recommendations for developers and network operators. provides recommendations for developers and network operators.
Status of This Memo Status of This Memo
skipping to change at page 1, line 43 skipping to change at page 1, line 43
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on July 15, 2019. This Internet-Draft will expire on August 2, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 26 skipping to change at page 2, line 26
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . . 3 2. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Links, Paths, MTU and PMTU . . . . . . . . . . . . . . . 3 2.1. Links, Paths, MTU and PMTU . . . . . . . . . . . . . . . 3
2.2. Fragmentation Procedures . . . . . . . . . . . . . . . . 5 2.2. Fragmentation Procedures . . . . . . . . . . . . . . . . 5
2.3. Upper-Layer Reliance on IP Fragmentation . . . . . . . . 6 2.3. Upper-Layer Reliance on IP Fragmentation . . . . . . . . 6
3. Requirements Language . . . . . . . . . . . . . . . . . . . . 7 3. Requirements Language . . . . . . . . . . . . . . . . . . . . 7
4. Reduced Reliability . . . . . . . . . . . . . . . . . . . . . 7 4. Reduced Reliability . . . . . . . . . . . . . . . . . . . . . 7
4.1. Policy-Based Routing . . . . . . . . . . . . . . . . . . 7 4.1. Policy-Based Routing . . . . . . . . . . . . . . . . . . 7
4.2. Network Address Translation (NAT) . . . . . . . . . . . . 8 4.2. Network Address Translation (NAT) . . . . . . . . . . . . 8
4.3. Stateless Firewalls . . . . . . . . . . . . . . . . . . . 8 4.3. Stateless Firewalls . . . . . . . . . . . . . . . . . . . 9
4.4. Stateless Load Balancers . . . . . . . . . . . . . . . . 9 4.4. Stateless Load Balancers . . . . . . . . . . . . . . . . 9
4.5. IPv4 Reassembly Errors at High Data Rates . . . . . . . . 10 4.5. Equal Cost Multipath (ECMP) . . . . . . . . . . . . . . . 10
4.6. Security Vulnerabilities . . . . . . . . . . . . . . . . 10 4.6. IPv4 Reassembly Errors at High Data Rates . . . . . . . . 10
4.7. Blackholing Due to ICMP Loss . . . . . . . . . . . . . . 11 4.7. Security Vulnerabilities . . . . . . . . . . . . . . . . 10
4.7.1. Transient Loss . . . . . . . . . . . . . . . . . . . 12 4.8. PMTU Blackholing Due to ICMP Loss . . . . . . . . . . . . 11
4.7.2. Incorrect Implementation of Security Policy . . . . . 12 4.8.1. Transient Loss . . . . . . . . . . . . . . . . . . . 12
4.7.3. Persistent Loss Caused By Anycast . . . . . . . . . . 13 4.8.2. Incorrect Implementation of Security Policy . . . . . 12
4.8. Blackholing Due To Filtering . . . . . . . . . . . . . . 13 4.8.3. Persistent Loss Caused By Anycast . . . . . . . . . . 13
4.9. Blackholing Due To Filtering or Loss . . . . . . . . . . 13
5. Alternatives to IP Fragmentation . . . . . . . . . . . . . . 14 5. Alternatives to IP Fragmentation . . . . . . . . . . . . . . 14
5.1. Transport Layer Solutions . . . . . . . . . . . . . . . . 14 5.1. Transport Layer Solutions . . . . . . . . . . . . . . . . 14
5.2. Application Layer Solutions . . . . . . . . . . . . . . . 15 5.2. Application Layer Solutions . . . . . . . . . . . . . . . 15
6. Applications That Rely on IPv6 Fragmentation . . . . . . . . 16 6. Applications That Rely on IPv6 Fragmentation . . . . . . . . 16
6.1. DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6.1. DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2. OSPF . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.2. OSPF . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3. Packet-in-Packet Encapsulations . . . . . . . . . . . . . 17 6.3. Packet-in-Packet Encapsulations . . . . . . . . . . . . . 17
6.4. Licklider Transmission Protocol (LTP) . . . . . . . . . . 17 6.4. Licklider Transmission Protocol (LTP) . . . . . . . . . . 17
7. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 18 7. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 18
7.1. For Application and Protocol Developers . . . . . . . . . 18 7.1. For Application and Protocol Developers . . . . . . . . . 18
skipping to change at page 3, line 45 skipping to change at page 3, line 46
link to the next. link to the next.
Internet paths are dynamic. Assume that the path from one node to Internet paths are dynamic. Assume that the path from one node to
another contains a set of links and routers. If the network topology another contains a set of links and routers. If the network topology
changes, that path can also change so that it includes a different changes, that path can also change so that it includes a different
set of links and routers. set of links and routers.
Each link is constrained by the number of bytes that it can convey in Each link is constrained by the number of bytes that it can convey in
a single IP packet. This constraint is called the link Maximum a single IP packet. This constraint is called the link Maximum
Transmission Unit (MTU). IPv4 [RFC0791] requires every link to Transmission Unit (MTU). IPv4 [RFC0791] requires every link to
support a specified MTU (see footnote). IPv6 [RFC8200] requires support a specified MTU (see NOTE 1). IPv6 [RFC8200] requires every
every link to support an MTU of 1280 bytes or greater. These are link to support an MTU of 1280 bytes or greater. These are called
called the IPv4 and IPv6 minimum link MTU's. the IPv4 and IPv6 minimum link MTU's.
Likewise, each Internet path is constrained by the number of bytes Likewise, each Internet path is constrained by the number of bytes
that it can convey in a IP single packet. This constraint is called that it can convey in a IP single packet. This constraint is called
the Path MTU (PMTU). For any given path, the PMTU is equal to the the Path MTU (PMTU). For any given path, the PMTU is equal to the
smallest of its link MTU's. Because Internet paths are dynamic, PMTU smallest of its link MTU's. Because Internet paths are dynamic, PMTU
is also dynamic. is also dynamic.
For reasons described below, source nodes estimate the PMTU between For reasons described below, source nodes estimate the PMTU between
themselves and destination nodes. A source node can produce themselves and destination nodes. A source node can produce
extremely conservative PMTU estimates in which: extremely conservative PMTU estimates in which:
skipping to change at page 4, line 30 skipping to change at page 4, line 31
performance. performance.
By executing Path MTU Discovery (PMTUD) [RFC1191] [RFC8201] By executing Path MTU Discovery (PMTUD) [RFC1191] [RFC8201]
procedures, a source node can maintain a less conservative estimate procedures, a source node can maintain a less conservative estimate
of the PMTU between itself and a destination node. In PMTUD, the of the PMTU between itself and a destination node. In PMTUD, the
source node produces an initial PMTU estimate. This initial estimate source node produces an initial PMTU estimate. This initial estimate
is equal to the MTU of the first link along the path to the is equal to the MTU of the first link along the path to the
destination node. It can be greater than the actual PMTU. destination node. It can be greater than the actual PMTU.
Having produced an initial PMTU estimate, the source node sends non- Having produced an initial PMTU estimate, the source node sends non-
fragmentable IP packets to the destination node. If one of these fragmentable IP packets to the destination node (see NOTE 2). If one
packets is larger than the actual PMTU, a downstream router will not of these packets is larger than the actual PMTU, a downstream router
be able to forward the packet through the next link along the path. will not be able to forward the packet through the next link along
Therefore, the downstream router drops the packet and sends an the path. Therefore, the downstream router drops the packet and
Internet Control Message Protocol (ICMP) [RFC0792] [RFC4443] Packet sends an Internet Control Message Protocol (ICMP) [RFC0792] [RFC4443]
Too Big (PTB) message to the source node. The ICMP PTB message Packet Too Big (PTB) message to the source node (see NOTE 3). The
indicates the MTU of the link through which the packet could not be ICMP PTB message indicates the MTU of the link through which the
forwarded. The source node uses this information to refine its PMTU packet could not be forwarded. The source node uses this information
estimate. to refine its PMTU estimate.
PMTUD produces a running estimate of the PMTU between a source node PMTUD produces a running estimate of the PMTU between a source node
and a destination node. Because PMTU is dynamic, at any given time, and a destination node. Because PMTU is dynamic, at any given time,
the PMTU estimate can differ from the actual PMTU. In order to the PMTU estimate can differ from the actual PMTU. In order to
detect PMTU increases, PMTUD occasionally resets the PMTU estimate to detect PMTU increases, PMTUD occasionally resets the PMTU estimate to
its initial value and repeats the procedure described above. its initial value and repeats the procedure described above.
PMTUD has the following characteristics: Ideally, PMTUD operates as described above. However, in some
scenarios, PMTUD fails. For example:
o It relies on the network's ability to deliver ICMP PTB messages to o PMTUD relies on the network's ability to deliver ICMP PTB messages
the source node. to the source node. If the network cannot deliver ICMP PTB
messages to the source node, PMTUD fails.
o It is susceptible to attack because ICMP messages are easily o PMTUD is susceptible to attack because ICMP messages are easily
forged [RFC5927]. forged [RFC5927]. Such attacks can cause PMTUD to produce
unnecessarily conservative PMTU estimates.
FOOTNOTE: In IPv4, every host must be capable of receiving a packet NOTE 1: In IPv4, every host must be capable of receiving a packet
whose length is equal to 576 bytes. However, the IPv4 minimum link whose length is equal to 576 bytes. However, the IPv4 minimum link
MTU is not 576. Section 3.2 of RFC 791 explicitly states that the MTU is not 576. Section 3.2 of RFC 791 explicitly states that the
IPv4 minimum link MTU is 68 bytes. But for practical purposes, many IPv4 minimum link MTU is 68 bytes. But for practical purposes, many
network operators consider the IPv4 minimum link MTU to be 576 bytes. network operators consider the IPv4 minimum link MTU to be 576 bytes.
So, for the purposes of this document, we assume that the IPv4 So, for the purposes of this document, we assume that the IPv4
minimum link MTU is 576 bytes. minimum link MTU is 576 bytes.
FOOTNOTE: In the paragraphs above, the term "non-fragmentable packet" NOTE 2: A non-fragmentable packet can be fragmented at its source.
is introduced. A non-fragmentable packet can be fragmented at its However, it cannot be fragmented by a downstream node. An IPv4
source. However, it cannot be fragmented by a downstream node. An packet whose DF-bit is set to zero is fragmentable. An IPv4 packet
IPv4 packet whose DF-bit is set to zero is fragmentable. An IPv4 whose DF-bit is set to one is non-fragmentable. All IPv6 packets are
packet whose DF-bit is set to one is non-fragmentable. All IPv6 also non-fragmentable.
packets are also non-fragmentable.
FOOTNOTE: In the paragraphs above, the term "ICMP PTB message" is NOTE 3:: The ICMP PTB message has two instantiations. In ICMPv4
introduced. The ICMP PTB message has two instantiations. In ICMPv4
[RFC0792], the ICMP PTB message is Destination Unreachable message [RFC0792], the ICMP PTB message is Destination Unreachable message
with Code equal to (4) fragmentation needed and DF set. This message with Code equal to (4) fragmentation needed and DF set. This message
was augmented by [RFC1191] to indicates the MTU of the link through was augmented by [RFC1191] to indicate the MTU of the link through
which the packet could not be forwarded. In ICMPv6 [RFC4443], the which the packet could not be forwarded. In ICMPv6 [RFC4443], the
ICMP PTB message is a Packet Too Big Message with Code equal to (0). ICMP PTB message is a Packet Too Big Message with Code equal to (0).
This message also indicates the MTU of the link through which the This message also indicates the MTU of the link through which the
packet could not be forwarded. packet could not be forwarded.
2.2. Fragmentation Procedures 2.2. Fragmentation Procedures
When an upper-layer protocol submits data to the underlying IP When an upper-layer protocol submits data to the underlying IP
module, and the resulting IP packet's length is greater than the module, and the resulting IP packet's length is greater than the
PMTU, the packet can be divided into fragments. Each fragment PMTU, the packet is divided into fragments. Each fragment includes
includes an IP header and a portion of the original packet. an IP header and a portion of the original packet.
[RFC0791] describes IPv4 fragmentation procedures. An IPv4 packet [RFC0791] describes IPv4 fragmentation procedures. An IPv4 packet
whose DF-bit is set to one cannot be fragmented. An IPv4 packet whose DF-bit is set to one can be fragmented by the source node, but
whose DF-bit is set to zero can be fragmented by the source node or cannot be fragmented by a downstream router. An IPv4 packet whose
by any downstream router. When an IPv4 packet is fragmented, all IP DF-bit is set to zero can be fragmented by the source node or by a
options appear in the first fragment, but only options whose "copy" downstream router. When an IPv4 packet is fragmented, all IP options
bit is set to one appear in subsequent fragments. appear in the first fragment, but only options whose "copy" bit is
set to one appear in subsequent fragments.
[RFC8200] describes IPv6 fragmentation procedures. An IPv6 packets [RFC8200] describes IPv6 fragmentation procedures. An IPv6 packet
can be fragmented at the source node only. When an IPv6 packet is can be fragmented at the source node only. When an IPv6 packet is
fragmented, all extension headers appear in the first fragment, but fragmented, all extension headers appear in the first fragment, but
only per-fragment headers appear in subsequent fragments. Per- only per-fragment headers appear in subsequent fragments. Per-
fragment headers include the following: fragment headers include the following:
o The IPv6 header. o The IPv6 header.
o The Hop-by-hop Options header (if present) o The Hop-by-hop Options header (if present)
o The Destination Options header (if present and if it precedes a o The Destination Options header (if present and if it precedes a
skipping to change at page 8, line 5 skipping to change at page 8, line 8
| | | | | | | | | | | |
| 2 | Policy- | 2001:db8::1/128 | TCP / 80 | 2001:db8::3 | | 2 | Policy- | 2001:db8::1/128 | TCP / 80 | 2001:db8::3 |
| | based | | | | | | based | | | |
+-------+--------------+-----------------+------------+-------------+ +-------+--------------+-----------------+------------+-------------+
Table 1: Policy-Based Routing FIB Table 1: Policy-Based Routing FIB
Assume that a router maintains the FIB in Table 1. The first FIB Assume that a router maintains the FIB in Table 1. The first FIB
entry is destination-based. It maps the a destination prefix entry is destination-based. It maps the a destination prefix
(2001:db8::1/128) to a next-hop (2001:db8::2). The second FIB entry (2001:db8::1/128) to a next-hop (2001:db8::2). The second FIB entry
is a policy-based. It maps the same destination prefix is policy-based. It maps the same destination prefix
(2001:db8::1/128) and a destination port ( TCP / 80 ) to a different (2001:db8::1/128) and a destination port ( TCP / 80 ) to a different
next-hop (2001:db8::3). The second entry is more specific than the next-hop (2001:db8::3). The second entry is more specific than the
first. first.
When the router receives the first fragment of a packet that is When the router receives the first fragment of a packet that is
destined for TCP port 80 on 2001:db8::1, it interrogates the FIB. destined for TCP port 80 on 2001:db8::1, it interrogates the FIB.
Both FIB entries satisfy the query. The router selects the second Both FIB entries satisfy the query. The router selects the second
FIB entry because it is more specific and forwards the packet to FIB entry because it is more specific and forwards the packet to
2001:db8::3. 2001:db8::3.
skipping to change at page 8, line 45 skipping to change at page 8, line 48
o The Destination IP Address and Destination Port on each inbound o The Destination IP Address and Destination Port on each inbound
packet. packet.
A+P [RFC6346] and Carrier Grade NAT (CGN) [RFC6888] are two common A+P [RFC6346] and Carrier Grade NAT (CGN) [RFC6888] are two common
NAT strategies. In both approaches the NAT device must virtually NAT strategies. In both approaches the NAT device must virtually
reassemble fragmented packets in order to translate and forward each reassemble fragmented packets in order to translate and forward each
fragment. fragment.
Virtual reassembly in the network is problematic, because it is Virtual reassembly in the network is problematic, because it is
computationally expensive and because it is prone to attacks computationally expensive and because it is prone to attacks
(Section 4.6). (Section 4.7).
4.3. Stateless Firewalls 4.3. Stateless Firewalls
IP fragmentation causes problems for stateless firewalls whose rules IP fragmentation causes problems for stateless firewalls whose rules
include TCP and UDP ports. Because port information is not available include TCP and UDP ports. Because port information is not available
in the trailing fragments the firewall is limited to the following in the trailing fragments the firewall is limited to the following
options: options:
o Accept all trailing fragments, possibly admitting certain classes o Accept all trailing fragments, possibly admitting certain classes
of attack. of attack.
o Block all trailing fragments, possibly blocking legitimate o Block all trailing fragments, possibly blocking legitimate
traffic. traffic.
Neither option is attractive. Neither option is attractive.
This problem does not occur in stateful firewalls. This problem does not occur in stateful firewalls or Network Address
Translation (NAT) devices. Such devices maintain state so that they
can afford identical treatment to each fragment that belongs to a
packet.
4.4. Stateless Load Balancers 4.4. Stateless Load Balancers
IP fragmentation causes problems for stateless load balancers. In IP fragmentation causes problems for stateless load balancers. In
order to assign a packet or packet fragment to a link, the load- order to assign a packet or packet fragment to a link, the load-
balancer executes an algorithm. If the packet or packet fragment balancer executes an algorithm. The following paragraphs describe a
contains a transport-layer header, the load balancing algorithm commonly deployed load-balancing algorithm.
accepts the following 5-tuple as input:
If the packet or packet fragment contains a transport-layer header,
the load balancing algorithm accepts the following 5-tuple as input:
o IP Source Address. o IP Source Address.
o IP Destination Address. o IP Destination Address.
o IPv4 Protocol or IPv6 Next Header. o IPv4 Protocol or IPv6 Next Header.
o transport-layer source port. o transport-layer source port.
o transport-layer destination port. o transport-layer destination port.
skipping to change at page 10, line 5 skipping to change at page 10, line 12
o IP Destination Address. o IP Destination Address.
o IPv4 Protocol or IPv6 Next Header. o IPv4 Protocol or IPv6 Next Header.
Therefore, non-fragmented packets belonging to a flow can be assigned Therefore, non-fragmented packets belonging to a flow can be assigned
to one link while fragmented packets belonging to the same flow can to one link while fragmented packets belonging to the same flow can
be divided between that link and another. This can cause suboptimal be divided between that link and another. This can cause suboptimal
load balancing. load balancing.
4.5. IPv4 Reassembly Errors at High Data Rates 4.5. Equal Cost Multipath (ECMP)
IP fragmentation causes problems for routers that support Equal Cost
Multipath (ECMP). Many routers that support ECMP execute the
algorithm described in Section 4.4. Therefore, the exhibit they same
problematic behaviors described in Section 4.4.
4.6. IPv4 Reassembly Errors at High Data Rates
IPv4 fragmentation is not sufficiently robust for use under some IPv4 fragmentation is not sufficiently robust for use under some
conditions in today's Internet. At high data rates, the 16-bit IP conditions in today's Internet. At high data rates, the 16-bit IP
identification field is not large enough to prevent frequent identification field is not large enough to prevent frequent
incorrectly assembled IP fragments, and the TCP and UDP checksums are incorrectly assembled IP fragments, and the TCP and UDP checksums are
insufficient to prevent the resulting corrupted datagrams from being insufficient to prevent the resulting corrupted datagrams from being
delivered to higher protocol layers. [RFC4963] describes some easily delivered to higher protocol layers. [RFC4963] describes some easily
reproduced experiments demonstrating the problem, and discusses some reproduced experiments demonstrating the problem, and discusses some
of the operational implications of these observations. of the operational implications of these observations.
These reassembly issues are not easily reproducible in IPv6 because These reassembly issues are not easily reproducible in IPv6 because
the IPv6 identification field is 32 bits long. the IPv6 identification field is 32 bits long.
4.6. Security Vulnerabilities 4.7. Security Vulnerabilities
Security researchers have documented several attacks that exploit IP Security researchers have documented several attacks that exploit IP
fragmentation. The following are examples: fragmentation. The following are examples:
o Overlapping fragment attacks [RFC1858][RFC3128][RFC5722] o Overlapping fragment attacks [RFC1858][RFC3128][RFC5722]
o Resource exhaustion attacks (such as the Rose Attack) o Resource exhaustion attacks (such as the Rose Attack)
o Attacks based on predictable fragment identification values o Attacks based on predictable fragment identification values
[RFC7739] [RFC7739]
skipping to change at page 11, line 25 skipping to change at page 11, line 39
for an attacker to forge malicious IP fragments that would cause the for an attacker to forge malicious IP fragments that would cause the
reassembly procedure for legitimate packets to fail. reassembly procedure for legitimate packets to fail.
NIDS aims at identifying malicious activity by analyzing network NIDS aims at identifying malicious activity by analyzing network
traffic. Ambiguity in the possible result of the fragment reassembly traffic. Ambiguity in the possible result of the fragment reassembly
process may allow an attacker to evade these systems. Many of these process may allow an attacker to evade these systems. Many of these
systems try to mitigate some of these evasion techniques (e.g. By systems try to mitigate some of these evasion techniques (e.g. By
computing all possible outcomes of the fragment reassembly process, computing all possible outcomes of the fragment reassembly process,
at the expense of increased processing requirements). at the expense of increased processing requirements).
4.7. Blackholing Due to ICMP Loss 4.8. PMTU Blackholing Due to ICMP Loss
As mentioned in Section 2.3, upper-layer protocols can be configured As mentioned in Section 2.3, upper-layer protocols can be configured
to rely on PMTUD. Because PMTUD relies upon the network to deliver to rely on PMTUD. Because PMTUD relies upon the network to deliver
ICMP PTB messages, those protocols also rely on the networks to ICMP PTB messages, those protocols also rely on the networks to
deliver ICMP PTB messages. deliver ICMP PTB messages.
According to [RFC4890], ICMP PTB messages must not be filtered. According to [RFC4890], ICMP PTB messages must not be filtered.
However, ICMP PTB delivery is not reliable. It is subject to both However, ICMP PTB delivery is not reliable. It is subject to both
transient and persistent loss. transient and persistent loss.
Transient loss of ICMP PTB messages can cause transient black holes. Transient loss of ICMP PTB messages can cause transient PMTU black
When the conditions contributing to transient loss abate, the network holes. When the conditions contributing to transient loss abate, the
regains its ability to deliver ICMP PTB messages and connectivity network regains its ability to deliver ICMP PTB messages and
between the source and destination nodes is restored. Section 4.7.1 connectivity between the source and destination nodes is restored.
of this document describes conditions that lead to transient loss of Section 4.8.1 of this document describes conditions that lead to
ICMP PTB messages. transient loss of ICMP PTB messages.
Persistent loss of ICMP PTB messages can cause persistent black Persistent loss of ICMP PTB messages can cause persistent black
holes. Section 4.7.2 and Section 4.7.3 of this document describe holes. Section 4.8.2 and Section 4.8.3 of this document describe
conditions that lead to persistent loss of ICMP PTB messages. conditions that lead to persistent loss of ICMP PTB messages.
The problem described in this section is specific to PMTUD. It does The problem described in this section is specific to PMTUD. It does
not occur when the upper-layer protocol obtains its PMTU estimate not occur when the upper-layer protocol obtains its PMTU estimate
from PLPMTUD or from any other source. from PLPMTUD or from any other source.
4.7.1. Transient Loss 4.8.1. Transient Loss
The following factors can contribute to transient loss of ICMP PTB The following factors can contribute to transient loss of ICMP PTB
messages: messages:
o Network congestion. o Network congestion.
o Packet corruption. o Packet corruption.
o Transient routing loops. o Transient routing loops.
o ICMP rate limiting. o ICMP rate limiting.
The effect of rate limiting may be severe, as RFC 4443 recommends The effect of rate limiting may be severe, as RFC 4443 recommends
strict rate limiting of IPv6 traffic. strict rate limiting of IPv6 traffic.
4.7.2. Incorrect Implementation of Security Policy 4.8.2. Incorrect Implementation of Security Policy
Incorrect implementation of security policy can cause persistent loss Incorrect implementation of security policy can cause persistent loss
of ICMP PTB messages. of ICMP PTB messages.
Assume that a Customer Premise Equipment (CPE) router implements the Assume that a Customer Premise Equipment (CPE) router implements the
following zone-based security policy: following zone-based security policy:
o Allow any traffic to flow from the inside zone to the outside o Allow any traffic to flow from the inside zone to the outside
zone. zone.
skipping to change at page 13, line 5 skipping to change at page 13, line 15
allows the ICMP PTB to flow from the outside zone to the inside zone. allows the ICMP PTB to flow from the outside zone to the inside zone.
If not, the implementation discards the ICMP PTB message. If not, the implementation discards the ICMP PTB message.
When a incorrect implementation of the above-mentioned security When a incorrect implementation of the above-mentioned security
policy receives an ICMP PTB message, it discards the packet because policy receives an ICMP PTB message, it discards the packet because
its source address is not associated with an existing flow. its source address is not associated with an existing flow.
The security policy described above is implemented incorrectly on The security policy described above is implemented incorrectly on
many consumer CPE routers. many consumer CPE routers.
4.7.3. Persistent Loss Caused By Anycast 4.8.3. Persistent Loss Caused By Anycast
Anycast can cause persistent loss of ICMP PTB messages. Consider the Anycast can cause persistent loss of ICMP PTB messages. Consider the
example below: example below:
A DNS client sends a request to an anycast address. The network A DNS client sends a request to an anycast address. The network
routes that DNS request to the nearest instance of that anycast routes that DNS request to the nearest instance of that anycast
address (i.e., a DNS Server). The DNS server generates a response address (i.e., a DNS Server). The DNS server generates a response
and sends it back to the DNS client. While the response does not and sends it back to the DNS client. While the response does not
exceed the DNS server's PMTU estimate, it does exceed the actual exceed the DNS server's PMTU estimate, it does exceed the actual
PMTU. PMTU.
A downstream router drops the packet and sends an ICMP PTB message A downstream router drops the packet and sends an ICMP PTB message
the packet's source (i.e., the anycast address). The network routes the packet's source (i.e., the anycast address). The network routes
the ICMP PTB message to the anycast instance closest to the the ICMP PTB message to the anycast instance closest to the
downstream router. That anycast instance may not be the DNS server downstream router. That anycast instance may not be the DNS server
that originated the DNS response. It may be another DNS server with that originated the DNS response. It may be another DNS server with
the same anycast address. The DNS server that originated the the same anycast address. The DNS server that originated the
response may never receive the ICMP PTB message and may never updates response may never receive the ICMP PTB message and may never update
it PMTU estimate. its PMTU estimate.
4.8. Blackholing Due To Filtering 4.9. Blackholing Due To Filtering or Loss
In RFC 7872, researchers sampled Internet paths to determine whether In RFC 7872, researchers sampled Internet paths to determine whether
they would convey packets that contain IPv6 extension headers. they would convey packets that contain IPv6 extension headers.
Sampled paths terminated at popular Internet sites (e.g., popular Sampled paths terminated at popular Internet sites (e.g., popular
web, mail and DNS servers). web, mail and DNS servers).
The study revealed that at least 28% of the sampled paths did not The study revealed that at least 28% of the sampled paths did not
convey packets containing the IPv6 Fragment extension header. In convey packets containing the IPv6 Fragment extension header. In
most cases, fragments were dropped in the destination autonomous most cases, fragments were dropped in the destination autonomous
system. In other cases, the fragments were dropped in transit system. In other cases, the fragments were dropped in transit
skipping to change at page 14, line 7 skipping to change at page 14, line 16
Possible causes follow: Possible causes follow:
o Hardware inability to process fragmented packets. o Hardware inability to process fragmented packets.
o Failure to change vendor defaults. o Failure to change vendor defaults.
o Unintentional misconfiguration. o Unintentional misconfiguration.
o Intentional configuration (e.g., network operators consciously o Intentional configuration (e.g., network operators consciously
chooses to drop IPv6 fragments in order to address the issues chooses to drop IPv6 fragments in order to address the issues
raised in Section 4.1 through Section 4.7, above.) raised in Section 4.1 through Section 4.8, above.)
5. Alternatives to IP Fragmentation 5. Alternatives to IP Fragmentation
5.1. Transport Layer Solutions 5.1. Transport Layer Solutions
The Transport Control Protocol (TCP) [RFC0793]) can be operated in a The Transport Control Protocol (TCP) [RFC0793]) can be operated in a
mode that does not require IP fragmentation. mode that does not require IP fragmentation.
Applications submit a stream of data to TCP. TCP divides that stream Applications submit a stream of data to TCP. TCP divides that stream
of data into segments, with no segment exceeding the TCP Maximum of data into segments, with no segment exceeding the TCP Maximum
skipping to change at page 18, line 41 skipping to change at page 18, line 46
(e.g.,PLMPTUD). (e.g.,PLMPTUD).
7.2. For System Developers 7.2. For System Developers
Software libraries SHOULD include provision for PLPMTUD for each Software libraries SHOULD include provision for PLPMTUD for each
supported transport protocol. supported transport protocol.
7.3. For Middle Box Developers 7.3. For Middle Box Developers
Middle boxes SHOULD process IP fragments in a manner that is Middle boxes SHOULD process IP fragments in a manner that is
compliant with RFC 791 and RFC 8200. In many cases, middle boxes consistent with [RFC0791] and [RFC8200]. In many cases, middle boxes
must maintain state in order to achieve this goal. must maintain state in order to achieve this goal.
Price and performance considerations frequently motivate network Price and performance considerations frequently motivate network
operators to deploy stateless middle boxes. These stateless middle operators to deploy stateless middle boxes. These stateless middle
boxes may perform sub-optimally, process IP fragments in a manner boxes may perform sub-optimally, process IP fragments in a manner
that is not compliant with RFC 791 or RFC 8200, or even discard IP that is not compliant with RFC 791 or RFC 8200, or even discard IP
fragments completely. Such behaviors are NOT RECOMMENDED. If a fragments completely. Such behaviors are NOT RECOMMENDED. If a
middleboxes implements non-standard behavior with respect to IP middleboxes implements non-standard behavior with respect to IP
fragmentation, then that behavior MUST be clearly documented. fragmentation, then that behavior MUST be clearly documented.
7.4. For Network Operators 7.4. For Network Operators
Operators MUST ensure proper PMTUD operation in their network,
including making sure the network generates PTB packets when dropping
packets too large compared to outgoing interface MTU.
As per RFC 4890, network operators MUST NOT filter ICMPv6 PTB As per RFC 4890, network operators MUST NOT filter ICMPv6 PTB
messages unless they are known to be forged or otherwise messages unless they are known to be forged or otherwise
illegitimate. As stated in Section 4.7, filtering ICMPv6 PTB packets illegitimate. As stated in Section 4.8, filtering ICMPv6 PTB packets
causes PMTUD to fail. Operators MUST ensure proper PMTUD operation causes PMTUD to fail. Many upper-layer protocols rely on PMTUD.
in their network, including making sure the network generates PTB
packets when dropping packets too large compared to outgoing
interface MTU. Many upper-layer protocols rely on PMTUD.
As per RFC 8200, network operators MUST NOT deploy IPv6 links whose As per RFC 8200, network operators MUST NOT deploy IPv6 links whose
MTU is less than 1280 bytes. MTU is less than 1280 bytes.
Network operators SHOULD NOT filter IP fragments if they originated Network operators SHOULD NOT filter IP fragments if they originated
at a domain name server or are destined for a domain name server. at a domain name server or are destined for a domain name server.
8. IANA Considerations 8. IANA Considerations
This document makes no request of IANA. This document makes no request of IANA.
 End of changes. 36 change blocks. 
76 lines changed or deleted 92 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/