draft-ietf-intarea-frag-fragile-00.txt   draft-ietf-intarea-frag-fragile-01.txt 
Internet Area WG R. Bonica Internet Area WG R. Bonica
Internet-Draft Juniper Networks Internet-Draft Juniper Networks
Intended status: Best Current Practice F. Baker Intended status: Best Current Practice F. Baker
Expires: February 16, 2019 Unaffiliated Expires: April 13, 2019 Unaffiliated
G. Huston G. Huston
APNIC APNIC
R. Hinden R. Hinden
Check Point Software Check Point Software
O. Troan O. Troan
Cisco Cisco
F. Gont F. Gont
SI6 Networks SI6 Networks
August 15, 2018 October 10, 2018
IP Fragmentation Considered Fragile IP Fragmentation Considered Fragile
draft-ietf-intarea-frag-fragile-00 draft-ietf-intarea-frag-fragile-01
Abstract Abstract
This document provides an overview of IP fragmentation. It also This document describes IP fragmentation and explains how it reduces
explains how IP fragmentation reduces the reliability of Internet the reliability of Internet communication.
communication.
Finally, this document proposes alternatives to IP fragmentation and This document also proposes alternatives to IP fragmentation and
provides recommendations for application developers and network provides recommendations for developers and network operators.
operators.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 16, 2019. This Internet-Draft will expire on April 13, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2018 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 25 skipping to change at page 2, line 20
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . . 3 2. IP Fragmentation . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Links, Paths, MTU and PMTU . . . . . . . . . . . . . . . 3 2.1. Links, Paths, MTU and PMTU . . . . . . . . . . . . . . . 3
2.2. Upper-layer Protocols . . . . . . . . . . . . . . . . . . 5 2.2. Fragmentation Procedures . . . . . . . . . . . . . . . . 5
2.3. Upper-Layer Reliance on IP Fragmentation . . . . . . . . 6
3. Requirements Language . . . . . . . . . . . . . . . . . . . . 7 3. Requirements Language . . . . . . . . . . . . . . . . . . . . 7
4. IP Fragmentation Reduces Reliability . . . . . . . . . . . . 7 4. Reduced Reliability . . . . . . . . . . . . . . . . . . . . . 7
4.1. Middle Box Failures . . . . . . . . . . . . . . . . . . . 8 4.1. Policy-Based Routing . . . . . . . . . . . . . . . . . . 7
4.2. Partial Filtering . . . . . . . . . . . . . . . . . . . . 8 4.2. Network Address Translation (NAT) . . . . . . . . . . . . 8
4.3. Telemetry and Monitoring and monitoring Failures . . . . 9 4.3. Stateless Firewalls . . . . . . . . . . . . . . . . . . . 8
4.4. Suboptimal Load Balancing . . . . . . . . . . . . . . . . 9 4.4. Stateless Load Balancers . . . . . . . . . . . . . . . . 9
4.5. Security Vulnerabilities . . . . . . . . . . . . . . . . 10 4.5. Security Vulnerabilities . . . . . . . . . . . . . . . . 9
4.6. Blackholing Due to ICMP Loss . . . . . . . . . . . . . . 11 4.6. Blackholing Due to ICMP Loss . . . . . . . . . . . . . . 11
4.6.1. Transient Loss . . . . . . . . . . . . . . . . . . . 12 4.6.1. Transient Loss . . . . . . . . . . . . . . . . . . . 11
4.6.2. Incorrect Implementation of Security Policy . . . . . 12 4.6.2. Incorrect Implementation of Security Policy . . . . . 12
4.6.3. Persistant Loss Caused By Anycast . . . . . . . . . . 13 4.6.3. Persistent Loss Caused By Anycast . . . . . . . . . . 12
4.7. Blackholing Due To Filtering . . . . . . . . . . . . . . 13 4.7. Blackholing Due To Filtering . . . . . . . . . . . . . . 13
5. Alternatives to IP Fragmentation . . . . . . . . . . . . . . 14 5. Alternatives to IP Fragmentation . . . . . . . . . . . . . . 13
5.1. Transport Layer Solutions . . . . . . . . . . . . . . . . 14 5.1. Transport Layer Solutions . . . . . . . . . . . . . . . . 13
5.2. Application Layer Solutions . . . . . . . . . . . . . . . 15 5.2. Application Layer Solutions . . . . . . . . . . . . . . . 15
6. Applications That Rely on IPv6 Fragmentation . . . . . . . . 16 6. Applications That Rely on IPv6 Fragmentation . . . . . . . . 16
6.1. DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.1. DNS . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2. OSPFv3 . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.2. OSPF . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.3. Packet-in-Packet Encapsulations . . . . . . . . . . . . . 17 6.3. Packet-in-Packet Encapsulations . . . . . . . . . . . . . 17
7. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 17 7. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 17
7.1. For Application Developers . . . . . . . . . . . . . . . 17 7.1. For Application Developers . . . . . . . . . . . . . . . 17
7.2. For Network Operators . . . . . . . . . . . . . . . . . . 18 7.2. For System Developers . . . . . . . . . . . . . . . . . . 17
7.3. For Middle Box Developers . . . . . . . . . . . . . . . . 17
7.4. For Network Operators . . . . . . . . . . . . . . . . . . 18
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18
9. Security Considerations . . . . . . . . . . . . . . . . . . . 18 9. Security Considerations . . . . . . . . . . . . . . . . . . . 18
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 18
11.1. Normative References . . . . . . . . . . . . . . . . . . 18 11.1. Normative References . . . . . . . . . . . . . . . . . . 18
11.2. Informative References . . . . . . . . . . . . . . . . . 20 11.2. Informative References . . . . . . . . . . . . . . . . . 20
Appendix A. Contributors' Address . . . . . . . . . . . . . . . 22 Appendix A. Contributors' Address . . . . . . . . . . . . . . . 22
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 23
1. Introduction 1. Introduction
Operational experience [RFC7872] [Huston] reveals that IP Operational experience [Kent] [Huston] [RFC7872] reveals that IP
fragmentation reduces the reliability of Internet communication. fragmentation reduces the reliability of Internet communication.
This document provides an overview of IP fragmentation. It also This document describes IP fragmentation and explains how it reduces
explains how IP fragmentation reduces the reliability of Internet the reliability of Internet communication. This document also
communication. proposes alternatives to IP fragmentation and provides
recommendations for developers and network operators.
Finally, this document proposes alternatives to IP fragmentation and
provides recommendations for application developers and network
operators.
While this document identifies issues associated with IP While this document identifies issues associated with IP
fragmentation, it does not recommend deprecation of IP fragmentation fragmentation, it does not recommend deprecation. Some applications
features. This document recognizes that IP fragmentation is required (e.g., [I-D.ietf-intarea-tunnels]) require IP fragmentation.
for some applications [I-D.ietf-intarea-tunnels].
Rather than deprecating IP Fragmentation, this document recommends
that upper-layer protocols address the problem of fragmentation at
their layer, reducing their reliance on IP fragmentation to the
greatest degree possible.
2. IP Fragmentation 2. IP Fragmentation
2.1. Links, Paths, MTU and PMTU 2.1. Links, Paths, MTU and PMTU
An Internet path connects a source node to a destination node. A An Internet path connects a source node to a destination node. A
path can contain links and intermediate systems. If a path contains path can contain links and routers. If a path contains more than one
more than one link, the links are connected in series and an link, the links are connected in series and a router connects each
intermediate system connects each link to the next. An intermediate link to the next.
system can be a router or a middle box.
Internet paths are dynamic. Assume that the path from one node to Internet paths are dynamic. Assume that the path from one node to
another contains a set of links and intermediate systems. If the another contains a set of links and routers. If the network topology
network topology changes, that path can also change so that it changes, that path can also change so that it includes a different
includes a different set of links and intermediate systems. set of links and routers.
Each link is constrained by the number of bytes that it can convey in Each link is constrained by the number of bytes that it can convey in
a single IP packet. This constraint is called the link Maximum a single IP packet. This constraint is called the link Maximum
Transmission Unit (MTU). IPv4 [RFC0791] requires every link to have Transmission Unit (MTU). IPv4 [RFC0791] requires every link to
an MTU of 68 bytes or greater. IPv6 [RFC8200] requires every link to support a specified MTU (see footnote). IPv6 [RFC8200] requires
have an MTU of 1280 bytes or greater. These are called the IPv4 and every link to support an MTU of 1280 bytes or greater. These are
IPv6 minimum link MTU's. called the IPv4 and IPv6 minimum link MTU's.
Each Internet path is constrained by the number of bytes that it can Likewise, each Internet path is constrained by the number of bytes
convey in a IP single packet. This constraint is called the Path MTU that it can convey in a IP single packet. This constraint is called
(PMTU). For any given path, the PMTU is equal to the smallest of its the Path MTU (PMTU). For any given path, the PMTU is equal to the
link MTU's. Because Internet paths are dynamic, PMTU is also smallest of its link MTU's. Because Internet paths are dynamic, PMTU
dynamic. is also dynamic.
For reasons described below, source nodes estimate the PMTU between For reasons described below, source nodes estimate the PMTU between
themselves and destination nodes. A source node can produce themselves and destination nodes. A source node can produce
extremely conservative PMTU estimates in which: extremely conservative PMTU estimates in which:
o The estimate for each IPv4 path is equal to the IPv4 minimum link o The estimate for each IPv4 path is equal to the IPv4 minimum link
MTU. MTU.
o The estimate for each IPv6 path is equal to the IPv6 minimum link o The estimate for each IPv6 path is equal to the IPv6 minimum link
MTU. MTU.
While these conservative estimates are guaranteed to be less than or While these conservative estimates are guaranteed to be less than or
equal to the actual PMTU, they are likely to be much less than the equal to the actual PMTU, they are likely to be much less than the
actual PMTU. This may adversely affect upper-layer protocol actual PMTU. This may adversely affect upper-layer protocol
performance. performance.
By executing Path MTU Discovery (PMTUD) [RFC1191] [RFC8201] By executing Path MTU Discovery (PMTUD) [RFC1191] [RFC8201]
procedures, a source node can maintain a less conservative, running procedures, a source node can maintain a less conservative estimate
estimate of the PMTU between itself and a destination node. of the PMTU between itself and a destination node. In PMTUD, the
According to these procedures, the source node produces an initial source node produces an initial PMTU estimate. This initial estimate
PMTU estimate. This initial estimate is equal to the MTU of the is equal to the MTU of the first link along the path to the
first link along the path to the destination node. It can be greater destination node. It can be greater than the actual PMTU.
than the actual PMTU.
Having produced an initial PMTU estimate, the source node sends non- Having produced an initial PMTU estimate, the source node sends non-
fragmentable IP packets to the destination node. If one of these fragmentable IP packets to the destination node. If one of these
packets is larger than the actual PMTU, a downstream router will not packets is larger than the actual PMTU, a downstream router will not
be able to forward the packet through the next link along the path. be able to forward the packet through the next link along the path.
Therefore, the downstream router drops the packet and sends an Therefore, the downstream router drops the packet and sends an
Internet Control Message Protocol (ICMP) [RFC0792] [RFC4443] Packet Internet Control Message Protocol (ICMP) [RFC0792] [RFC4443] Packet
Too Big (PTB) message to the source node. The ICMP PTB message Too Big (PTB) message to the source node. The ICMP PTB message
indicates the MTU of the link through which the packet could not be indicates the MTU of the link through which the packet could not be
forwarded. The source node uses this information to refine its PMTU forwarded. The source node uses this information to refine its PMTU
estimate. estimate.
PMTUD produces a running estimate of the PMTU between a source node PMTUD produces a running estimate of the PMTU between a source node
and a destination node. Because PMTU is dynamic, at any given time, and a destination node. Because PMTU is dynamic, at any given time,
the PMTU estimate can differ from the actual PMTU. In order to the PMTU estimate can differ from the actual PMTU. In order to
detect PMTU increases, PMTUD occasionally resets the PMTU estimate to detect PMTU increases, PMTUD occasionally resets the PMTU estimate to
the MTU of the first link along path to the destination node. It its initial value and repeats the procedure described above.
then repeats the procedure described above.
PMTUD has the following characteristics: PMTUD has the following characteristics:
o It relies on the network's ability to deliver ICMP PTB messages to o It relies on the network's ability to deliver ICMP PTB messages to
the source node. the source node.
o It is susceptible to attack because ICMP messages are easily o It is susceptible to attack because ICMP messages are easily
forged [RFC5927]. forged [RFC5927].
FOOTNOTE: According to RFC 0791, every IPv4 host must be capable of FOOTNOTE: In IPv4, every host must be capable of receiving a packet
receiving a packet whose length is equal to 576 bytes. However, the whose length is equal to 576 bytes. However, the IPv4 minimum link
IPv4 minimum link MTU is not 576. Section 3.2 of RFC 0791 explicitly MTU is not 576. Section 3.2 of RFC 791 explicitly states that the
states that the IPv4 minimum link MTU is 68 bytes. IPv4 minimum link MTU is 68 bytes. But for practical purposes, many
network operators consider the IPv4 minimum link MTU to be 576 bytes.
So, for the purposes of this document, we assume that the IPv4
minimum link MTU is 576 bytes.
FOOTNOTE: In the paragraphs above, the term "non-fragmentable packet" FOOTNOTE: In the paragraphs above, the term "non-fragmentable packet"
is introduced. A non-fragmentable packet can be fragmented at its is introduced. A non-fragmentable packet can be fragmented at its
source. However, it cannot be fragmented by a downstream node. An source. However, it cannot be fragmented by a downstream node. An
IPv4 packet whose DF-bit is set to zero is fragmentable. An IPv4 IPv4 packet whose DF-bit is set to zero is fragmentable. An IPv4
packet whose DF-bit is set to one is non-fragmentable. All IPv6 packet whose DF-bit is set to one is non-fragmentable. All IPv6
packets are also non-fragmentable. packets are also non-fragmentable.
FOOTNOTE: In the paragraphs above, the term "ICMP PTB message" is FOOTNOTE: In the paragraphs above, the term "ICMP PTB message" is
introduced. The ICMP PTB message has two instantiations. In ICMPv4 introduced. The ICMP PTB message has two instantiations. In ICMPv4
[RFC0792], the ICMP PTB message is Destination Unreachable message [RFC0792], the ICMP PTB message is Destination Unreachable message
with Code equal to (4) fragmentation needed and DF set. This message with Code equal to (4) fragmentation needed and DF set. This message
was augmented by [RFC1191] to indicates the MTU of the link through was augmented by [RFC1191] to indicates the MTU of the link through
which the packet could not be forwarded. In ICMPv6 [RFC4443], the which the packet could not be forwarded. In ICMPv6 [RFC4443], the
ICMP PTB message is a Packet Too Big Message with Code equal to (0). ICMP PTB message is a Packet Too Big Message with Code equal to (0).
This message also indicates the MTU of the link through which the This message also indicates the MTU of the link through which the
packet could not be forwarded. packet could not be forwarded.
2.2. Upper-layer Protocols 2.2. Fragmentation Procedures
When an upper-layer protocol submits data to the underlying IP When an upper-layer protocol submits data to the underlying IP
module, and the resulting IP packet's length is greater than the module, and the resulting IP packet's length is greater than the
PMTU, IP fragmentation may be required. IP fragmentation divides a PMTU, the packet can be divided into fragments. Each fragment
packet into fragments. Each fragment includes an IP header and a includes an IP header and a portion of the original packet.
portion of the original packet.
[RFC0791] describes IPv4 fragmentation procedures. IPv4 packets [RFC0791] describes IPv4 fragmentation procedures. An IPv4 packet
whose DF-bit is set to one cannot be fragmented. IPv4 packets whose whose DF-bit is set to one cannot be fragmented. An IPv4 packet
DF-bit is set to zero can be fragmented at the source node or by any whose DF-bit is set to zero can be fragmented by the source node or
downstream router. [RFC8200] describes IPv6 fragmentation by any downstream router. When an IPv4 packet is fragmented, all IP
procedures. IPv6 packets can be fragmented at the source node only. options appear in the first fragment, but only options whose "copy"
bit is set to one appear in subsequent fragments.
IPv4 fragmentation differs slightly from IPv6 fragmentation. [RFC8200] describes IPv6 fragmentation procedures. An IPv6 packets
However, in both IP versions, the upper-layer header appears in the can be fragmented at the source node only. When an IPv6 packet is
first fragment only. It does not appear in subsequent fragments. fragmented, all extension headers appear in the first fragment, but
only per-fragment headers appear in subsequent fragments. Per-
fragment headers include the following:
o The IPv6 header.
o The Hop-by-hop Options header (if present)
o The Destination Options header (if present and if it precedes a
Routing header)
o The Routing Header (if present)
o The Fragment Header
In both IPv4 and IPv6, the upper-layer header appears in the first
fragment only. It does not appear in subsequent fragments.
2.3. Upper-Layer Reliance on IP Fragmentation
Upper-layer protocols can operate in the following modes: Upper-layer protocols can operate in the following modes:
o Do not rely on IP fragmentation. o Do not rely on IP fragmentation.
o Rely on IP source fragmentation only (i.e., fragmentation at the o Rely on IP fragmentation by the source node only.
source node).
o Rely on IP source fragmentation and downstream fragmentation o Rely on IP fragmentation by any node.
(i.e., fragmentation at any node along the path).
Upper-layer protocols running over IPv4 can operate in all of the Upper-layer protocols running over IPv4 can operate in all of the
above-mentioned modes. Upper-layer protocols running over IPv6 can above-mentioned modes. Upper-layer protocols running over IPv6 can
operate in the first and second modes only. operate in the first and second modes only.
Upper-layer protocols that operate in the first two modes (above) Upper-layer protocols that operate in the first two modes (above)
require access to the PMTU estimate. In order to fulfil this require access to the PMTU estimate. In order to fulfil this
requirement, they can requirement, they can:
o Estimate the PMTU to be equal to the IPv4 or IPv6 minimum link o Estimate the PMTU to be equal to the IPv4 or IPv6 minimum link
MTU. MTU.
o Access the estimate that PMTUD produced. o Access the estimate that PMTUD produced.
o Execute PMTUD procedures themselves. o Execute PMTUD procedures themselves.
o Execute Packetization Layer PMTUD (PLPMTUD) [RFC4821] o Execute Packetization Layer PMTUD (PLPMTUD) [RFC4821]
[I-D.fairhurst-tsvwg-datagram-plpmtud] procedures. [I-D.ietf-tsvwg-datagram-plpmtud] procedures.
According to PLPMTUD procedures, the upper-layer protocol maintains a According to PLPMTUD procedures, the upper-layer protocol maintains a
running PMTU estimate. It does so by sending probe packets of running PMTU estimate. It does so by sending probe packets of
various sizes to its peer and receiving acknowledgements. This various sizes to its upper-layer peer and receiving acknowledgements.
strategy differs from PMTUD in that it relies of acknowledgement of This strategy differs from PMTUD in that it relies of acknowledgement
received messages, as opposed to ICMP PTB messages concerning dropped of received messages, as opposed to ICMP PTB messages concerning
messages. Therefore, PLPMTUD does not rely on the network's ability dropped messages. Therefore, PLPMTUD does not rely on the network's
to deliver ICMP PTB messages to the source. ability to deliver ICMP PTB messages to the source.
An upper-layer protocol that does not rely on IP fragmentation never
causes the underlying IP module to emit
o A fragmentable IP packet (i.e., an IPv4 packet with the DF-bit set
to zero).
o An IP fragment.
o A packet whose length is greater than the PMTU estimate.
However, when the PMTU estimate is greater than the actual PMTU, the
upper-layer protocol can cause the underlying IP module to emit a
packet whose length is greater than the actual PMTU. When this
occurs, a downstream router drops the packet and the source node
refines its PMTU estimate, employing either PMTUD or PLPMTUD
procedures.
When an upper-layer protocol that relies on IP source fragmentation
only submits data to the underlying IP module, and the resulting
packet is larger than the PMTU estimate, the underlying IP module
fragments the packet and emits the fragments. However, the upper-
layer protocol never causes the underlying IP module to emit
o A fragmentable IP packet.
o A packet whose length is greater than the PMTU estimate.
When the PMTU estimate is greater than the actual PMTU, the upper-
layer protocol can cause the underlying IP module to emit a packet
whose length is greater than the actual PMTU. When this occurs, a
downstream router drops the packet and the source node refines its
PMTU estimate, employing either PMTUD or PLPMTUD procedures.
An upper-layer protocol that relies on IP source fragmentation and
downstream fragmentation can cause the underlying IP module to emit
o A fragmentable IP packet.
o An IP fragment.
o A packet whose length is greater than the PMTU estimate.
A protocol that relies on IP source fragmentation and downstream
fragmentation does not require access to the PMTU estimate. For
these protocols, the underlying IP module:
o Fragments all packets whose length exceeds the MTU of the first
link along the path to the destination.
o Sets the DF-bit to zero, so that downstream nodes can fragment the
packet.
3. Requirements Language 3. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119] [RFC8174] when, and only when, they appear in all 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
4. IP Fragmentation Reduces Reliability 4. Reduced Reliability
This section explains how IP fragmentation reduces the reliability of This section explains how IP fragmentation reduces the reliability of
Internet communication. Internet communication.
4.1. Middle Box Failures 4.1. Policy-Based Routing
Many middle boxes require access to the transport-layer header. IP Fragmentation causes problems for routers that implement policy-
However, when a packet is divided into fragments, the transport-layer based routing.
header appears in the first fragment only. It does not appear in
subsequent fragments. This omission can prevent middle boxes from
delivering their intended services.
For example, assume that a router diverts selected packets from their When a router receives a packet, it identifies the next-hop on route
normal path towards network appliances that support deep packet to the packet's destination and forwards the packet to that next-hop.
inspection and lawful intercept. The router selects packets for In order to identify the next-hop, the router interrogates a local
diversion based upon the following 5-tuple: data structure called the Forwarding Information Base (FIB).
o IP Source Address. Normally, the FIB contains destination-based entries that map a
destination prefix to a next-hop. Policy-based routing allows
destination-based and policy-based entries to coexist in the same
FIB. A policy-based FIB entry maps multiple fields, drawn from
either the IP or transport-layer header, to a next-hop.
o IP Destination Address. +-------+--------------+-----------------+------------+-------------+
| Entry | Type | Dest. Prefix | Next Hdr / | Next-Hop |
| | | | Dest. Port | |
+-------+--------------+-----------------+------------+-------------+
| | | | | |
| 1 | Destination- | 2001:db8::1/128 | Any / Any | 2001:db8::2 |
| | based | | | |
| | | | | |
| 2 | Policy- | 2001:db8::1/128 | TCP / 80 | 2001:db8::3 |
| | based | | | |
+-------+--------------+-----------------+------------+-------------+
o IPv4 Protocol or IPv6 Next Header. Table 1: Policy-Based Routing FIB
o transport-layer source port. Assume that a router maintains the FIB in Table 1. The first FIB
entry is destination-based. It maps the a destination prefix
(2001:db8::1/128) to a next-hop (2001:db8::2). The second FIB entry
is a policy-based. It maps the same destination prefix
(2001:db8::1/128) and a destination port ( TCP / 80 ) to a different
next-hop (2001:db8::3). The second entry is more specific than the
first.
o transport-layer destination port. When the router receives the first fragment of a packet that is
destined for TCP port 80 on 2001:db8::1, it interrogates the FIB.
Both FIB entries satisfy the query. The router selects the second
FIB entry because it is more specific and forwards the packet to
2001:db8::3.
IP fragmentation causes this selection algorithm to behave When the router receives the second fragment of the packet, it
suboptimally, because the transport-layer header appears only in the interrogates the FIB again. This time, only the first FIB entry
first fragment of each packet. satisfies the query, because the second fragment contains no
indication that the packet is destined for TCP port 80. Therefore,
the router selects the first FIB entry and forwards the packet to
2001:db8::2.
In another example, a middle box remarks a packet's Differentiated Policy-based routing is also known as filter-based-forwarding.
Services Code Point [RFC2474] based upon the above-mentioned 5-tuple.
IP fragmentation causes this process to behave suboptimally, because
the transport-layer header appears only in the first fragment of each
packet.
In all of the above-mentioned examples, the middle box cannot deliver 4.2. Network Address Translation (NAT)
its intended service without reassembling fragmented packets.
4.2. Partial Filtering IP fragmentation causes problems for Network Address Translation
(NAT) devices. When a NAT device detects a new, outbound flow, it
maps that flow's source port and IP address to another source port
and IP address. Having created that mapping, the NAT device
translates:
IP fragments cause problems for firewalls whose filter rules include o The Source IP Address and Source Port on each outbound packet.
decision making based on TCP and UDP ports. As the port information
is not in the trailing fragments the firewall may elect to accept all
trailing fragments, which may admit certain classes of attack, or may
elect to block all trailing fragments, which may block otherwise
legitimate traffic, or may elect to reassemble all fragmented
packets, which may be inefficient and negatively affect performance.
4.3. Telemetry and Monitoring and monitoring Failures o The Destination IP Address and Destination Port on each inbound
packet.
Stateless telemetry and monitoring strategies may require the A+P [RFC6346] and Carrier Grade NAT (CGN) [RFC6888] are two common
transport-layer header to appear in every packet. However, when a NAT strategies. In both approaches the NAT device must virtually
packet is divided into fragments, the transport-layer header appears reassemble fragmented packets in order to translate and forward each
in the first fragment only. It does not appear in subsequent fragment.
fragments. This omission can prevent some stateless telemetry
strategies from functioning correctly.
4.4. Suboptimal Load Balancing Virtual reassembly in the network is problematic, because it is
computationally expensive and because it is prone to attacks
(Section 4.5).
Many stateless load-balancers require access to the transport-layer 4.3. Stateless Firewalls
header. Assume that a load-balancer distributes flows among parallel
links. In order to optimize load balancing, the load-balancer sends
every packet or packet fragment belonging to a flow through the same
link.
In order to assign a packet or packet fragment to a link, the load- IP fragmentation causes problems for stateless firewalls whose rules
include TCP and UDP ports. Because port information is not available
in the trailing fragments the firewall is limited to the following
options:
o Accept all trailing fragments, possibly admitting certain classes
of attack.
o Block all trailing fragments, possibly blocking legitimate
traffic.
Neither option is attractive.
This problem does not occur in stateful firewalls.
4.4. Stateless Load Balancers
IP fragmentation causes problems for stateless load balancers. In
order to assign a packet or packet fragment to a link, the load-
balancer executes an algorithm. If the packet or packet fragment balancer executes an algorithm. If the packet or packet fragment
contains a transport-layer header, the load balancing algorithm contains a transport-layer header, the load balancing algorithm
accepts the following 5-tuple as input: accepts the following 5-tuple as input:
o IP Source Address. o IP Source Address.
o IP Destination Address. o IP Destination Address.
o IPv4 Protocol or IPv6 Next Header. o IPv4 Protocol or IPv6 Next Header.
o transport-layer source port. o transport-layer source port.
o transport-layer destination port. o transport-layer destination port.
However, if the packet or packet fragment does not contain a If the packet or packet fragment does not contain a transport-layer
transport-layer header, the load balancing algorithm accepts only the header, the load balancing algorithm accepts only the following
following 3-tuple as input: 3-tuple as input:
o IP Source Address. o IP Source Address.
o IP Destination Address. o IP Destination Address.
o IPv4 Protocol or IPv6 Next Header. o IPv4 Protocol or IPv6 Next Header.
Therefore, non-fragmented packets belonging to a flow can be assigned Therefore, non-fragmented packets belonging to a flow can be assigned
to one link while fragmented packets belonging to the same flow can to one link while fragmented packets belonging to the same flow can
be divided between that link and another. This can cause suboptimal be divided between that link and another. This can cause suboptimal
load balancing. load balancing.
4.5. Security Vulnerabilities 4.5. Security Vulnerabilities
Security researchers have documented several attacks that rely on IP Security researchers have documented several attacks that exploit IP
fragmentation. The following are examples: fragmentation. The following are examples:
o Overlapping fragment attack [RFC1858][RFC3128] [RFC5722] o Overlapping fragment attacks [RFC1858][RFC3128][RFC5722]
o Resource exhaustion attacks (such as the Rose Attack) o Resource exhaustion attacks (such as the Rose Attack)
o Attacks based on predictable fragment identification values o Attacks based on predictable fragment identification values
[RFC7739] [RFC7739]
o Attacks based on bugs in the implementation of the fragment
reassembly algorithm
o Evasion of Network Intrusion Detection Systems (NIDS) [Ptacek1998] o Evasion of Network Intrusion Detection Systems (NIDS) [Ptacek1998]
In the overlapping fragment attack, an attacker constructs a series In the overlapping fragment attack, an attacker constructs a series
of packet fragments. The first fragment contains an IP header, a of packet fragments. The first fragment contains an IP header, a
transport-layer header, and some transport-layer payload. This transport-layer header, and some transport-layer payload. This
fragment complies with local security policy and is allowed to pass fragment complies with local security policy and is allowed to pass
through a stateless firewall. A second fragment, having a non-zero through a stateless firewall. A second fragment, having a non-zero
offset, overlaps with the first fragment. The second fragment also offset, overlaps with the first fragment. The second fragment also
passes through the stateless firewall. When the packet is passes through the stateless firewall. When the packet is
reassembled, the transport layer header from the first fragment is reassembled, the transport layer header from the first fragment is
overwritten by data from the second fragment. The reassembled packet overwritten by data from the second fragment. The reassembled packet
does not comply with local security policy. Had it traversed the does not comply with local security policy. Had it traversed the
firewall in one piece, the firewall would have rejected it. firewall in one piece, the firewall would have rejected it.
A stateless firewall cannot protect against the overlapping fragment A stateless firewall cannot protect against the overlapping fragment
attack. However, destination nodes can protect against the attack. However, destination nodes can protect against the
overlapping fragment attack by implementing the reassembly procedures overlapping fragment attack by implementing the procedures described
described in RFC 1858, RFC 3128 and RFC 8200. These reassembly in RFC 1858, RFC 3128 and RFC 8200. These reassembly procedures
procedures detect the overlap and discard the packet. detect the overlap and discard the packet.
The fragment reassembly algorithm is a stateful procedure for an The fragment reassembly algorithm is a stateful procedure for an
otherwise stateless protocol. As such, it can be exploited for otherwise stateless protocol. Therefore, it can be exploited by
resource exhaustion attacks. An attacker can construct a series of resource exhaustion attacks. An attacker can construct a series of
fragmented packets, with one fragment missing from each packet so fragmented packets, with one fragment missing from each packet so
that the reassembly process cannot complete. Thus, this attack that the reassembly is impossible. Thus, this attack causes resource
causes resource exhaustion on the destination node, possibly denying exhaustion on the destination node, possibly denying reassembly
reassembly services to other flows. This type of attack can be services to other flows. This type of attack can be mitigated by
mitigated by flushing fragment reassembly buffers when necessary, at flushing fragment reassembly buffers when necessary, at the expense
the expense of possibly dropping legitimate fragments. of possibly dropping legitimate fragments.
An IP fragment contains an "Identification" field that, together with
the IP Source Address and Destination Address of a packet, identifies
fragments that correspond to the same original datagram, so that they
can be reassembled together by the receiving host. Many
implementations have employed predictable values for the
Identification field, thus making it easy for an attacker to forge
malicious IP fragments that would cause the reassembly procedure for
legitimate packets to fail.
Over the years multiple IPv4 and IPv6 implementations have been found Each IP fragment contains an "Identification" field that destination
to have flaws in their implementation of the IP fragment reassembly nodes use to reassemble fragmented packets. Many implementations set
algorithm, typically resulting in buffer overflows. These buffer the Identification field to a predictable value, thus making it easy
overflows have been exploitable for denial of service and remote code for an attacker to forge malicious IP fragments that would cause the
execution attacks. reassembly procedure for legitimate packets to fail.
NIDS aims at identifying malicious activity by analyzing network NIDS aims at identifying malicious activity by analyzing network
traffic. Ambiguity in the possible result of the fragment reassembly traffic. Ambiguity in the possible result of the fragment reassembly
process may allow an attacker to evade these systems. Many of these process may allow an attacker to evade these systems. Many of these
systems try to mitigate some of these evasion techniques by e.g. systems try to mitigate some of these evasion techniques (e.g. By
Computing all possible outcomes of the fragment reassembly process, computing all possible outcomes of the fragment reassembly process,
at the expense of increased processing requirements. at the expense of increased processing requirements).
4.6. Blackholing Due to ICMP Loss 4.6. Blackholing Due to ICMP Loss
As stated above, an upper-layer protocol requires access the PMTU As mentioned in Section 2.3, upper-layer protocols can be configured
estimate if it: to rely on PMTUD. Because PMTUD relies upon the network to deliver
ICMP PTB messages, those protocols also rely on the networks to
o Does not rely on IP fragmentation. deliver ICMP PTB messages.
o Relies on IP source fragmentation only (i.e., fragmentation at the
source node).
In order to satisfy this requirement, the upper-layer protocol can:
o Estimate the PMTU to be equal to the IPv4 or IPv6 minimum link
MTU.
o Access the estimate that PMTUD produced.
o Execute PMTUD procedures itself.
o Execute PLPMTUD procedures.
PMTUD relies upon the network's ability to deliver ICMP PTB messages
to the source node. Therefore, if an upper-layer protocol relies on
PMTUD, it also relies on the network's ability to deliver ICMP PTB
messages to the source node.
According to [RFC4890], ICMP PTB messages must not be filtered. According to [RFC4890], ICMP PTB messages must not be filtered.
However, ICMP PTB delivery is not reliable. It is subject to both However, ICMP PTB delivery is not reliable. It is subject to both
transient and persistent loss. transient and persistent loss.
Transient loss of ICMP PTB messages causes PMTUD to perform less Transient loss of ICMP PTB messages can cause transient black holes.
efficiently, but does not cause it to fail completely. When the When the conditions contributing to transient loss abate, the network
conditions contributing to transient loss abate, the network regains regains its ability to deliver ICMP PTB messages and connectivity
its ability to deliver ICMP PTB messages and PMTUD regains its between the source and destination nodes is restored. Section 4.6.1
ability to function. Section 4.6.1 of this document describes of this document describes conditions that lead to transient loss of
conditions that lead to transient loss of ICMP PTB messages. ICMP PTB messages.
However, persistent loss of ICMP PTB messages causes PMTUD to fail Persistent loss of ICMP PTB messages can cause persistent black
completely. Section 4.6.2 and Section 4.6.3 of this document holes. Section 4.6.2 and Section 4.6.3 of this document describe
describe conditions that lead to persistent loss of ICMP PTB conditions that lead to persistent loss of ICMP PTB messages.
messages.
The problem described in this section is specific to PMTUD. It does The problem described in this section is specific to PMTUD. It does
not occur when the upper-layer protocol obtains its PMTU estimate not occur when the upper-layer protocol obtains its PMTU estimate
from PLPMTUD or any other source. from PLPMTUD or from any other source.
4.6.1. Transient Loss 4.6.1. Transient Loss
The following factors can contribute to transient loss of ICMP PTB The following factors can contribute to transient loss of ICMP PTB
messages: messages:
o Network congestion. o Network congestion.
o Packet corruption. o Packet corruption.
skipping to change at page 13, line 7 skipping to change at page 12, line 22
o Allow any traffic to flow from the inside zone to the outside o Allow any traffic to flow from the inside zone to the outside
zone. zone.
o Do not allow any traffic to flow from the outside zone to the o Do not allow any traffic to flow from the outside zone to the
inside zone unless it is part of an existing flow (i.e., it was inside zone unless it is part of an existing flow (i.e., it was
elicited by an outbound packet). elicited by an outbound packet).
When a correct implementation of the above-mentioned security policy When a correct implementation of the above-mentioned security policy
receives an ICMP PTB message, it examines the ICMP PTB payload in receives an ICMP PTB message, it examines the ICMP PTB payload in
order to determine the original packet (i.e., the packet that order to determine whether the original packet (i.e., the packet that
elicited the ICMP PTB message) belonged to an existing flow. If the elicited the ICMP PTB message) belonged to an existing flow. If the
original packet belonged to an existing flow, the implementation original packet belonged to an existing flow, the implementation
allows the ICMP PTB to flow from the outside zone to the inside zone. allows the ICMP PTB to flow from the outside zone to the inside zone.
If not, the implementation discards the ICMP PTB message. If not, the implementation discards the ICMP PTB message.
When a incorrect implementation of the above-mentioned security When a incorrect implementation of the above-mentioned security
policy receives an ICMP PTB message, it discards the packet because policy receives an ICMP PTB message, it discards the packet because
its source address is not associated with an existing flow. its source address is not associated with an existing flow.
The security policy described above is implemented incorrectly on The security policy described above is implemented incorrectly on
many consumer CPE routers. many consumer CPE routers.
4.6.3. Persistant Loss Caused By Anycast 4.6.3. Persistent Loss Caused By Anycast
Anycast can cause persistent loss of ICMP PTB messages. Consider the Anycast can cause persistent loss of ICMP PTB messages. Consider the
example below: example below:
A DNS client sends a request to an anycast address. The network A DNS client sends a request to an anycast address. The network
routes that DNS request to the nearest instance of that anycast routes that DNS request to the nearest instance of that anycast
address (i.e., a DNS Server). The DNS server generates a response address (i.e., a DNS Server). The DNS server generates a response
and sends it back to the DNS client. While the response does not and sends it back to the DNS client. While the response does not
exceed the DNS server's PMTU estimate, it does exceed the actual exceed the DNS server's PMTU estimate, it does exceed the actual
PMTU. PMTU.
A downstream router drops the packet and sends an ICMP PTB message A downstream router drops the packet and sends an ICMP PTB message
the packet's source (i.e., the anycast address). The network routes the packet's source (i.e., the anycast address). The network routes
the ICMP PTB message to the anycast instance closest to the the ICMP PTB message to the anycast instance closest to the
downstream router. Sadly, that anycast instance may not be the DNS downstream router. That anycast instance may not be the DNS server
server that originated the DNS response. It may be another DNS that originated the DNS response. It may be another DNS server with
server with the same anycast address. The DNS server that originated the same anycast address. The DNS server that originated the
the response may never receive the ICMP PTB message and may never response may never receive the ICMP PTB message and may never updates
updates it PMTU estimate. it PMTU estimate.
4.7. Blackholing Due To Filtering 4.7. Blackholing Due To Filtering
In RFC 7872, researchers sampled Internet paths to determine whether In RFC 7872, researchers sampled Internet paths to determine whether
they would convey packets that contain IPv6 extension headers. they would convey packets that contain IPv6 extension headers.
Sampled paths terminated at popular Internet sites (e.g., popular Sampled paths terminated at popular Internet sites (e.g., popular
web, mail and DNS servers). web, mail and DNS servers).
The study revealed that at least 28% of the sampled paths did not The study revealed that at least 28% of the sampled paths did not
convey packets containing the IPv6 Fragment extension header. In convey packets containing the IPv6 Fragment extension header. In
skipping to change at page 14, line 14 skipping to change at page 13, line 29
Another recent study [Huston] confirmed this finding. It reported Another recent study [Huston] confirmed this finding. It reported
that 37% of sampled endpoints used IPv6-capable DNS resolvers that that 37% of sampled endpoints used IPv6-capable DNS resolvers that
were incapable of receiving a fragmented IPv6 response. were incapable of receiving a fragmented IPv6 response.
It is difficult to determine why network operators drop fragments. It is difficult to determine why network operators drop fragments.
Possible causes follow: Possible causes follow:
o Hardware inability to process fragmented packets. o Hardware inability to process fragmented packets.
o Failure to change a vendor defaults. o Failure to change vendor defaults.
o Unintentional misconfiguration. o Unintentional misconfiguration.
o Intentional configuration (e.g., network operators consciously o Intentional configuration (e.g., network operators consciously
chooses to drop IPv6 fragments in order to address the issues chooses to drop IPv6 fragments in order to address the issues
raised in Section 4.1 through Section 4.6, above.) raised in Section 4.1 through Section 4.6, above.)
5. Alternatives to IP Fragmentation 5. Alternatives to IP Fragmentation
5.1. Transport Layer Solutions 5.1. Transport Layer Solutions
skipping to change at page 14, line 47 skipping to change at page 14, line 17
Therefore, IP fragmentation is not required. Therefore, IP fragmentation is not required.
TCP offers the following mechanisms for MSS management: TCP offers the following mechanisms for MSS management:
o Manual configuration o Manual configuration
o PMTUD o PMTUD
o PLPMTUD o PLPMTUD
For IPv6 nodes, manual configuration is always applicable. If the Manual configuration is always applicable. If the MSS is configured
MSS is manually configured to 1220 bytes and the packet does not to a sufficiently low value, the IP layer will never produce a packet
contain extension headers, the IP layer will never produce a packet whose length is greater than the protocol minimum link MTU. However,
whose length is greater than the IPv6 minimum link MTU (1280 bytes). manual configuration prevents TCP from taking advantage of larger
However, manual configuration prevents TCP from taking advantage of link MTU's.
larger link MTU's.
RFC 8200 strongly recommends that IPv6 nodes implement PMTUD, in Upper-layer protocols can implement PMTUD in order to discover and
order to discover and take advantage of path MTUs greater than 1280 take advantage of larger path MTUs. However, as mentioned in
bytes. However, as mentioned in Section 2.1, PMTUD relies upon the Section 2.1, PMTUD relies upon the network to deliver ICMP PTB
network's ability to deliver ICMP PTB messages. Therefore, PMTUD is messages. Therefore, PMTUD is applicable only in environments where
applicable only in environments where the risk of ICMP PTB loss is the risk of ICMP PTB loss is acceptable.
acceptable.
By contrast, PLPMTUD does not rely upon the network's ability to By contrast, PLPMTUD does not rely upon the network's ability to
deliver ICMP PTB messages. However, in many loss-based TCP deliver ICMP PTB messages. However, in many loss-based TCP
congestion control algorithms, the dropping of a packet may cause the congestion control algorithms, the dropping of a packet may cause the
TCP control algorithm to drop the congestion control window, or even TCP control algorithm to drop the congestion control window, or even
re-start with the entire slow start process. For high capacity, long re-start with the entire slow start process. For high capacity, long
round-trip time, large volume TCP streams, the deliberate probing round-trip time, large volume TCP streams, the deliberate probing
with large packets and the consequent packet drop may impose too with large packets and the consequent packet drop may impose too
harsh a penalty on total TCP throughput for it to be a viable harsh a penalty on total TCP throughput for it to be a viable
approach. [RFC4821] defines PLPMTUD procedures for TCP. approach. [RFC4821] defines PLPMTUD procedures for TCP.
skipping to change at page 15, line 35 skipping to change at page 15, line 4
occurs, the packet is dropped, the PMTU estimate is updated, the occurs, the packet is dropped, the PMTU estimate is updated, the
segment is divided into smaller segments and each smaller segment is segment is divided into smaller segments and each smaller segment is
submitted to the underlying IP module. submitted to the underlying IP module.
The Datagram Congestion Control Protocol (DCCP) [RFC4340] and the The Datagram Congestion Control Protocol (DCCP) [RFC4340] and the
Stream Control Protocol (SCP) [RFC4960] also can be operated in a Stream Control Protocol (SCP) [RFC4960] also can be operated in a
mode that does not require IP fragmentation. They both accept data mode that does not require IP fragmentation. They both accept data
from an application and divide that data into segments, with no from an application and divide that data into segments, with no
segment exceeding a maximum size. Both DCCP and SCP offer manual segment exceeding a maximum size. Both DCCP and SCP offer manual
configuration, PMTUD and PLPMTUD as mechanisms for managing that configuration, PMTUD and PLPMTUD as mechanisms for managing that
maximum size. [I-D.fairhurst-tsvwg-datagram-plpmtud] proposes maximum size. [I-D.ietf-tsvwg-datagram-plpmtud] proposes PLPMTUD
PLPMTUD procedures for DCCP and SCP. procedures for DCCP and SCP.
Currently, User Data Protocol (UDP) [RFC0768] lacks a fragmentation Currently, User Data Protocol (UDP) [RFC0768] lacks a fragmentation
mechanism of its own and relies on IP fragmentation. However, mechanism of its own and relies on IP fragmentation. However,
[I-D.ietf-tsvwg-udp-options] proposes a fragmentation mechanism for [I-D.ietf-tsvwg-udp-options] proposes a fragmentation mechanism for
UDP. UDP.
5.2. Application Layer Solutions 5.2. Application Layer Solutions
[RFC8085] recognizes that IP fragmentation reduces the reliability of [RFC8085] recognizes that IP fragmentation reduces the reliability of
Internet communication. It also recognizes that UDP lacks a Internet communication. It also recognizes that UDP lacks a
skipping to change at page 16, line 36 skipping to change at page 16, line 11
small, even though the IPv4 minimum link MTU is 68 bytes. small, even though the IPv4 minimum link MTU is 68 bytes.
This advice applies equally to application that run directly over IP. This advice applies equally to application that run directly over IP.
6. Applications That Rely on IPv6 Fragmentation 6. Applications That Rely on IPv6 Fragmentation
The following applications rely on IPv6 fragmentation: The following applications rely on IPv6 fragmentation:
o DNS [RFC1035] o DNS [RFC1035]
o OSPFv3 [RFC5340] o OSPFv3 [RFC2328][RFC5340]
o Packet-in-packet encapsulations o Packet-in-packet encapsulations
Each of these applications relies on IPv6 fragmentation to a varying Each of these applications relies on IPv6 fragmentation to a varying
degree. In some cases, that reliance is essential, and cannot be degree. In some cases, that reliance is essential, and cannot be
broken without fundamentally changing the protocol. In other cases, broken without fundamentally changing the protocol. In other cases,
that reliance is incidental, and most implementations already take that reliance is incidental, and most implementations already take
appropriate steps to avoid fragmentation. appropriate steps to avoid fragmentation.
This list is not comprehensive, and other protocols that rely on IPv6 This list is not comprehensive, and other protocols that rely on IP
fragmentation may exist. They are not specifically considered in the fragmentation may exist. They are not specifically considered in the
context of this document. context of this document.
6.1. DNS 6.1. DNS
DNS relies on UDP for efficiency, and the consequence is the use of DNS relies on UDP for efficiency, and the consequence is the use of
IP fragmentation for large responses, as permitted by the DNS EDNS(0) IP fragmentation for large responses, as permitted by the DNS EDNS(0)
options in the query. It is possible to mitigate the issue of options in the query. It is possible to mitigate the issue of
fragmentation-based packet loss by having queries use smaller EDNS(0) fragmentation-based packet loss by having queries use smaller EDNS(0)
UDP buffer sizes, but then the operational issue of the partial level UDP buffer sizes, or by having the DNS server limit the size of its
of support for DNS over TCP over IPv6 becomes a limiting factor of UDP responses to some self-imposed maximum packet size that may be
the efficacy of this approach in an IPv6 context [Damas]. less than the preferred EDNS(0) UDP Buffer Size. In both cases,
large responses are truncated in the DNS, signalling to the client to
re-query using TCP to obtain the complete response. However, the
operational issue of the partial level of support for DNS over TCP,
particularly in the case where IPv6 transport is being used, becomes
a limiting factor of the efficacy of this approach [Damas].
Larger DNS responses can normally be avoided by aggressively pruning Larger DNS responses can normally be avoided by aggressively pruning
the Additional section of DNS responses. One scenario where such the Additional section of DNS responses. One scenario where such
pruning is ineffective is in the use of DNSSEC, where large key sizes pruning is ineffective is in the use of DNSSEC, where large key sizes
act to increase the response size to certain DNS queries. There is act to increase the response size to certain DNS queries. There is
no effective response to this situation within the DNS other than no effective response to this situation within the DNS other than
using smaller cryptographic keys and adoption of DNSSEC using smaller cryptographic keys and adoption of DNSSEC
administrative practices that attempt to keep DNS response as short administrative practices that attempt to keep DNS response as short
as possible. as possible.
6.2. OSPFv3 6.2. OSPF
OSPFv3 implementations can emit messages large enough to cause IPv6 OSPF implementations can emit messages large enough to cause
fragmentation. However, in keeping with the recommendations of fragmentation. However, in order to optimize performance, most OSPF
RFC8200, and in order to optimize performance, most OSPFv3 implementations restrict their maximum message size to a value that
implementations restrict their maximum message size to the IPv6 will not cause fragmentation.
minimum link MTU.
6.3. Packet-in-Packet Encapsulations 6.3. Packet-in-Packet Encapsulations
In this document, packet-in-packet encapsulations include IP-in-IP In this document, packet-in-packet encapsulations include IP-in-IP
[RFC2003], Generic Routing Encapsulation (GRE) [RFC2784], GRE-in-UDP [RFC2003], Generic Routing Encapsulation (GRE) [RFC2784], GRE-in-UDP
[RFC8086] and Generic Packet Tunneling in IPv6 [RFC2473]. [RFC4459] [RFC8086] and Generic Packet Tunneling in IPv6 [RFC2473]. [RFC4459]
describes fragmentation issues associated with all of the above- describes fragmentation issues associated with all of the above-
mentioned encapsulations. mentioned encapsulations.
The fragmentation strategy described for GRE in [RFC7588] has been The fragmentation strategy described for GRE in [RFC7588] has been
deployed for all of the above-mentioned encapsulations. This deployed for all of the above-mentioned encapsulations. This
strategy does not rely on IPv6 fragmentation except in one corner strategy does not rely on IP fragmentation except in one corner case.
case. (see Section 3.3.2.2 of RFC 7588 and Section 7.1 of RFC 2473). (see Section 3.3.2.2 of RFC 7588 and Section 7.1 of RFC 2473).
Section 3.3 of [RFC7676] further describes this corner case. Section 3.3 of [RFC7676] further describes this corner case.
7. Recommendations 7. Recommendations
7.1. For Application Developers 7.1. For Application Developers
Application developers SHOULD NOT develop applications that rely on Application developers SHOULD NOT develop new applications that rely
IPv6 fragmentation. on IP fragmentation.
Application-layer protocols then depend upon IPv6 fragmentation Application-layer protocols that depend upon IPv6 fragmentation
SHOULD be updated to break that dependency. SHOULD be updated to break that dependency. This can be achieved by
using a sufficiently small MTU (e.g. The protocol minimum link MTU),
disabling fragmentation, and ensuring that the transport protocol in
use adapts its segment size to that MTU. This would avoid the
problem of PMTUD failure described in Section 4.6. Another approach
is to use PLPMTUD in a way suitable for the transport protocol in use
(e.g. [I-D.ietf-tsvwg-datagram-plpmtud] for UDP).
7.2. For Network Operators 7.2. For System Developers
Software libraries SHOULD include provision for PLPMTUD for each
supported transport protocol.
7.3. For Middle Box Developers
Middle box developers SHOULD implement devices that support IP
fragmentation. These boxes SHOULD not fail or cause failures when
processing fragmented IP packets.
For example, in order to support IP fragmentation, a load balancer
might execute the following procedure:
o Receive a fragmented packet
o Identify a next-hop using information drawn from the first
fragment
o Forward the first fragment and all subsequent fragments through
the above-mentioned next-hop
7.4. For Network Operators
As per RFC 4890, network operators MUST NOT filter ICMPv6 PTB As per RFC 4890, network operators MUST NOT filter ICMPv6 PTB
messages unless they are known to be forged or otherwise messages unless they are known to be forged or otherwise
illegitimate. As stated in Section 4.6, filtering ICMPv6 PTB packets illegitimate. As stated in Section 4.6, filtering ICMPv6 PTB packets
causes PMTUD to fail. Operators MUST ensure proper PMTUD operation causes PMTUD to fail. Operators MUST ensure proper PMTUD operation
in their network, including making sure the network generates PTB in their network, including making sure the network generates PTB
packets when dropping packets too large compared to outgoing packets when dropping packets too large compared to outgoing
interface MTU. interface MTU.
Many upper-layer protocols rely on PMTUD. Many upper-layer protocols rely on PMTUD.
8. IANA Considerations 8. IANA Considerations
This document makes no request of IANA. This document makes no request of IANA.
9. Security Considerations 9. Security Considerations
This document mitigates some of the security considerations This document mitigates some of the security considerations
associated with IP fragmentation by discouraging the use of IP associated with IP fragmentation by discouraging its use. It does
fragmentation. It does not introduce any new security not introduce any new security vulnerabilities, because it does not
vulnerabilities, because it does not introduce any new alternatives introduce any new alternatives to IP fragmentation. Instead, it
to IP fragmentation. Instead, it recommends well-understood recommends well-understood alternatives.
alternatives.
10. Acknowledgements 10. Acknowledgements
Thanks to Mikael Abrahamsson, Lorenzo Colitti, Mike Heard, Tom Thanks to Mikael Abrahamsson, Brian Carpenter, Silambu Chelvan,
Herbert, Tatuya Jinmei, Paolo Lucente, Eric Nygren, and Joe Touch for Lorenzo Colitti, Mike Heard, Tom Herbert, Tatuya Jinmei, Paolo
their comments. Lucente, Manoj Nayak, Eric Nygren, and Joe Touch for their comments.
11. References 11. References
11.1. Normative References 11.1. Normative References
[RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768,
DOI 10.17487/RFC0768, August 1980, DOI 10.17487/RFC0768, August 1980,
<https://www.rfc-editor.org/info/rfc768>. <https://www.rfc-editor.org/info/rfc768>.
[RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791,
skipping to change at page 20, line 14 skipping to change at page 20, line 19
11.2. Informative References 11.2. Informative References
[Damas] Damas, J. and G. Huston, "Measuring ATR", April 2018, [Damas] Damas, J. and G. Huston, "Measuring ATR", April 2018,
<http://www.potaroo.net/ispcol/2018-04/atr.html>. <http://www.potaroo.net/ispcol/2018-04/atr.html>.
[Huston] Huston, G., "IPv6, Large UDP Packets and the DNS [Huston] Huston, G., "IPv6, Large UDP Packets and the DNS
(http://www.potaroo.net/ispcol/2017-08/xtn-hdrs.html)", (http://www.potaroo.net/ispcol/2017-08/xtn-hdrs.html)",
August 2017. August 2017.
[I-D.fairhurst-tsvwg-datagram-plpmtud]
Fairhurst, G., Jones, T., Tuexen, M., and I. Ruengeler,
"Packetization Layer Path MTU Discovery for Datagram
Transports", draft-fairhurst-tsvwg-datagram-plpmtud-02
(work in progress), December 2017.
[I-D.ietf-intarea-tunnels] [I-D.ietf-intarea-tunnels]
Touch, J. and M. Townsley, "IP Tunnels in the Internet Touch, J. and M. Townsley, "IP Tunnels in the Internet
Architecture", draft-ietf-intarea-tunnels-09 (work in Architecture", draft-ietf-intarea-tunnels-09 (work in
progress), July 2018. progress), July 2018.
[I-D.ietf-tsvwg-datagram-plpmtud]
Fairhurst, G., Jones, T., Tuexen, M., and I. Ruengeler,
"Packetization Layer Path MTU Discovery for Datagram
Transports", draft-ietf-tsvwg-datagram-plpmtud-05 (work in
progress), October 2018.
[I-D.ietf-tsvwg-udp-options] [I-D.ietf-tsvwg-udp-options]
Touch, J., "Transport Options for UDP", draft-ietf-tsvwg- Touch, J., "Transport Options for UDP", draft-ietf-tsvwg-
udp-options-05 (work in progress), July 2018. udp-options-05 (work in progress), July 2018.
[Kent] Kent, C. and J. Mogul, ""Fragmentation Considered
Harmful", In Proc. SIGCOMM '87 Workshop on Frontiers in
Computer Communications Technology, DOI
10.1145/55483.55524", August 1987,
<http://www.hpl.hp.com/techreports/Compaq-DEC/
WRL-87-3.pdf>.
[Ptacek1998] [Ptacek1998]
Ptacek, T. and T. Newsham, "Insertion, Evasion and Denial Ptacek, T. and T. Newsham, "Insertion, Evasion and Denial
of Service: Eluding Network Intrusion Detection", 1998, of Service: Eluding Network Intrusion Detection", 1998,
<http://www.aciri.org/vern/Ptacek-Newsham-Evasion-98.ps>. <http://www.aciri.org/vern/Ptacek-Newsham-Evasion-98.ps>.
[RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts -
Communication Layers", STD 3, RFC 1122, Communication Layers", STD 3, RFC 1122,
DOI 10.17487/RFC1122, October 1989, DOI 10.17487/RFC1122, October 1989,
<https://www.rfc-editor.org/info/rfc1122>. <https://www.rfc-editor.org/info/rfc1122>.
[RFC1858] Ziemba, G., Reed, D., and P. Traina, "Security [RFC1858] Ziemba, G., Reed, D., and P. Traina, "Security
Considerations for IP Fragment Filtering", RFC 1858, Considerations for IP Fragment Filtering", RFC 1858,
DOI 10.17487/RFC1858, October 1995, DOI 10.17487/RFC1858, October 1995,
<https://www.rfc-editor.org/info/rfc1858>. <https://www.rfc-editor.org/info/rfc1858>.
[RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003,
DOI 10.17487/RFC2003, October 1996, DOI 10.17487/RFC2003, October 1996,
<https://www.rfc-editor.org/info/rfc2003>. <https://www.rfc-editor.org/info/rfc2003>.
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328,
DOI 10.17487/RFC2328, April 1998,
<https://www.rfc-editor.org/info/rfc2328>.
[RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in [RFC2473] Conta, A. and S. Deering, "Generic Packet Tunneling in
IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473, IPv6 Specification", RFC 2473, DOI 10.17487/RFC2473,
December 1998, <https://www.rfc-editor.org/info/rfc2473>. December 1998, <https://www.rfc-editor.org/info/rfc2473>.
[RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black,
"Definition of the Differentiated Services Field (DS
Field) in the IPv4 and IPv6 Headers", RFC 2474,
DOI 10.17487/RFC2474, December 1998,
<https://www.rfc-editor.org/info/rfc2474>.
[RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P.
Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, Traina, "Generic Routing Encapsulation (GRE)", RFC 2784,
DOI 10.17487/RFC2784, March 2000, DOI 10.17487/RFC2784, March 2000,
<https://www.rfc-editor.org/info/rfc2784>. <https://www.rfc-editor.org/info/rfc2784>.
[RFC3128] Miller, I., "Protection Against a Variant of the Tiny [RFC3128] Miller, I., "Protection Against a Variant of the Tiny
Fragment Attack (RFC 1858)", RFC 3128, Fragment Attack (RFC 1858)", RFC 3128,
DOI 10.17487/RFC3128, June 2001, DOI 10.17487/RFC3128, June 2001,
<https://www.rfc-editor.org/info/rfc3128>. <https://www.rfc-editor.org/info/rfc3128>.
skipping to change at page 22, line 5 skipping to change at page 22, line 13
<https://www.rfc-editor.org/info/rfc5340>. <https://www.rfc-editor.org/info/rfc5340>.
[RFC5722] Krishnan, S., "Handling of Overlapping IPv6 Fragments", [RFC5722] Krishnan, S., "Handling of Overlapping IPv6 Fragments",
RFC 5722, DOI 10.17487/RFC5722, December 2009, RFC 5722, DOI 10.17487/RFC5722, December 2009,
<https://www.rfc-editor.org/info/rfc5722>. <https://www.rfc-editor.org/info/rfc5722>.
[RFC5927] Gont, F., "ICMP Attacks against TCP", RFC 5927, [RFC5927] Gont, F., "ICMP Attacks against TCP", RFC 5927,
DOI 10.17487/RFC5927, July 2010, DOI 10.17487/RFC5927, July 2010,
<https://www.rfc-editor.org/info/rfc5927>. <https://www.rfc-editor.org/info/rfc5927>.
[RFC6346] Bush, R., Ed., "The Address plus Port (A+P) Approach to
the IPv4 Address Shortage", RFC 6346,
DOI 10.17487/RFC6346, August 2011,
<https://www.rfc-editor.org/info/rfc6346>.
[RFC6888] Perreault, S., Ed., Yamagata, I., Miyakawa, S., Nakagawa,
A., and H. Ashida, "Common Requirements for Carrier-Grade
NATs (CGNs)", BCP 127, RFC 6888, DOI 10.17487/RFC6888,
April 2013, <https://www.rfc-editor.org/info/rfc6888>.
[RFC7588] Bonica, R., Pignataro, C., and J. Touch, "A Widely [RFC7588] Bonica, R., Pignataro, C., and J. Touch, "A Widely
Deployed Solution to the Generic Routing Encapsulation Deployed Solution to the Generic Routing Encapsulation
(GRE) Fragmentation Problem", RFC 7588, (GRE) Fragmentation Problem", RFC 7588,
DOI 10.17487/RFC7588, July 2015, DOI 10.17487/RFC7588, July 2015,
<https://www.rfc-editor.org/info/rfc7588>. <https://www.rfc-editor.org/info/rfc7588>.
[RFC7676] Pignataro, C., Bonica, R., and S. Krishnan, "IPv6 Support [RFC7676] Pignataro, C., Bonica, R., and S. Krishnan, "IPv6 Support
for Generic Routing Encapsulation (GRE)", RFC 7676, for Generic Routing Encapsulation (GRE)", RFC 7676,
DOI 10.17487/RFC7676, October 2015, DOI 10.17487/RFC7676, October 2015,
<https://www.rfc-editor.org/info/rfc7676>. <https://www.rfc-editor.org/info/rfc7676>.
 End of changes. 89 change blocks. 
315 lines changed or deleted 324 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/