воскресенье

[Bug 1909062] Re: qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload

** Description changed:

BugLink: https://bugs.launchpad.net/bugs/1909062

[Impact]

For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000
Series 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel
to the 5.4 kernel, Kubernetes Internal DNS requests will fail, due to
these packets getting corrupted.

Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and
this particular packet type is not supported for hardware tx checksum
offload, and the packets end up corrupted when the qede driver attempts
to checksum them.

This only affects internal Kubernetes DNS, as regular DNS lookups to
regular external domains will succeed, due to them not using IPIP packet
types.

[Fix]

Marvell has developed a fix for the qede driver, which checks the packet
type, and if it is IPPROTO_IPIP, then csum offloads are disabled for
socket buffers of type IPIP.

commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
Subject: qede: fix offload for IPIP tunnel packets
- Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=5d5647dad259bb416fd5d3d87012760386d97530
+ Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530

- This commit is currently in the netdev tree, awaiting merge to mainline.
- The commit is queued for upstream stable.
+ This commit landed in mainline in 5.11-rc3. The commit is queued for
+ upstream stable.

[Testcase]

The system must have a QLogic QL41xxx series NIC fitted, and needs to be
a part of a Kubernetes cluster.

Firstly, get a list of all devices in the system:

$ sudo ifconfig

Next, set all devices down with:

$ sudo ifconfig <device> down

Next, bring up the QLogic QL41xxx device:

$ sudo ifconfig <qlogic nic device> up

Then, attempt to lookup an internal Kubernetes domain:

$ nslookup <internal kubernetes domain address>

Without the patch, the connection will time out:

;; connection timed out; no servers could be reached

If we look at packet traces with tcpdump, we see it leaves the source,
but never arrives at the destination.

There is a test kernel available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test

If you install it, then Kubernetes internal DNS lookups will succeed.

[Where problems could occur]

If a regression were to occur, then users of the qede driver would be
affected. This is limited to those with QLogic QL41xxx series NICs. The
patch explicitly checks for IPIP type packets, so only those particular
packets would be affected.

Since IPIP type packets are uncommon, it would not cause a total outage
on regression, since most packets are not IPIP tunnelled. It could
potentially cause problems for users who frequently handle VPN or
Kubernetes internal DNS traffic.

A workaround would be to use ethtool to disable tx csum offload for all
packet types, or to revert to an older kernel.

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/1909062

Title:
qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not
supporting IPIP tx csum offload

Status in linux package in Ubuntu:
Confirmed
Status in linux source package in Focal:
In Progress
Status in linux source package in Groovy:
In Progress

Bug description:
BugLink: https://bugs.launchpad.net/bugs/1909062

[Impact]

For users with QLogic QL41xxx series NICs, such as the FastLinQ
QL41000 Series 10/25/40/50GbE Controller, when they upgrade from the
4.15 kernel to the 5.4 kernel, Kubernetes Internal DNS requests will
fail, due to these packets getting corrupted.

Kubernetes uses IPIP tunnelled packets for internal DNS resolution,
and this particular packet type is not supported for hardware tx
checksum offload, and the packets end up corrupted when the qede
driver attempts to checksum them.

This only affects internal Kubernetes DNS, as regular DNS lookups to
regular external domains will succeed, due to them not using IPIP
packet types.

[Fix]

Marvell has developed a fix for the qede driver, which checks the
packet type, and if it is IPPROTO_IPIP, then csum offloads are
disabled for socket buffers of type IPIP.

commit 5d5647dad259bb416fd5d3d87012760386d97530
Author: Manish Chopra <manishc@marvell.com>
Date: Mon Dec 21 06:55:30 2020 -0800
Subject: qede: fix offload for IPIP tunnel packets
Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530

This commit landed in mainline in 5.11-rc3. The commit is queued for
upstream stable.

[Testcase]

The system must have a QLogic QL41xxx series NIC fitted, and needs to
be a part of a Kubernetes cluster.

Firstly, get a list of all devices in the system:

$ sudo ifconfig

Next, set all devices down with:

$ sudo ifconfig <device> down

Next, bring up the QLogic QL41xxx device:

$ sudo ifconfig <qlogic nic device> up

Then, attempt to lookup an internal Kubernetes domain:

$ nslookup <internal kubernetes domain address>

Without the patch, the connection will time out:

;; connection timed out; no servers could be reached

If we look at packet traces with tcpdump, we see it leaves the source,
but never arrives at the destination.

There is a test kernel available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test

If you install it, then Kubernetes internal DNS lookups will succeed.

[Where problems could occur]

If a regression were to occur, then users of the qede driver would be
affected. This is limited to those with QLogic QL41xxx series NICs.
The patch explicitly checks for IPIP type packets, so only those
particular packets would be affected.

Since IPIP type packets are uncommon, it would not cause a total
outage on regression, since most packets are not IPIP tunnelled. It
could potentially cause problems for users who frequently handle VPN
or Kubernetes internal DNS traffic.

A workaround would be to use ethtool to disable tx csum offload for
all packet types, or to revert to an older kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1909062/+subscriptions

Комментариев нет:

Отправить комментарий