четверг

[Bug 1799393] Comment bridged from LTC Bugzilla

------- Comment From clsoto@us.ibm.com 2019-03-01 00:25 EDT-------
I did verified this bugzilla but I reverified with this level. (4.18.0-16-generic)
uname -r
4.18.0-16-generic
# netstat -in
Kernel Interface table
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
enP2p1s0 1500 9161 0 9 0 379 0 0 0 BMRU
enp1s0f0 1500 5459302 0 0 0 5455281 0 0 0 BMRU
lo 65536 12 0 0 0 12 0 0 0 LRU
virbr0 1500 0 0 0 0 0 0 0 0 BMU
# ethtool -S enp1s0f0 | grep rx_wqe
rx_wqe_err: 0

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/1799393

Title:
Mellanox CX5 stops pinging with rx_wqe_err (mlx5_core)

Status in The Ubuntu-power-systems project:
Fix Released
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Cosmic:
Fix Released

Bug description:

== SRU Justification ==
The requested commit fixes a regression introduce by mainline commit
3a2f70331226, in v4.18-rc1. The commit is only needed in Cosmic. Do to
the regression, A Mellanox CX5 stops pinging with rx_wqe_err (mlx5_core)

== Fix ==
37fdffb217a4 ("net/mlx5: WQ, fixes for fragmented WQ buffers API")

== Regression Potential ==
Low. This commit has been cc'd to stable, so it has had additional
upstream review.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.



== Comment: #0 - Michael Ranweiler - 2018-10-18 11:34:40 ==

---Problem Description---
At the system if u do
ethtool -S enP48p1s0f0 | grep wqe_err
     rx_wqe_err: 1
     rx0_wqe_err: 0
     rx1_wqe_err: 0
     rx2_wqe_err: 0
     rx3_wqe_err: 1
     rx4_wqe_err: 0
     rx5_wqe_err: 0
     rx6_wqe_err: 0
     rx7_wqe_err: 0
     rx8_wqe_err: 0
     rx9_wqe_err: 0
     rx10_wqe_err: 0
     rx11_wqe_err: 0
     rx12_wqe_err: 0
     rx13_wqe_err: 0
     rx14_wqe_err: 0
     rx15_wqe_err: 0

Will see that rx side is hitting issue.

---Additional Hardware Info---
Mellanox CX5 Ethernet 100G
lspci
0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

Machine Type = P9

---Debugger---
A debugger is not configured

---Steps to Reproduce---
Using a CX5 Ethernet 100G card
lspci
0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]

just configure IP
ifconfig enP48p1s0f0 33.33.33.33 netmask 255.255.255.0 up
then partner system configure IP and then try ping -f
ping -f 33.33.33.33
PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
........................................^C
--- 33.33.33.33 ping statistics ---
5413 packets transmitted, 5373 received, 0% packet loss, time 934ms
rtt min/avg/max/mdev = 0.015/0.019/0.669/0.010 ms, ipg/ewma 0.172/0.020 ms
# ping 33.33.33.33
PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
^C
--- 33.33.33.33 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1071ms

then at the recv system then do
ethtool -S enP48p1s0f0 | grep wqe_err
     rx_wqe_err: 1
     rx0_wqe_err: 0
     rx1_wqe_err: 0
     rx2_wqe_err: 0
     rx3_wqe_err: 1
     rx4_wqe_err: 0
     rx5_wqe_err: 0
     rx6_wqe_err: 0
     rx7_wqe_err: 0
     rx8_wqe_err: 0
     rx9_wqe_err: 0
     rx10_wqe_err: 0
     rx11_wqe_err: 0
     rx12_wqe_err: 0
     rx13_wqe_err: 0
     rx14_wqe_err: 0
     rx15_wqe_err: 0
you will see rx_wqe_err with a counter non-zero.

This is fixed by this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=37fdffb217a45609edccbb8b407d031143f551c0

== Comment: #1 - Carol L. Soto - 2018-10-18 11:46:00 ==
I did a git clone to the cosmic tree and loaded the kernel in a system.

kernel 4.18.12 and I can recreate it.

lspci | grep Mell | grep ConnectX-5
0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0000:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.0 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.1 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
:~# ethtool -S enp1s0f0 | grep wqe_err
     rx_wqe_err: 2
     rx0_wqe_err: 1
     rx1_wqe_err: 1
     rx2_wqe_err: 0
     rx3_wqe_err: 0
     rx4_wqe_err: 0
     rx5_wqe_err: 0
     rx6_wqe_err: 0
     rx7_wqe_err: 0
     rx8_wqe_err: 0
     rx9_wqe_err: 0
     rx10_wqe_err: 0
...

Let me check if the proposed patch needs backport or not.

== Comment: #3 - Carol L. Soto - 2018-10-18 13:34:46 ==
I was able to apply the proposed patch as it to the cosmic git tree and no issue. (no need to backport)
using a kernel 4.18.12+.

With the proposed patch I do not see wqe err and ping does not stop.
ethtool -S enp1s0f0 | grep wqe_err
     rx_wqe_err: 0
     rx0_wqe_err: 0
     rx1_wqe_err: 0
     rx2_wqe_err: 0
     rx3_wqe_err: 0
     rx4_wqe_err: 0
     rx5_wqe_err: 0
     rx6_wqe_err: 0
     rx7_wqe_err: 0
     rx8_wqe_err: 0
     rx9_wqe_err: 0
     rx10_wqe_err: 0
...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1799393/+subscriptions

Комментариев нет:

Отправить комментарий