четверг

[Bug 2139322] Re: Enable mlx5 ovs hardware offload causes multiple issues

** Description changed:

+ BugLink: https://bugs.launchpad.net/bugs/2139322
+
+ [Impact]
+
+
Enable mlx5 ovs hardware offload on 6.8 kernel, we see different issues on our production environment,
it only happens under real and heavy workloads.

Issue 1, general protection fault:

[75202.650580] general protection fault, probably for non-canonical address 0x9cad655f9b42c237: 0000 [#1] PREEMPT SMP NOPTI
[75202.661464] CPU: 29 PID: 0 Comm: swapper/29 Kdump: loaded Not tainted 6.8.0-51-generic #52~22.04.1-Ubuntu
[75202.671039] Hardware name: Dell Inc. PowerEdge R7525/0H3K7P, BIOS 2.15.2 04/02/2024
[75202.678701] RIP: 0010:kmalloc_trace+0xd7/0x360
[75202.683158] Code: 83 78 10 00 48 8b 38 0f 84 36 02 00 00 48 85 ff 0f 84 2d 02 00 00 41 8b 44 24 28 49 8b 9c 24 b8 00 00 00 49 8b 34 24 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 20 00 00
[75202.701933] RSP: 0018:ffffabfc19a08990 EFLAGS: 00010282
[75202.707166] RAX: 9cad655f9b42c237 RBX: 1c00e25717636e48 RCX: 0000000000000000
[75202.714310] RDX: 000000bec1e5c01d RSI: 000000000003b980 RDI: 9cad655f9b42c1b7
[75202.721449] RBP: ffffabfc19a089e0 R08: 0000000000000000 R09: 0000000000000000
[75202.728593] R10: ffffabfc19a08a00 R11: 0000000000000000 R12: ffff94db00050c00
[75202.735735] R13: 0000000000000920 R14: 00000000000000d8 R15: 0000000000000000
[75202.742876] FS: 0000000000000000(0000) GS:ffff95da7cc80000(0000) knlGS:0000000000000000
[75202.750971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[75202.756722] CR2: 00007a5f6af90010 CR3: 0000010263b44002 CR4: 0000000000f70ef0
[75202.763866] PKRU: 55555554
[75202.766581] Call Trace:
[75202.769033] <IRQ>
[75202.771053] ? show_regs+0x6d/0x80
[75202.774483] ? die_addr+0x37/0xa0
[75202.777807] ? exc_general_protection+0x1db/0x480
[75202.782525] ? asm_exc_general_protection+0x27/0x30
[75202.787412] ? kmalloc_trace+0xd7/0x360
[75202.791261] ? flow_offload_alloc+0x64/0x120 [nf_flow_table]
[75202.796938] flow_offload_alloc+0x64/0x120 [nf_flow_table]
[75202.802431] ? nf_conntrack_in+0x113/0x360 [nf_conntrack]
[75202.807846] ? flow_offload_alloc+0x64/0x120 [nf_flow_table]
[75202.813517] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct]
[75202.819444] tcf_ct_act+0x6c8/0xae0 [act_ct]
[75202.823726] tcf_action_exec+0xbc/0x190
[75202.827571] __tcf_classify+0xcb/0x1f0
[75202.831332] tcf_classify+0xff/0x260
[75202.834920] tc_run+0xa3/0x110
[75202.837987] __netif_receive_skb_core.constprop.0+0x459/0xf90
[75202.843744] ? dev_gro_receive+0xc0/0x350
[75202.847763] ? srso_alias_return_thunk+0x5/0xfbef5
[75202.852565] ? napi_gro_receive+0x73/0x220
[75202.856675] __netif_receive_skb_list_core+0xfd/0x250
[75202.861736] netif_receive_skb_list_internal+0x1a3/0x2d0
[75202.867056] ? srso_alias_return_thunk+0x5/0xfbef5
[75202.871858] ? mlx5e_rx_cq_process_basic_cqe_comp+0x2f7/0x310 [mlx5_core]
[75202.878752] napi_complete_done+0x74/0x1c0
[75202.882855] mlx5e_napi_poll+0x190/0x7b0 [mlx5_core]
[75202.887911] __napi_poll+0x33/0x200
[75202.891753] net_rx_action+0x181/0x2e0
[75202.895849] handle_softirqs+0xdb/0x340
[75202.900027] __irq_exit_rcu+0xd9/0x100
[75202.904103] irq_exit_rcu+0xe/0x20
[75202.907828] common_interrupt+0xa4/0xb0
[75202.911983] </IRQ>
[75202.914387] <TASK>
[75202.916786] asm_common_interrupt+0x27/0x40
[75202.921258] RIP: 0010:mwait_idle+0x50/0x80

This is caused by use-after-free in slab (kmalloc-256).

-
Issue 2, soft lockup:

[148720.717134] watchdog: BUG: soft lockup - CPU#3 stuck for 7923s! [swapper/3:0]
- [148720.725207] Modules linked in: act_csum act_pedit act_tunnel_key vhost_net vhost tap vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd xt_CT xt_tcpudp nft_compat nf_tables veth
+ [148720.725207] Modules linked in: act_csum act_pedit act_tunnel_key vhost_net vhost tap vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd xt_CT xt_tcpudp nft_compat nf_tables veth
act_ct nf_flow_table nf_conntrack_netlink nvme_fabrics nvme_keyring xfs dm_crypt act_skbedit act_vlan act_mirred cls_matchall geneve ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnet
link act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nf_nat 8021q garp mrp stp llc bonding sunrpc binfmt_misc nls_iso8859_1 mlx5_vdpa vringh vhost_iotlb vdpa intel_rapl_ms
r intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass rapl dell_wmi video ledtrig_audio sparse_keymap dell_smbios dcdbas dell_wmi_descriptor wmi_bmof ipmi_ssif ccp ptdma k1
0temp acpi_power_meter ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler mac_hid dm_service_time sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 msr efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov
[148720.725328] async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c mlx5_ib ib_uverbs macsec ib_core ses enclosure raid1 raid0 bcache mlx5_core crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel mlxfw mpt3sas sha256_ssse3 nvme psample ahci sha1_ssse3 raid_class tg3 nvme_core tls libahci xhci_pci mgag200 nvme_auth scsi_transport_sas i2c_algo_bit pci_hyperv_intf i2c_piix4 xhci_pci_renesas wmi aesni_intel crypto_simd cryptd
[148720.725385] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G L 6.8.0-57-generic #59~22.04.1-Ubuntu
[148720.725388] Hardware name: Dell Inc. PowerEdge R7525/0H3K7P, BIOS 2.16.3 09/10/2024
[148720.725390] RIP: 0010:flow_offload_hash_cmp+0x1f/0x40 [nf_flow_table]
[148720.725398] Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 8b 47 08 ba 32 00 00 00 48 8d 7e 08 48 89 c6 48 89 e5 e8 62 4a b6 fa 5d <85> c0 0f 95 c0 0f b6 c0 31 d2 31 f6 31 ff e9 b9 3b ee fa 66 66 2e
[148720.725401] RSP: 0018:ffffad9f403fc928 EFLAGS: 00000246
[148720.725404] RAX: 0000000000000004 RBX: ffff8a8f9a3c3a40 RCX: 0000000000000000
[148720.725406] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[148720.725409] RBP: ffffad9f403fc990 R08: 0000000000000000 R09: 000000000000003c
[148720.725411] R10: 000000000000003c R11: 0000000000000000 R12: ffff89b49b080000
[148720.725413] R13: 0000000000000000 R14: ffff89b49b09e6b8 R15: ffff89b2ba69ea58
[148720.725415] FS: 0000000000000000(0000) GS:ffff8a8f3bf80000(0000) knlGS:0000000000000000
[148720.725417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[148720.725419] CR2: 000056c0ae793900 CR3: 000000021d904002 CR4: 0000000000f70ef0
[148720.725421] PKRU: 55555554
[148720.725423] Call Trace:
[148720.725426] <IRQ>
[148720.725428] ? show_regs+0x6d/0x80
[148720.725435] ? watchdog_timer_fn+0x206/0x290
[148720.725441] ? __pfx_watchdog_timer_fn+0x10/0x10
[148720.725445] ? __hrtimer_run_queues+0x112/0x2a0
[148720.725450] ? srso_alias_return_thunk+0x5/0xfbef5
[148720.725457] ? hrtimer_interrupt+0xf6/0x250
[148720.725462] ? __sysvec_apic_timer_interrupt+0x51/0x120
[148720.725467] ? sysvec_apic_timer_interrupt+0x3b/0xd0
[148720.725473] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[148720.725479] ? flow_offload_hash_cmp+0x1f/0x40 [nf_flow_table]
[148720.725484] ? flow_offload_lookup+0xb2/0x180 [nf_flow_table]
[148720.725491] tcf_ct_flow_table_lookup.isra.0+0x244/0x6b0 [act_ct]
[148720.725494] ? srso_alias_return_thunk+0x5/0xfbef5
[148720.725499] ? ovs_dp_process_packet+0x1af/0x220 [openvswitch]
[148720.725518] tcf_ct_act+0x23d/0xae0 [act_ct]
[148720.725524] tcf_action_exec+0xbc/0x190
[148720.725531] __tcf_classify+0xcb/0x1f0
[148720.725535] tcf_classify+0xff/0x260
[148720.725539] tc_run+0xa3/0x110
[148720.725543] ? srso_alias_return_thunk+0x5/0xfbef5
[148720.725547] __netif_receive_skb_core.constprop.0+0x459/0xf90
[148720.725552] ? dev_gro_receive+0x150/0x350
[148720.725557] ? srso_alias_return_thunk+0x5/0xfbef5
[148720.725560] ? napi_gro_receive+0x73/0x220
[148720.725564] __netif_receive_skb_list_core+0xfd/0x250
[148720.725569] netif_receive_skb_list_internal+0x1a3/0x2d0
[148720.725573] ? srso_alias_return_thunk+0x5/0xfbef5
[148720.725578] ? mlx5e_rx_cq_process_basic_cqe_comp+0x2f7/0x310 [mlx5_core]
[148720.725688] napi_complete_done+0x74/0x1c0
[148720.725693] mlx5e_napi_poll+0x190/0x7b0 [mlx5_core]
[148720.725782] __napi_poll+0x33/0x200
[148720.725786] net_rx_action+0x181/0x2e0
[148720.725792] handle_softirqs+0xdb/0x340
[148720.725799] __irq_exit_rcu+0xd9/0x100
[148720.725802] irq_exit_rcu+0xe/0x20

before soft lockup, we see some error messages from mlx5, e.g.:

[486111.016058] mlx5_core 0000:41:00.1 ens3f1: NETDEV WATCHDOG: CPU: 119: transmit queue 0 timed out 17547 ms
[486111.025773] mlx5_core 0000:41:00.1 ens3f1: TX timeout detected
[486111.031726] mlx5_core 0000:41:00.1 ens3f1: TX timeout on queue: 0, SQ: 0x11d0, CQ: 0x1487, SQ Cons: 0xae7a SQ Prod: 0xaec3, usecs since last trans: 17562000
[486111.045845] mlx5_core 0000:41:00.1 ens3f1: EQ 0x7: Cons = 0x8ac57014, irqn = 0x5f5

-
Kernel cmdline:
GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS0,115200n8 nvme_core.multipath=0 amd_iommu=on iommu=pt probe_vf=0 transparent_hugepage=never hugepagesz=1G hugepages=1536 default_hugepagesz=1G"
+
+ [Fix]
+
+ This upstream commit fixes it:
+
+ commit 03428ca5cee9f0792edc996c06ce4514816af1fb
+ Author: Florian Westphal <fw@strlen.de>
+ Date: Tue Jan 14 00:50:36 2025 +0100
+
+ netfilter: conntrack: rework offload nf_conn timeout extension logic
+
+ This patch fixes ct use-after-free and packet gets stuck issues, which
+ should be related to the above two call traces.
+
+
+ [Test Plan]
+
+ This issue can only be reproduced on our production environment with mlx5 NIC and ovs hw-offload enabled.
+ We need to run the kernel on the environment for few weeks to confirm it's fixed.
+
+ [Where problems could occur]
+
+ The patch makes sure to take a refcount on ct and test offload bits, it could prevent ct being used after it's removed.
+ And also modifies flow offload teardown logic, if there is anything wrong, the ovs flow offload might be broken.

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2139322

Title:
Enable mlx5 ovs hardware offload causes multiple issues

Status in linux package in Ubuntu:
In Progress
Status in linux source package in Noble:
In Progress

Bug description:
BugLink: https://bugs.launchpad.net/bugs/2139322

[Impact]


Enable mlx5 ovs hardware offload on 6.8 kernel, we see different issues on our production environment,
it only happens under real and heavy workloads.

Issue 1, general protection fault:

[75202.650580] general protection fault, probably for non-canonical address 0x9cad655f9b42c237: 0000 [#1] PREEMPT SMP NOPTI
[75202.661464] CPU: 29 PID: 0 Comm: swapper/29 Kdump: loaded Not tainted 6.8.0-51-generic #52~22.04.1-Ubuntu
[75202.671039] Hardware name: Dell Inc. PowerEdge R7525/0H3K7P, BIOS 2.15.2 04/02/2024
[75202.678701] RIP: 0010:kmalloc_trace+0xd7/0x360
[75202.683158] Code: 83 78 10 00 48 8b 38 0f 84 36 02 00 00 48 85 ff 0f 84 2d 02 00 00 41 8b 44 24 28 49 8b 9c 24 b8 00 00 00 49 8b 34 24 48 01 f8 <48> 33 18 48 89 c1 48 89 f8 48 0f c9 48 31 cb 48 8d 8a 00 20 00 00
[75202.701933] RSP: 0018:ffffabfc19a08990 EFLAGS: 00010282
[75202.707166] RAX: 9cad655f9b42c237 RBX: 1c00e25717636e48 RCX: 0000000000000000
[75202.714310] RDX: 000000bec1e5c01d RSI: 000000000003b980 RDI: 9cad655f9b42c1b7
[75202.721449] RBP: ffffabfc19a089e0 R08: 0000000000000000 R09: 0000000000000000
[75202.728593] R10: ffffabfc19a08a00 R11: 0000000000000000 R12: ffff94db00050c00
[75202.735735] R13: 0000000000000920 R14: 00000000000000d8 R15: 0000000000000000
[75202.742876] FS: 0000000000000000(0000) GS:ffff95da7cc80000(0000) knlGS:0000000000000000
[75202.750971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[75202.756722] CR2: 00007a5f6af90010 CR3: 0000010263b44002 CR4: 0000000000f70ef0
[75202.763866] PKRU: 55555554
[75202.766581] Call Trace:
[75202.769033] <IRQ>
[75202.771053] ? show_regs+0x6d/0x80
[75202.774483] ? die_addr+0x37/0xa0
[75202.777807] ? exc_general_protection+0x1db/0x480
[75202.782525] ? asm_exc_general_protection+0x27/0x30
[75202.787412] ? kmalloc_trace+0xd7/0x360
[75202.791261] ? flow_offload_alloc+0x64/0x120 [nf_flow_table]
[75202.796938] flow_offload_alloc+0x64/0x120 [nf_flow_table]
[75202.802431] ? nf_conntrack_in+0x113/0x360 [nf_conntrack]
[75202.807846] ? flow_offload_alloc+0x64/0x120 [nf_flow_table]
[75202.813517] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct]
[75202.819444] tcf_ct_act+0x6c8/0xae0 [act_ct]
[75202.823726] tcf_action_exec+0xbc/0x190
[75202.827571] __tcf_classify+0xcb/0x1f0
[75202.831332] tcf_classify+0xff/0x260
[75202.834920] tc_run+0xa3/0x110
[75202.837987] __netif_receive_skb_core.constprop.0+0x459/0xf90
[75202.843744] ? dev_gro_receive+0xc0/0x350
[75202.847763] ? srso_alias_return_thunk+0x5/0xfbef5
[75202.852565] ? napi_gro_receive+0x73/0x220
[75202.856675] __netif_receive_skb_list_core+0xfd/0x250
[75202.861736] netif_receive_skb_list_internal+0x1a3/0x2d0
[75202.867056] ? srso_alias_return_thunk+0x5/0xfbef5
[75202.871858] ? mlx5e_rx_cq_process_basic_cqe_comp+0x2f7/0x310 [mlx5_core]
[75202.878752] napi_complete_done+0x74/0x1c0
[75202.882855] mlx5e_napi_poll+0x190/0x7b0 [mlx5_core]
[75202.887911] __napi_poll+0x33/0x200
[75202.891753] net_rx_action+0x181/0x2e0
[75202.895849] handle_softirqs+0xdb/0x340
[75202.900027] __irq_exit_rcu+0xd9/0x100
[75202.904103] irq_exit_rcu+0xe/0x20
[75202.907828] common_interrupt+0xa4/0xb0
[75202.911983] </IRQ>
[75202.914387] <TASK>
[75202.916786] asm_common_interrupt+0x27/0x40
[75202.921258] RIP: 0010:mwait_idle+0x50/0x80

This is caused by use-after-free in slab (kmalloc-256).

Issue 2, soft lockup:

[148720.717134] watchdog: BUG: soft lockup - CPU#3 stuck for 7923s! [swapper/3:0]
[148720.725207] Modules linked in: act_csum act_pedit act_tunnel_key vhost_net vhost tap vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd xt_CT xt_tcpudp nft_compat nf_tables veth
act_ct nf_flow_table nf_conntrack_netlink nvme_fabrics nvme_keyring xfs dm_crypt act_skbedit act_vlan act_mirred cls_matchall geneve ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnet
link act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nf_nat 8021q garp mrp stp llc bonding sunrpc binfmt_misc nls_iso8859_1 mlx5_vdpa vringh vhost_iotlb vdpa intel_rapl_ms
r intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass rapl dell_wmi video ledtrig_audio sparse_keymap dell_smbios dcdbas dell_wmi_descriptor wmi_bmof ipmi_ssif ccp ptdma k1
0temp acpi_power_meter ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler mac_hid dm_service_time sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 msr efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov
[148720.725328] async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c mlx5_ib ib_uverbs macsec ib_core ses enclosure raid1 raid0 bcache mlx5_core crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel mlxfw mpt3sas sha256_ssse3 nvme psample ahci sha1_ssse3 raid_class tg3 nvme_core tls libahci xhci_pci mgag200 nvme_auth scsi_transport_sas i2c_algo_bit pci_hyperv_intf i2c_piix4 xhci_pci_renesas wmi aesni_intel crypto_simd cryptd
[148720.725385] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G L 6.8.0-57-generic #59~22.04.1-Ubuntu
[148720.725388] Hardware name: Dell Inc. PowerEdge R7525/0H3K7P, BIOS 2.16.3 09/10/2024
[148720.725390] RIP: 0010:flow_offload_hash_cmp+0x1f/0x40 [nf_flow_table]
[148720.725398] Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 8b 47 08 ba 32 00 00 00 48 8d 7e 08 48 89 c6 48 89 e5 e8 62 4a b6 fa 5d <85> c0 0f 95 c0 0f b6 c0 31 d2 31 f6 31 ff e9 b9 3b ee fa 66 66 2e
[148720.725401] RSP: 0018:ffffad9f403fc928 EFLAGS: 00000246
[148720.725404] RAX: 0000000000000004 RBX: ffff8a8f9a3c3a40 RCX: 0000000000000000
[148720.725406] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[148720.725409] RBP: ffffad9f403fc990 R08: 0000000000000000 R09: 000000000000003c
[148720.725411] R10: 000000000000003c R11: 0000000000000000 R12: ffff89b49b080000
[148720.725413] R13: 0000000000000000 R14: ffff89b49b09e6b8 R15: ffff89b2ba69ea58
[148720.725415] FS: 0000000000000000(0000) GS:ffff8a8f3bf80000(0000) knlGS:0000000000000000
[148720.725417] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[148720.725419] CR2: 000056c0ae793900 CR3: 000000021d904002 CR4: 0000000000f70ef0
[148720.725421] PKRU: 55555554
[148720.725423] Call Trace:
[148720.725426] <IRQ>
[148720.725428] ? show_regs+0x6d/0x80
[148720.725435] ? watchdog_timer_fn+0x206/0x290
[148720.725441] ? __pfx_watchdog_timer_fn+0x10/0x10
[148720.725445] ? __hrtimer_run_queues+0x112/0x2a0
[148720.725450] ? srso_alias_return_thunk+0x5/0xfbef5
[148720.725457] ? hrtimer_interrupt+0xf6/0x250
[148720.725462] ? __sysvec_apic_timer_interrupt+0x51/0x120
[148720.725467] ? sysvec_apic_timer_interrupt+0x3b/0xd0
[148720.725473] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[148720.725479] ? flow_offload_hash_cmp+0x1f/0x40 [nf_flow_table]
[148720.725484] ? flow_offload_lookup+0xb2/0x180 [nf_flow_table]
[148720.725491] tcf_ct_flow_table_lookup.isra.0+0x244/0x6b0 [act_ct]
[148720.725494] ? srso_alias_return_thunk+0x5/0xfbef5
[148720.725499] ? ovs_dp_process_packet+0x1af/0x220 [openvswitch]
[148720.725518] tcf_ct_act+0x23d/0xae0 [act_ct]
[148720.725524] tcf_action_exec+0xbc/0x190
[148720.725531] __tcf_classify+0xcb/0x1f0
[148720.725535] tcf_classify+0xff/0x260
[148720.725539] tc_run+0xa3/0x110
[148720.725543] ? srso_alias_return_thunk+0x5/0xfbef5
[148720.725547] __netif_receive_skb_core.constprop.0+0x459/0xf90
[148720.725552] ? dev_gro_receive+0x150/0x350
[148720.725557] ? srso_alias_return_thunk+0x5/0xfbef5
[148720.725560] ? napi_gro_receive+0x73/0x220
[148720.725564] __netif_receive_skb_list_core+0xfd/0x250
[148720.725569] netif_receive_skb_list_internal+0x1a3/0x2d0
[148720.725573] ? srso_alias_return_thunk+0x5/0xfbef5
[148720.725578] ? mlx5e_rx_cq_process_basic_cqe_comp+0x2f7/0x310 [mlx5_core]
[148720.725688] napi_complete_done+0x74/0x1c0
[148720.725693] mlx5e_napi_poll+0x190/0x7b0 [mlx5_core]
[148720.725782] __napi_poll+0x33/0x200
[148720.725786] net_rx_action+0x181/0x2e0
[148720.725792] handle_softirqs+0xdb/0x340
[148720.725799] __irq_exit_rcu+0xd9/0x100
[148720.725802] irq_exit_rcu+0xe/0x20

before soft lockup, we see some error messages from mlx5, e.g.:

[486111.016058] mlx5_core 0000:41:00.1 ens3f1: NETDEV WATCHDOG: CPU: 119: transmit queue 0 timed out 17547 ms
[486111.025773] mlx5_core 0000:41:00.1 ens3f1: TX timeout detected
[486111.031726] mlx5_core 0000:41:00.1 ens3f1: TX timeout on queue: 0, SQ: 0x11d0, CQ: 0x1487, SQ Cons: 0xae7a SQ Prod: 0xaec3, usecs since last trans: 17562000
[486111.045845] mlx5_core 0000:41:00.1 ens3f1: EQ 0x7: Cons = 0x8ac57014, irqn = 0x5f5

Kernel cmdline:
GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS0,115200n8 nvme_core.multipath=0 amd_iommu=on iommu=pt probe_vf=0 transparent_hugepage=never hugepagesz=1G hugepages=1536 default_hugepagesz=1G"

[Fix]

This upstream commit fixes it:

commit 03428ca5cee9f0792edc996c06ce4514816af1fb
Author: Florian Westphal <fw@strlen.de>
Date: Tue Jan 14 00:50:36 2025 +0100

netfilter: conntrack: rework offload nf_conn timeout extension
logic

This patch fixes ct use-after-free and packet gets stuck issues, which
should be related to the above two call traces.


[Test Plan]

This issue can only be reproduced on our production environment with mlx5 NIC and ovs hw-offload enabled.
We need to run the kernel on the environment for few weeks to confirm it's fixed.

[Where problems could occur]

The patch makes sure to take a refcount on ct and test offload bits, it could prevent ct being used after it's removed.
And also modifies flow offload teardown logic, if there is anything wrong, the ovs flow offload might be broken.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2139322/+subscriptions

Комментариев нет:

Отправить комментарий