** Changed in: linux (Ubuntu Noble) Status: In Progress => Fix Committed -- You received this bug notification because you are subscribed to linux in Ubuntu. Matching subscriptions: Bgg, Bmail, Nb https://bugs.launchpad.net/bugs/2149762 Title: Revert "netfilter: conntrack: fix erronous removal of offload bit" Status in linux package in Ubuntu: Invalid Status in linux source package in Noble: Fix Committed Bug description: [Impact] With this commit: netfilter: conntrack: fix erronous removal of offload bit We hit the regression on PS6/7, all the nodes with this commit hit soft lockup every 1-2 days and need to reboot the nodes to recover, e.g.: [1022567.831263] watchdog: BUG: soft lockup - CPU#352 stuck for 26s! [kworker/u789:13:1036823] [1022567.831271] Modules linked in: scsi_transport_iscsi mlx5_vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd vhost_net tap xfs act_csum act_pedit act_ct nf_flow_table act_tunnel_key xt_CT xt_tcpudp nft_compat dm_crypt ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat iptable_filter nf_tables veth nf_conntrack_netlink vhost_vsock vmw_vsock_virtio_transport_common vhost vsock nvme_fabrics nvme_keyring act_mirred act_skbedit act_vlan cls_matchall 8021q garp mrp geneve ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout nfnetlink act_gact cls_flower sch_ingress openvswitch nsh nf_conncount nf_nat bridge stp llc bonding sunrpc binfmt_misc intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass rapl ipmi_ssif nls_iso8859_1 joydev input_leds ipmi_si ipmi_devintf k10temp ccp ipmi_msghandler mac_hid mlx5_vdpa vringh vhost_iotlb vdpa dm_service_time sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua nf_conntrack [1022567.831352] nf_defrag_ipv6 nf_defrag_ipv4 efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 mlx5_ib ib_uverbs macsec ib_core raid10 hid_generic usbhid ses hid enclosure dax_hmem mlx5_core crct10dif_pclmul crc32_pclmul cxl_acpi polyval_clmulni cxl_port mlxfw polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 cxl_core psample nvme mpt3sas ahci raid_class nvme_core tls ast tg3 scsi_transport_sas libahci pci_hyperv_intf nvme_auth i2c_algo_bit xhci_pci xhci_pci_renesas i2c_piix4 aesni_intel crypto_simd cryptd [1022567.831407] CPU: 352 PID: 1036823 Comm: kworker/u789:13 Kdump: loaded Tainted: G L 6.8.0-106-generic #106~22.04.1+hf399032v20260316b0-Ubuntu [1022567.831411] Hardware name: Lenovo ThinkSystem SR665 V3/SB27B75430, BIOS KAE140F-5.70 09/03/2025 [1022567.831413] Workqueue: events_power_efficient nf_flow_offload_work_gc [nf_flow_table] [1022567.831425] RIP: 0010:rhashtable_walk_next+0x17/0xd0 [1022567.831433] Code: 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 4c 8b 37 48 89 fb 4c 8b 7f 10 <4c> 8b 6f 08 45 0f b6 66 38 41 80 fc 01 0f 87 e4 b1 98 00 41 83 e4 [1022567.831435] RSP: 0018:ff732ea21e2cbda8 EFLAGS: 00000202 [1022567.831438] RAX: ff2b1e3b98ea4058 RBX: ff732ea21e2cbde0 RCX: 0000000000000000 [1022567.831440] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff732ea21e2cbde0 [1022567.831441] RBP: ff732ea21e2cbdd0 R08: 0000000000000000 R09: 0000000000000000 [1022567.831442] R10: 0000000000000000 R11: 0000000000000000 R12: ff2b1cb8611cbee8 [1022567.831444] R13: ff2b1cb8611cbe40 R14: ff2b1cb8611cbe48 R15: ff2b1e3b98ea4058 [1022567.831445] FS: 0000000000000000(0000) GS:ff2b1fb3bb000000(0000) knlGS:0000000000000000 [1022567.831447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1022567.831449] CR2: 0000775a42eee000 CR3: 0000019530ce8003 CR4: 0000000000f71ef0 [1022567.831450] PKRU: 55555554 [1022567.831452] Call Trace: [1022567.831454] <TASK> [1022567.831461] nf_flow_offload_work_gc+0x5a/0xf0 [nf_flow_table] [1022567.831468] process_one_work+0x181/0x3a0 [1022567.831475] worker_thread+0x306/0x440 [1022567.831479] ? __pfx_worker_thread+0x10/0x10 [1022567.831481] kthread+0xef/0x120 [1022567.831485] ? __pfx_kthread+0x10/0x10 [1022567.831487] ret_from_fork+0x44/0x70 [1022567.831492] ? __pfx_kthread+0x10/0x10 [1022567.831494] ret_from_fork_asm+0x1b/0x30 [1022567.831502] </TASK> [1022568.208115] R13: 0000000000000002 R14: 0000000000000002 R15: 0003a1fef9c1d979 [1022568.216666] ? cpuidle_enter_state+0xca/0x720 [1022568.222092] ? tick_nohz_stop_tick+0x70/0x210 [1022568.227521] cpuidle_enter+0x2e/0x50 [1022568.232060] call_cpuidle+0x23/0x60 [1022568.236472] cpuidle_idle_call+0x11d/0x190 [1022568.241562] do_idle+0x87/0xf0 [1022568.245472] cpu_startup_entry+0x2a/0x30 [1022568.250357] start_secondary+0x129/0x160 [1022568.255243] secondary_startup_64_no_verify+0x184/0x18b [1022568.261586] </TASK> [Fix] Without this commit, we spent few weeks to confirm PS6/7 nodes running without any lockup. This commit is a follow-up fix for: netfilter: conntrack: rework offload nf_conn timeout extension logic But the call path and logic starting from flow_offload_fixup_ct changed between these 2 commits. Only manually backport this commit without all the changes in the middle causes issue, and the commit message also says what it fixes is harmless. [Test Plan] We have run a test kernel without this commit on PS6/7 for almost 2 weeks without hitting any soft lockup. [Where problems could occur] The commit we revert, its commit message says what it fixes is harmless, and we have already run the kernel without it for almost 2 weeks without any issue, the risk is very low. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2149762/+subscriptions
Комментариев нет:
Отправить комментарий