Public bug reported: === SUMMARY === A userspace process reading /proc/net/tcp triggers a softlockup in the TCP listening hash table traversal (tcp_seq_next -> listening_get_first -> _raw_spin_lock), stalling CPU#4 for 210+ seconds and cascading into a full networking stack deadlock via mutex chains through kworker -> wpa_supplicant -> NetworkManager -> tailscaled and others. System requires hard reset. === ENVIRONMENT == Kernel: 6.17.0-20-generic #20-Ubuntu SMP PREEMPT(voluntary) Distribution: Ubuntu 25.10 (questing) Compiler: gcc 15.2.0 (Ubuntu 15.2.0-4ubuntu4) Architecture: x86_64 Hardware: ASUS ROG STRIX Z890-E GAMING WIFI, BIOS 2201 09/12/2025 CPUs: 20 (Intel) Taint: P (PROPRIETARY_MODULE), O (OOT_MODULE), L (SOFTLOCKUP) === DESCRIPTION === On April 10 2026 at ~22:03 local time, the system became completely unresponsive. Two CPU cores (4 and 1) entered softlockup state spinning on the TCP listening hash table spinlock via the seq_file read path for /proc/net/tcp. The lockup started at 22:03:29 and persisted until manual reboot at ~22:46. CPU#4 was stuck for 210+ seconds (confirmed by RCU stall at 22:07:07 reporting cputime of 210003ms). CPU#1 was also stuck for 75+ seconds. This triggered a cascading mutex dependency chain that blocked the entire networking stack: node (PID 495127) reads /proc/net/tcp -> kworker/u80:5 (485336) blocked on mutex owned by node -> wpa_supplicant (2546) blocked on mutex owned by kworker -> NetworkManager (2540) blocked on mutex owned by wpa_supplicant -> tailscaled (2891) blocked on mutex owned by wpa_supplicant -> nxserver.bin (3077) blocked on mutex owned by wpa_supplicant -> connector-threa (5600) blocked on mutex owned by wpa_supplicant -> P2P_DISCOVER (7304) blocked on mutex owned by wpa_supplicant -> ThreadPoolForeg (5808) blocked on mutex owned by wpa_supplicant -> kworker/10:2 (43769) blocked on mutex owned by node 10+ tasks total blocked for 122+ seconds before hung task warnings were suppressed. === CALL TRACE (softlockup on CPU#4) === watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [DefaultDispatch:21259] Tainted: P O L 6.17.0-20-generic #20-Ubuntu PREEMPT(voluntary) Call Trace: <TASK> _raw_spin_lock+0x3f/0x60 listening_get_first+0x90/0x120 listening_get_next+0xb0/0xd0 tcp_seq_next+0x60/0x90 seq_read_iter+0x2f9/0x490 seq_read+0x11b/0x160 proc_reg_read+0x6a/0xd0 vfs_read+0xbc/0x3a0 ksys_read+0x71/0xf0 __x64_sys_read+0x19/0x30 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> RIP: 0010:native_queued_spin_lock_slowpath+0x24e/0x330 RSP: 0018:ffffcd1602b978f0 EFLAGS: 00000206 RAX: 0000000000082056 RBX: ffff8cc5440be010 RCX: ffff8cc5440be010 === RCU STALL (CPU#4) === rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 4-....: (240003 ticks this GP) idle=c7b4/1/0x4000000000000000 softirq=7014058/7014058 fqs=58549 rcu: hardirqs softirqs csw/system rcu: number: 243545 331 0 rcu: cputime: 0 0 210003 ==> 210003(ms) rcu: (t=240005 jiffies g=10296721 q=126115 ncpus=20) === HUNG TASKS (all blocked 122+ seconds) === INFO: task NetworkManager:2540 blocked for more than 122 seconds. INFO: task NetworkManager:2540 is blocked on a mutex likely owned by task wpa_supplicant:2546. INFO: task wpa_supplicant:2546 blocked for more than 122 seconds. INFO: task wpa_supplicant:2546 is blocked on a mutex likely owned by task kworker/u80:5:485336. INFO: task tailscaled:2891 blocked for more than 122 seconds. INFO: task tailscaled:2891 is blocked on a mutex likely owned by task wpa_supplicant:2546. INFO: task nxserver.bin:3077 blocked for more than 122 seconds. INFO: task connector-threa:5600 blocked for more than 122 seconds. INFO: task connector-threa:5600 is blocked on a mutex likely owned by task wpa_supplicant:2546. INFO: task P2P_DISCOVER:7304 blocked for more than 122 seconds. INFO: task P2P_DISCOVER:7304 is blocked on a mutex likely owned by task wpa_supplicant:2546. INFO: task ThreadPoolForeg:5808 blocked for more than 122 seconds. INFO: task ThreadPoolForeg:5808 is blocked on a mutex likely owned by task wpa_supplicant:2546. INFO: task kworker/10:2:43769 blocked for more than 122 seconds. INFO: task kworker/10:2:43769 is blocked on a mutex likely owned by task (node):495127. INFO: task kworker/u80:5:485336 blocked for more than 122 seconds. INFO: task (node):495127 blocked for more than 122 seconds. Future hung task reports are suppressed (kernel.hung_task_warnings=10). === SOFTLOCKUP TIMELINE === 22:03:29 CPU#4 stuck for 23s [DefaultDispatch:21259] 22:03:57 CPU#4 stuck for 49s 22:04:07 RCU stall detected on CPU#4 (cputime 29999ms) 22:04:13 CPU#1 stuck for 23s [DefaultDispatch:7886] 22:04:33 CPU#4 stuck for 82s 22:04:41 CPU#1 stuck for 49s 22:05:01 CPU#4 stuck for 108s 22:05:09 CPU#1 stuck for 75s 22:05:29 CPU#4 stuck for 134s 22:05:37 CPU#1 stuck for 108s 22:05:57 CPU#4 stuck for 160s 22:06:05 CPU#1 stuck for 134s 22:06:25 CPU#4 stuck for 186s 22:06:29 Hung task reports (10+ tasks blocked 122+ seconds) 22:07:07 RCU stall on CPU#4 (cputime 210003ms, 240s in grace period) ...system unresponsive until manual reboot at ~22:46 === ROOT CAUSE ANALYSIS === The seq_file implementation for /proc/net/tcp walks the TCP listening hash table (listening_hash) under a spinlock. Under heavy network connection load with concurrent modifications, this spinlock becomes contended. With PREEMPT(voluntary), the spinning CPU cannot be preempted, causing softlockup. The softlockup on the kworker then cascades through mutex dependencies in the networking subsystem (wpa_supplicant -> NetworkManager -> etc.), creating a system-wide deadlock of all networking-related processes. This appears to be a known class of bug in the TCP hash table seq_file implementation that has been partially addressed in mainline via RCU-based read-side access. The listening_hash walk may not be fully converted. === WORKAROUND === Setting softlockup_panic=1 allows the kernel to auto-reboot after ~10s of softlockup instead of hanging indefinitely. === REPRODUCIBILITY === Happened once. The trigger was a node process (PID 495127) reading /proc/net/tcp while the system was under moderate network load. Not deterministically reproducible, but the code path is clear from traces. === KERNEL CONFIG === CONFIG_PREEMPT_VOLUNTARY=y (PREEMPT(voluntary)) Kernel command line: root=zfs:rpool/root loglevel=7 spl.spl_hostid=0x00bab10c ProblemType: Bug DistroRelease: Ubuntu 25.10 Package: linux-image-6.17.0-20-generic 6.17.0-20.20 ProcVersionSignature: Ubuntu 6.17.0-20.20-generic 6.17.13 Uname: Linux 6.17.0-20-generic x86_64 NonfreeKernelModules: zfs ApportVersion: 2.33.1-0ubuntu3 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC1: rperez 3496 F.... wireplumber /dev/snd/controlC2: rperez 3496 F.... wireplumber /dev/snd/controlC0: rperez 3496 F.... wireplumber /dev/snd/seq: rperez 3492 F.... pipewire CasperMD5CheckResult: unknown CurrentDesktop: KDE Date: Fri Apr 10 23:20:18 2026 MachineType: ASUS System Product Name ProcFB: 0 nvidia-drmdrmfb ProcKernelCmdLine: root=zfs:rpool/root loglevel=7 spl.spl_hostid=0x00bab10c PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: firmware-sof N/A linux-firmware 20250901.git993ff19b-0ubuntu1.10 SourcePackage: linux UpgradeStatus: Upgraded to questing on 2025-12-13 (118 days ago) dmi.bios.date: 09/12/2025 dmi.bios.release: 22.1 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 2201 dmi.board.asset.tag: Default string dmi.board.name: ROG STRIX Z890-E GAMING WIFI dmi.board.vendor: ASUSTeK COMPUTER INC. dmi.board.version: Rev 1.xx dmi.chassis.asset.tag: Default string dmi.chassis.type: 3 dmi.chassis.vendor: Default string dmi.chassis.version: Default string dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2201:bd09/12/2025:br22.1:svnASUS:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXZ890-EGAMINGWIFI:rvrRev1.xx:cvnDefaultstring:ct3:cvrDefaultstring:skuSKU: dmi.product.family: To be filled by O.E.M. dmi.product.name: System Product Name dmi.product.sku: SKU dmi.product.version: System Version dmi.sys.vendor: ASUS ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: amd64 apport-bug questing -- You received this bug notification because you are subscribed to linux in Ubuntu. Matching subscriptions: Bgg, Bmail, Nb https://bugs.launchpad.net/bugs/2148046 Title: softlockup in tcp_seq_next / listening_get_first causes full networking deadlock Status in linux package in Ubuntu: New Bug description: === SUMMARY === A userspace process reading /proc/net/tcp triggers a softlockup in the TCP listening hash table traversal (tcp_seq_next -> listening_get_first -> _raw_spin_lock), stalling CPU#4 for 210+ seconds and cascading into a full networking stack deadlock via mutex chains through kworker -> wpa_supplicant -> NetworkManager -> tailscaled and others. System requires hard reset. === ENVIRONMENT == Kernel: 6.17.0-20-generic #20-Ubuntu SMP PREEMPT(voluntary) Distribution: Ubuntu 25.10 (questing) Compiler: gcc 15.2.0 (Ubuntu 15.2.0-4ubuntu4) Architecture: x86_64 Hardware: ASUS ROG STRIX Z890-E GAMING WIFI, BIOS 2201 09/12/2025 CPUs: 20 (Intel) Taint: P (PROPRIETARY_MODULE), O (OOT_MODULE), L (SOFTLOCKUP) === DESCRIPTION === On April 10 2026 at ~22:03 local time, the system became completely unresponsive. Two CPU cores (4 and 1) entered softlockup state spinning on the TCP listening hash table spinlock via the seq_file read path for /proc/net/tcp. The lockup started at 22:03:29 and persisted until manual reboot at ~22:46. CPU#4 was stuck for 210+ seconds (confirmed by RCU stall at 22:07:07 reporting cputime of 210003ms). CPU#1 was also stuck for 75+ seconds. This triggered a cascading mutex dependency chain that blocked the entire networking stack: node (PID 495127) reads /proc/net/tcp -> kworker/u80:5 (485336) blocked on mutex owned by node -> wpa_supplicant (2546) blocked on mutex owned by kworker -> NetworkManager (2540) blocked on mutex owned by wpa_supplicant -> tailscaled (2891) blocked on mutex owned by wpa_supplicant -> nxserver.bin (3077) blocked on mutex owned by wpa_supplicant -> connector-threa (5600) blocked on mutex owned by wpa_supplicant -> P2P_DISCOVER (7304) blocked on mutex owned by wpa_supplicant -> ThreadPoolForeg (5808) blocked on mutex owned by wpa_supplicant -> kworker/10:2 (43769) blocked on mutex owned by node 10+ tasks total blocked for 122+ seconds before hung task warnings were suppressed. === CALL TRACE (softlockup on CPU#4) === watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [DefaultDispatch:21259] Tainted: P O L 6.17.0-20-generic #20-Ubuntu PREEMPT(voluntary) Call Trace: <TASK> _raw_spin_lock+0x3f/0x60 listening_get_first+0x90/0x120 listening_get_next+0xb0/0xd0 tcp_seq_next+0x60/0x90 seq_read_iter+0x2f9/0x490 seq_read+0x11b/0x160 proc_reg_read+0x6a/0xd0 vfs_read+0xbc/0x3a0 ksys_read+0x71/0xf0 __x64_sys_read+0x19/0x30 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> RIP: 0010:native_queued_spin_lock_slowpath+0x24e/0x330 RSP: 0018:ffffcd1602b978f0 EFLAGS: 00000206 RAX: 0000000000082056 RBX: ffff8cc5440be010 RCX: ffff8cc5440be010 === RCU STALL (CPU#4) === rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 4-....: (240003 ticks this GP) idle=c7b4/1/0x4000000000000000 softirq=7014058/7014058 fqs=58549 rcu: hardirqs softirqs csw/system rcu: number: 243545 331 0 rcu: cputime: 0 0 210003 ==> 210003(ms) rcu: (t=240005 jiffies g=10296721 q=126115 ncpus=20) === HUNG TASKS (all blocked 122+ seconds) === INFO: task NetworkManager:2540 blocked for more than 122 seconds. INFO: task NetworkManager:2540 is blocked on a mutex likely owned by task wpa_supplicant:2546. INFO: task wpa_supplicant:2546 blocked for more than 122 seconds. INFO: task wpa_supplicant:2546 is blocked on a mutex likely owned by task kworker/u80:5:485336. INFO: task tailscaled:2891 blocked for more than 122 seconds. INFO: task tailscaled:2891 is blocked on a mutex likely owned by task wpa_supplicant:2546. INFO: task nxserver.bin:3077 blocked for more than 122 seconds. INFO: task connector-threa:5600 blocked for more than 122 seconds. INFO: task connector-threa:5600 is blocked on a mutex likely owned by task wpa_supplicant:2546. INFO: task P2P_DISCOVER:7304 blocked for more than 122 seconds. INFO: task P2P_DISCOVER:7304 is blocked on a mutex likely owned by task wpa_supplicant:2546. INFO: task ThreadPoolForeg:5808 blocked for more than 122 seconds. INFO: task ThreadPoolForeg:5808 is blocked on a mutex likely owned by task wpa_supplicant:2546. INFO: task kworker/10:2:43769 blocked for more than 122 seconds. INFO: task kworker/10:2:43769 is blocked on a mutex likely owned by task (node):495127. INFO: task kworker/u80:5:485336 blocked for more than 122 seconds. INFO: task (node):495127 blocked for more than 122 seconds. Future hung task reports are suppressed (kernel.hung_task_warnings=10). === SOFTLOCKUP TIMELINE === 22:03:29 CPU#4 stuck for 23s [DefaultDispatch:21259] 22:03:57 CPU#4 stuck for 49s 22:04:07 RCU stall detected on CPU#4 (cputime 29999ms) 22:04:13 CPU#1 stuck for 23s [DefaultDispatch:7886] 22:04:33 CPU#4 stuck for 82s 22:04:41 CPU#1 stuck for 49s 22:05:01 CPU#4 stuck for 108s 22:05:09 CPU#1 stuck for 75s 22:05:29 CPU#4 stuck for 134s 22:05:37 CPU#1 stuck for 108s 22:05:57 CPU#4 stuck for 160s 22:06:05 CPU#1 stuck for 134s 22:06:25 CPU#4 stuck for 186s 22:06:29 Hung task reports (10+ tasks blocked 122+ seconds) 22:07:07 RCU stall on CPU#4 (cputime 210003ms, 240s in grace period) ...system unresponsive until manual reboot at ~22:46 === ROOT CAUSE ANALYSIS === The seq_file implementation for /proc/net/tcp walks the TCP listening hash table (listening_hash) under a spinlock. Under heavy network connection load with concurrent modifications, this spinlock becomes contended. With PREEMPT(voluntary), the spinning CPU cannot be preempted, causing softlockup. The softlockup on the kworker then cascades through mutex dependencies in the networking subsystem (wpa_supplicant -> NetworkManager -> etc.), creating a system-wide deadlock of all networking-related processes. This appears to be a known class of bug in the TCP hash table seq_file implementation that has been partially addressed in mainline via RCU-based read-side access. The listening_hash walk may not be fully converted. === WORKAROUND === Setting softlockup_panic=1 allows the kernel to auto-reboot after ~10s of softlockup instead of hanging indefinitely. === REPRODUCIBILITY === Happened once. The trigger was a node process (PID 495127) reading /proc/net/tcp while the system was under moderate network load. Not deterministically reproducible, but the code path is clear from traces. === KERNEL CONFIG === CONFIG_PREEMPT_VOLUNTARY=y (PREEMPT(voluntary)) Kernel command line: root=zfs:rpool/root loglevel=7 spl.spl_hostid=0x00bab10c ProblemType: Bug DistroRelease: Ubuntu 25.10 Package: linux-image-6.17.0-20-generic 6.17.0-20.20 ProcVersionSignature: Ubuntu 6.17.0-20.20-generic 6.17.13 Uname: Linux 6.17.0-20-generic x86_64 NonfreeKernelModules: zfs ApportVersion: 2.33.1-0ubuntu3 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC1: rperez 3496 F.... wireplumber /dev/snd/controlC2: rperez 3496 F.... wireplumber /dev/snd/controlC0: rperez 3496 F.... wireplumber /dev/snd/seq: rperez 3492 F.... pipewire CasperMD5CheckResult: unknown CurrentDesktop: KDE Date: Fri Apr 10 23:20:18 2026 MachineType: ASUS System Product Name ProcFB: 0 nvidia-drmdrmfb ProcKernelCmdLine: root=zfs:rpool/root loglevel=7 spl.spl_hostid=0x00bab10c PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: firmware-sof N/A linux-firmware 20250901.git993ff19b-0ubuntu1.10 SourcePackage: linux UpgradeStatus: Upgraded to questing on 2025-12-13 (118 days ago) dmi.bios.date: 09/12/2025 dmi.bios.release: 22.1 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 2201 dmi.board.asset.tag: Default string dmi.board.name: ROG STRIX Z890-E GAMING WIFI dmi.board.vendor: ASUSTeK COMPUTER INC. dmi.board.version: Rev 1.xx dmi.chassis.asset.tag: Default string dmi.chassis.type: 3 dmi.chassis.vendor: Default string dmi.chassis.version: Default string dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2201:bd09/12/2025:br22.1:svnASUS:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXZ890-EGAMINGWIFI:rvrRev1.xx:cvnDefaultstring:ct3:cvrDefaultstring:skuSKU: dmi.product.family: To be filled by O.E.M. dmi.product.name: System Product Name dmi.product.sku: SKU dmi.product.version: System Version dmi.sys.vendor: ASUS To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2148046/+subscriptions
Комментариев нет:
Отправить комментарий