четверг

[Bug 2154748] Comment bridged from LTC Bugzilla

------- Comment From Boris.mail@de.ibm.com 2026-06-18 06:13 EDT------- That's awesome Massimiliano, thanks a lot for your work! -- You received this bug notification because you are subscribed to linux in Ubuntu. Matching subscriptions: Bgg, Bmail, Nb https://bugs.launchpad.net/bugs/2154748 Title: [Ubuntu 26.04] Severe Performance Degradation on kernel 7.0.0-15 Status in Ubuntu on IBM z Systems: In Progress Status in linux package in Ubuntu: In Progress Bug description: [ Impact ] s390 selects GENERIC_LOCKBREAK if PREEMPT is enabled. Reason is a historic 18 years old commit [1] which fixed a compile error for PREEMPT enabled kernels. Back than only PREEMPT_NONE and PREEMPT_VOLUNTARY kernels were considered to be important for s390. PREEMPT should "just work". However, since recently PREEMPT is always enabled [2], which also causes GENERIC_LOCKBREAK to be always enabled. For some workloads this leads to massive performance degradation; e.g. a simple kernel compile on machines with many CPUs may take up to four times longer. To fix this just remove the GENERIC_LOCKBREAK from s390's Kconfig, since the compile error from 18 years ago does not exist anymore. [1] commit b6b40c532a36 ("[S390] Define GENERIC_LOCKBREAK.") [2] commit 7dadeaa6e851 ("sched: Further restrict the preemption modes") [ Fix ] Backport commit: 1f57f68c4dd1 ("s390: Remove GENERIC_LOCKBREAK Kconfig option") [ Test Plan ] Compile and boot tested. Tested performance by compiling a kernel and monitoring execution with perf. [ Regression Potential ] The regression potential of the patch is low. It affects only s390x spinlock implementation. --- == Comment: #2 - Mete Durlu <Mete.Durlu@ibm.com> - 2026-06-01 08:59:07 == ---Problem Description--- Ubuntu 26.04 shows massive performance degradation. On large machines with more than 20 COREs (40 CPUs with SMT) CPU bound workloads suffer greatly. Ex: linux kernel compilation takes >10x more time Resource utilization shows up to 100% system time during the workload. perf top output indicates excessive lock contention in the kernel. $ make -j$(nproc) $ perf top   52.41% [kernel] [k] arch_spin_trylock_retry    8.76% [kernel] [k] _raw_spin_lock_irqsave    2.03% [kernel] [k] arch_spin_relax    1.09% cc1 [.] ht_lookup_with_hash(ht*, unsigned char    0.97% [kernel] [k] diag49c    0.95% [kernel] [k] lru_gen_add_folio    0.80% [kernel] [k] post_alloc_hook.localalias    0.77% [kernel] [k] lru_gen_del_folio.constprop.0    0.63% cc1 [.] htab_find_slot_with_hash    0.60% [kernel] [k] folios_put_refs    0.49% [kernel] [k] arch_vcpu_is_preempted    0.48% cc1 [.] ggc_internal_alloc_no_dtor(unsigned lo    0.44% cc1 [.] _cpp_lex_direct ... The lock contention seems to be linked directly to the thread count on the workload; # on a system with 34 COREs (68 CPUs w SMT) $ make -j20   perf top shows no arch_spin_trylock_retry $ make -j25   perf top shows ~2% arch_spin_trylock_retry $ make -j30   perf top shows ~5% arch_spin_trylock_retry $ make -j34 # thread count = core count   perf top shows ~15% arch_spin_trylock_retry $ make -j40 # thread count > core count   perf top shows >30% arch_spin_trylock_retry There has also been hints of delays on workqueue execution in dmesg output: ... [10600.136975] workqueue: vmstat_update hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND [10806.428576] workqueue: delayed_vfree_work hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND [10819.822422] workqueue: delayed_vfree_work hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND [10885.381900] workqueue: delayed_vfree_work hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND [10915.209117] workqueue: pcpu_balance_workfn hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND [11059.719121] workqueue: pcpu_balance_workfn hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND [20223.529295] workqueue: inode_switch_wbs_work_fn hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND [22584.374168] workqueue: mmput_async_fn hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND [22602.115559] workqueue: delayed_vfree_work hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND [22817.328172] workqueue: vmstat_update hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND [22840.202092] workqueue: delayed_vfree_work hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND [26834.512017] workqueue: delayed_vfree_work hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND [26883.480296] workqueue: vmstat_update hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND ... Systems with less COREs don't seem to be effected. The limit seems to be around 15 COREs (30 CPUs) ---uname output--- Linux localhost 7.0.0-15-generic #15-Ubuntu SMP PREEMPT Wed Apr 22 15:04:00 UTC 2026 s390x GNU/Linux To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/2154748/+subscriptions

Комментариев нет:

Отправить комментарий