This bug is awaiting verification that the linux/6.11.0-17.17 kernel in
-proposed solves the problem. Please test the kernel and update this bug
with the results. If the problem is solved, change the tag
'verification-needed-oracular-linux' to 'verification-done-oracular-
linux'. If the problem still exists, change the tag 'verification-
needed-oracular-linux' to 'verification-failed-oracular-linux'.
If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!
** Tags added: kernel-spammed-oracular-linux-v2 verification-needed-oracular-linux
--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2089327
Title:
By always inlining _compound_head(), clone() sees 3%+ performance
increase
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Noble:
Fix Committed
Status in linux source package in Oracular:
Fix Committed
Bug description:
BugLink: https://bugs.launchpad.net/bugs/2089327
[Impact]
_compound_head() is called frequently during clone() heavy workloads with
CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y set, so much that it is worthwhile
always inlining it for a slight 3%+ performance improvement during clone().
Over the lifecycle of Noble, Oracular it could save significant amounts of
CPU time during clone(), and save a large amount of electricity. We should
always inline _compound_head() and take advantage of the performance boost.
[Fix]
This was fixed in 6.12-rc1 by:
commit ef5f379de302884b9b7ad9b62587a942a9f0bb55
Author: David Hildenbrand <david@redhat.com>
Date: Tue Aug 20 14:22:10 2024 +0200
Subject: mm: always inline _compound_head() with CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP=y
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ef5f379de302884b9b7ad9b62587a942a9f0bb55
This commit is intended to offset the performance loss caused by:
c0bff412e67b ("mm: allow anon exclusive check over hugetlb tail
pages")
which landed in 6.10-rc1, but the change is generic enough that Noble users
would benefit from the fix as well. They bring both Noble and Oracular +3%.
[Testcase]
clone() heavy workloads are best to show the performance increase.
Originally, the user who requested this is running an Ansible heavy workload,
and finds that clone() bottlenecks during large runs of Ansible against
thousands of containers and hosts.
They benchmarked 6.8.0-49-generic against a patched test kernel of the same
6.8.0-49-generic and found:
Before:
08:24:23: Rename subiquity netplan config
08:36:12: hostendpoint_monitoring: Create log directory (10990)
= 11m49s
08:37:59: Rename subiquity netplan config
08:49:49: hostendpoint_monitoring: Create log directory (10991)
= 11m50s
After:
08:55:16: Rename subiquity netplan config
09:06:28: hostendpoint_monitoring: Create log directory (10991)
= 11m12s
09:08:59: Rename subiquity netplan config
09:20:22: hostendpoint_monitoring: Create log directory (10991)
= 11m23s
Take 11m23s versus 11m49s, for a 3.6%+ performance improvement. This adds up
over thousands of hosts.
I did some basic tests with stress-ng using the clone() stressor.
I ran:
$ sudo apt install stress-ng
$ sudo stress-ng --seq=5 --clone 5 --timeout=60 --metrics
Before:
ubuntu@jammy-test:~$ sudo stress-ng --seq=5 --clone 5 --timeout=60 --metrics
stress-ng: info: [953] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per
stress-ng: info: [953] (secs) (secs) (secs) (real time) (usr+sys time) instance (%)
stress-ng: info: [953] clone 19919 61.80 2.19 232.84 322.29 84.75 76.06
stress-ng: info: [55777] clone 19540 61.17 1.75 229.32 319.42 84.56 75.55
stress-ng: info: [107873] clone 19817 62.39 1.92 235.90 317.64 83.33 76.24
stress-ng: info: [177572] clone 19763 60.57 0.89 226.55 326.27 86.89 75.10
After:
ubuntu@jammy-test:~$ sudo stress-ng --seq=5 --clone 5 --timeout=60 --metrics
stress-ng: info: [914] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per
stress-ng: info: [914] (secs) (secs) (secs) (real time) (usr+sys time) instance (%)
stress-ng: info: [914] clone 19446 60.67 1.83 229.60 320.50 84.03 76.29
stress-ng: info: [67984] clone 19600 60.63 0.90 226.66 323.26 86.13 75.06
stress-ng: info: [117843] clone 19665 60.64 0.98 226.97 324.27 86.27 75.18
stress-ng: info: [167831] clone 19306 61.22 1.20 227.39 315.38 84.46 74.68
These numbers are a bit more fuzzy, but its about 3% extra bogo ops.
There is a test kernel available in the below ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf401086-test
If you install it, you too will get 3%+ performance improvement on clone() heavy
workloads.
[Where problems could occur]
We are inlining a hotly used function in the clone() syscall callpath. This
should technically increase the performance due to not having to context switch
between calls to _compound_head(), without much of a downside, apart from
slightly increased binary size, and the inability to livepatch the function.
I checked on cscope, and _compound_head is called from:
compound_head()
page_folio()
both in page-flags.h as #defines. This is going to have a minuscule footprint
change.
The risk of regression is well worth the 3%+ performance gain.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2089327/+subscriptions
Комментариев нет:
Отправить комментарий