** Changed in: ubuntu-power-systems
Status: Fix Committed => Fix Released
--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2076147
Title:
Add 'mm: hold PTL from the first PTE while reclaiming a large folio'
to fix L2 Guest hang during LTP Test
Status in The Ubuntu-power-systems project:
Fix Released
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Noble:
Fix Released
Status in linux source package in Oracular:
Fix Released
Bug description:
SRU Justification:
[ Impact ]
* KVM 2nd level guest (means KVM VM that runs nested on top of a Power 10
PowerVM hypervisor) hangs during LTP (Linux Test Projects) test suite.
* It hangs with:
"Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab"
* Diagnosing the issues points this this fix/upstream-commit:
[commit message, by Barry Song <v-songbaohua@oppo.com>]
Within try_to_unmap_one(), page_vma_mapped_walk() races with other PTE
modifications preceded by pte clear. While iterating over PTEs of a large folio,
it only starts acquiring PTL from the first valid (present) PTE.
PTE modifications can temporarily set PTEs to pte_none.
Consequently, the initial PTEs of a large folio might be skipped
in try_to_unmap_one().
For example, for an anon folio, if we skip PTE0, we may have PTE0 which is
still present, while PTE1 ~ PTE(nr_pages - 1) are swap entries after
try_to_unmap_one().
So folio will be still mapped, the folio fails to be reclaimed and is put
back to LRU in this round.
This also breaks up PTEs optimization such as CONT-PTE on this large folio
and may lead to accident folio_split() afterwards.
And since a part of PTEs are now swap entries, accessing those parts will
introduce overhead - do_swap_page.
Although the kernel can withstand all of the above issues, the situation
still seems quite awkward and warrants making it more ideal.
The same race also occurs with small folios, but they have only one PTE,
thus, it won't be possible for them to be partially unmapped.
This patch [see below] holds PTL from PTE0, allowing us to avoid reading
PTE values that are in the process of being transformed. With stable PTE
values, we can ensure that this large folio is either completely reclaimed
or that all PTEs remain untouched in this round.
A corner case is that if we hold PTL from PTE0 and most initial PTEs have
been really unmapped before that, we may increase the duration of holding
PTL. Thus we only apply this optimization to folios which are still entirely
mapped (not in deferred_split list).
[ Fix ]
* 73bc32875ee9 73bc32875ee9b1881dd780308c6793fe463fe803
"mm: hold PTL from the first PTE while reclaiming a large folio"
[ Test Plan ]
* An IBM Power 10 system (where PowerVM is mandatory)
running Ubuntu Server 24.04 (kernel 6.8) or later
with (nested) KVM setup (so KVM on top of PowerVM).
* Run LTP test suite
Tests running: SLS(io,base)
* Without the patch the above test will hang with
Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab
[ Where problems could occur ]
* This is a common code change in the memory management sub-system,
hence great care needs to be taken, even if it was discussed upfront
at the https://lore.kernel.org/ mailing list and the upstream commit
provenance shows that many eyes had a look at this.
* The modification is relatively small with just one if statement
(across two lines) in mm/vmscan.c.
* This change is to assist 'try_to_unmap' to acquire page table locks (PTL)
from the first page table entry (PTE) and to eliminate the influence of
temporary and volatile PTE values.
* If done wrong it can especially have a negative impact in case of large folios.
and wrong hints might be given to try_to_unmap
which may lead to bad page swapping.
* In case of an issue with this patch the result can also be decreased
performance and efficiency in the page table handling - the opposite
of what the patch is supposed to address.
* Fortunately several developers had their eyes on this commit,
as the provenance of the patch and the discussion at LKML shows.
* Further upstream conversation:
Link: https://lkml.kernel.org/r/20240306095219.71086-1-21cnbao@gmail.com
[ Other Info ]
* The commit is upstream since v6.10(-rc1), hence it will be included
in oracular with the planned target kernel of 6.11.
* And since (nested) KVM virtualization on ppc64el was (re-)introduced
just with noble, no older Ubuntu releases older than noble are affected.
__________
== Comment: #0 - SEETEENA THOUFEEK <sthoufee@in.ibm.com> - 2024-08-06 00:20:57 ==
+++ This bug was initially created as a clone of Bug #206372 +++
---Problem Description---
L2 Guest hung during LTP Tests. Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab (edit)
---uname output---
NA
---Additional Hardware Info---
NA
Contact Information = na
---Debugger Data---
NA
---Patches Installed---
NA
---Steps to Reproduce---
Tests running: SLS(io,base)
LPAR Config:
============
PHYP Environment: PowerVM
LPAR Hostname/IP: 10.33.2.107
Rootvg Filesystem: xfs
Network Interface: Shiner-T
vNIC/SR-IOV Config: n/a
IO Type: SAN
IO Disk Type: raw
Multipath Enabled: No
-------------------------------------------------------------------------------------
DUMP Config:
============
KDUMP configured: Yes
XMON enabled no
DUMP Available: no
Machine Type = na
Userspace rpm: NA
The userspace tool has the following bit modes: NA
Userspace tool obtained from project website: na
Userspace tool common name: NA
*Additional Instructions for na:
-Post a private note with access information to the machine that is currently in the debugger.
-Attach ltrace and strace of userspace application.
please include this commit in Ubuntu 24.04
upstream commit which is solving these data store lockups:
73bc32875ee9b1881dd780308c6793fe463fe803 mm: hold PTL from the first PTE while reclaiming a large folio
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2076147/+subscriptions
Комментариев нет:
Отправить комментарий