среда

[Bug 2076147] Re: Add 'mm: hold PTL from the first PTE while reclaiming a large folio' to fix L2 Guest hang during LTP Test

** Changed in: ubuntu-power-systems
Status: Fix Committed => Fix Released

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2076147

Title:
Add 'mm: hold PTL from the first PTE while reclaiming a large folio'
to fix L2 Guest hang during LTP Test

Status in The Ubuntu-power-systems project:
Fix Released
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Noble:
Fix Released
Status in linux source package in Oracular:
Fix Released

Bug description:
SRU Justification:

[ Impact ]

 * KVM 2nd level guest (means KVM VM that runs nested on top of a Power 10
   PowerVM hypervisor) hangs during LTP (Linux Test Projects) test suite.

 * It hangs with:
   "Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab"

 * Diagnosing the issues points this this fix/upstream-commit:
   [commit message, by Barry Song <v-songbaohua@oppo.com>]
   Within try_to_unmap_one(), page_vma_mapped_walk() races with other PTE
   modifications preceded by pte clear. While iterating over PTEs of a large folio,
   it only starts acquiring PTL from the first valid (present) PTE.
   PTE modifications can temporarily set PTEs to pte_none.
   Consequently, the initial PTEs of a large folio might be skipped
   in try_to_unmap_one().
   For example, for an anon folio, if we skip PTE0, we may have PTE0 which is
   still present, while PTE1 ~ PTE(nr_pages - 1) are swap entries after
   try_to_unmap_one().
   So folio will be still mapped, the folio fails to be reclaimed and is put
   back to LRU in this round.
   This also breaks up PTEs optimization such as CONT-PTE on this large folio
   and may lead to accident folio_split() afterwards.
   And since a part of PTEs are now swap entries, accessing those parts will
   introduce overhead - do_swap_page.
   Although the kernel can withstand all of the above issues, the situation
   still seems quite awkward and warrants making it more ideal.
   The same race also occurs with small folios, but they have only one PTE,
   thus, it won't be possible for them to be partially unmapped.
   This patch [see below] holds PTL from PTE0, allowing us to avoid reading
   PTE values that are in the process of being transformed. With stable PTE
   values, we can ensure that this large folio is either completely reclaimed
   or that all PTEs remain untouched in this round.
   A corner case is that if we hold PTL from PTE0 and most initial PTEs have
   been really unmapped before that, we may increase the duration of holding
   PTL. Thus we only apply this optimization to folios which are still entirely
   mapped (not in deferred_split list).

[ Fix ]

 * 73bc32875ee9 73bc32875ee9b1881dd780308c6793fe463fe803
   "mm: hold PTL from the first PTE while reclaiming a large folio"

[ Test Plan ]

 * An IBM Power 10 system (where PowerVM is mandatory)
   running Ubuntu Server 24.04 (kernel 6.8) or later
   with (nested) KVM setup (so KVM on top of PowerVM).

 * Run LTP test suite
   Tests running: SLS(io,base)

 * Without the patch the above test will hang with
   Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab

[ Where problems could occur ]

 * This is a common code change in the memory management sub-system,
   hence great care needs to be taken, even if it was discussed upfront
   at the https://lore.kernel.org/ mailing list and the upstream commit
   provenance shows that many eyes had a look at this.

 * The modification is relatively small with just one if statement
   (across two lines) in mm/vmscan.c.

 * This change is to assist 'try_to_unmap' to acquire page table locks (PTL)
   from the first page table entry (PTE) and to eliminate the influence of
   temporary and volatile PTE values.

 * If done wrong it can especially have a negative impact in case of large folios.
   and wrong hints might be given to try_to_unmap
   which may lead to bad page swapping.

 * In case of an issue with this patch the result can also be decreased
   performance and efficiency in the page table handling - the opposite
   of what the patch is supposed to address.

 * Fortunately several developers had their eyes on this commit,
   as the provenance of the patch and the discussion at LKML shows.

* Further upstream conversation:
Link: https://lkml.kernel.org/r/20240306095219.71086-1-21cnbao@gmail.com

[ Other Info ]

 * The commit is upstream since v6.10(-rc1), hence it will be included
   in oracular with the planned target kernel of 6.11.

* And since (nested) KVM virtualization on ppc64el was (re-)introduced
just with noble, no older Ubuntu releases older than noble are affected.

__________

== Comment: #0 - SEETEENA THOUFEEK <sthoufee@in.ibm.com> - 2024-08-06 00:20:57 ==
+++ This bug was initially created as a clone of Bug #206372 +++

---Problem Description---
L2 Guest hung during LTP Tests. Back trace of paca->saved_r1 (0xc000000c1bc8bb00) (possibly stale) @ new_slab (edit)

---uname output---
NA

---Additional Hardware Info---
NA

Contact Information = na

---Debugger Data---
NA

---Patches Installed---
NA

---Steps to Reproduce---

Tests running: SLS(io,base)
LPAR Config:
============
PHYP Environment: PowerVM
LPAR Hostname/IP: 10.33.2.107
Rootvg Filesystem: xfs
Network Interface: Shiner-T
vNIC/SR-IOV Config: n/a
IO Type: SAN
IO Disk Type: raw
Multipath Enabled: No
-------------------------------------------------------------------------------------
DUMP Config:
============
KDUMP configured: Yes
XMON enabled no
DUMP Available: no

Machine Type = na

Userspace rpm: NA

The userspace tool has the following bit modes: NA

Userspace tool obtained from project website: na

Userspace tool common name: NA

*Additional Instructions for na:
-Post a private note with access information to the machine that is currently in the debugger.
-Attach ltrace and strace of userspace application.

please include this commit in Ubuntu 24.04

upstream commit which is solving these data store lockups:
73bc32875ee9b1881dd780308c6793fe463fe803 mm: hold PTL from the first PTE while reclaiming a large folio

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2076147/+subscriptions

Комментариев нет:

Отправить комментарий