среда

[Bug 2060039] Re: [Ubuntu-24.04] FADump with recommended crash size is making the L1 hang

Hello Kowshik, thanks for the verification, even if the result is not
like expected :-/

I double checked if the patch is properly included:
~/ubuntu-noble-master-next/noble-clean$ git log --oneline --grep "powerpc/64s/radix/kfence: map __kfence_pool at page granularity"
ec65624fc069 powerpc/64s/radix/kfence: map __kfence_pool at page granularity
~/ubuntu-noble-master-next/noble-clean$ git tag --contains ec65624fc069
Ubuntu-6.8.0-48.48
And it is.

And we have currently set CONFIG_KFENCE to yes for all architectures, incl. ppc64el:
grep -ri CONFIG_KFENCE\ debian.master/*
debian.master/config/annotations:CONFIG_KFENCE policy<{'amd64': 'y', 'arm64': 'y', 'armhf': 'y', 'ppc64el': 'y', 'riscv64': 'y', 's390x': 'y'}>

Oh dear, so that is (according to the Ubuntu SRU terms) first of all a
"verification-failed".


Either the proposed fix (to cherry-pick "powerpc/64s/radix/kfence: map __kfence_pool at page granularity") is either not fixing the situation like expected, or there is more needed (like for example setting CONFIG_KFENCE to 'n' for ppc64el only)?!

Is a new root cause analysis for this issue needed?
And shall we pull the commit again out of the Ubuntu kernel (which is tricky, since we are already late in the SRU cycle)?
@IBM what do you think?

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2060039

Title:
[Ubuntu-24.04] FADump with recommended crash size is making the L1
hang

Status in The Ubuntu-power-systems project:
Fix Committed
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Noble:
Fix Committed
Status in linux source package in Oracular:
Fix Released

Bug description:
SRU Justification:

[Impact]
 * L1 host hangs when triggering FADump that results in crash

[Fix]
 * 353d7a84c214f184d5a6b62acdec8b4424159b7c 353d7a84c214 "powerpc/64s/radix/kfence: map __kfence_pool at page granularity"

[Test Case]
 * Have a Ubuntu Server 24.04 LTS installation on ppc64el.
 * Enable FADump with 1GB: fadump=on crashkernel=1024M
 * A kernel panic will happen when dump got triggered

[Regression Potential]
* There is a certain risk of a regression, but it is mapping only the memory
  allocated for KFENCE pool at page granularity, reducing memory consumption
  when KFENCE is used.

* On top the commit is already upstream reviewed and accepted.

* The modifications were done and tested by IBM.

* The fadump feature is supported only on IBM POWER systems.

[Other]
* The fix/commit got upstream accepted with kernel v6.11-rc4,
  hence Oracular (with a planned kernel of 6.11) is not affected.

.......................

Problem description :
======================

Triggered FADump with the recommended crash. L1 host got hung.

As per the public document
https://wiki.ubuntu.com/ppc64el/Recommendations recommended crash
kernel size is 1024M for the system. But with 1024M and 2048M, the L1
is getting hanged. with 4096, crash is generated and collected.

root@ubuntu2404:~# uname -ar
Linux ubuntu2404 6.8.0-11-generic #11-Ubuntu SMP Wed Feb 14 00:33:03 UTC 2024 ppc64le ppc64le ppc64le GNU/Linux

root@ubuntu2404:~# free -h
               total used free shared buff/cache available
Mem: 48Gi 1.7Gi 46Gi 13Mi 687Mi 46Gi
Swap: 8.0Gi 0B 8.0Gi

root@ubuntu2404:~# cat /proc/cmdline
BOOT_IMAGE=/vmlinux-6.8.0-11-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro fadump=on crashkernel=1024M

root@ubuntu2404:~# dmesg | grep -i reser
[ 0.000000] fadump: Reserved 1024MB of memory at 0x00000040000000 (System RAM: 51200MB)
[ 0.000000] fadump: Initialized 0x40000000 bytes cma area at 1024MB from 0x40070000 bytes of memory reserved for firmware-assisted dump
[ 0.000000] Memory: 49316672K/52428800K available (23616K kernel code, 4096K rwdata, 25536K rodata, 8832K init, 2487K bss, 2063552K reserved, 1048576K cma-reserved)
[ 0.396408] ibmvscsi 30000066: Client reserve enabled

root@ubuntu2404:~# kdump-config show
DUMP_MODE: fadump
USE_KDUMP: 1
KDUMP_COREDIR: /var/crash
   /var/lib/kdump/vmlinuz
kdump initrd:
   /var/lib/kdump/initrd.img
current state: ready to fadump

IBM is looking to update the crash kernel reservations section of the
wiki for Power.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2060039/+subscriptions

Комментариев нет:

Отправить комментарий