This bug is awaiting verification that the linux/6.8.0-50.51 kernel in
-proposed solves the problem. Please test the kernel and update this bug
with the results. If the problem is solved, change the tag
'verification-needed-noble-linux' to 'verification-done-noble-linux'. If
the problem still exists, change the tag 'verification-needed-noble-
linux' to 'verification-failed-noble-linux'.
If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!
** Tags added: kernel-spammed-noble-linux-v2 verification-needed-noble-linux
--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2081079
Title:
[SRU] Ubuntu 24.04 - GPU cannot be installed with DL380a Gen12 (2P,
SRF-SP)
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Noble:
Fix Committed
Status in linux source package in Oracular:
Fix Released
Bug description:
[Impact]
Description:
Failed to install GPU with Ubuntu 24.04 on a DL380a Gen12 with Intel Sierra Forest 2P
There is a random write to VF BAR0's memory region that causes the
kernel got MCE error.
Version-Release number :
Ubuntu 24.04
Additional info:
We have tracked this issue with RHEL9.4, it's caused by the following
pathes.
cb4a6ccf3583 perf/x86/intel/uncore: Support Sierra Forest and Grand Ridge (v6.8-rc1)
388d76175bd9 perf/x86/intel/uncore: Support IIO free-running counters on GNR (v6.8-rc1)
632c4bf6d007 perf/x86/intel/uncore: Support Granite Rapids (v6.8-rc1)
b560e0cd882b perf/x86/uncore: Use u64 to replace unsigned for the uncore offsets array (v6.8-rc1)
cf35791476fc perf/x86/intel/uncore: Generic uncore_get_uncores and MMIO format of SPR (v6.8-rc1)
[Test Plan]
How reproducible:
Each time
Steps to reproduce
- PCI segment, Intel VT-d and SR-IOV , all enabled in the BIOS
- Run a fresh install on a DL380a server with 2P with GPU in slot17
Expected results
No MCE and run installation w/o problem
Actual results
The kernel got MCE errors.
[Fix]
Intel gave us a patch set that resolves the issue.
https://lore.kernel.org/lkml/20240614134631.1092359-1-kan.liang@linux.intel.com/#r
The following patches are required.
f8a86a9bb5f7 perf/x86/intel/uncore: Support HBM and CXL PMON counters (v6.11-rc1)
15a4bd51853b perf/x86/uncore: Cleanup unused unit structure (v6.11-rc1)
f76a8420444b perf/x86/uncore: Apply the unit control RB tree to PCI uncore units (v6.11-rc1)
b1d9ea2e1ca4 perf/x86/uncore: Apply the unit control RB tree to MSR uncore units (v6.11-rc1)
80580dae65b9 perf/x86/uncore: Apply the unit control RB tree to MMIO uncore units (v6.11-rc1)
585463fee642 perf/x86/uncore: Retrieve the unit ID from the unit control RB tree (v6.11-rc1)
c74443d92f68 perf/x86/uncore: Support per PMU cpumask (v6.11-rc1)
0007f3932592 perf/x86/uncore: Save the unit control address of all units (v6.11-rc1)
[Where problems could occur]
[Other Info]
https://code.launchpad.net/~mreed8855/ubuntu/+source/linux/+git/noble/+ref/lp_2081079_dl380a_gen12
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2081079/+subscriptions
Комментариев нет:
Отправить комментарий