вторник

[Bug 2141377] Re: [SRU]Fix xe GPU suspend/resume crash on Battlemage

** Description changed:

[ Impact ]
Intel Battlemage xe GPU (8086:e212) crashes during system resume with
NULL pointer dereference in xe_guc_ads_populate_post_load(), making
suspend/resume non-functional on affected systems.

Root cause: Noble 6.17 kernels have commit 59cebf0bdff48 but are missing
its prerequisite commit 1313351e71181. Without proper forcewake handling,
MMIO register access causes hardware corruption.

[ Fix ]
Cherry-pick upstream commit from v6.18-rc1:
- 1313351e71181 ("drm/xe: make xe_gt_idle_disable_c6() handle the forcewake internally")

[ Test Plan ]
1. System: Lenovo ThinkStation P3 Ultra G2 with Battlemage dGPU (8086:e212)
2. Reproduce: rtcwake -m mem -s 10
3. Verify: System resumes successfully without crashes

[ Where problems could occur ]
- Low risk - change is localized to xe driver GT idle/power management, only
- affects suspend/resume path. Commit is from mainline v6.18-rc1 with upstream
- review.
+ It may break xe driver GT idle/power management, affects suspend/resume path. Commit is from mainline v6.18-rc1.

** Also affects: linux (Ubuntu Resolute)
Importance: Undecided
Status: New

** Also affects: linux-oem-6.17 (Ubuntu Resolute)
Importance: Undecided
Status: New

** Also affects: linux (Ubuntu Noble)
Importance: Undecided
Status: New

** Also affects: linux-oem-6.17 (Ubuntu Noble)
Importance: Undecided
Status: New

** Also affects: linux (Ubuntu Questing)
Importance: Undecided
Status: New

** Also affects: linux-oem-6.17 (Ubuntu Questing)
Importance: Undecided
Status: New

** Changed in: linux (Ubuntu Noble)
Status: New => Won't Fix

** Description changed:

[ Impact ]
Intel Battlemage xe GPU (8086:e212) crashes during system resume with
NULL pointer dereference in xe_guc_ads_populate_post_load(), making
suspend/resume non-functional on affected systems.

Root cause: Noble 6.17 kernels have commit 59cebf0bdff48 but are missing
its prerequisite commit 1313351e71181. Without proper forcewake handling,
MMIO register access causes hardware corruption.

[ Fix ]
Cherry-pick upstream commit from v6.18-rc1:
- 1313351e71181 ("drm/xe: make xe_gt_idle_disable_c6() handle the forcewake internally")

[ Test Plan ]
1. System: Lenovo ThinkStation P3 Ultra G2 with Battlemage dGPU (8086:e212)
2. Reproduce: rtcwake -m mem -s 10
3. Verify: System resumes successfully without crashes

[ Where problems could occur ]
It may break xe driver GT idle/power management, affects suspend/resume path. Commit is from mainline v6.18-rc1.
+
+ The dGPU is not fully certified on v6.8 kernel, so SRU for questing and
+ oem-6.17.

** Changed in: linux (Ubuntu Questing)
Status: New => In Progress

** Changed in: linux (Ubuntu Resolute)
Status: New => Fix Released

** Changed in: linux-oem-6.17 (Ubuntu Questing)
Status: New => Invalid

** Changed in: linux-oem-6.17 (Ubuntu Resolute)
Status: New => Invalid

** Changed in: linux-oem-6.17 (Ubuntu Noble)
Status: New => In Progress

** Changed in: hwe-next
Assignee: (unassigned) => AaronMa (mapengyu)

** Changed in: hwe-next
Importance: Undecided => Medium

** Changed in: linux (Ubuntu Questing)
Importance: Undecided => Medium

** Changed in: linux-oem-6.17 (Ubuntu Noble)
Importance: Undecided => Medium

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2141377

Title:
[SRU]Fix xe GPU suspend/resume crash on Battlemage

Status in HWE Next:
New
Status in linux package in Ubuntu:
Fix Released
Status in linux-oem-6.17 package in Ubuntu:
Invalid
Status in linux source package in Noble:
Won't Fix
Status in linux-oem-6.17 source package in Noble:
In Progress
Status in linux source package in Questing:
In Progress
Status in linux-oem-6.17 source package in Questing:
Invalid
Status in linux source package in Resolute:
Fix Released
Status in linux-oem-6.17 source package in Resolute:
Invalid

Bug description:
[ Impact ]
Intel Battlemage xe GPU (8086:e212) crashes during system resume with
NULL pointer dereference in xe_guc_ads_populate_post_load(), making
suspend/resume non-functional on affected systems.

Root cause: Noble 6.17 kernels have commit 59cebf0bdff48 but are missing
its prerequisite commit 1313351e71181. Without proper forcewake handling,
MMIO register access causes hardware corruption.

[ Fix ]
Cherry-pick upstream commit from v6.18-rc1:
- 1313351e71181 ("drm/xe: make xe_gt_idle_disable_c6() handle the forcewake internally")

[ Test Plan ]
1. System: Lenovo ThinkStation P3 Ultra G2 with Battlemage dGPU (8086:e212)
2. Reproduce: rtcwake -m mem -s 10
3. Verify: System resumes successfully without crashes

[ Where problems could occur ]
It may break xe driver GT idle/power management, affects suspend/resume path. Commit is from mainline v6.18-rc1.

The dGPU is not fully certified on v6.8 kernel, so SRU for questing
and oem-6.17.

To manage notifications about this bug go to:
https://bugs.launchpad.net/hwe-next/+bug/2141377/+subscriptions

Комментариев нет:

Отправить комментарий