Public bug reported: # Bug report: kernel 7.0 hangs on s2idle resume with nvidia-open on RTX 50 series (Blackwell) ## Summary Lid-close suspend (s2idle) on Ubuntu 26.04 with kernel 7.0.0-14-generic and NVIDIA RTX 5070 Max-Q using the nvidia-open driver never resumes. Requires a hard reset. The same machine resumes cleanly on kernel 6.17.0-22-generic with the exact same nvidia-open driver and modprobe configuration. Suspend also works cleanly on kernel 7.0 itself if all nvidia kernel modules are blacklisted — so the regression is in the interaction between the nvidia-open PCI `.suspend` callback (or code it invokes) and changes in the 7.0 kernel. This is kernel-vs-driver, not a driver-version issue: both nvidia-driver-580-open (580.142) and nvidia-driver-595-open (595.58.03) reproduce. ## Hardware - Framework Laptop 16, AMD Ryzen 7940HS ("Phoenix") - Discrete GPU: NVIDIA GeForce RTX 5070 Max-Q (GB206M, Blackwell) — PCI 0000:01:00.0 - GPU HDA function: 0000:01:00.1 - Integrated GPU: AMD Phoenix1 — PCI 0000:c3:00.0 (drives the internal display via amdgpu) - Platform only supports s2idle (`cat /sys/power/mem_sleep` → `[s2idle]`; no `deep`) ## Software - Ubuntu 26.04 "resolute" (beta) - Affected kernel: `linux-image-7.0.0-14-generic` (7.0.0-14.14) - Working kernel: `linux-image-6.17.0-22-generic` - Drivers tested: `nvidia-driver-580-open` (580.142-0ubuntu\*) and `nvidia-driver-595-open` (595.58.03-0ubuntu2) — both reproduce - modprobe config: distro defaults only (no custom overrides) ## Steps to reproduce 1. Install `nvidia-driver-595-open` (or 580-open) on Ubuntu 26.04. 2. Boot `linux-image-7.0.0-14-generic`. 3. Log in to GNOME (Wayland, amdgpu drives internal display, nvidia modules loaded for CUDA / offload). 4. Close the lid. 5. Wait 10 seconds, open the lid. ## Expected Normal s2idle resume: display wakes, session continues. ## Actual Display never wakes; machine is unresponsive. Hard reset (long power button) is the only recovery. `journalctl -b -1 -k` of the failed boot ends with the suspend-cascade line below, and `PM: suspend entry (s2idle)` is *not* printed — which the kernel emits only after all device-suspend callbacks return: ``` kernel: nvidia 0000:01:00.0: Enabling HDA controller (then journal ends — nothing after) ``` On kernel 6.17.0-22-generic (same system, same nvidia-open driver, same modprobe.d contents), the equivalent sequence logs: ``` kernel: nvidia 0000:01:00.0: Enabling HDA controller kernel: PM: suspend entry (s2idle) kernel: Filesystems sync: 0.009 seconds kernel: Freezing user space processes ... normal suspend flow ... kernel: PM: resume from suspend-to-idle kernel: PM: resume of devices complete after 330.827 msecs kernel: PM: suspend exit ``` ## Isolation | Kernel | nvidia loaded | Result | |---|---|---| | 6.17.0-22-generic | yes | works | | 6.17.0-22-generic | no | works | | 7.0.0-14-generic | yes | hangs | | 7.0.0-14-generic | no (`module_blacklist=nvidia_drm,nvidia_modeset,nvidia_uvm,nvidia`) | **works** | Blacklisting all nvidia kernel modules on 7.0 produces a clean `PM: suspend entry (s2idle)` → `PM: resume from suspend-to-idle` → `PM: suspend exit` sequence (verified by journal). This isolates the regression to the nvidia-open PCI suspend path, triggered by something that changed between 6.17 and 7.0. ## Workarounds that did NOT help - `module_blacklist=thunderbolt` - Unloading `btusb` at runtime / blacklisting it at boot - udev rule: `ATTR{d3cold_allowed}="0"` on both `0000:01:00.0` and `0000:01:00.1`. This moved the hang point past the `Enabling HDA controller` message to after `PM: suspend entry (s2idle)`, but did not prevent the hang. - `options nvidia NVreg_DynamicPowerManagement=0` — broke suspend on 6.17 as well. **Do not try this.** - `options nvidia NVreg_PreserveVideoMemoryAllocations=0 NVreg_EnableS0ixPowerManagement=1` (no effect on 7.0; proven unnecessary on 6.17 too) - `no_console_suspend` + `pm_debug_messages` cmdline flags (did not surface additional kernel output — kmsg → journald pipeline is frozen along with the system; nothing is flushed to disk after the hang) ## Related upstream reports NVIDIA developer forums, "Ubuntu 26.04: suspend resumes to black screen" — same software stack (kernel 7.0.0, RTX 5070 Max-Q, nvidia-open 595.58.03), identical symptoms, reported on a Lenovo T1g Gen 8 (Intel CPU). Confirms the regression is not Framework-specific — it tracks the GPU + OS + kernel: https://forums.developer.nvidia.com/t/ubuntu-26-04-suspend-resumes-to-black-screen/365785 ## Diagnostic artifacts Available on request: - dmesg from failing 7.0 boot (nvidia loaded) - dmesg from successful 7.0 boot (nvidia blacklisted) - dmesg from successful 6.17 boot - `journalctl --list-boots` with annotated suspend results - `lspci -vvv` (GPU + HDA functions) - `cat /proc/driver/nvidia/params` on 6.17 while working ProblemType: Bug DistroRelease: Ubuntu 26.04 Package: linux-image-7.0.0-14-generic 7.0.0-14.14 ProcVersionSignature: Ubuntu 7.0.0-14.14-generic 7.0.0 Uname: Linux 7.0.0-14-generic x86_64 ApportVersion: 2.34.0-0ubuntu2 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC2: andrei 6048 F.... wireplumber /dev/snd/controlC0: andrei 6048 F.... wireplumber /dev/snd/controlC1: andrei 6048 F.... wireplumber /dev/snd/seq: andrei 6024 F.... pipewire CasperMD5CheckResult: pass CurrentDesktop: ubuntu:GNOME Date: Wed Apr 22 11:39:12 2026 InstallationDate: Installed on 2025-01-18 (459 days ago) InstallationMedia: Ubuntu 24.10 "Oracular Oriole" - Release amd64 (20241009.4) MachineType: Framework Laptop 16 (AMD Ryzen 7040 Series) ProcEnviron: LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/bin/bash TERM=xterm-256color XDG_RUNTIME_DIR=<set> ProcFB: 0 amdgpudrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-7.0.0-14-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro quiet splash crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M SourcePackage: linux UpgradeStatus: Upgraded to resolute on 2026-04-21 (1 days ago) dmi.bios.date: 11/13/2025 dmi.bios.release: 4.2 dmi.bios.vendor: INSYDE Corp. dmi.bios.version: 04.02 dmi.board.asset.tag: * dmi.board.name: FRANMZCP09 dmi.board.vendor: Framework dmi.board.version: A9 dmi.chassis.asset.tag: FRAGACCPAJ4386000W dmi.chassis.type: 10 dmi.chassis.vendor: Framework dmi.chassis.version: AJ dmi.modalias: dmi:bvnINSYDECorp.:bvr04.02:bd11/13/2025:br4.2:svnFramework:pnLaptop16(AMDRyzen7040Series):pvrAJ:rvnFramework:rnFRANMZCP09:rvrA9:cvnFramework:ct10:cvrAJ:skuFRAGACCP0J:pfa16inLaptop: dmi.product.family: 16in Laptop dmi.product.name: Laptop 16 (AMD Ryzen 7040 Series) dmi.product.sku: FRAGACCP0J dmi.product.version: AJ dmi.sys.vendor: Framework ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Tags: amd64 apport-bug resolute wayland-session -- You received this bug notification because you are subscribed to linux in Ubuntu. Matching subscriptions: Bgg, Bmail, Nb https://bugs.launchpad.net/bugs/2149963 Title: kernel 7.0.0-14 hangs on s2idle resume with nvidia-open on RTX 50-series (Blackwell); 6.17.0-22 works on same hw Status in linux package in Ubuntu: New Bug description: # Bug report: kernel 7.0 hangs on s2idle resume with nvidia-open on RTX 50 series (Blackwell) ## Summary Lid-close suspend (s2idle) on Ubuntu 26.04 with kernel 7.0.0-14-generic and NVIDIA RTX 5070 Max-Q using the nvidia-open driver never resumes. Requires a hard reset. The same machine resumes cleanly on kernel 6.17.0-22-generic with the exact same nvidia-open driver and modprobe configuration. Suspend also works cleanly on kernel 7.0 itself if all nvidia kernel modules are blacklisted — so the regression is in the interaction between the nvidia-open PCI `.suspend` callback (or code it invokes) and changes in the 7.0 kernel. This is kernel-vs-driver, not a driver-version issue: both nvidia-driver-580-open (580.142) and nvidia-driver-595-open (595.58.03) reproduce. ## Hardware - Framework Laptop 16, AMD Ryzen 7940HS ("Phoenix") - Discrete GPU: NVIDIA GeForce RTX 5070 Max-Q (GB206M, Blackwell) — PCI 0000:01:00.0 - GPU HDA function: 0000:01:00.1 - Integrated GPU: AMD Phoenix1 — PCI 0000:c3:00.0 (drives the internal display via amdgpu) - Platform only supports s2idle (`cat /sys/power/mem_sleep` → `[s2idle]`; no `deep`) ## Software - Ubuntu 26.04 "resolute" (beta) - Affected kernel: `linux-image-7.0.0-14-generic` (7.0.0-14.14) - Working kernel: `linux-image-6.17.0-22-generic` - Drivers tested: `nvidia-driver-580-open` (580.142-0ubuntu\*) and `nvidia-driver-595-open` (595.58.03-0ubuntu2) — both reproduce - modprobe config: distro defaults only (no custom overrides) ## Steps to reproduce 1. Install `nvidia-driver-595-open` (or 580-open) on Ubuntu 26.04. 2. Boot `linux-image-7.0.0-14-generic`. 3. Log in to GNOME (Wayland, amdgpu drives internal display, nvidia modules loaded for CUDA / offload). 4. Close the lid. 5. Wait 10 seconds, open the lid. ## Expected Normal s2idle resume: display wakes, session continues. ## Actual Display never wakes; machine is unresponsive. Hard reset (long power button) is the only recovery. `journalctl -b -1 -k` of the failed boot ends with the suspend-cascade line below, and `PM: suspend entry (s2idle)` is *not* printed — which the kernel emits only after all device-suspend callbacks return: ``` kernel: nvidia 0000:01:00.0: Enabling HDA controller (then journal ends — nothing after) ``` On kernel 6.17.0-22-generic (same system, same nvidia-open driver, same modprobe.d contents), the equivalent sequence logs: ``` kernel: nvidia 0000:01:00.0: Enabling HDA controller kernel: PM: suspend entry (s2idle) kernel: Filesystems sync: 0.009 seconds kernel: Freezing user space processes ... normal suspend flow ... kernel: PM: resume from suspend-to-idle kernel: PM: resume of devices complete after 330.827 msecs kernel: PM: suspend exit ``` ## Isolation | Kernel | nvidia loaded | Result | |---|---|---| | 6.17.0-22-generic | yes | works | | 6.17.0-22-generic | no | works | | 7.0.0-14-generic | yes | hangs | | 7.0.0-14-generic | no (`module_blacklist=nvidia_drm,nvidia_modeset,nvidia_uvm,nvidia`) | **works** | Blacklisting all nvidia kernel modules on 7.0 produces a clean `PM: suspend entry (s2idle)` → `PM: resume from suspend-to-idle` → `PM: suspend exit` sequence (verified by journal). This isolates the regression to the nvidia-open PCI suspend path, triggered by something that changed between 6.17 and 7.0. ## Workarounds that did NOT help - `module_blacklist=thunderbolt` - Unloading `btusb` at runtime / blacklisting it at boot - udev rule: `ATTR{d3cold_allowed}="0"` on both `0000:01:00.0` and `0000:01:00.1`. This moved the hang point past the `Enabling HDA controller` message to after `PM: suspend entry (s2idle)`, but did not prevent the hang. - `options nvidia NVreg_DynamicPowerManagement=0` — broke suspend on 6.17 as well. **Do not try this.** - `options nvidia NVreg_PreserveVideoMemoryAllocations=0 NVreg_EnableS0ixPowerManagement=1` (no effect on 7.0; proven unnecessary on 6.17 too) - `no_console_suspend` + `pm_debug_messages` cmdline flags (did not surface additional kernel output — kmsg → journald pipeline is frozen along with the system; nothing is flushed to disk after the hang) ## Related upstream reports NVIDIA developer forums, "Ubuntu 26.04: suspend resumes to black screen" — same software stack (kernel 7.0.0, RTX 5070 Max-Q, nvidia-open 595.58.03), identical symptoms, reported on a Lenovo T1g Gen 8 (Intel CPU). Confirms the regression is not Framework-specific — it tracks the GPU + OS + kernel: https://forums.developer.nvidia.com/t/ubuntu-26-04-suspend-resumes-to-black-screen/365785 ## Diagnostic artifacts Available on request: - dmesg from failing 7.0 boot (nvidia loaded) - dmesg from successful 7.0 boot (nvidia blacklisted) - dmesg from successful 6.17 boot - `journalctl --list-boots` with annotated suspend results - `lspci -vvv` (GPU + HDA functions) - `cat /proc/driver/nvidia/params` on 6.17 while working ProblemType: Bug DistroRelease: Ubuntu 26.04 Package: linux-image-7.0.0-14-generic 7.0.0-14.14 ProcVersionSignature: Ubuntu 7.0.0-14.14-generic 7.0.0 Uname: Linux 7.0.0-14-generic x86_64 ApportVersion: 2.34.0-0ubuntu2 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC2: andrei 6048 F.... wireplumber /dev/snd/controlC0: andrei 6048 F.... wireplumber /dev/snd/controlC1: andrei 6048 F.... wireplumber /dev/snd/seq: andrei 6024 F.... pipewire CasperMD5CheckResult: pass CurrentDesktop: ubuntu:GNOME Date: Wed Apr 22 11:39:12 2026 InstallationDate: Installed on 2025-01-18 (459 days ago) InstallationMedia: Ubuntu 24.10 "Oracular Oriole" - Release amd64 (20241009.4) MachineType: Framework Laptop 16 (AMD Ryzen 7040 Series) ProcEnviron: LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/bin/bash TERM=xterm-256color XDG_RUNTIME_DIR=<set> ProcFB: 0 amdgpudrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-7.0.0-14-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro quiet splash crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M SourcePackage: linux UpgradeStatus: Upgraded to resolute on 2026-04-21 (1 days ago) dmi.bios.date: 11/13/2025 dmi.bios.release: 4.2 dmi.bios.vendor: INSYDE Corp. dmi.bios.version: 04.02 dmi.board.asset.tag: * dmi.board.name: FRANMZCP09 dmi.board.vendor: Framework dmi.board.version: A9 dmi.chassis.asset.tag: FRAGACCPAJ4386000W dmi.chassis.type: 10 dmi.chassis.vendor: Framework dmi.chassis.version: AJ dmi.modalias: dmi:bvnINSYDECorp.:bvr04.02:bd11/13/2025:br4.2:svnFramework:pnLaptop16(AMDRyzen7040Series):pvrAJ:rvnFramework:rnFRANMZCP09:rvrA9:cvnFramework:ct10:cvrAJ:skuFRAGACCP0J:pfa16inLaptop: dmi.product.family: 16in Laptop dmi.product.name: Laptop 16 (AMD Ryzen 7040 Series) dmi.product.sku: FRAGACCP0J dmi.product.version: AJ dmi.sys.vendor: Framework To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2149963/+subscriptions
Комментариев нет:
Отправить комментарий