среда

[Bug 2148761] Re: xhci_hcd "Controller not ready at resume -19" hard system hang on Zen 4 (Raphael) — runtime PM resume failure on 12:00.x PCIe complex

** Tags added: kernel-daily-bug -- You received this bug notification because you are subscribed to linux in Ubuntu. Matching subscriptions: Bgg, Bmail, Nb https://bugs.launchpad.net/bugs/2148761 Title: xhci_hcd "Controller not ready at resume -19" hard system hang on Zen 4 (Raphael) — runtime PM resume failure on 12:00.x PCIe complex Status in linux package in Ubuntu: New Bug description: ## System - Ubuntu 24.04 - Kernel: first known broken 6.8.0-106-generic (large update in early March 2026 skipped several versions; last known good version unknown) - CPU: AMD Ryzen 9 7950X3D (Zen 4, Raphael) - GPU: NVIDIA RTX 4090 (primary), AMD Raphael integrated GPU present (0000:12:00.0, device 0x164e) but not used for display - Motherboard: ASUS TUF GAMING X670E-PLUS WIFI - Xorg (not Wayland) ## Symptom Hard system hang — instant black screen, no kernel panic, no OOPS, machine reboots as if reset button was pressed. No kernel log output from the crash itself. Timing is random — anywhere from under a minute to several hours after trigger. Machine can crash after the triggering application has already exited. ## Dmesg evidence (captured via remote dmesg streaming) Last kernel messages before every hang are consistently: xhci_hcd 0000:12:00.3: Controller not ready at resume -19 xhci_hcd 0000:12:00.3: PCI post-resume error -19! xhci_hcd 0000:12:00.3: HC died; cleaning up or the same sequence on 0000:12:00.4. Both are USB controllers that are functions of the Raphael APU PCIe device (0000:12:00.x complex, which also includes the iGPU, HD audio, and PSP/crypto engine). At time of crash, these controllers show: power/control: auto runtime_enabled: enabled runtime_status: suspended runtime_suspended_time: ~657000ms (suspended for almost entire uptime) ## Trigger In this case the crash was consistently triggered by running WiVRn OpenXR server (https://github.com/WiVRn/WiVRn), which is a long-running Vulkan application. However WiVRn is likely just making the crash happen faster — see similar report below with no VR software involved at all. Note: crash timing is non-deterministic. It can occur while WiVRn is running, after WiVRn has exited, and even with no Quest headset connected. General desktop use alone is sufficient to eventually trigger it per the similar report below. ## Similar report Identical dmesg signature reported on Fedora 43 (December 2025) on a pure AMD system (Ryzen 7 9700X + RX 9070 XT) with no VR software, triggered by general desktop use: https://discussion.fedoraproject.org/t/constant-random-crashing-unable-to-identify-cause/177192 This suggests the bug is not specific to NVIDIA+AMD configurations, not specific to VR software, and is likely an upstream kernel regression affecting Zen 4 systems broadly. ## Workaround Preventing runtime PM suspend of the Raphael PCIe complex: for dev in /sys/bus/pci/devices/0000:12:00.{1,3,4,6}/power/control; do echo on | sudo tee $dev done This appears stable across extended sessions. Note: addresses symptom only, does not fix the underlying resume failure. The Fedora user resolved it with kernel boot parameters: pcie_port_pm=off usbcore.autosuspend=-1 iommu=pt (broader in scope — disables PCIe port PM and USB autosuspend globally) ## What is NOT known - Exactly which kernel version introduced the regression (skipped several versions in a large update) - Whether WiVRn/Vulkan/amdgpu initialization of the iGPU plays any role in triggering the PM instability, or whether it would crash eventually from general desktop use alone (not yet tested) - Which of the three Fedora boot parameters is actually necessary To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2148761/+subscriptions

Комментариев нет:

Отправить комментарий