Likely same root cause as bug #2150605 (HP ZBook Fury G1i, Arrow Lake-S). Same i915 Failed to bring PHY A to idle / PHY A Read 0c70 / DPLL hw state mismatch signature, same code path in intel_cx0_phy.c, same Ubuntu 26.04 / Wayland / Blackwell-hybrid environment, same ineffective workarounds (i915.enable_psr=0 converts hang to slow recover but doesn't fix). Key differences in my case: (1) Arrow Lake-P variant rather than Arrow Lake-S, (2) bug triggers on every s2idle resume regardless of dwell time, not only long dwells, (3) recovery time 30–75s rather than 5–10s, and (4) occasional silent hard freezes ~minute after a seemingly-clean resume (see "Silent freeze case" in the description). Ubuntu kernel team to decide formal duplicate status. -- You received this bug notification because you are subscribed to linux in Ubuntu. Matching subscriptions: Bgg, Bmail, Nb https://bugs.launchpad.net/bugs/2154646 Title: i915 Arrow Lake: PHY A powerdown failure + DPLL hw state mismatch on s2idle resume — recovers most cycles, hard-freezes some (Dell Pro Max 16, kernel 7.0.0-22) Status in linux package in Ubuntu: New Bug description: # i915 Arrow Lake: PHY A powerdown failure + DPLL hw state mismatch on s2idle resume — recovers most cycles, hard-freezes some (Dell Pro Max 16, kernel 7.0.0-22) ## Summary Every s2idle suspend/resume cycle triggers an identical i915 failure on the internal display (eDP-1). The bug reproduces 100% of the time, independent of external monitors, dock state, or session activity. Most resume cycles now recover after 30–75 seconds of degraded display state via timeout-based fallback paths in i915. However, **some resume cycles produce a complete, silent system freeze**: the desktop remains visible but is entirely unresponsive (no keyboard, mouse, or remote SSH), the kernel log stops with no further entries, no soft lockup watchdog trace fires, and only a power button hold recovers the system. The proportion of cycles that result in a silent freeze vs. a noisy recovery is non-deterministic. On earlier kernels (6.17.0-1017-oem, 7.0.0-15-generic) the same bug presented as a soft lockup in `nvidia-modeset` blocked downstream of the stuck i915 display state, with a visible watchdog trace in dmesg. As of 7.0.0-22 the soft-lockup path no longer fires — but the underlying race is unchanged, and now manifests as a silent hang of the entire graphics pipeline when it does cause a freeze. ## Hardware - **Machine**: Dell Pro Max 16 Premium (MA16250), BIOS 1.9.0 (2026-03-31) — latest available from Dell - **iGPU**: Intel Arrow Lake-P / Arc Pro 130T (rev 03) at `0000:00:02.0` - **dGPU**: NVIDIA RTX PRO 2000 Blackwell (Open kernel modules, driver 595.71.05) — verified not involved in this trace - **Internal display**: eDP-1 (CONNECTOR:507), driven via PHY A / DPLL 0 / pipe A / PLANE:34 - **RAM**: 61 GiB ## Software - **OS**: Ubuntu 26.04 LTS - **Kernel**: `7.0.0-22-generic` (`linux-image-7.0.0-22-generic`) - **Session**: Wayland (GNOME) - **Suspend mode**: `s2idle` only — BIOS does not expose `deep`/S3 (`cat /sys/power/mem_sleep` returns `[s2idle]`) - **Kernel command line**: `quiet splash i915.enable_psr=0 nvidia-drm.modeset=1` ## Steps to reproduce 1. Boot the system normally (any user activity, GNOME Wayland session) 2. `systemctl suspend` 3. Wake via keyboard/lid within ~30 seconds 4. Observe dmesg ## Expected behavior Suspend/resume completes within a few seconds, no display errors, dmesg clean. ## Actual behavior Resume takes 30–75 seconds during which the internal display is in a glitched/unresponsive state. dmesg shows the identical sequence on every cycle: ``` i915 0000:00:02.0: [drm] *ERROR* Failed to bring PHY A to idle. i915 0000:00:02.0: [drm] *ERROR* PHY A Read 0c70 failed after 3 retries. i915 0000:00:02.0: [drm] *ERROR* PHY A Write 0c70 failed after 3 retries. i915 0000:00:02.0: [drm] *ERROR* [CRTC:150:pipe A] flip_done timed out i915 0000:00:02.0: [drm] *ERROR* [CRTC:150:pipe A] mismatch in dpll_hw_state i915 0000:00:02.0: [drm] DPLL 0: pll hw state mismatch ------------[ cut here ]------------ WARNING: drivers/gpu/drm/i915/display/intel_dpll_mgr.c:4945 at verify_single_dpll_state+0x2a6/0x6b0 [i915] RIP: 0010:verify_single_dpll_state+0x2b3/0x6b0 [i915] Call Trace: intel_dpll_state_verify+0x63/0x260 [i915] intel_modeset_verify_crtc+0x5a/0xb0 [i915] intel_atomic_commit_tail+0x8a3/0xc50 [i915] intel_atomic_commit+0x28e/0x2e0 [i915] drm_atomic_commit+0xad/0xf0 drm_atomic_helper_commit_duplicated_state+0xfe/0x120 __intel_display_driver_resume+0xb8/0x130 [i915] intel_display_driver_resume+0xc7/0x140 [i915] i915_drm_resume+0x147/0x1e0 [i915] i915_pm_resume+0x1b/0x30 [i915] pci_pm_resume+0x8c/0x140 dpm_run_callback+0x5f/0x180 device_resume+0x177/0x270 async_resume+0x21/0x40 ---[ end trace 0000000000000000 ]--- i915 0000:00:02.0: [drm] *ERROR* flip_done timed out i915 0000:00:02.0: [drm] *ERROR* [CRTC:150:pipe A] commit wait timed out i915 0000:00:02.0: [drm] *ERROR* flip_done timed out i915 0000:00:02.0: [drm] *ERROR* [CONNECTOR:507:eDP-1] commit wait timed out i915 0000:00:02.0: [drm] *ERROR* flip_done timed out i915 0000:00:02.0: [drm] *ERROR* [PLANE:34:plane 1A] commit wait timed out [further trace through intel_dp_retrain_link → intel_dp_link_check → intel_encoder_link_check_work_fn] i915 0000:00:02.0: [drm] PHY A failed to change powerdown state ``` ## Test matrix (all produced the identical trace) | # | Dock state | Outcome | |---|------------|---------| | 1 | Dell Thunderbolt dock attached throughout | recovered, 33s | | 2 | Dock attached throughout (different cycle, same boot) | recovered, 74s | | 3 | Dock attached at suspend, disconnected before wake | recovered, 74s | | 4 | No dock attached at any time | recovered, 32s | Conclusion: dock state is irrelevant to the recovery cases. Bug is internal to i915 handling of eDP-1 / PHY A during s2idle resume. ## Silent freeze case (2026-05-30, kernel 7.0.0-22-generic, docked) A separate boot exhibited four consecutive recoverable resume cycles followed by a fifth cycle that hard-froze. Timeline from `journalctl -k -b -3`: ``` 10:31:54 suspend 1: visible PHY A errors → recovered 10:43:44 suspend 2: visible PHY A errors → recovered 11:05:47 suspend 3: visible PHY A errors → recovered 11:17:08 suspend 4: visible PHY A errors → recovered 11:30:54 suspend 5: PM: suspend exit @ 11:30:55 (no PHY A errors logged) 11:30:56 evdi reconnects, displays reinitialize 11:31:02 WiFi reassociates, DisplayLink reconnects 11:31:05 usb 3-9: reset full-speed USB device number 7 using xhci_hcd 11:31:53 cgroup: Unknown subsys name 'memory' ← LAST KERNEL LOG ENTRY ??? HARD FREEZE — no further entries, no watchdog trace ``` Operational symptoms during the freeze: - Desktop on internal display remained visible — last frame held, not black - Keyboard, mouse, and trackpad completely unresponsive - Lid state at freeze: undetermined (user docked, internal display in use) - Recovery required holding the power button (no other path; SSH/SysRq not tried but kernel logging had stopped, suggesting they would not have worked) - Boot session ended at uptime ~5500s (boot at 10:00, freeze at 11:31) This freeze was preceded by a resume that *appeared* clean in dmesg — no PHY A errors logged before the hang. The previous four resume cycles in the same boot all triggered the standard PHY A / DPLL trace but recovered. This suggests the same underlying race can either (a) trigger visible recovery, or (b) lock the entire display pipeline silently — the difference is not yet understood from available logs. The kernel `watchdog: BUG: soft lockup` mechanism does **not** fire for this failure mode. Earlier kernels (6.17.0-1017-oem, 7.0.0-15-generic) produced a visible soft lockup trace in `nvidia- modeset` after ~22 seconds in cases where i915 hung — that downstream effect appears to have been mitigated in 7.0.0-22, but the upstream i915 deadlock that caused those lockups still exists and now hangs silently when it triggers. ## Workarounds attempted (no effect on the bug) - NVIDIA driver upgrade `595.58.03` → `595.71.05` - Kernel upgrade `6.17.0-1017-oem` → `7.0.0-15-generic` → `7.0.0-22-generic` - BIOS upgrade `1.8.1` → `1.9.0` - Dell dock firmware updates (MST VMM9 9.03→9.04, PD 1.34→1.37, WD25TB5 1.0.9→1.0.10, USB4v2 controller, dock package) - Kernel command line already has `i915.enable_psr=0` ## Behavior across kernel versions - **6.17.0-1017-oem / 7.0.0-15-generic**: Bug fired on most s2idle resumes and caused hard system freezes. Soft lockup watchdog visibly triggered in `nvidia-modeset` thread blocked downstream of the stuck i915 state. System unrecoverable without power button reset. Frequency: multiple per week under normal use. - **7.0.0-22-generic**: Bug still fires on every s2idle resume (visible PHY A / DPLL trace in dmesg). i915 timeout-based recovery paths handle most cycles, leaving the system in a degraded display state for 30–75 seconds before continuing. However, occasional cycles result in a complete silent system freeze (see "Silent freeze case" section above) — desktop visible but unresponsive, no kernel trace, no watchdog firing. Hard freeze frequency: at least one per week of normal use with the current workaround in place to avoid most suspends. Without the workaround (allowing GNOME idle suspend), prior measurements suggest higher frequency. The 7.0.0-22 change has eliminated the *visible* soft lockup symptom, but has not fixed the underlying race in PHY A / DPLL state validation during s2idle resume. ## Workaround currently in use Avoid s2idle suspend entirely: ``` # /etc/systemd/logind.conf.d/no-suspend.conf [Login] HandleLidSwitch=lock HandleLidSwitchDocked=ignore HandleLidSwitchExternalPower=lock ``` ``` gsettings set org.gnome.settings-daemon.plugins.power sleep-inactive-ac-type 'nothing' gsettings set org.gnome.settings-daemon.plugins.power sleep-inactive-battery-type 'nothing' ``` System runs indefinitely with display blanked when idle. No bug triggered. ProblemType: Bug DistroRelease: Ubuntu 26.04 Package: linux-image-7.0.0-22-generic 7.0.0-22.22 ProcVersionSignature: Ubuntu 7.0.0-22.22-generic 7.0.0 Uname: Linux 7.0.0-22-generic x86_64 ApportVersion: 2.34.0-0ubuntu2 Architecture: amd64 CasperMD5CheckMismatches: ./casper/initrd ./casper/minimal.standard.live.hotfix.manifest ./casper/minimal.standard.live.hotfix.size ./casper/minimal.standard.live.size ./casper/minimal.manifest ./casper/minimal.standard.manifest ./casper/minimal.standard.size ./casper/minimal.hotfix.size ./casper/minimal.standard.live.hotfix.squashfs ./casper/minimal.standard.hotfix.squashfs ./casper/minimal.standard.hotfix.size ./casper/minimal.hotfix.squashfs ./casper/minimal.standard.live.manifest ./casper/minimal.size ./boot/grub/grub.cfg CasperMD5CheckResult: fail CurrentDesktop: ubuntu:GNOME Date: Sat May 30 20:33:57 2026 DistributionChannelDescriptor: # This is the distribution channel descriptor for Ubuntu 24.04 for Dell # For more information see http://wiki.ubuntu.com/DistributionChannelDescriptor canonical-oem-somerville-noble-oem-24.04c-20251113-97 InstallationDate: Installed on 2026-03-04 (87 days ago) InstallationMedia: Ubuntu OEM 24.04.3 LTS "Noble Numbat" - Release amd64 (20251111) MachineType: Dell Inc. Dell Pro Max 16 Premium MA16250 ProcEnviron: LANG=C.UTF-8 PATH=(custom, no user) SHELL=/usr/bin/zsh TERM=tmux-256color XDG_RUNTIME_DIR=<set> ProcFB: 0 i915drmfb 1 nvidia-drmdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-7.0.0-22-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro quiet splash i915.enable_psr=0 nvidia-drm.modeset=1 ipv6.disable=1 crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 03/31/2026 dmi.bios.release: 1.9 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.9.0 dmi.board.name: 061K67 dmi.board.vendor: Dell Inc. dmi.board.version: A02 dmi.chassis.type: 10 dmi.chassis.vendor: Dell Inc. dmi.ec.firmware.release: 1.8 dmi.modalias: dmi:bvnDellInc.:bvr1.9.0:bd03/31/2026:br1.9:efr1.8:svnDellInc.:pnDellProMax16PremiumMA16250:pvr:rvnDellInc.:rn061K67:rvrA02:cvnDellInc.:ct10:cvr:sku0D33:pfaDellProMaxLaptops: dmi.product.family: Dell Pro Max Laptops dmi.product.name: Dell Pro Max 16 Premium MA16250 dmi.product.sku: 0D33 dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2154646/+subscriptions
Комментариев нет:
Отправить комментарий