четверг

[Bug 2153976] [NEW] Fix graceful fault handling after FPU softirq changes causes hard freeze on EFI runtime calls

Public bug reported: [Impact] Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088, 02511-38091, 202511-38068, 202511-38069) with 6.17 causes a hardsystem freeze. The machine becomes unreachable by ping/SSH and requires a hardpower cycle. Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count during EFI runtime calls, making in_interrupt() return true in normal task context. The EFI graceful page fault handler efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails out, leaving firmware page faults unhandled. This escalates to die() which also sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), freezing the system. [Fix] Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). This preserves the original intent of bailing for interrupts or NMI faults, while no longer falsely triggering from the FPU code path's local_bh_disable(). [Test Plan] 1. Boot affected HP machine with the patched 6.17-oem kernel. 2. Run:    $ sudo fwts uefirttime Without patch: system hard-freezes, requires power cycle. With patch: fwts completes (pass or fail), system remains responsive. [Where problems could occur] The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). If !in_task() incorrectly identifies a real interrupt-context fault as task context, the handler would try to process it as an EFI firmware fault instead of letting the normal oops path handle it. This could mask real kernel bugs during interrupt-context EFI faults, though such faults are extremely rare. Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the fpregs_lock() call could cause a page fault that gets misidentified. The use of !in_task() (which incorporates in_serving_softirq()) handles this window correctly. ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Affects: linux-oem-6.17 (Ubuntu) Importance: Undecided Status: New ** Affects: linux (Ubuntu Noble) Importance: Undecided Status: Invalid ** Affects: linux-oem-6.17 (Ubuntu Noble) Importance: Undecided Status: New ** Affects: linux (Ubuntu Resolute) Importance: Undecided Status: New ** Affects: linux-oem-6.17 (Ubuntu Resolute) Importance: Undecided Status: Invalid ** Description changed: - [Impact] - Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088, - 202511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a hard - system freeze. The machine becomes unreachable by ping/SSH and requires a hard - power cycle. - - Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making - kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use - local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in - preempt_count during EFI runtime calls, making in_interrupt() return true in - normal task context. The EFI graceful page fault handler - efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults - in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails - out, leaving firmware page faults unhandled. This escalates to die() which also - sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), - freezing the system. - - [Fix] - Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). - This preserves the original intent of bailing for interrupts or NMI faults, - while no longer falsely triggering from the FPU code path's local_bh_disable(). - - - [Test Plan] - 1. Boot affected HP machine with the patched 6.17-oem kernel. - 2. Run: - $ sudo fwts uefirttime - - Without patch: system hard-freezes, requires power cycle. - With patch: fwts completes (pass or fail), system remains responsive. - - [Where problems could occur] - The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). - - If !in_task() incorrectly identifies a real interrupt-context fault as task - context, the handler would try to process it as an EFI firmware fault instead - of letting the normal oops path handle it. This could mask real kernel bugs - during interrupt-context EFI faults, though such faults are extremely rare. - - Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the - fpregs_lock() call could cause a page fault that gets misidentified. The use of - !in_task() (which incorporates in_serving_softirq()) handles this window - correctly. - - [Other Info] - Upstream commit: 088f65e206087bf903743bd18417261d7a4c9644 - Fixes: d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") SRU for oem-6.17. + [Impact] +    Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088, +    202511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a hard +    system freeze. The machine becomes unreachable by ping/SSH and requires a hard +    power cycle. + +    Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making +    kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use +    local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in +    preempt_count during EFI runtime calls, making in_interrupt() return true in +    normal task context. The EFI graceful page fault handler +    efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults +    in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails +    out, leaving firmware page faults unhandled. This escalates to die() which also +    sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), +    freezing the system. + +    [Fix] +    Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). +    This preserves the original intent of bailing for interrupts or NMI faults, +    while no longer falsely triggering from the FPU code path's local_bh_disable(). + +    [Test Plan] +    1. Boot affected HP machine with the patched 6.17-oem kernel. +    2. Run: +       $ sudo fwts uefirttime + +    Without patch: system hard-freezes, requires power cycle. +    With patch: fwts completes (pass or fail), system remains responsive. + +    [Where problems could occur] +    The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). + +    If !in_task() incorrectly identifies a real interrupt-context fault as task +    context, the handler would try to process it as an EFI firmware fault instead +    of letting the normal oops path handle it. This could mask real kernel bugs +    during interrupt-context EFI faults, though such faults are extremely rare. + +    Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the +    fpregs_lock() call could cause a page fault that gets misidentified. The use of !in_task() (which incorporates in_serving_softirq()) handles this window +    correctly. ** Description changed: [Impact] -    Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088, -    202511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a hard -    system freeze. The machine becomes unreachable by ping/SSH and requires a hard -    power cycle. + Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088,02511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a hardsystem freeze. The machine becomes unreachable by ping/SSH and requires a hardpower cycle. -    Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making -    kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use -    local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in -    preempt_count during EFI runtime calls, making in_interrupt() return true in -    normal task context. The EFI graceful page fault handler -    efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults -    in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails -    out, leaving firmware page faults unhandled. This escalates to die() which also -    sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), -    freezing the system. + Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making + kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use + local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in + preempt_count during EFI runtime calls, making in_interrupt() return true in + normal task context. The EFI graceful page fault handler + efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults + in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails + out, leaving firmware page faults unhandled. This escalates to die() which also + sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), + freezing the system. -    [Fix] -    Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). -    This preserves the original intent of bailing for interrupts or NMI faults, -    while no longer falsely triggering from the FPU code path's local_bh_disable(). + [Fix] + Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). + This preserves the original intent of bailing for interrupts or NMI faults, + while no longer falsely triggering from the FPU code path's local_bh_disable(). -    [Test Plan] -    1. Boot affected HP machine with the patched 6.17-oem kernel. -    2. Run: -       $ sudo fwts uefirttime + [Test Plan] + 1. Boot affected HP machine with the patched 6.17-oem kernel. + 2. Run: +    $ sudo fwts uefirttime -    Without patch: system hard-freezes, requires power cycle. -    With patch: fwts completes (pass or fail), system remains responsive. + Without patch: system hard-freezes, requires power cycle. + With patch: fwts completes (pass or fail), system remains responsive. -    [Where problems could occur] -    The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). - -    If !in_task() incorrectly identifies a real interrupt-context fault as task -    context, the handler would try to process it as an EFI firmware fault instead -    of letting the normal oops path handle it. This could mask real kernel bugs -    during interrupt-context EFI faults, though such faults are extremely rare. - -    Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the -    fpregs_lock() call could cause a page fault that gets misidentified. The use of !in_task() (which incorporates in_serving_softirq()) handles this window -    correctly. + [Where problems could occur] + The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). + If !in_task() incorrectly identifies a real interrupt-context fault as task + context, the handler would try to process it as an EFI firmware fault instead + of letting the normal oops path handle it. This could mask real kernel bugs + during interrupt-context EFI faults, though such faults are extremely rare. + Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the + fpregs_lock() call could cause a page fault that gets misidentified. The use of !in_task() (which incorporates in_serving_softirq()) handles this window correctly. ** Description changed: [Impact] - Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088,02511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a hardsystem freeze. The machine becomes unreachable by ping/SSH and requires a hardpower cycle. + Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088,02511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a hardsystem freeze. The machine becomes unreachable by ping/SSH and requires a + hardpower cycle. Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count during EFI runtime calls, making in_interrupt() return true in normal task context. The EFI graceful page fault handler efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails out, leaving firmware page faults unhandled. This escalates to die() which also sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), freezing the system. [Fix] Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). This preserves the original intent of bailing for interrupts or NMI faults, while no longer falsely triggering from the FPU code path's local_bh_disable(). [Test Plan] 1. Boot affected HP machine with the patched 6.17-oem kernel. 2. Run:    $ sudo fwts uefirttime Without patch: system hard-freezes, requires power cycle. With patch: fwts completes (pass or fail), system remains responsive. [Where problems could occur] The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). If !in_task() incorrectly identifies a real interrupt-context fault as task context, the handler would try to process it as an EFI firmware fault instead of letting the normal oops path handle it. This could mask real kernel bugs during interrupt-context EFI faults, though such faults are extremely rare. Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the - fpregs_lock() call could cause a page fault that gets misidentified. The use of !in_task() (which incorporates in_serving_softirq()) handles this window correctly. + fpregs_lock() call could cause a page fault that gets misidentified. The use of !in_task() (which incorporates in_serving_softirq()) handles this window + correctly. ** Description changed: [Impact] - Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088,02511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a hardsystem freeze. The machine becomes unreachable by ping/SSH and requires a + Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088,02511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a + hardsystem freeze. The machine becomes unreachable by ping/SSH and requires a hardpower cycle. Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count during EFI runtime calls, making in_interrupt() return true in normal task context. The EFI graceful page fault handler efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails out, leaving firmware page faults unhandled. This escalates to die() which also sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), freezing the system. [Fix] Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). This preserves the original intent of bailing for interrupts or NMI faults, while no longer falsely triggering from the FPU code path's local_bh_disable(). [Test Plan] 1. Boot affected HP machine with the patched 6.17-oem kernel. 2. Run:    $ sudo fwts uefirttime Without patch: system hard-freezes, requires power cycle. With patch: fwts completes (pass or fail), system remains responsive. [Where problems could occur] The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). If !in_task() incorrectly identifies a real interrupt-context fault as task context, the handler would try to process it as an EFI firmware fault instead of letting the normal oops path handle it. This could mask real kernel bugs during interrupt-context EFI faults, though such faults are extremely rare. Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the fpregs_lock() call could cause a page fault that gets misidentified. The use of !in_task() (which incorporates in_serving_softirq()) handles this window correctly. ** Description changed: [Impact] - Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088,02511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a + Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, + 202511-38088,02511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a hardsystem freeze. The machine becomes unreachable by ping/SSH and requires a hardpower cycle. Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count during EFI runtime calls, making in_interrupt() return true in normal task context. The EFI graceful page fault handler efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails out, leaving firmware page faults unhandled. This escalates to die() which also sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), freezing the system. [Fix] Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). This preserves the original intent of bailing for interrupts or NMI faults, while no longer falsely triggering from the FPU code path's local_bh_disable(). [Test Plan] 1. Boot affected HP machine with the patched 6.17-oem kernel. 2. Run:    $ sudo fwts uefirttime Without patch: system hard-freezes, requires power cycle. With patch: fwts completes (pass or fail), system remains responsive. [Where problems could occur] The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). If !in_task() incorrectly identifies a real interrupt-context fault as task context, the handler would try to process it as an EFI firmware fault instead of letting the normal oops path handle it. This could mask real kernel bugs during interrupt-context EFI faults, though such faults are extremely rare. Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the - fpregs_lock() call could cause a page fault that gets misidentified. The use of !in_task() (which incorporates in_serving_softirq()) handles this window + fpregs_lock() call could cause a page fault that gets misidentified. The use + of !in_task() (which incorporates in_serving_softirq()) handles this window correctly. ** Description changed: [Impact] - Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, + Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088,02511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a hardsystem freeze. The machine becomes unreachable by ping/SSH and requires a hardpower cycle. Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count during EFI runtime calls, making in_interrupt() return true in normal task context. The EFI graceful page fault handler efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails out, leaving firmware page faults unhandled. This escalates to die() which also sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), freezing the system. [Fix] Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). This preserves the original intent of bailing for interrupts or NMI faults, while no longer falsely triggering from the FPU code path's local_bh_disable(). [Test Plan] 1. Boot affected HP machine with the patched 6.17-oem kernel. 2. Run:    $ sudo fwts uefirttime Without patch: system hard-freezes, requires power cycle. With patch: fwts completes (pass or fail), system remains responsive. [Where problems could occur] The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). If !in_task() incorrectly identifies a real interrupt-context fault as task context, the handler would try to process it as an EFI firmware fault instead of letting the normal oops path handle it. This could mask real kernel bugs during interrupt-context EFI faults, though such faults are extremely rare. Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the - fpregs_lock() call could cause a page fault that gets misidentified. The use - of !in_task() (which incorporates in_serving_softirq()) handles this window + fpregs_lock() call could cause a page fault that gets misidentified. The use +  of !in_task() (which incorporates in_serving_softirq()) handles this window correctly. ** Description changed: [Impact] - Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, - 202511-38088,02511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a + Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088, + 02511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a hardsystem freeze. The machine becomes unreachable by ping/SSH and requires a hardpower cycle. Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count during EFI runtime calls, making in_interrupt() return true in normal task context. The EFI graceful page fault handler efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails out, leaving firmware page faults unhandled. This escalates to die() which also sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), freezing the system. [Fix] Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). This preserves the original intent of bailing for interrupts or NMI faults, while no longer falsely triggering from the FPU code path's local_bh_disable(). [Test Plan] 1. Boot affected HP machine with the patched 6.17-oem kernel. 2. Run:    $ sudo fwts uefirttime Without patch: system hard-freezes, requires power cycle. With patch: fwts completes (pass or fail), system remains responsive. [Where problems could occur] The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). If !in_task() incorrectly identifies a real interrupt-context fault as task context, the handler would try to process it as an EFI firmware fault instead of letting the normal oops path handle it. This could mask real kernel bugs during interrupt-context EFI faults, though such faults are extremely rare. Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the fpregs_lock() call could cause a page fault that gets misidentified. The use -  of !in_task() (which incorporates in_serving_softirq()) handles this window + of !in_task() (which incorporates in_serving_softirq()) handles this window correctly. ** Also affects: linux-oem-6.17 (Ubuntu) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Noble) Importance: Undecided Status: New ** Also affects: linux-oem-6.17 (Ubuntu Noble) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Resolute) Importance: Undecided Status: New ** Also affects: linux-oem-6.17 (Ubuntu Resolute) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Noble) Status: New => Invalid ** Changed in: linux-oem-6.17 (Ubuntu Resolute) Status: New => Invalid ** Description changed: [Impact] Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088, - 02511-38091, 202511-38068, 202511-38069) with 6.17.0-1018-oem causes a - hardsystem freeze. The machine becomes unreachable by ping/SSH and requires a - hardpower cycle. + 02511-38091, 202511-38068, 202511-38069) with 6.17 causes a hardsystem freeze. + The machine becomes unreachable by ping/SSH and requires a hardpower cycle. Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count during EFI runtime calls, making in_interrupt() return true in normal task context. The EFI graceful page fault handler efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails out, leaving firmware page faults unhandled. This escalates to die() which also sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), freezing the system. [Fix] Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). This preserves the original intent of bailing for interrupts or NMI faults, while no longer falsely triggering from the FPU code path's local_bh_disable(). [Test Plan] 1. Boot affected HP machine with the patched 6.17-oem kernel. 2. Run:    $ sudo fwts uefirttime Without patch: system hard-freezes, requires power cycle. With patch: fwts completes (pass or fail), system remains responsive. [Where problems could occur] The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). If !in_task() incorrectly identifies a real interrupt-context fault as task context, the handler would try to process it as an EFI firmware fault instead of letting the normal oops path handle it. This could mask real kernel bugs during interrupt-context EFI faults, though such faults are extremely rare. Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the fpregs_lock() call could cause a page fault that gets misidentified. The use of !in_task() (which incorporates in_serving_softirq()) handles this window correctly. -- You received this bug notification because you are subscribed to linux in Ubuntu. Matching subscriptions: Bgg, Bmail, Nb https://bugs.launchpad.net/bugs/2153976 Title: Fix graceful fault handling after FPU softirq changes causes hard freeze on EFI runtime calls Status in linux package in Ubuntu: New Status in linux-oem-6.17 package in Ubuntu: New Status in linux source package in Noble: Invalid Status in linux-oem-6.17 source package in Noble: New Status in linux source package in Resolute: New Status in linux-oem-6.17 source package in Resolute: Invalid Bug description: [Impact] Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088, 02511-38091, 202511-38068, 202511-38069) with 6.17 causes a hardsystem freeze. The machine becomes unreachable by ping/SSH and requires a hardpower cycle. Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count during EFI runtime calls, making in_interrupt() return true in normal task context. The EFI graceful page fault handler efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails out, leaving firmware page faults unhandled. This escalates to die() which also sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), freezing the system. [Fix] Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault(). This preserves the original intent of bailing for interrupts or NMI faults, while no longer falsely triggering from the FPU code path's local_bh_disable(). [Test Plan] 1. Boot affected HP machine with the patched 6.17-oem kernel. 2. Run:    $ sudo fwts uefirttime Without patch: system hard-freezes, requires power cycle. With patch: fwts completes (pass or fail), system remains responsive. [Where problems could occur] The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c). If !in_task() incorrectly identifies a real interrupt-context fault as task context, the handler would try to process it as an EFI firmware fault instead of letting the normal oops path handle it. This could mask real kernel bugs during interrupt-context EFI faults, though such faults are extremely rare. Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the fpregs_lock() call could cause a page fault that gets misidentified. The use of !in_task() (which incorporates in_serving_softirq()) handles this window correctly. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2153976/+subscriptions

Комментариев нет:

Отправить комментарий