пятница

[Bug 2158377] [NEW] ext4: writeback causes kernel oops when low on space

Public bug reported: SRU Justification: [Impact] The noble upstream stable patchset 2026-05-28 (LP: #2154496) introduced refactoring patches targeting ext4 code, originating from upstream stable branch linux-6.6.y. During regression testing of the generic noble kernel, the ltp tests mmap14 and mmap16 were observed to fail, causing a kernel oops. Through bisecting, upstream commit f7d1331f16a8 ("ext4: get rid of ppath in ext4_ext_insert_extent()") was found to be cause. This commit was the first in the refactoring series of commits targeting ext4 in v6.6.130. The following stacktrace was recorded on an arm64 openstack instance: [ 1194.380996] Unable to handle kernel paging request at virtual address ffffffffffffffec [ 1194.381993] Mem abort info: [ 1194.382451] ESR = 0x0000000096000004 [ 1194.382950] EC = 0x25: DABT (current EL), IL = 32 bits [ 1194.383615] SET = 0, FnV = 0 [ 1194.384067] EA = 0, S1PTW = 0 [ 1194.384534] FSC = 0x04: level 0 translation fault [ 1194.385143] Data abort info: [ 1194.385626] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 1194.386310] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 1194.387026] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1194.391322] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000cc641000 [ 1194.393554] [ffffffffffffffec] pgd=0000000000000000, p4d=0000000000000000 [ 1194.394958] Internal error: Oops: 0000000096000004 [#1] SMP [ 1194.395660] Modules linked in: brd overlay exfat bcachefs lz4hc_compress lz4_compress xfs sctp ip6_udp_tunnel udp_tunnel nfsd auth_rpcgss nfs_acl lockd grace sunrpc tls qrtr cfg80211 binfmt_misc nls_iso8859_1 input_leds sch_fq_codel dm_multipath efi_pstore nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 hid_generic usbhid hid btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_arm64 virtio_gpu arm_smccc_trng sha1_ce virtio_rng virtio_dma_buf xhci_pci xhci_pci_renesas aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [last unloaded: init_module(OE)] [ 1194.405523] CPU: 1 PID: 95 Comm: kworker/u6:1 Tainted: G OE 6.8.0-132-generic #133-Ubuntu [ 1194.407319] Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8~22.04.0~ppa3 05/14/2025 [ 1194.408502] Workqueue: writeback wb_workfn (flush-7:0) [ 1194.409343] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1194.410263] pc : ext4_ext_map_blocks+0x2a0/0xa18 [ 1194.410979] lr : ext4_ext_map_blocks+0x8e8/0xa18 [ 1194.411685] sp : ffff8000803db600 [ 1194.412280] x29: ffff8000803db6a0 x28: 0000000000000ee8 x27: ffff0000c3641280 [ 1194.413248] x26: ffff0000cb20b000 x25: ffffffffffffffe4 x24: 000000000000042f [ 1194.414196] x23: ffff0000c286bc40 x22: ffff0000c36413a8 x21: ffff8000803db948 [ 1194.415141] x20: 0000000000000008 x19: 0000000000000000 x18: ffff800080381078 [ 1194.416131] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 1194.417083] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 1194.418051] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa6fb00b8227c [ 1194.419026] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 1194.419949] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 1194.420897] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 [ 1194.421851] Call trace: [ 1194.422404] ext4_ext_map_blocks+0x2a0/0xa18 [ 1194.423106] ext4_map_blocks+0x1c4/0x650 [ 1194.423755] mpage_map_one_extent+0x7c/0x1e0 [ 1194.424456] mpage_map_and_submit_extent+0x94/0x428 [ 1194.425196] ext4_do_writepages+0x6c0/0x7d8 [ 1194.425883] ext4_writepages+0x88/0x128 [ 1194.426570] do_writepages+0x98/0x210 [ 1194.427222] __writeback_single_inode+0x50/0x390 [ 1194.427947] writeback_sb_inodes+0x230/0x4d8 [ 1194.428666] wb_writeback+0x128/0x410 [ 1194.429324] wb_do_writeback+0xb4/0x398 [ 1194.429998] wb_workfn+0x80/0x258 [ 1194.430604] process_one_work+0x17c/0x448 [ 1194.431289] worker_thread+0x1bc/0x3a0 [ 1194.431923] kthread+0xf8/0x110 [ 1194.432533] ret_from_fork+0x10/0x20 [ 1194.433178] Code: b9003fe2 f94023f9 b9000ea0 b4001a39 (79401337) [ 1194.434077] ---[ end trace 0000000000000000 ]--- The crash is triggered during normal writeback when the filesystem is low on space. Any workload that writes to ext4 with unwritten (preallocated/fallocated) extents under space pressure can hit this. It is fatal to the affected writeback worker thread. [Explanation] The kernel crash (oops) occurs during ext4 writeback in `ext4_ext_map_blocks()` when the filesystem encounters an ENOSPC (no space left on device) condition while handling unwritten extents. The crash manifests as a page fault at address `0xffffffffffffffec` (which is `ERR_PTR(-ENOSPC) + 8`) on arm64, triggered from a writeback worker thread. The ext4 extent code underwent a significant refactoring series that changed how extent path objects are passed between functions. Previously, functions used double-pointer (`struct ext4_ext_path **ppath`) semantics — on error, the callee would set `*ppath = NULL`, ensuring the caller never held a dangling or invalid pointer. The refactoring changed these functions to return the path directly (or `ERR_PTR` on error). After this refactoring, functions like `ext4_ext_handle_unwritten_extents()` and `ext4_ext_insert_extent()` return `ERR_PTR(-ENOSPC)` on failure. The caller (`ext4_ext_map_blocks`) stores this error pointer in its local `path` variable and jumps to its cleanup label, where `ext4_free_ext_path(path)` is called. However, `ext4_free_ext_path()` was never updated to handle `ERR_PTR` values — it only had a NULL check. As a result, it attempts to dereference the error pointer (specifically reading `path->p_depth`), causing the kernel page fault. [Fix] Cherry-pick this missing prerequisite patch: 6b854d552711 ("ext4: get rid of ppath in get_ext_path()"). This patch adds `IS_ERR_OR_NULL()` guards to both `ext4_ext_drop_refs()` and `ext4_free_ext_path()`. This teaches these cleanup functions to gracefully handle error pointers by returning immediately, which is necessary now that the refactored call chain can leave `path` set to an `ERR_PTR` value when reaching cleanup code. [Test Plan] Run the ltp tests mmap14 and mmp16 that reliably trigger the crash. The tests should complete successfully without triggering a kernel crash. [Where problems could occur] If problems with this fix were to occur, they would manifest as silent memory leaks in ext4 extent handling - specifically, any code path that previously relied on `ext4_free_ext_path()` to actually free a valid path object but now mistakenly passes an `IS_ERR()` value would silently skip the free and leak the already-freed (or never-allocated) path, making such bugs harder to detect rather than causing a visible crash. Conversely, if any code path were to accidentally store an `ERR_PTR` in a path variable that is later reused (rather than cleaned up), the `IS_ERR_OR_NULL` guard would mask what should be a loud failure, potentially leading to subtle use-after-free or NULL-dereference bugs downstream when that variable is subsequently passed to `ext4_find_extent()` for recycling. ** Affects: linux (Ubuntu) Importance: Undecided Status: Invalid ** Affects: linux (Ubuntu Noble) Importance: High Assignee: Manuel Diewald (diewald) Status: In Progress ** Also affects: linux (Ubuntu Noble) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Noble) Assignee: (unassigned) => Manuel Diewald (diewald) ** Changed in: linux (Ubuntu Noble) Status: New => In Progress ** Changed in: linux (Ubuntu Noble) Importance: Undecided => High ** Changed in: linux (Ubuntu) Status: New => Invalid -- You received this bug notification because you are subscribed to linux in Ubuntu. Matching subscriptions: Bgg, Bmail, Nb https://bugs.launchpad.net/bugs/2158377 Title: ext4: writeback causes kernel oops when low on space Status in linux package in Ubuntu: Invalid Status in linux source package in Noble: In Progress Bug description: SRU Justification: [Impact] The noble upstream stable patchset 2026-05-28 (LP: #2154496) introduced refactoring patches targeting ext4 code, originating from upstream stable branch linux-6.6.y. During regression testing of the generic noble kernel, the ltp tests mmap14 and mmap16 were observed to fail, causing a kernel oops. Through bisecting, upstream commit f7d1331f16a8 ("ext4: get rid of ppath in ext4_ext_insert_extent()") was found to be cause. This commit was the first in the refactoring series of commits targeting ext4 in v6.6.130. The following stacktrace was recorded on an arm64 openstack instance: [ 1194.380996] Unable to handle kernel paging request at virtual address ffffffffffffffec [ 1194.381993] Mem abort info: [ 1194.382451] ESR = 0x0000000096000004 [ 1194.382950] EC = 0x25: DABT (current EL), IL = 32 bits [ 1194.383615] SET = 0, FnV = 0 [ 1194.384067] EA = 0, S1PTW = 0 [ 1194.384534] FSC = 0x04: level 0 translation fault [ 1194.385143] Data abort info: [ 1194.385626] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 1194.386310] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 1194.387026] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1194.391322] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000cc641000 [ 1194.393554] [ffffffffffffffec] pgd=0000000000000000, p4d=0000000000000000 [ 1194.394958] Internal error: Oops: 0000000096000004 [#1] SMP [ 1194.395660] Modules linked in: brd overlay exfat bcachefs lz4hc_compress lz4_compress xfs sctp ip6_udp_tunnel udp_tunnel nfsd auth_rpcgss nfs_acl lockd grace sunrpc tls qrtr cfg80211 binfmt_misc nls_iso8859_1 input_leds sch_fq_codel dm_multipath efi_pstore nfnetlink dmi_sysfs qemu_fw_cfg ip_tables x_tables autofs4 hid_generic usbhid hid btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_arm64 virtio_gpu arm_smccc_trng sha1_ce virtio_rng virtio_dma_buf xhci_pci xhci_pci_renesas aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher [last unloaded: init_module(OE)] [ 1194.405523] CPU: 1 PID: 95 Comm: kworker/u6:1 Tainted: G OE 6.8.0-132-generic #133-Ubuntu [ 1194.407319] Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8~22.04.0~ppa3 05/14/2025 [ 1194.408502] Workqueue: writeback wb_workfn (flush-7:0) [ 1194.409343] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1194.410263] pc : ext4_ext_map_blocks+0x2a0/0xa18 [ 1194.410979] lr : ext4_ext_map_blocks+0x8e8/0xa18 [ 1194.411685] sp : ffff8000803db600 [ 1194.412280] x29: ffff8000803db6a0 x28: 0000000000000ee8 x27: ffff0000c3641280 [ 1194.413248] x26: ffff0000cb20b000 x25: ffffffffffffffe4 x24: 000000000000042f [ 1194.414196] x23: ffff0000c286bc40 x22: ffff0000c36413a8 x21: ffff8000803db948 [ 1194.415141] x20: 0000000000000008 x19: 0000000000000000 x18: ffff800080381078 [ 1194.416131] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 1194.417083] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 1194.418051] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa6fb00b8227c [ 1194.419026] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 1194.419949] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 1194.420897] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 [ 1194.421851] Call trace: [ 1194.422404] ext4_ext_map_blocks+0x2a0/0xa18 [ 1194.423106] ext4_map_blocks+0x1c4/0x650 [ 1194.423755] mpage_map_one_extent+0x7c/0x1e0 [ 1194.424456] mpage_map_and_submit_extent+0x94/0x428 [ 1194.425196] ext4_do_writepages+0x6c0/0x7d8 [ 1194.425883] ext4_writepages+0x88/0x128 [ 1194.426570] do_writepages+0x98/0x210 [ 1194.427222] __writeback_single_inode+0x50/0x390 [ 1194.427947] writeback_sb_inodes+0x230/0x4d8 [ 1194.428666] wb_writeback+0x128/0x410 [ 1194.429324] wb_do_writeback+0xb4/0x398 [ 1194.429998] wb_workfn+0x80/0x258 [ 1194.430604] process_one_work+0x17c/0x448 [ 1194.431289] worker_thread+0x1bc/0x3a0 [ 1194.431923] kthread+0xf8/0x110 [ 1194.432533] ret_from_fork+0x10/0x20 [ 1194.433178] Code: b9003fe2 f94023f9 b9000ea0 b4001a39 (79401337) [ 1194.434077] ---[ end trace 0000000000000000 ]--- The crash is triggered during normal writeback when the filesystem is low on space. Any workload that writes to ext4 with unwritten (preallocated/fallocated) extents under space pressure can hit this. It is fatal to the affected writeback worker thread. [Explanation] The kernel crash (oops) occurs during ext4 writeback in `ext4_ext_map_blocks()` when the filesystem encounters an ENOSPC (no space left on device) condition while handling unwritten extents. The crash manifests as a page fault at address `0xffffffffffffffec` (which is `ERR_PTR(-ENOSPC) + 8`) on arm64, triggered from a writeback worker thread. The ext4 extent code underwent a significant refactoring series that changed how extent path objects are passed between functions. Previously, functions used double-pointer (`struct ext4_ext_path **ppath`) semantics — on error, the callee would set `*ppath = NULL`, ensuring the caller never held a dangling or invalid pointer. The refactoring changed these functions to return the path directly (or `ERR_PTR` on error). After this refactoring, functions like `ext4_ext_handle_unwritten_extents()` and `ext4_ext_insert_extent()` return `ERR_PTR(-ENOSPC)` on failure. The caller (`ext4_ext_map_blocks`) stores this error pointer in its local `path` variable and jumps to its cleanup label, where `ext4_free_ext_path(path)` is called. However, `ext4_free_ext_path()` was never updated to handle `ERR_PTR` values — it only had a NULL check. As a result, it attempts to dereference the error pointer (specifically reading `path->p_depth`), causing the kernel page fault. [Fix] Cherry-pick this missing prerequisite patch: 6b854d552711 ("ext4: get rid of ppath in get_ext_path()"). This patch adds `IS_ERR_OR_NULL()` guards to both `ext4_ext_drop_refs()` and `ext4_free_ext_path()`. This teaches these cleanup functions to gracefully handle error pointers by returning immediately, which is necessary now that the refactored call chain can leave `path` set to an `ERR_PTR` value when reaching cleanup code. [Test Plan] Run the ltp tests mmap14 and mmp16 that reliably trigger the crash. The tests should complete successfully without triggering a kernel crash. [Where problems could occur] If problems with this fix were to occur, they would manifest as silent memory leaks in ext4 extent handling - specifically, any code path that previously relied on `ext4_free_ext_path()` to actually free a valid path object but now mistakenly passes an `IS_ERR()` value would silently skip the free and leak the already-freed (or never-allocated) path, making such bugs harder to detect rather than causing a visible crash. Conversely, if any code path were to accidentally store an `ERR_PTR` in a path variable that is later reused (rather than cleaned up), the `IS_ERR_OR_NULL` guard would mask what should be a loud failure, potentially leading to subtle use-after-free or NULL-dereference bugs downstream when that variable is subsequently passed to `ext4_find_extent()` for recycling. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2158377/+subscriptions

Комментариев нет:

Отправить комментарий