** Tags added: kernel-daily-bug -- You received this bug notification because you are subscribed to linux in Ubuntu. Matching subscriptions: Bgg, Bmail, Nb https://bugs.launchpad.net/bugs/2157755 Title: [linux 6.8.0-124-generic] Dentry-cache slab use-after-free under concurrent /proc lookup + process exit on high-density LXD hosts Status in linux package in Ubuntu: New Bug description: **Package:** linux (Ubuntu Noble 24.04 LTS) **Kernel:** 6.8.0-124-generic #124-Ubuntu SMP PREEMPT_DYNAMIC (Ubuntu 6.8.0-124.124, base 6.8.12) **Severity:** High — repeatable hard crash (kernel panic / reboot) on production hosts, ~every 2–6 days --- ## Summary On heavily loaded LXD container hosts running 6.8.0-124-generic, the kernel panics repeatedly with memory corruption localized to the **dentry slab cache**. A captured kdump vmcore shows a **use-after- free**: the dentry slab freelist is overwritten with non-pointer garbage, and concurrent threads doing `/proc` path lookups and process-exit dentry teardown fault on the corrupted objects. The corruption is confirmed by `crash`'s own slab validator (`kmem -s` reports `invalid freepointer` on dentry slabs) and by three CPUs caught mid-fault in the same dentry alloc/free paths in a single core. The same workload on **6.8.0-90-generic does not exhibit the crash** (see A/B test below), which points to a regression in the dentry/procfs path between -90 and -124, or to a latent race newly exposed by changes in that range. --- ## Environment - **Hardware:** Supermicro SYS-611C-TN4R / X13DDW-A, BIOS 2.7 (07/23/2025), dual-socket, 64 logical CPUs, 256 GB RAM - **Root/storage:** OpenZFS (zfs 2.2.2-0ubuntu9.4), kernel tainted `PO` (out-of-tree + proprietary ZFS module) - **Workload:** ~90 LXD system containers (managed WordPress hosting); very high concurrent fork/exit and `/proc` scanning from per-container nginx / php-fpm / mariadbd / redis plus host-side monitoring (`ps`), backups (`tar`) - **Crash cadence:** every ~2–6 days; uptime at this capture was 8 days - **EDAC/MCE:** clean (`ras-mc-ctl --summary/--errors` show no memory or PCIe errors; IPMI SEL clean apart from PSU/chassis noise) — not a hardware memory fault --- ## Impact Each event is a hard kernel panic. With `panic_on_oops=1` / `panic=10` the host self-reboots, but every crash is a full outage of ~90 tenant containers. The corruption surfaces in unrelated subsystems (dentry teardown, dentry alloc, socket/pid allocation) because it is a slab freelist UAF — the faulting site is never the bug site, which makes it look like random instability until the dump is examined. --- ## Crash analysis (from kdump vmcore, full matching dbgsym) Panic task and primary oops: ``` PANIC: "Oops: 0000 [#1] PREEMPT SMP NOPTI" COMMAND: "ps" CPU: 37 [exception RIP: dentry_unlink_inode+251] (NULL deref; RAX/RDX/RSI/RDI = 0) #8 dentry_unlink_inode #9 __dentry_kill #10 shrink_dentry_list #11 shrink_dcache_parent #12 d_invalidate #13 lookup_fast #14 walk_component #15 path_lookupat #16 filename_lookup #17 vfs_statx #18 vfs_fstatat #19 __do_sys_newfstatat ``` The corrupted dentry is a procfs pid entry — `/proc/<pid>/cmdline`: ``` struct dentry { d_name.name = "cmdline" d_iname = "cmdline" d_inode = 0x0 <-- already unlinked d_op = pid_dentry_operations d_lockref.count = -128 (0xffffff80) <-- refcount already driven negative } ``` `crash`'s slab validator independently flags the dentry cache as corrupt (no `slub_debug` was active at capture — this is structural freelist validation): ``` kmem: dentry: slab: ffd8d770cc2fe300 invalid freepointer: 7d6cf1f4997700d6 kmem: dentry: slab: ffd8d770cc1abe00 invalid freepointer: 7d6cf1f494205b56 kmem: kmalloc-rcl-64: slab: ffd8d770cc26a700 invalid freepointer: 55ab8f7b3288b69a ``` Three CPUs were simultaneously in dentry alloc/free paths at panic — the race, in one snapshot: | CPU | Task | Operation | Fault | |-----|------|-----------|-------| | 37 | ps | dentry teardown: `dentry_unlink_inode ← __dentry_kill ← shrink_dentry_list ← d_invalidate ← lookup_fast` (`/proc` stat walk) | NULL deref on already-freed dentry (panicked first) | | 4 | ps | dentry teardown: `dentry_unlink_inode ← __dentry_kill ← dput ← lookup_fast ← open_last_lookups ← openat` | same fault site; spinning in `native_queued_spin_lock_slowpath` | | 45 | tar | dentry **allocation**: `kmem_cache_alloc_lru ← __d_alloc ← d_alloc_parallel ← __lookup_slow` (stat walk) | GPF on poisoned freelist pointer; R14 = dentry cache addr | The `tar` GPF register state shows the poisoned pointer being consumed from the dentry slab: ``` [exception RIP: kmem_cache_alloc_lru+221] general protection fault (non-canonical address) RAX: 627117ed820fc609 RDI: 627117ed820fc5a9 <-- garbage freelist pointer R14: ff1a80bec01f6800 <-- dentry kmem_cache ``` This matches earlier pstore-only captures of the same host, where the first event was consistently a GPF in `kmem_cache_alloc_lru` on a non- canonical freelist pointer reached via `__d_alloc` / `alloc_pid` / `sock_alloc_file` — all dentry/slab allocations off the fork/exit hot paths. --- ## What is ruled out - **Not ZFS.** All ZFS caches (`zfs_znode_cache`, `dnode_t`, `dmu_buf_impl_t`, `arc_buf_*`) are intact in `kmem -s` — no `invalid freepointer` — despite millions of live objects. ZFS appears only as a passing frame on the clone path. (Kernel is ZFS-tainted; noted for completeness, but the corrupted cache is core VFS `dentry`, not any ZFS slab.) - **Not AppArmor notification CVEs (USN-8373-1 / CVE-2026-47326..47328).** `apparmor_auditcache` is clean/empty; the AppArmor notification interface is not in active use on these hosts (no `aa-notify` consumer, `features/policy/notify` empty). The fault is in core procfs/VFS dentry handling (`pid_dentry_operations`), unrelated to AppArmor. - **Not hardware.** EDAC/MCE/SEL clean; corruption is structurally consistent (always dentry slab, always teardown/alloc paths), not the random scatter of failing DIMMs. --- ## A/B test (kernel version isolation) Two near-identical heavily loaded hosts that both crashed on -124: - **Host A (vps232):** kept on **6.8.0-124**, kdump-armed, used to capture this vmcore. - **Host B (vps193):** rolled back to **6.8.0-90-generic**, same workload (~90 containers), as control. Expected discriminator within one crash interval: if Host B on -90 stays up while Host A on -124 keeps crashing, the regression is localized to the -90→-124 range. (Result will be added as a follow-up comment.) Note: 6.8.0-124.124 is the newest generic kernel currently published for Noble, so there is no forward kernel to test against — rollback to -90 is the only available containment. --- ## Reproduction conditions Not yet reduced to a minimal reproducer, but reliably reproduced in production by: - High logical-CPU-count host (64) with high process density (~90 LXD containers) - Sustained concurrent `/proc` traversal (host monitoring running `ps`/stat loops) **plus** continuous process churn (per-container php-fpm/nginx fork+exit) **plus** filesystem tree walks (`tar` backups) - i.e. heavy concurrent `__d_alloc` (lookup) and `__dentry_kill`/`proc_flush_pid` (exit + invalidate) against the shared dentry cache Mean time to corruption: ~2–6 days of normal production load. --- ## Artifacts available on request - Full kdump vmcore (`/var/crash/...`, ~17 GB, PARTIAL DUMP via makedumpfile) captured against `linux-image-unsigned-6.8.0-124-generic-dbgsym` 6.8.0-124.124 (matching build-id) - `crash` session output: `bt`, `bt -a` (all 64 CPUs), `kmem -s`, `kmem -S dentry`, `struct dentry` of the corrupted object, `log` - Five prior pstore dmesg captures from the same host showing the recurring signature - apport-collected host/config data (will attach via `ubuntu-bug linux`) ## Planned follow-up Host A is being rebooted with `slub_debug=FZP` to catch the corrupting write **at the bad free** (red-zone/poison validation), which should name the exact freeing path. That trace will be attached as a follow- up comment once the next event is captured. Full 17 GB kdump vmcore (PARTIAL DUMP, makedumpfile) retained on the affected host, captured against linux-image- unsigned-6.8.0-124-generic-dbgsym 6.8.0-124.124 (matching build-id). Available to the assigned engineer on request ProblemType: Bug DistroRelease: Ubuntu 24.04 Package: linux-image-6.8.0-124-generic 6.8.0-124.124 ProcVersionSignature: Ubuntu 6.8.0-124.124-generic 6.8.12 Uname: Linux 6.8.0-124-generic x86_64 NonfreeKernelModules: zfs AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Jun 21 19:29 seq crw-rw---- 1 root audio 116, 33 Jun 21 19:29 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay' ApportVersion: 2.28.1-0ubuntu3.8 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: N/A CasperMD5CheckResult: pass Date: Sun Jun 21 20:43:01 2026 InstallationDate: Installed on 2025-12-01 (202 days ago) InstallationMedia: Ubuntu-Server 24.04.3 LTS "Noble Numbat" - Release amd64 (20250805.1) IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig' Lsusb: Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 002: ID 1d6b:0107 Linux Foundation USB Virtual Hub Bus 001 Device 003: ID 0557:9241 ATEN International Co., Ltd SMCI HID KM Bus 001 Device 004: ID 0b1f:03ee Insyde Software Corp. RNDIS/Ethernet Gadget Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub MachineType: Supermicro SYS-611C-TN4R PciMultimedia: ProcEnviron: LANG=en_US.UTF-8 PATH=(custom, no user) SHELL=/bin/bash TERM=xterm XDG_RUNTIME_DIR=<set> ProcFB: 0 astdrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.8.0-124-generic root=UUID=3e867032-21c4-416e-b45f-a17d1dae6788 ro crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M panic_on_oops=1 panic=10 RelatedPackageVersions: linux-restricted-modules-6.8.0-124-generic N/A linux-backports-modules-6.8.0-124-generic N/A linux-firmware 20240318.git3b128b60-0ubuntu2.26 RfKill: Error: [Errno 2] No such file or directory: 'rfkill' SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 07/23/2025 dmi.bios.release: 5.32 dmi.bios.vendor: American Megatrends International, LLC. dmi.bios.version: 2.7 dmi.board.asset.tag: Base Board Asset Tag dmi.board.name: X13DDW-A dmi.board.vendor: Supermicro dmi.board.version: 1.01 dmi.chassis.asset.tag: Chassis Asset Tag dmi.chassis.type: 1 dmi.chassis.vendor: Supermicro dmi.chassis.version: 0123456789 dmi.modalias: dmi:bvnAmericanMegatrendsInternational,LLC.:bvr2.7:bd07/23/2025:br5.32:svnSupermicro:pnSYS-611C-TN4R:pvr0123456789:rvnSupermicro:rnX13DDW-A:rvr1.01:cvnSupermicro:ct1:cvr0123456789:skuTobefilledbyO.E.M.: dmi.product.family: Family dmi.product.name: SYS-611C-TN4R dmi.product.sku: To be filled by O.E.M. dmi.product.version: 0123456789 dmi.sys.vendor: Supermicro To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2157755/+subscriptions
Комментариев нет:
Отправить комментарий