The review has been sent to the mailing list:
https://lists.ubuntu.com/archives/kernel-team/2025-January/156160.html
--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2089318
Title:
kernel hard lockup in cgroups during eBPF workload
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Jammy:
In Progress
Bug description:
SRU Justification:
[Impact]
There is a kernel hard lockup where all CPUs are stuck acquiring an
already-locked spinlock (css_set_lock) within the cgroup subsystem
when the user is running a certain eBPF program.
This has been hit in focal 5.15 backport kernels.
[Fix]
The bug was introduced in 5.15 via commit 74e4b956eb1c and fixed in 5.16 via 46307fd6e27a.
[Test Plan]
This change has been tested by the LP user maxwolffe who opened the
ticket and helped with the investigation.
[Where problems could occur]
Due to the changes in the cgroup code between 5.15 and 5.16, the code
was not a direct cherry-pick and had to be altered a bit. This could
cause issues in the future if there are further backports to the 5.15
kernel that touch the same area of the cgroup code.
Original bug description follows
----------------------------------
Hi friends,
We hit a kernel hard lockup where all CPUs are stuck acquiring an
already-locked spinlock (css_set_lock) within the cgroup subsystem.
Below are the call stacks from a memory dump of a two-core system
taken on Ubuntu 20.04 (5.15 kernel) on AWS, but the same issue occurs
on Azure and GCP too. This is happening in a non-deterministic
fashion (less than 1%), and can occur at any time of the VM execution.
We suspect it's a deadlock triggered by some race condition, but we
don't know for sure.
```
PID: 21079 TASK: ffff91fdcd1dc000 CPU: 0 COMMAND: "sh"
#0 [fffffe7127850cb8] machine_kexec at ffffffffadc92680
#1 [fffffe7127850d18] __crash_kexec at ffffffffadda0b9f
#2 [fffffe7127850de0] panic at ffffffffae8f56be
#3 [fffffe7127850e70] unknown_nmi_error.cold at ffffffffae8eb4c8
#4 [fffffe7127850e90] default_do_nmi at ffffffffae99c639
#5 [fffffe7127850eb8] exc_nmi at ffffffffae99c7db
#6 [fffffe7127850ef0] end_repeat_nmi at ffffffffaea017f3
[exception RIP: native_queued_spin_lock_slowpath+63]
RIP: ffffffffadd40eff RSP: ffffa1f68589fc60 RFLAGS: 00000002 (interrupt disabled!!)
RAX: 0000000000000001 RBX: ffffffffb0ea5804 RCX: ffff91fb597c8980
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffb0ea5804
RBP: ffffa1f68589fc88 R8: 0000000000005259 R9: 00000000597c8980
R10: 0000000000000000 R11: 0000000000000000 R12: ffffa1f68589fdf8
R13: ffff91fdcd1d8000 R14: 0000000000004100 R15: ffff91fdcd1d8000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#7 [ffffa1f68589fc60] native_queued_spin_lock_slowpath at ffffffffadd40eff
#8 [ffffa1f68589fc90] _raw_spin_lock_irq at ffffffffae9af19a
#9 [ffffa1f68589fca0] cgroup_can_fork at ffffffffaddb0de8
#10 [ffffa1f68589fce8] copy_process at ffffffffadcc1938
#11 [ffffa1f68589fcf0] filemap_map_pages at ffffffffadeb68db
#12 [ffffa1f68589fdf0] __x64_sys_vfork at ffffffffadcc2a20
#13 [ffffa1f68589fe70] x64_sys_call at ffffffffadc068a9
#14 [ffffa1f68589fe80] do_syscall_64 at ffffffffae99a9e4
#15 [ffffa1f68589fec0] exit_to_user_mode_prepare at ffffffffadd725ad
#16 [ffffa1f68589ff00] irqentry_exit_to_user_mode at ffffffffae99f43e
#17 [ffffa1f68589ff10] irqentry_exit at ffffffffae99f46d
#18 [ffffa1f68589ff18] clear_bhb_loop at ffffffffaea018c5
#19 [ffffa1f68589ff28] clear_bhb_loop at ffffffffaea018c5
#20 [ffffa1f68589ff38] clear_bhb_loop at ffffffffaea018c5
#21 [ffffa1f68589ff50] entry_SYSCALL_64_after_hwframe at ffffffffaea00124
RIP: 00007fddfa4cebcc RSP: 00007fffaa741990 RFLAGS: 00000202
RAX: ffffffffffffffda RBX: 000055ea66750428 RCX: 00007fddfa4cebcc
RDX: 0000000000000000 RSI: 00007fffaa7419c0 RDI: 000055ea663c8866
RBP: 0000000000000003 R8: 00007fffaa7419c0 R9: 000055ea667505f0
R10: 0000000000000008 R11: 0000000000000202 R12: 00007fffaa7419c0
R13: 00007fffaa741ae0 R14: 0000000000000000 R15: 000055ea663de810
ORIG_RAX: 000000000000003a CS: 0033 SS: 002b
PID: 20304 TASK: ffff91fb05440000 CPU: 1 COMMAND: "Writer:Driver>C"
#0 [fffffe6c293d3e10] crash_nmi_callback at ffffffffadc81ec0
#1 [fffffe6c293d3e48] nmi_handle at ffffffffadc49b03
#2 [fffffe6c293d3e90] default_do_nmi at ffffffffae99c5a5
#3 [fffffe6c293d3eb8] exc_nmi at ffffffffae99c7db
#4 [fffffe6c293d3ef0] end_repeat_nmi at ffffffffaea017f3
[exception RIP: native_queued_spin_lock_slowpath+63]
RIP: ffffffffadd40eff RSP: ffffa1f6853afd00 RFLAGS: 00000002 (interrupt disabled!!)
RAX: 0000000000000001 RBX: ffffffffb0ea5804 RCX: ffff91fa1d0aee00
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffffb0ea5804
RBP: ffffa1f6853afd28 R8: 000000000000525a R9: 000000001d0aee00
R10: 0000000000000000 R11: 0000000000000000 R12: ffffa1f6853afe98
R13: ffff91fd8eeea000 R14: 00000000003d0f00 R15: ffff91fd8eeea000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffffa1f6853afd00] native_queued_spin_lock_slowpath at ffffffffadd40eff
#6 [ffffa1f6853afd30] _raw_spin_lock_irq at ffffffffae9af19a
#7 [ffffa1f6853afd40] cgroup_can_fork at ffffffffaddb0de8
#8 [ffffa1f6853afd88] copy_process at ffffffffadcc1938
#9 [ffffa1f6853afe20] kernel_clone at ffffffffadcc262d
#10 [ffffa1f6853afe90] __do_sys_clone at ffffffffadcc2a9d
#11 [ffffa1f6853aff10] __x64_sys_clone at ffffffffadcc2ae5
#12 [ffffa1f6853aff20] x64_sys_call at ffffffffadc05579
#13 [ffffa1f6853aff30] do_syscall_64 at ffffffffae99a9e4
#14 [ffffa1f6853aff50] entry_SYSCALL_64_after_hwframe at ffffffffaea00124
RIP: 00007f0d8bcac9f6 RSP: 00007f0cfabfcc38 RFLAGS: 00000206
RAX: ffffffffffffffda RBX: 00007f0cfabfcc90 RCX: 00007f0d8bcac9f6
RDX: 00007f0ced3ff910 RSI: 00007f0ced3feef0 RDI: 00000000003d0f00
RBP: ffffffffffffff80 R8: 00007f0ced3ff640 R9: 00007f0ced3ff640
R10: 00007f0ced3ff910 R11: 0000000000000206 R12: 00007f0ced3ff640
R13: 0000000000000016 R14: 00007f0d8bc1b7d0 R15: 00007f0cfabfcdf0
ORIG_RAX: 0000000000000038 CS: 0033 SS: 002b
```
Environment
```
$ uname -a
Linux ip-172-31-16-171 5.15.0-1072-aws #78~20.04.1-Ubuntu SMP Wed Oct 9 15:30:47 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 106
model name : Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
stepping : 6
microcode : 0xd0003e8
cpu MHz : 2900.036
cache size : 55296 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 27
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd ida arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid md_clear flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs mmio_stale_data eibrs_pbrsb gds bhi
bogomips : 5800.07
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
```
We see this very infrequently, but have experienced it on a variety of
instanceTypes - r6i.large , r6i.xlarge, r6i.2large at least.
Thanks!
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2089318/+subscriptions
Комментариев нет:
Отправить комментарий