четверг

[Bug 2086198] [NEW] If a task in a CGroup with oom_score_adj set to -1000 goes over the memory limit, it is not killed but CPU usage of the machine goes high.

Public bug reported:

The oom-killer goes into a loop if a task in a memory CGroup with oom_score_adj set to -1000 goes over the memory limit.
The task (python3 in this case) is not killed and the CPU usage of the machine goes high but the machine still remained responsive.


Setup details
-------------

root@vm1:~# cat /etc/os-release
PRETTY_NAME="Ubuntu 24.10"
NAME="Ubuntu"
VERSION_ID="24.10"
VERSION="24.10 (Oracular Oriole)"
VERSION_CODENAME=oracular
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=oracular
LOGO=ubuntu-logo


root@vm1:~# uname -a
Linux vm1 6.11.0-8-generic #8-Ubuntu SMP PREEMPT_DYNAMIC Mon Sep 16 13:41:20 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
root@vm1:~#


How to reproduce ?
----------------

This was done on a VM with 1 GB Ram and 2 CPU cores with no swap
root@vm1:~# free -h
total used free shared buff/cache available
Mem: 960Mi 287Mi 203Mi 1.2Mi 608Mi 673Mi
Swap: 0B 0B 0B
root@vm1:~# nproc
2
root@vm1:~#


On terminal 1,

root@vm1:~# python3
Python 3.12.7 (main, Oct 3 2024, 15:15:22) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>


On terminal 2,

root@vm1:~# mkdir /sys/fs/cgroup/testcg
root@vm1:~# ps aux|grep python # get pid of python 2536 in this case
root@vm1:~# echo -1000 > /proc/2536/oom_score_adj
root@vm1:~# 2536 > /sys/fs/cgroup/testcg/cgroup.procs

On terminal 1,
>>> c2 = {i: i**4 for i in range(6000100)}


Logs
----

Dmesg continuously gets the following message along with other traces.
Collected dmesg attached.


[Thu Oct 31 13:11:56 2024] python3 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=-1000
[Thu Oct 31 13:11:56 2024] CPU: 1 UID: 0 PID: 2653 Comm: python3 Not tainted 6.11.0-8-generic #8-Ubuntu
[Thu Oct 31 13:11:56 2024] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[Thu Oct 31 13:11:56 2024] Call Trace:
[Thu Oct 31 13:11:56 2024] <TASK>
[Thu Oct 31 13:11:56 2024] show_stack+0x49/0x60
[Thu Oct 31 13:11:56 2024] dump_stack_lvl+0x5f/0x90
[Thu Oct 31 13:11:56 2024] dump_stack+0x10/0x18
[Thu Oct 31 13:11:56 2024] dump_header+0x46/0x1a6
[Thu Oct 31 13:11:56 2024] out_of_memory.cold+0x1d/0x8d
[Thu Oct 31 13:11:56 2024] mem_cgroup_out_of_memory+0x13b/0x170
[Thu Oct 31 13:11:56 2024] try_charge_memcg+0x40f/0x5c0
[Thu Oct 31 13:11:56 2024] __mem_cgroup_charge+0x45/0xd0
[Thu Oct 31 13:11:56 2024] alloc_anon_folio+0x21b/0x450
[Thu Oct 31 13:11:56 2024] do_anonymous_page+0x13b/0x400
[Thu Oct 31 13:11:56 2024] handle_pte_fault+0x1ad/0x1c0
[Thu Oct 31 13:11:56 2024] __handle_mm_fault+0x3d5/0x7a0
[Thu Oct 31 13:11:56 2024] handle_mm_fault+0xef/0x2d0
[Thu Oct 31 13:11:56 2024] do_user_addr_fault+0x2ff/0x7e0
[Thu Oct 31 13:11:56 2024] exc_page_fault+0x85/0x1c0
[Thu Oct 31 13:11:56 2024] asm_exc_page_fault+0x27/0x30
[Thu Oct 31 13:11:56 2024] RIP: 0033:0x72a4d5d961d3
[Thu Oct 31 13:11:56 2024] Code: c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 48 3b 15 69 20 08 00 73 77 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 0f 1f 40 00 c4 e2 79 78 c0 83 fa 10 7d
[Thu Oct 31 13:11:56 2024] RSP: 002b:00007ffd70a12718 EFLAGS: 00010287
[Thu Oct 31 13:11:56 2024] RAX: 0000000000000000 RBX: 0000000000b509f0 RCX: 00000000003d2020
[Thu Oct 31 13:11:56 2024] RDX: 000072a4d4900030 RSI: 0000000000000000 RDI: 000072a4d492e000
[Thu Oct 31 13:11:56 2024] RBP: 00007ffd70a12780 R08: 00000000ffffffff R09: 0000000000000000
[Thu Oct 31 13:11:56 2024] R10: 0000000000000022 R11: 000072a4d4800030 R12: 00000000003ffff0
[Thu Oct 31 13:11:56 2024] R13: 000072a4d5000010 R14: 000072a4d5bf59c0 R15: 000072a4d4800010
[Thu Oct 31 13:11:56 2024] </TASK>
[Thu Oct 31 13:11:56 2024] memory: usage 10240kB, limit 10240kB, failcnt 18041088
[Thu Oct 31 13:11:56 2024] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[Thu Oct 31 13:11:56 2024] Memory cgroup stats for /testcg:
[Thu Oct 31 13:11:56 2024] anon 10457088
[Thu Oct 31 13:11:56 2024] file 0
[Thu Oct 31 13:11:56 2024] kernel 28672
[Thu Oct 31 13:11:56 2024] kernel_stack 0
[Thu Oct 31 13:11:56 2024] pagetables 24576
[Thu Oct 31 13:11:56 2024] sec_pagetables 0
[Thu Oct 31 13:11:56 2024] percpu 0
[Thu Oct 31 13:11:56 2024] sock 0
[Thu Oct 31 13:11:56 2024] vmalloc 0
[Thu Oct 31 13:11:56 2024] shmem 0
[Thu Oct 31 13:11:56 2024] zswap 0
[Thu Oct 31 13:11:56 2024] zswapped 0
[Thu Oct 31 13:11:56 2024] file_mapped 0
[Thu Oct 31 13:11:56 2024] file_dirty 0
[Thu Oct 31 13:11:56 2024] file_writeback 0
[Thu Oct 31 13:11:56 2024] swapcached 0
[Thu Oct 31 13:11:56 2024] anon_thp 0
[Thu Oct 31 13:11:56 2024] file_thp 0
[Thu Oct 31 13:11:56 2024] shmem_thp 0
[Thu Oct 31 13:11:56 2024] inactive_anon 0
[Thu Oct 31 13:11:56 2024] active_anon 10457088
[Thu Oct 31 13:11:56 2024] inactive_file 0
[Thu Oct 31 13:11:56 2024] active_file 0
[Thu Oct 31 13:11:56 2024] unevictable 0
[Thu Oct 31 13:11:56 2024] slab_reclaimable 0
[Thu Oct 31 13:11:56 2024] slab_unreclaimable 2096
[Thu Oct 31 13:11:56 2024] slab 2096
[Thu Oct 31 13:11:56 2024] workingset_refault_anon 0
[Thu Oct 31 13:11:56 2024] workingset_refault_file 0
[Thu Oct 31 13:11:56 2024] workingset_activate_anon 0
[Thu Oct 31 13:11:56 2024] workingset_activate_file 0
[Thu Oct 31 13:11:56 2024] workingset_restore_anon 0
[Thu Oct 31 13:11:56 2024] workingset_restore_file 0
[Thu Oct 31 13:11:56 2024] workingset_nodereclaim 0
[Thu Oct 31 13:11:56 2024] pgscan 0
[Thu Oct 31 13:11:56 2024] pgsteal 0
[Thu Oct 31 13:11:56 2024] pgscan_kswapd 0
[Thu Oct 31 13:11:56 2024] pgscan_direct 0
[Thu Oct 31 13:11:56 2024] pgscan_khugepaged 0
[Thu Oct 31 13:11:56 2024] pgsteal_kswapd 0
[Thu Oct 31 13:11:56 2024] pgsteal_direct 0
[Thu Oct 31 13:11:56 2024] pgsteal_khugepaged 0
[Thu Oct 31 13:11:56 2024] pgfault 961958
[Thu Oct 31 13:11:56 2024] pgmajfault 36510
[Thu Oct 31 13:11:56 2024] pgrefill 0
[Thu Oct 31 13:11:56 2024] pgactivate 0
[Thu Oct 31 13:11:56 2024] pgdeactivate 0
[Thu Oct 31 13:11:56 2024] pglazyfree 0
[Thu Oct 31 13:11:56 2024] pglazyfreed 0
[Thu Oct 31 13:11:56 2024] zswpin 0
[Thu Oct 31 13:11:56 2024] zswpout 0
[Thu Oct 31 13:11:56 2024] zswpwb 0
[Thu Oct 31 13:11:56 2024] thp_fault_alloc 0
[Thu Oct 31 13:11:56 2024] thp_collapse_alloc 0
[Thu Oct 31 13:11:56 2024] thp_swpout 0
[Thu Oct 31 13:11:56 2024] thp_swpout_fallback 0
[Thu Oct 31 13:11:56 2024] Tasks state (memory values in pages):
[Thu Oct 31 13:11:56 2024] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[Thu Oct 31 13:11:56 2024] [ 2653] 0 2653 7961 4564 3984 580 0 106496 0 -1000 python3
[Thu Oct 31 13:11:56 2024] Out of memory and no killable processes...

** Affects: linux (Ubuntu)
Importance: Undecided
Status: New

** Attachment added: "dmesg.log"
https://bugs.launchpad.net/bugs/2086198/+attachment/5833418/+files/dmesg.log

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2086198

Title:
If a task in a CGroup with oom_score_adj set to -1000 goes over the
memory limit, it is not killed but CPU usage of the machine goes high.

Status in linux package in Ubuntu:
New

Bug description:
The oom-killer goes into a loop if a task in a memory CGroup with oom_score_adj set to -1000 goes over the memory limit.
The task (python3 in this case) is not killed and the CPU usage of the machine goes high but the machine still remained responsive.


Setup details
-------------

root@vm1:~# cat /etc/os-release
PRETTY_NAME="Ubuntu 24.10"
NAME="Ubuntu"
VERSION_ID="24.10"
VERSION="24.10 (Oracular Oriole)"
VERSION_CODENAME=oracular
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=oracular
LOGO=ubuntu-logo


root@vm1:~# uname -a
Linux vm1 6.11.0-8-generic #8-Ubuntu SMP PREEMPT_DYNAMIC Mon Sep 16 13:41:20 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
root@vm1:~#


How to reproduce ?
----------------

This was done on a VM with 1 GB Ram and 2 CPU cores with no swap
root@vm1:~# free -h
total used free shared buff/cache available
Mem: 960Mi 287Mi 203Mi 1.2Mi 608Mi 673Mi
Swap: 0B 0B 0B
root@vm1:~# nproc
2
root@vm1:~#


On terminal 1,

root@vm1:~# python3
Python 3.12.7 (main, Oct 3 2024, 15:15:22) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>


On terminal 2,

root@vm1:~# mkdir /sys/fs/cgroup/testcg
root@vm1:~# ps aux|grep python # get pid of python 2536 in this case
root@vm1:~# echo -1000 > /proc/2536/oom_score_adj
root@vm1:~# 2536 > /sys/fs/cgroup/testcg/cgroup.procs

On terminal 1,
>>> c2 = {i: i**4 for i in range(6000100)}


Logs
----

Dmesg continuously gets the following message along with other traces.
Collected dmesg attached.


[Thu Oct 31 13:11:56 2024] python3 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=-1000
[Thu Oct 31 13:11:56 2024] CPU: 1 UID: 0 PID: 2653 Comm: python3 Not tainted 6.11.0-8-generic #8-Ubuntu
[Thu Oct 31 13:11:56 2024] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[Thu Oct 31 13:11:56 2024] Call Trace:
[Thu Oct 31 13:11:56 2024] <TASK>
[Thu Oct 31 13:11:56 2024] show_stack+0x49/0x60
[Thu Oct 31 13:11:56 2024] dump_stack_lvl+0x5f/0x90
[Thu Oct 31 13:11:56 2024] dump_stack+0x10/0x18
[Thu Oct 31 13:11:56 2024] dump_header+0x46/0x1a6
[Thu Oct 31 13:11:56 2024] out_of_memory.cold+0x1d/0x8d
[Thu Oct 31 13:11:56 2024] mem_cgroup_out_of_memory+0x13b/0x170
[Thu Oct 31 13:11:56 2024] try_charge_memcg+0x40f/0x5c0
[Thu Oct 31 13:11:56 2024] __mem_cgroup_charge+0x45/0xd0
[Thu Oct 31 13:11:56 2024] alloc_anon_folio+0x21b/0x450
[Thu Oct 31 13:11:56 2024] do_anonymous_page+0x13b/0x400
[Thu Oct 31 13:11:56 2024] handle_pte_fault+0x1ad/0x1c0
[Thu Oct 31 13:11:56 2024] __handle_mm_fault+0x3d5/0x7a0
[Thu Oct 31 13:11:56 2024] handle_mm_fault+0xef/0x2d0
[Thu Oct 31 13:11:56 2024] do_user_addr_fault+0x2ff/0x7e0
[Thu Oct 31 13:11:56 2024] exc_page_fault+0x85/0x1c0
[Thu Oct 31 13:11:56 2024] asm_exc_page_fault+0x27/0x30
[Thu Oct 31 13:11:56 2024] RIP: 0033:0x72a4d5d961d3
[Thu Oct 31 13:11:56 2024] Code: c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 0f 1f 84 00 00 00 00 00 48 3b 15 69 20 08 00 73 77 40 0f b6 c6 48 89 d1 48 89 fa <f3> aa 48 89 d0 c5 f8 77 c3 0f 1f 40 00 c4 e2 79 78 c0 83 fa 10 7d
[Thu Oct 31 13:11:56 2024] RSP: 002b:00007ffd70a12718 EFLAGS: 00010287
[Thu Oct 31 13:11:56 2024] RAX: 0000000000000000 RBX: 0000000000b509f0 RCX: 00000000003d2020
[Thu Oct 31 13:11:56 2024] RDX: 000072a4d4900030 RSI: 0000000000000000 RDI: 000072a4d492e000
[Thu Oct 31 13:11:56 2024] RBP: 00007ffd70a12780 R08: 00000000ffffffff R09: 0000000000000000
[Thu Oct 31 13:11:56 2024] R10: 0000000000000022 R11: 000072a4d4800030 R12: 00000000003ffff0
[Thu Oct 31 13:11:56 2024] R13: 000072a4d5000010 R14: 000072a4d5bf59c0 R15: 000072a4d4800010
[Thu Oct 31 13:11:56 2024] </TASK>
[Thu Oct 31 13:11:56 2024] memory: usage 10240kB, limit 10240kB, failcnt 18041088
[Thu Oct 31 13:11:56 2024] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[Thu Oct 31 13:11:56 2024] Memory cgroup stats for /testcg:
[Thu Oct 31 13:11:56 2024] anon 10457088
[Thu Oct 31 13:11:56 2024] file 0
[Thu Oct 31 13:11:56 2024] kernel 28672
[Thu Oct 31 13:11:56 2024] kernel_stack 0
[Thu Oct 31 13:11:56 2024] pagetables 24576
[Thu Oct 31 13:11:56 2024] sec_pagetables 0
[Thu Oct 31 13:11:56 2024] percpu 0
[Thu Oct 31 13:11:56 2024] sock 0
[Thu Oct 31 13:11:56 2024] vmalloc 0
[Thu Oct 31 13:11:56 2024] shmem 0
[Thu Oct 31 13:11:56 2024] zswap 0
[Thu Oct 31 13:11:56 2024] zswapped 0
[Thu Oct 31 13:11:56 2024] file_mapped 0
[Thu Oct 31 13:11:56 2024] file_dirty 0
[Thu Oct 31 13:11:56 2024] file_writeback 0
[Thu Oct 31 13:11:56 2024] swapcached 0
[Thu Oct 31 13:11:56 2024] anon_thp 0
[Thu Oct 31 13:11:56 2024] file_thp 0
[Thu Oct 31 13:11:56 2024] shmem_thp 0
[Thu Oct 31 13:11:56 2024] inactive_anon 0
[Thu Oct 31 13:11:56 2024] active_anon 10457088
[Thu Oct 31 13:11:56 2024] inactive_file 0
[Thu Oct 31 13:11:56 2024] active_file 0
[Thu Oct 31 13:11:56 2024] unevictable 0
[Thu Oct 31 13:11:56 2024] slab_reclaimable 0
[Thu Oct 31 13:11:56 2024] slab_unreclaimable 2096
[Thu Oct 31 13:11:56 2024] slab 2096
[Thu Oct 31 13:11:56 2024] workingset_refault_anon 0
[Thu Oct 31 13:11:56 2024] workingset_refault_file 0
[Thu Oct 31 13:11:56 2024] workingset_activate_anon 0
[Thu Oct 31 13:11:56 2024] workingset_activate_file 0
[Thu Oct 31 13:11:56 2024] workingset_restore_anon 0
[Thu Oct 31 13:11:56 2024] workingset_restore_file 0
[Thu Oct 31 13:11:56 2024] workingset_nodereclaim 0
[Thu Oct 31 13:11:56 2024] pgscan 0
[Thu Oct 31 13:11:56 2024] pgsteal 0
[Thu Oct 31 13:11:56 2024] pgscan_kswapd 0
[Thu Oct 31 13:11:56 2024] pgscan_direct 0
[Thu Oct 31 13:11:56 2024] pgscan_khugepaged 0
[Thu Oct 31 13:11:56 2024] pgsteal_kswapd 0
[Thu Oct 31 13:11:56 2024] pgsteal_direct 0
[Thu Oct 31 13:11:56 2024] pgsteal_khugepaged 0
[Thu Oct 31 13:11:56 2024] pgfault 961958
[Thu Oct 31 13:11:56 2024] pgmajfault 36510
[Thu Oct 31 13:11:56 2024] pgrefill 0
[Thu Oct 31 13:11:56 2024] pgactivate 0
[Thu Oct 31 13:11:56 2024] pgdeactivate 0
[Thu Oct 31 13:11:56 2024] pglazyfree 0
[Thu Oct 31 13:11:56 2024] pglazyfreed 0
[Thu Oct 31 13:11:56 2024] zswpin 0
[Thu Oct 31 13:11:56 2024] zswpout 0
[Thu Oct 31 13:11:56 2024] zswpwb 0
[Thu Oct 31 13:11:56 2024] thp_fault_alloc 0
[Thu Oct 31 13:11:56 2024] thp_collapse_alloc 0
[Thu Oct 31 13:11:56 2024] thp_swpout 0
[Thu Oct 31 13:11:56 2024] thp_swpout_fallback 0
[Thu Oct 31 13:11:56 2024] Tasks state (memory values in pages):
[Thu Oct 31 13:11:56 2024] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[Thu Oct 31 13:11:56 2024] [ 2653] 0 2653 7961 4564 3984 580 0 106496 0 -1000 python3
[Thu Oct 31 13:11:56 2024] Out of memory and no killable processes...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2086198/+subscriptions

Комментариев нет:

Отправить комментарий