вторник

[Bug 2064176] Re: LXD fan bridge causes blocked tasks

** Changed in: linux (Ubuntu Noble)
Status: Fix Released => Fix Committed

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2064176

Title:
LXD fan bridge causes blocked tasks

Status in linux package in Ubuntu:
Invalid
Status in linux source package in Jammy:
Fix Committed
Status in linux source package in Noble:
Fix Committed

Bug description:
SRU Justification:

[Impact]

User can trigger a host crash on Jammy/Noble by launching
a container which uses Ubuntu FAN network in LXD.

[Fix]

A first proposed patch fixes RCU locking by releasing rcu_read_lock
on the skb discard codepath.

Second patch just use a proper way (dev_core_stats_tx_dropped_inc() function)
to increase netdev's tx_dropped statistic value.

[Test Plan]

As provided by Max Asnaashari:

# Install LXD from channel latest/stable
snap install lxd --channel latest/stable

# Configure LXD
lxd init --auto

# Create a FAN network
lxc network create lxdfan0 bridge.mode=fan ipv4.nat=true

# Launch a container using the FAN network
lxc launch ubuntu-minimal:22.04 c1 --network lxdfan0

# Try to interact with LXD
lxc ls

[Where problems could occur]

Change is local and only related to Ubuntu FAN code. I would not expect
any problems with this patchset.

Hi, cross posting this from
https://github.com/canonical/lxd/issues/12161

I've got a lxd cluster running across 3 VMs using the fan bridge. I'm
using a dev revision of LXD based on 6413a948. Creating a container
causes the trace in the attached syslog snippet; this causes the
container creation process to hang indefinitely. ssh logins, `lxc
shell cluster1`, and `ps -aux` also hang.

Apr 29 17:15:01 cluster1 kernel: [ 161.250951] ------------[ cut here ]------------
Apr 29 17:15:01 cluster1 kernel: [ 161.250957] Voluntary context switch within RCU read-side critical section!
Apr 29 17:15:01 cluster1 kernel: [ 161.250990] WARNING: CPU: 2 PID: 510 at kernel/rcu/tree_plugin.h:320 rcu_note_context_switch+0x2a7/0x2f0
Apr 29 17:15:01 cluster1 kernel: [ 161.251003] Modules linked in: nft_masq nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vxlan ip6_udp_tunnel udp_tunnel dummy br
idge stp llc zfs(PO) spl(O) nf_tables libcrc32c nfnetlink vhost_vsock vhost vhost_iotlb binfmt_misc nls_iso8859_1 intel_rapl_msr intel_rapl_common kvm_intel kvm irqbypass crct10dif
_pclmul crc32_pclmul virtio_gpu polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 virtio_dma_buf aesni_intel vmw_vsock_virtio_transport 9pnet_virtio xhci_
pci drm_shmem_helper i2c_i801 ahci 9pnet vmw_vsock_virtio_transport_common xhci_pci_renesas drm_kms_helper libahci crypto_simd joydev virtio_input cryptd lpc_ich virtiofs i2c_smbus
 vsock psmouse input_leds mac_hid serio_raw rapl qemu_fw_cfg vmgenid nfsd dm_multipath auth_rpcgss scsi_dh_rdac nfs_acl lockd scsi_dh_emc scsi_dh_alua grace sch_fq_codel drm sunrpc
 efi_pstore virtio_rng ip_tables x_tables autofs4
Apr 29 17:15:01 cluster1 kernel: [ 161.251085] CPU: 2 PID: 510 Comm: nmbd Tainted: P O 6.5.0-28-generic #29~22.04.1-Ubuntu
Apr 29 17:15:01 cluster1 kernel: [ 161.251089] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022
Apr 29 17:15:01 cluster1 kernel: [ 161.251091] RIP: 0010:rcu_note_context_switch+0x2a7/0x2f0
Apr 29 17:15:01 cluster1 kernel: [ 161.251095] Code: 08 f0 83 44 24 fc 00 48 89 de 4c 89 f7 e8 d1 af ff ff e9 1e fe ff ff 48 c7 c7 d0 60 56 88 c6 05 e6 27 40 02 01 e8 79 b2 f2 ff
<0f> 0b e9 bd fd ff ff a9 ff ff ff 7f 0f 84 75 fe ff ff 65 48 8b 3c
Apr 29 17:15:01 cluster1 kernel: [ 161.251098] RSP: 0018:ffffb9cbc11dbbc8 EFLAGS: 00010046
Apr 29 17:15:01 cluster1 kernel: [ 161.251101] RAX: 0000000000000000 RBX: ffff941ef7cb3f80 RCX: 0000000000000000
Apr 29 17:15:01 cluster1 kernel: [ 161.251103] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Apr 29 17:15:01 cluster1 kernel: [ 161.251104] RBP: ffffb9cbc11dbbe8 R08: 0000000000000000 R09: 0000000000000000
Apr 29 17:15:01 cluster1 kernel: [ 161.251106] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Apr 29 17:15:01 cluster1 kernel: [ 161.251111] R13: ffff941d893e9980 R14: 0000000000000000 R15: ffff941d80ad7a80
Apr 29 17:15:01 cluster1 kernel: [ 161.251113] FS: 00007c7dcbdb8a00(0000) GS:ffff941ef7c80000(0000) knlGS:0000000000000000
Apr 29 17:15:01 cluster1 kernel: [ 161.251115] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 29 17:15:01 cluster1 kernel: [ 161.251117] CR2: 00005a30877ae488 CR3: 0000000105888003 CR4: 0000000000170ee0
Apr 29 17:15:01 cluster1 kernel: [ 161.251122] Call Trace:
Apr 29 17:15:01 cluster1 kernel: [ 161.251128] <TASK>
Apr 29 17:15:01 cluster1 kernel: [ 161.251133] ? show_regs+0x6d/0x80
Apr 29 17:15:01 cluster1 kernel: [ 161.251145] ? __warn+0x89/0x160
Apr 29 17:15:01 cluster1 kernel: [ 161.251152] ? rcu_note_context_switch+0x2a7/0x2f0
Apr 29 17:15:01 cluster1 kernel: [ 161.251155] ? report_bug+0x17e/0x1b0
Apr 29 17:15:01 cluster1 kernel: [ 161.251172] ? handle_bug+0x46/0x90
Apr 29 17:15:01 cluster1 kernel: [ 161.251187] ? exc_invalid_op+0x18/0x80
Apr 29 17:15:01 cluster1 kernel: [ 161.251190] ? asm_exc_invalid_op+0x1b/0x20
Apr 29 17:15:01 cluster1 kernel: [ 161.251202] ? rcu_note_context_switch+0x2a7/0x2f0
Apr 29 17:15:01 cluster1 kernel: [ 161.251205] ? rcu_note_context_switch+0x2a7/0x2f0
Apr 29 17:15:01 cluster1 kernel: [ 161.251208] __schedule+0xcc/0x750
Apr 29 17:15:01 cluster1 kernel: [ 161.251218] schedule+0x63/0x110
Apr 29 17:15:01 cluster1 kernel: [ 161.251222] schedule_hrtimeout_range_clock+0xbc/0x130
Apr 29 17:15:01 cluster1 kernel: [ 161.251238] ? __pfx_hrtimer_wakeup+0x10/0x10
Apr 29 17:15:01 cluster1 kernel: [ 161.251245] schedule_hrtimeout_range+0x13/0x30
Apr 29 17:15:01 cluster1 kernel: [ 161.251248] ep_poll+0x33f/0x390
Apr 29 17:15:01 cluster1 kernel: [ 161.251254] ? __pfx_ep_autoremove_wake_function+0x10/0x10
Apr 29 17:15:01 cluster1 kernel: [ 161.251257] do_epoll_wait+0xdb/0x100
Apr 29 17:15:01 cluster1 kernel: [ 161.251259] __x64_sys_epoll_wait+0x6f/0x110
Apr 29 17:15:01 cluster1 kernel: [ 161.251265] do_syscall_64+0x5b/0x90
Apr 29 17:15:01 cluster1 kernel: [ 161.251270] ? do_epoll_ctl+0x3cb/0x860
Apr 29 17:15:01 cluster1 kernel: [ 161.251273] ? __task_pid_nr_ns+0x6c/0xc0
Apr 29 17:15:01 cluster1 kernel: [ 161.251279] ? exit_to_user_mode_prepare+0x30/0xb0
Apr 29 17:15:01 cluster1 kernel: [ 161.251284] ? syscall_exit_to_user_mode+0x37/0x60
Apr 29 17:15:01 cluster1 kernel: [ 161.251286] ? do_syscall_64+0x67/0x90
Apr 29 17:15:01 cluster1 kernel: [ 161.251288] ? syscall_exit_to_user_mode+0x37/0x60
Apr 29 17:15:01 cluster1 kernel: [ 161.251300] ? do_syscall_64+0x67/0x90
Apr 29 17:15:01 cluster1 kernel: [ 161.251304] ? syscall_exit_to_user_mode+0x37/0x60
Apr 29 17:15:01 cluster1 kernel: [ 161.251306] ? do_syscall_64+0x67/0x90
Apr 29 17:15:01 cluster1 kernel: [ 161.251309] ? do_syscall_64+0x67/0x90
Apr 29 17:15:01 cluster1 kernel: [ 161.251313] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Apr 29 17:15:01 cluster1 kernel: [ 161.251316] RIP: 0033:0x7c7dcf325dea
Apr 29 17:15:01 cluster1 kernel: [ 161.251333] Code: 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 e8 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 5e c3 0f 1f 44 00 00 48 83 ec 28 89 54 24 18
Apr 29 17:15:01 cluster1 kernel: [ 161.251335] RSP: 002b:00007ffdde5e0278 EFLAGS: 00000246 ORIG_RAX: 00000000000000e8
Apr 29 17:15:01 cluster1 kernel: [ 161.251338] RAX: ffffffffffffffda RBX: 00005a30877a2ea0 RCX: 00007c7dcf325dea
Apr 29 17:15:01 cluster1 kernel: [ 161.251340] RDX: 0000000000000001 RSI: 00007ffdde5e02ac RDI: 0000000000000005
Apr 29 17:15:01 cluster1 kernel: [ 161.251341] RBP: 00005a3087794590 R08: 00000000000f423f R09: 00007ffdde5e0357
Apr 29 17:15:01 cluster1 kernel: [ 161.251343] R10: 00000000000003e8 R11: 0000000000000246 R12: 00005a30877a2f30
Apr 29 17:15:01 cluster1 kernel: [ 161.251345] R13: 00000000000003e8 R14: 0000000000000090 R15: 000000000000000a
Apr 29 17:15:01 cluster1 kernel: [ 161.251348] </TASK>
Apr 29 17:15:01 cluster1 kernel: [ 161.251349] ---[ end trace 0000000000000000 ]---

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2064176/+subscriptions

Комментариев нет:

Отправить комментарий