** Changed in: linux (Ubuntu Bionic)
Status: In Progress => Fix Committed
--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/1933172
Title:
btrfs: Attempting to balance a nearly full filesystem with relocated
root nodes fails
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Bionic:
Fix Committed
Bug description:
BugLink: https://bugs.launchpad.net/bugs/1933172
[Impact]
If you attempt to balance a btrfs filesystem that is nearly full, and
this filesystem has had a lot of small, medium and large files created
and deleted, such that the b-tree needs to be rotated, when the
balance fails due to not having enough free space, the kernel oops,
and the btrfs filesystem hangs.
It doesn't appear to cause any filesystem corruption, and is
reproducible every time on affected filesystems.
The following oops is generated:
general protection fault: 0000 [#1] SMP PTI
CPU: 0 PID: 18440 Comm: btrfs Not tainted 4.15.0-136-generic #140-Ubuntu
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
RIP: 0010:btrfs_set_root_node+0x5/0x60 [btrfs]
RSP: 0018:ffffb3db890a79e0 EFLAGS: 00010282
RAX: ffff8d7f73861ad0 RBX: ffff8d7f78455708 RCX: ffff8d7f6d9a5390
RDX: ffff8d7f73861ad0 RSI: a023775cfc0348a3 RDI: ffff8d7f6d9a5028
RBP: ffffb3db890a7a78 R08: 0000000000000044 R09: 0000000000000228
R10: ffff8d7f6d9a5000 R11: 0000000000000010 R12: ffffb3db890a7a08
R13: ffff8d7f6d9a5000 R14: ffff8d7f6d9a5028 R15: ffff8d7f74560000
FS: 00007f48d84498c0(0000) GS:ffff8d7f7fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe4fbc1f000 CR3: 00000001799fc001 CR4: 0000000000160ef0
Call Trace:
? commit_fs_roots+0x130/0x1b0 [btrfs]
? btrfs_run_delayed_refs.part.70+0x80/0x190 [btrfs]
btrfs_commit_transaction+0x42c/0x910 [btrfs]
? start_transaction+0x191/0x430 [btrfs]
relocate_block_group+0x1e7/0x640 [btrfs]
btrfs_relocate_block_group+0x18f/0x280 [btrfs]
btrfs_relocate_chunk+0x38/0xd0 [btrfs]
__btrfs_balance+0x972/0xcd0 [btrfs]
? insert_balance_item.isra.35+0x391/0x3c0 [btrfs]
btrfs_balance+0x32c/0x5a0 [btrfs]
btrfs_ioctl_balance+0x320/0x390 [btrfs]
btrfs_ioctl+0x5a6/0x2490 [btrfs]
? lru_cache_add_active_or_unevictable+0x36/0xb0
? __handle_mm_fault+0x9fd/0x1290
do_vfs_ioctl+0xa8/0x630
? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
? do_vfs_ioctl+0xa8/0x630
? __do_page_fault+0x2a1/0x4b0
SyS_ioctl+0x79/0x90
do_syscall_64+0x73/0x130
entry_SYSCALL_64_after_hwframe+0x41/0xa6
RIP: 0033:0x7f48d7228317
RSP: 002b:00007ffd76d03e38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f48d7228317
RDX: 00007ffd76d03ec8 RSI: 00000000c4009420 RDI: 0000000000000003
RBP: 00007ffd76d03ec8 R08: 0000000000000078 R09: 0000000000000000
R10: 0000562086e7f010 R11: 0000000000000246 R12: 0000000000000003
R13: 00007ffd76d057cb R14: 0000000000000002 R15: 0000000000000000
Code: 4d 85 e4 0f 84 56 fe ff ff 4d 89 04 24 41 c6 44 24 08 84 4d 89 4c 24 09 e9 42 fe ff ff 0f 0b e8 02 24 5e e0 66 90 0f 1f 44 00 00 <48> 8b 06 48 8b 0d c9 d4 99 e1 48 8b 15 d2 d4 99 e1 55 48 89 87
RIP: btrfs_set_root_node+0x5/0x60 [btrfs] RSP: ffffb3db890a79e0
I don't see this behaviour on any upstream kernel, and the first
kernel to show this behaviour is 4.15.0-109-generic. The current
4.15.0-145-generic is still affected.
I believe that this is a regression introduced in the fixing of
CVE-2019-19036.
[Testcase]
I haven't reliably been able to create a script which places a btrfs
filesystem into the state necessary to reproduce this issue, so I have
just provided my qcow2 image with my btrfs filesystem which reproduces
the issue 100% of the time.
Download the image from here (warning size is 8.0gb):
https://people.canonical.com/~mruffell/sf311164/ubuntu18.04-server-2.qcow2
Make a Ubuntu 18.04 VM. Attach the ubuntu18.04-server-2.qcow2 image to
a new virtio disk. Note, ubuntu18.04-server-2.qcow2 does not have an
operating system, it is just a data only volume.
Mount the volume:
$ sudo mount /dev/vdb /mnt
Attempt to balance:
$ sudo btrfs filesystem balance start --full-balance /mnt
Segmentation fault (core dumped)
Check dmesg for kernel oops:
https://paste.ubuntu.com/p/wjJNqKBCfh/
If you install the test kernel from the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf311164-test
You should see this instead:
$ sudo btrfs filesystem balance start --full-balance /mnt
ERROR: error during balancing '/mnt': No space left on device
There may be more info in syslog - try dmesg | tail
Checking dmesg shows no kernel oops, and just info about the volume
being too full to balance:
https://paste.ubuntu.com/p/4J8Gq2dtz4/
[Fix]
I found the problem to be introduced in 4.15.0-109-generic, and
4.15.0-108-generic and earlier worked fine, which means we introduced
a regression somewhere.
I bisected the problem down to the following commit:
ubuntu-bionic 6f536ce7a978531d38a21d092394616cefb54436
Author: Qu Wenruo <wqu@suse.com>
Date: Tue May 19 10:13:20 2020 +0800
Subject btrfs: reloc: fix reloc root leak and NULL pointer dereference
Link: https://paste.ubuntu.com/p/4qfWCM8ykh/
Unfortunately, I believe this is a bad backport. If you examine the
original upstream commit:
commit 51415b6c1b117e223bc083e30af675cb5c5498f3
Author: Qu Wenruo <wqu@suse.com>
Date: Tue May 19 10:13:20 2020 +0800
Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference
Link: https://github.com/torvalds/linux/commit/51415b6c1b117e223bc083e30af675cb5c5498f3
You will see the 4.15 backport has calls to free_extent_buffer() and
btrfs_put_fs_root(). Now, btrfs_put_fs_root() was renamed to
btrfs_put_root() in the newer patches, and contains logic to free
relocated roots, so I think we might not need the calls to
free_extent_buffer() to free the extents first, since it might be
handled later.
The core issue is that we hit a general protection fault when
attempting to access a root node, which means we have freed a root
node we shouldn't have.
If we look at the backport in 5.4.y, aka, the one in Focal:
ubuntu-focal ecaee3a76ea998bc2fe20f056eb27f9bc837d116
Author: Qu Wenruo <wqu@suse.com>
Date: Tue May 19 10:13:20 2020 +0800
Subject: btrfs: reloc: fix reloc root leak and NULL pointer dereference
Link: https://paste.ubuntu.com/p/PZrMqVt8Yk/
It seems upstream -stable omitted the calls to btrfs_put_root()
entirely, and we don't need the calls to free_extent_buffer() because
of it.
If I revert 6f536ce7a978531d38a21d092394616cefb54436 from ubuntu-
bionic, and cherry-pick ecaee3a76ea998bc2fe20f056eb27f9bc837d116 from
ubuntu-focal, and build, the problem no longer reproduces.
[Where problems could occur]
If a regression were to occur, it would affect users of btrfs
filesystems, and would likely show during a routine balance operation.
Since the issue is triggered during the cancellation of a balance
operation, problems might occur for users with nearly full filesystems
or filesystems that have existing corruption.
We are replacing a patch that was backported during the fixing of
CVE-2019-19036, and replacing it with a backport provided by upstream
developers, which cherry picks from 5.4.y to Bionic. The patch in
5.4.y is well tested by the community and is currently in the Focal
kernel.
With all modifications to btrfs, there is a risk of data corruption
and filesystem corruption for all btrfs users, since balances happen
automatically and on a regular basis. If a regression does happen,
users should remount their filesystems with the "nobalance" flag,
backup their data, and attempt a repair if necessary.
[Other info]
A community member has hit this issue before I did, and has reported
it upstream to linux-btrfs here, although no one knew what was
happening:
https://www.spinics.net/lists/linux-btrfs/msg103367.html
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1933172/+subscriptions
Комментариев нет:
Отправить комментарий