пятница

[Bug 1765998] Re: FS access deadlock with btrfs quotas enabled

Hm, it's been a while...
I think back then I made some btrfs developers aware of it on IRC, but never got around to sending it to the mailing list.
I'm running my own kernel builds for now (I had to do that to fix some other issues anyway) with the patch from comment #4 applied, which seems to reliably fix this issue.

I am very occasionally getting parent transid verify errors on the quota
tree though, which I believe must be originating from another bug added
at some point after I posted that patch here, because initially I didn't
have any of those for several months. It seems that those can be cleaned
up by temporarily disabling and re-enabling quota, so they are no big
deal to me right now, despite causing some annoying downtime
occasionally.

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/1765998

Title:
FS access deadlock with btrfs quotas enabled

Status in linux package in Ubuntu:
Triaged
Status in linux source package in Bionic:
Triaged

Bug description:
I'm running into an issue on Ubuntu Bionic (but not Xenial) where
shortly after boot, under heavy load from many LXD containers starting
at once, access to the btrfs filesystem that the containers are on
deadlocks.

The issue is quite hard to reproduce on other systems, quite likely
related to the size of the filesystem involved (4 devices with a total
of 8TB, millions of files, ~20 subvolumes with tens of snapshots each)
and the access pattern from many LXD containers at once. It definitely
goes away when disabling btrfs quotas though. Another prerequisite to
trigger this bug may be the container subvolumes sharing extents (from
their parent image or due to deduplication).

I can only reliably reproduce it on a production system that I can only do very limited testing on, however I have been able to gather the following information:
- Many threads are stuck, trying to aquire locks on various tree roots, which are never released by their current holders.
- There always seem to be (at least) two threads executing rmdir syscalls which are creating the circular dependency: One of them is in btrfs_cow_block => ... => btrfs_qgroup_trace_extent_post => ... => find_parent_nodes and wants to acquire a lock that was already aquired by btrfs_search_slot of the other rmdir.
- Reverting this patch seems to prevent it from happening: https://patchwork.kernel.org/patch/9573267/

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1765998/+subscriptions

Комментариев нет:

Отправить комментарий