вторник

[Bug 1884766] Re: use-after-free in af_alg_accept() due to bh_lock_sock()

** Description changed:

- This bug is for tracking and submitting this commit [1] once it lands in
- v5.8-rcN.
+ [Impact]

- [1]
- https://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
+ * Users of the Linux kernel's crypto userspace API
+ reported BUG() / kernel NULL pointer dereference
+ errors after kernel upgrades.
+
+ * The stack trace signature is an accept() syscall
+ going through af_alg_accept() and hitting errors
+ usually in one of:
+ - apparmor_sk_clone_security()
+ - apparmor_sock_graft()
+ - release_sock()
+
+ [Fix]
+
+ * This is a regression introduced by upstream commit
+ 37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
+ in sk_destruct") which made its way through stable.
+
+ * The offending patch allows the critical regions
+ of af_alg_accept() and af_alg_release_parent() to
+ run concurrently; now with the "right" events on 2
+ CPUs it might drop the non-atomic reference counter
+ of the alg_sock then the sock, thus release a sock
+ that is still in use.
+
+ * The fix is upstream commit 34c86f4c4a7b ("crypto:
+ af_alg - fix use-after-free in af_alg_accept() due
+ to bh_lock_sock()") [1]. It changes alg_sock's ref
+ counter to atomic, which addresses the root cause.
+
+ [Test Case]
+
+ * There is a synthetic test case available, which
+ uses a kprobes kernel module to synchronize the
+ concurrent CPUs on the instructions responsible
+ for the problem; and a userspace part to run it.
+
+ * The organic reproducer is the Varnish Cache Plus
+ software with the Crypto vmod (which uses kernel
+ crypto userspace API) under long, very high load.
+
+ * The patch has been verified on both reproducers
+ with the 4.15 and 5.7 kernels.
+
+ * More tests performed with 'stress-ng --af-alg'
+ with 11 CPUs/hogs on Bionic/Disco/Eoan/Focal
+ (all on same version of stress-ng, V0.11.14)
+ No regressions observed from original kernel.
+ (the af-alg stressor can exercise almost all
+ kernel crypto modules shipped with the kernel;
+ so it checks more paths/crypto alg interfaces.)
+
+ [Regression Potential]
+
+ * The fix patch does a fundamental change in how
+ alg_sock reference counters work, plus another
+ change to the 'nokey' counting. This of course
+ *has* a risk of regression.
+
+ * Regressions theoretically could manifest as use
+ after free errors (in case of undercounting) in
+ the af_alg functions or silent memory leaks (in
+ case of overcounting), but also other behaviors
+ since reference counting is key to many things.
+
+ * FWIW, this patch has been written by the crypto
+ subsystem maintainer, who certainly knows a lot
+ of the normal and corner cases, thus giving the
+ patch more credit.
+
+ * Testing with the organic reproducer ran as long
+ as 5 days, without issues, so it does look good.
+
+ [Other Info]
+
+ * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
+
+ [Stack Trace Examples]
+
+ Examples:
+
+ BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
+ ...
+ RIP: 0010:apparmor_sk_clone_security+0x26/0x70
+ ...
+ Call Trace:
+ security_sk_clone+0x33/0x50
+ af_alg_accept+0x81/0x1c0 [af_alg]
+ alg_accept+0x15/0x20 [af_alg]
+ SYSC_accept4+0xff/0x210
+ SyS_accept+0x10/0x20
+ do_syscall_64+0x73/0x130
+ entry_SYSCALL_64_after_hwframe+0x3d/0xa2
+
+ general protection fault: 0000 [#1] SMP PTI
+ ...
+ RIP: 0010:__release_sock+0x54/0xe0
+ ...
+ Call Trace:
+ release_sock+0x30/0xa0
+ af_alg_accept+0x122/0x1c0 [af_alg]
+ alg_accept+0x15/0x20 [af_alg]
+ SYSC_accept4+0xff/0x210
+ SyS_accept+0x10/0x20
+ do_syscall_64+0x73/0x130
+ entry_SYSCALL_64_after_hwframe+0x3d/0xa2

** Also affects: linux (Ubuntu Groovy)
Importance: Medium
Assignee: Mauricio Faria de Oliveira (mfo)
Status: Confirmed

** Also affects: linux (Ubuntu Xenial)
Importance: Undecided
Status: New

** Also affects: linux (Ubuntu Bionic)
Importance: Undecided
Status: New

** Also affects: linux (Ubuntu Eoan)
Importance: Undecided
Status: New

** Also affects: linux (Ubuntu Focal)
Importance: Undecided
Status: New

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/1884766

Title:
use-after-free in af_alg_accept() due to bh_lock_sock()

Status in linux package in Ubuntu:
Confirmed
Status in linux source package in Xenial:
New
Status in linux source package in Bionic:
New
Status in linux source package in Eoan:
New
Status in linux source package in Focal:
New
Status in linux source package in Groovy:
Confirmed

Bug description:
[Impact]

* Users of the Linux kernel's crypto userspace API
reported BUG() / kernel NULL pointer dereference
errors after kernel upgrades.

* The stack trace signature is an accept() syscall
going through af_alg_accept() and hitting errors
usually in one of:
- apparmor_sk_clone_security()
- apparmor_sock_graft()
- release_sock()

[Fix]

* This is a regression introduced by upstream commit
37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
in sk_destruct") which made its way through stable.

* The offending patch allows the critical regions
of af_alg_accept() and af_alg_release_parent() to
run concurrently; now with the "right" events on 2
CPUs it might drop the non-atomic reference counter
of the alg_sock then the sock, thus release a sock
that is still in use.

* The fix is upstream commit 34c86f4c4a7b ("crypto:
af_alg - fix use-after-free in af_alg_accept() due
to bh_lock_sock()") [1]. It changes alg_sock's ref
counter to atomic, which addresses the root cause.

[Test Case]

* There is a synthetic test case available, which
uses a kprobes kernel module to synchronize the
concurrent CPUs on the instructions responsible
for the problem; and a userspace part to run it.

* The organic reproducer is the Varnish Cache Plus
software with the Crypto vmod (which uses kernel
crypto userspace API) under long, very high load.

* The patch has been verified on both reproducers
with the 4.15 and 5.7 kernels.

* More tests performed with 'stress-ng --af-alg'
with 11 CPUs/hogs on Bionic/Disco/Eoan/Focal
(all on same version of stress-ng, V0.11.14)
No regressions observed from original kernel.
(the af-alg stressor can exercise almost all
kernel crypto modules shipped with the kernel;
so it checks more paths/crypto alg interfaces.)

[Regression Potential]

* The fix patch does a fundamental change in how
alg_sock reference counters work, plus another
change to the 'nokey' counting. This of course
*has* a risk of regression.

* Regressions theoretically could manifest as use
after free errors (in case of undercounting) in
the af_alg functions or silent memory leaks (in
case of overcounting), but also other behaviors
since reference counting is key to many things.

* FWIW, this patch has been written by the crypto
subsystem maintainer, who certainly knows a lot
of the normal and corner cases, thus giving the
patch more credit.

* Testing with the organic reproducer ran as long
as 5 days, without issues, so it does look good.

[Other Info]

* [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d

[Stack Trace Examples]

Examples:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
RIP: 0010:apparmor_sk_clone_security+0x26/0x70
...
Call Trace:
security_sk_clone+0x33/0x50
af_alg_accept+0x81/0x1c0 [af_alg]
alg_accept+0x15/0x20 [af_alg]
SYSC_accept4+0xff/0x210
SyS_accept+0x10/0x20
do_syscall_64+0x73/0x130
entry_SYSCALL_64_after_hwframe+0x3d/0xa2

general protection fault: 0000 [#1] SMP PTI
...
RIP: 0010:__release_sock+0x54/0xe0
...
Call Trace:
release_sock+0x30/0xa0
af_alg_accept+0x122/0x1c0 [af_alg]
alg_accept+0x15/0x20 [af_alg]
SYSC_accept4+0xff/0x210
SyS_accept+0x10/0x20
do_syscall_64+0x73/0x130
entry_SYSCALL_64_after_hwframe+0x3d/0xa2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1884766/+subscriptions

Комментариев нет:

Отправить комментарий