вторник

[Bug 1884766] Re: use-after-free in af_alg_accept() due to bh_lock_sock()

** Description changed:

[Impact]

- * Users of the Linux kernel's crypto userspace API
- reported BUG() / kernel NULL pointer dereference
- errors after kernel upgrades.
+  * Users of the Linux kernel's crypto userspace API
+    reported BUG() / kernel NULL pointer dereference
+    errors after kernel upgrades.

- * The stack trace signature is an accept() syscall
- going through af_alg_accept() and hitting errors
- usually in one of:
- - apparmor_sk_clone_security()
- - apparmor_sock_graft()
- - release_sock()
+  * The stack trace signature is an accept() syscall
+    going through af_alg_accept() and hitting errors
+    usually in one of:
+    - apparmor_sk_clone_security()
+    - apparmor_sock_graft()
+    - release_sock()

[Fix]
-
- * This is a regression introduced by upstream commit
- 37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
- in sk_destruct") which made its way through stable.
-
- * The offending patch allows the critical regions
- of af_alg_accept() and af_alg_release_parent() to
- run concurrently; now with the "right" events on 2
- CPUs it might drop the non-atomic reference counter
- of the alg_sock then the sock, thus release a sock
- that is still in use.

- * The fix is upstream commit 34c86f4c4a7b ("crypto:
- af_alg - fix use-after-free in af_alg_accept() due
- to bh_lock_sock()") [1]. It changes alg_sock's ref
- counter to atomic, which addresses the root cause.
-
+  * This is a regression introduced by upstream commit
+    37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
+    in sk_destruct") which made its way through stable.
+
+  * The offending patch allows the critical regions
+    of af_alg_accept() and af_alg_release_parent() to
+    run concurrently; now with the "right" events on 2
+    CPUs it might drop the non-atomic reference counter
+    of the alg_sock then the sock, thus release a sock
+    that is still in use.
+
+  * The fix is upstream commit 34c86f4c4a7b ("crypto:
+    af_alg - fix use-after-free in af_alg_accept() due
+    to bh_lock_sock()") [1]. It changes alg_sock's ref
+    counter to atomic, which addresses the root cause.
+
[Test Case]

- * There is a synthetic test case available, which
- uses a kprobes kernel module to synchronize the
- concurrent CPUs on the instructions responsible
- for the problem; and a userspace part to run it.
+  * There is a synthetic test case available, which
+    uses a kprobes kernel module to synchronize the
+    concurrent CPUs on the instructions responsible
+    for the problem; and a userspace part to run it.

- * The organic reproducer is the Varnish Cache Plus
- software with the Crypto vmod (which uses kernel
- crypto userspace API) under long, very high load.
-
- * The patch has been verified on both reproducers
- with the 4.15 and 5.7 kernels.
-
+  * The organic reproducer is the Varnish Cache Plus
+    software with the Crypto vmod (which uses kernel
+    crypto userspace API) under long, very high load.
+
+  * The patch has been verified on both reproducers
+    with the 4.15 and 5.7 kernels.
+
* More tests performed with 'stress-ng --af-alg'
- with 11 CPUs/hogs on Bionic/Disco/Eoan/Focal
+ with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal
(all on same version of stress-ng, V0.11.14)
+
No regressions observed from original kernel.
(the af-alg stressor can exercise almost all
kernel crypto modules shipped with the kernel;
so it checks more paths/crypto alg interfaces.)
-
+
[Regression Potential]

- * The fix patch does a fundamental change in how
- alg_sock reference counters work, plus another
- change to the 'nokey' counting. This of course
- *has* a risk of regression.
+  * The fix patch does a fundamental change in how
+    alg_sock reference counters work, plus another
+    change to the 'nokey' counting. This of course
+    *has* a risk of regression.

- * Regressions theoretically could manifest as use
- after free errors (in case of undercounting) in
- the af_alg functions or silent memory leaks (in
- case of overcounting), but also other behaviors
- since reference counting is key to many things.
-
- * FWIW, this patch has been written by the crypto
- subsystem maintainer, who certainly knows a lot
- of the normal and corner cases, thus giving the
- patch more credit.
-
- * Testing with the organic reproducer ran as long
- as 5 days, without issues, so it does look good.
+  * Regressions theoretically could manifest as use
+    after free errors (in case of undercounting) in
+    the af_alg functions or silent memory leaks (in
+    case of overcounting), but also other behaviors
+    since reference counting is key to many things.
+
+  * FWIW, this patch has been written by the crypto
+    subsystem maintainer, who certainly knows a lot
+    of the normal and corner cases, thus giving the
+    patch more credit.
+
+  * Testing with the organic reproducer ran as long
+    as 5 days, without issues, so it does look good.

[Other Info]

- * [1] Patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
-
+  * [1] Patch:
+ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
+
[Stack Trace Examples]

Examples:

- BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
- ...
- RIP: 0010:apparmor_sk_clone_security+0x26/0x70
- ...
- Call Trace:
- security_sk_clone+0x33/0x50
- af_alg_accept+0x81/0x1c0 [af_alg]
- alg_accept+0x15/0x20 [af_alg]
- SYSC_accept4+0xff/0x210
- SyS_accept+0x10/0x20
- do_syscall_64+0x73/0x130
- entry_SYSCALL_64_after_hwframe+0x3d/0xa2
+     BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
+     ...
+     RIP: 0010:apparmor_sk_clone_security+0x26/0x70
+     ...
+     Call Trace:
+      security_sk_clone+0x33/0x50
+      af_alg_accept+0x81/0x1c0 [af_alg]
+      alg_accept+0x15/0x20 [af_alg]
+      SYSC_accept4+0xff/0x210
+      SyS_accept+0x10/0x20
+      do_syscall_64+0x73/0x130
+      entry_SYSCALL_64_after_hwframe+0x3d/0xa2

- general protection fault: 0000 [#1] SMP PTI
- ...
- RIP: 0010:__release_sock+0x54/0xe0
- ...
- Call Trace:
- release_sock+0x30/0xa0
- af_alg_accept+0x122/0x1c0 [af_alg]
- alg_accept+0x15/0x20 [af_alg]
- SYSC_accept4+0xff/0x210
- SyS_accept+0x10/0x20
- do_syscall_64+0x73/0x130
- entry_SYSCALL_64_after_hwframe+0x3d/0xa2
+     general protection fault: 0000 [#1] SMP PTI
+     ...
+     RIP: 0010:__release_sock+0x54/0xe0
+     ...
+     Call Trace:
+      release_sock+0x30/0xa0
+      af_alg_accept+0x122/0x1c0 [af_alg]
+      alg_accept+0x15/0x20 [af_alg]
+      SYSC_accept4+0xff/0x210
+      SyS_accept+0x10/0x20
+      do_syscall_64+0x73/0x130
+      entry_SYSCALL_64_after_hwframe+0x3d/0xa2

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/1884766

Title:
use-after-free in af_alg_accept() due to bh_lock_sock()

Status in linux package in Ubuntu:
Confirmed
Status in linux source package in Xenial:
New
Status in linux source package in Bionic:
New
Status in linux source package in Eoan:
New
Status in linux source package in Focal:
New
Status in linux source package in Groovy:
Confirmed

Bug description:
[Impact]

 * Users of the Linux kernel's crypto userspace API
   reported BUG() / kernel NULL pointer dereference
   errors after kernel upgrades.

 * The stack trace signature is an accept() syscall
   going through af_alg_accept() and hitting errors
   usually in one of:
   - apparmor_sk_clone_security()
   - apparmor_sock_graft()
   - release_sock()

[Fix]

 * This is a regression introduced by upstream commit
   37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
   in sk_destruct") which made its way through stable.

 * The offending patch allows the critical regions
   of af_alg_accept() and af_alg_release_parent() to
   run concurrently; now with the "right" events on 2
   CPUs it might drop the non-atomic reference counter
   of the alg_sock then the sock, thus release a sock
   that is still in use.

 * The fix is upstream commit 34c86f4c4a7b ("crypto:
   af_alg - fix use-after-free in af_alg_accept() due
   to bh_lock_sock()") [1]. It changes alg_sock's ref
   counter to atomic, which addresses the root cause.

[Test Case]

 * There is a synthetic test case available, which
   uses a kprobes kernel module to synchronize the
   concurrent CPUs on the instructions responsible
   for the problem; and a userspace part to run it.

 * The organic reproducer is the Varnish Cache Plus
   software with the Crypto vmod (which uses kernel
   crypto userspace API) under long, very high load.

 * The patch has been verified on both reproducers
   with the 4.15 and 5.7 kernels.

* More tests performed with 'stress-ng --af-alg'
with 11 CPUs on Xenial/Bionic/Disco/Eoan/Focal
(all on same version of stress-ng, V0.11.14)

No regressions observed from original kernel.
(the af-alg stressor can exercise almost all
kernel crypto modules shipped with the kernel;
so it checks more paths/crypto alg interfaces.)

[Regression Potential]

 * The fix patch does a fundamental change in how
   alg_sock reference counters work, plus another
   change to the 'nokey' counting. This of course
   *has* a risk of regression.

 * Regressions theoretically could manifest as use
   after free errors (in case of undercounting) in
   the af_alg functions or silent memory leaks (in
   case of overcounting), but also other behaviors
   since reference counting is key to many things.

 * FWIW, this patch has been written by the crypto
   subsystem maintainer, who certainly knows a lot
   of the normal and corner cases, thus giving the
   patch more credit.

 * Testing with the organic reproducer ran as long
   as 5 days, without issues, so it does look good.

[Other Info]

 * [1] Patch:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d

[Stack Trace Examples]

Examples:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
    ...
    RIP: 0010:apparmor_sk_clone_security+0x26/0x70
    ...
    Call Trace:
     security_sk_clone+0x33/0x50
     af_alg_accept+0x81/0x1c0 [af_alg]
     alg_accept+0x15/0x20 [af_alg]
     SYSC_accept4+0xff/0x210
     SyS_accept+0x10/0x20
     do_syscall_64+0x73/0x130
     entry_SYSCALL_64_after_hwframe+0x3d/0xa2

    general protection fault: 0000 [#1] SMP PTI
    ...
    RIP: 0010:__release_sock+0x54/0xe0
    ...
    Call Trace:
     release_sock+0x30/0xa0
     af_alg_accept+0x122/0x1c0 [af_alg]
     alg_accept+0x15/0x20 [af_alg]
     SYSC_accept4+0xff/0x210
     SyS_accept+0x10/0x20
     do_syscall_64+0x73/0x130
     entry_SYSCALL_64_after_hwframe+0x3d/0xa2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1884766/+subscriptions

Комментариев нет:

Отправить комментарий