воскресенье

[Bug 2146707] Re: Boot-time SCSI async probe leaks nr_iowait (procs_blocked) — USB card reader 05e3:0751

Update: The procs_blocked leak has TWO sources, not just the USB card
reader.

Testing with usb-storage.quirks=05e3:0751:i (card reader fully ignored,
confirmed "device ignored" in dmesg, no /dev/sdc created) still shows
procs_blocked=4.

The second source is empty AHCI/SATA ports. This system has an Intel
AHCI controller (0000:00:17.0) with 6 ports, 2 occupied (sda, sdb) and 4
empty (ata3-6, all showing "SATA link down").

Adding libata.force=3:disable,4:disable,5:disable,6:disable drops
procs_blocked from 4 to 2.

Both sources share the same boot-time async scan code path:
do_scan_async → async_run_entry_fn → blk_execute_rq → io_schedule_timeout

This is not USB-specific — it's a general SCSI async scan accounting bug
affecting both USB mass-storage and AHCI during boot.

Summary of all test results:
No workarounds: procs_blocked=4
:i only (card reader): procs_blocked=4 (AHCI still leaks)
libata.force only: procs_blocked=2
:i + libata.force: procs_blocked=2 (same as libata.force alone)
Warm reprobe (ftrace): procs_blocked unchanged (boot-specific)

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2146707

Title:
Boot-time SCSI async probe leaks nr_iowait (procs_blocked) — USB card
reader 05e3:0751

Status in linux package in Ubuntu:
New

Bug description:

Subject: Boot-time SCSI async probe leaks nr_iowait (procs_blocked) — USB mass-storage card reader

Category: Drivers → SCSI
CC: linux-scsi@vger.kernel.org, linux-usb@vger.kernel.org

== Summary ==

The kernel's nr_iowait / procs_blocked counter becomes permanently
elevated (+4) during boot-time SCSI probing of a USB mass-storage card
reader. No actual D-state processes exist, and iotop shows zero disk IO.
All monitoring tools (top, vmstat, htop, node_exporter) report phantom
40-60% iowait.

The counter remains stuck until reboot.

Critical finding: if the same device is unbound and rebound after boot,
the SCSI probe runs again through the identical code path but does NOT
leak the counter. This proves the bug is specific to boot-time async
scanning, not the probe logic itself.

== Hardware ==

Genesys Logic microSD Card Reader (05e3:0751, bcdDevice 14.04)
Behind VIA Labs USB hub (2109:2817) built into a monitor
Intel xHCI host controller (0000:00:14.0)
Intel i7-9700K (8 cores, no HT)
MSI motherboard

== Kernel versions affected ==

6.8.0-106-generic — leaks 4-6
6.17.0-19-generic — leaks 2-4 (improved but not fixed)
Ubuntu 24.04.4 LTS (Noble Numbat) with HWE kernel

== Reproducer ==

1. Boot system with USB card reader attached (no SD card inserted).

2. After boot completes:

$ grep procs /proc/stat
procs_running 2
procs_blocked 4

3. Verify no D-state processes exist:

$ ps -eo pid,state,comm | awk '$2 == "D"'
(empty)

4. Verify zero actual disk IO:

$ iotop -bon1 | head -5
(shows zero read/write)

5. Verify all SCSI devices are healthy:

$ cat /sys/class/scsi_device/*/device/state
running
running
running

6. Rebind the device to trigger a second probe:

# echo 1-4.4:1.0 > /sys/bus/usb/drivers/usb-storage/unbind
# sleep 3
# echo 1-4.4:1.0 > /sys/bus/usb/drivers/usb-storage/bind
# sleep 10

7. Check procs_blocked again:

$ grep procs_blocked /proc/stat
procs_blocked 4

Counter stays at 4 — the warm reprobe does NOT leak.

== ftrace analysis ==

Traced io_schedule, io_schedule_timeout, blk_execute_rq, scsi_execute_cmd,
and scsi_test_unit_ready with func_stack_trace enabled.

The boot probe path (which leaks) and the warm reprobe path (which does
not leak) both execute:

io_schedule_timeout
<- __wait_for_common
<- wait_for_completion_io_timeout
<- blk_execute_rq
<- scsi_execute_cmd
<- scsi_probe_lun
<- scsi_probe_and_add_lun
<- __scsi_scan_target
<- scsi_scan_host_selected
<- do_scsi_scan_host
<- do_scan_async
<- async_run_entry_fn
<- process_one_work

The leak count of 4 matches the number of synchronous SCSI commands issued
during probe (TEST UNIT READY, REQUEST SENSE, READ CAPACITY, MODE SENSE),
each going through blk_execute_rq() → io_schedule_timeout().

During boot, the io_schedule_prepare/io_schedule_finish pairing appears to
be broken — the in_iowait flag is set but never cleared, leaking one
nr_iowait increment per command.

During warm reprobe, the same code path completes correctly.

== Hypothesis ==

The bug is likely in the interaction between the async scan worker
(do_scan_async via async_run_entry_fn) and the block layer's IO wait
accounting during early boot. Possible causes:

1. Async worker lifecycle issue — worker thread exits or gets requeued
before io_schedule_finish clears current->in_iowait
2. Block queue initialization race — request completes during partially
initialized queue state, bypassing normal cleanup
3. Early boot async scan concurrency — multiple devices scanning
simultaneously causes task flag inheritance issues

Recent fixes in related areas suggest this is a known problem class:
- Block layer iowait accounting in blk_execute_rq
- blk-mq wait accounting in blk_mq_wait_for_completion
- SCSI synchronous command completion handling
- Async scanning races during device discovery

== Suggested diagnostic patch ==

To confirm, add a check in do_scan_async() just before return:

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ do_scan_async
do_scsi_scan_host(shost);

+ if (current->in_iowait)
+ pr_warn("scsi_scan: worker exited with in_iowait "
+ "still set (pid=%d)\n", current->pid);
+
scsi_autopm_put_host(shost);
}

If the hypothesis is correct, this should fire during boot.

== Workaround ==

usb-storage.quirks=05e3:0751:i

This prevents the device from being probed at all, eliminating the leak.
But it completely disables the card reader.

== Parameters tested (none helped) ==

- usb-storage.delay_use=10
- scsi_mod.scan=sync
- usb-storage.quirks=05e3:0751:u (ignore UAS)
- Booting with SD card inserted

== dmesg during boot probe ==

usb 1-4.4: New USB device found, idVendor=05e3, idProduct=0751
usb-storage 1-4.4:1.0: USB Mass Storage device detected
scsi host6: usb-storage 1-4.4:1.0
[8 seconds later]
scsi 6:0:0:0: Direct-Access Generic STORAGE DEVICE 1404 PQ: 0 ANSI: 6
sd 6:0:0:0: [sdc] Media removed, stopped polling
sd 6:0:0:0: [sdc] Attached SCSI removable disk

No SCSI errors logged. Device appears to enumerate successfully.

== Attachments ==

- Full dmesg from boot
- Full ftrace output (62k lines, filtered timeline also provided)
- lsusb -t (USB topology)
- lsusb -v (full USB descriptors)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2146707/+subscriptions

Комментариев нет:

Отправить комментарий