Public bug reported:
[Impact]
A user reported a bug in nvme over tcp driver affecting aarch64 architectures.
In weakly ordered architectures the compiler can reorder the instructions reading/setting
queue->cmd and queue->rcv_state which can lead to dropping IOs and IO hanging.
The bug has been fixed upstream in [1], introduced in 6.14.
[Test Plan]
The bug is reproducible on arm64 architectures.
Setup nvme over tcp.
Using an arm based machien as the target run a fio test with the
following config:
[global]
ioengine=libaio
max_latency=45s
end_fsync=1
create_serialize=0
size=3200m
directory=/path/to/storage
ramp_time=30
lat_percentiles=1
direct=1
filename_format=fiodata.$jobnum
verify_dump=1
numjobs=16
fallocate=native
stonewall=1
group_reporting=1
file_service_type=random
iodepth=16
runtime=5m
time_based=1
[random_0_100_4k]
bs=4k
rw=randwrite
[Where problems could occur]
To fix the bug the patch reads/writes queue->cmd with READ/WRITE_ONCE
statements and queue->rcv_state with smp_load_acquire and smp_store_release.
The patch modifies the nvme-tcp driver and therefore any potential regressions
regard setups using nvme over tpc.
[Other Info]
The user is able to reproduce the issue with kernles 5.19(no longer
supported), 6.8 and 6.11.
[1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme?id=a16f88964c647103dad7743a484b216d488a6352
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Affects: linux (Ubuntu Noble)
Importance: Undecided
Status: New
** Affects: linux (Ubuntu Oracular)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Noble)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Oracular)
Importance: Undecided
Status: New
--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2106381
Title:
nvme/tcp hangs IO on arm
Status in linux package in Ubuntu:
New
Status in linux source package in Noble:
New
Status in linux source package in Oracular:
New
Bug description:
[Impact]
A user reported a bug in nvme over tcp driver affecting aarch64 architectures.
In weakly ordered architectures the compiler can reorder the instructions reading/setting
queue->cmd and queue->rcv_state which can lead to dropping IOs and IO hanging.
The bug has been fixed upstream in [1], introduced in 6.14.
[Test Plan]
The bug is reproducible on arm64 architectures.
Setup nvme over tcp.
Using an arm based machien as the target run a fio test with the
following config:
[global]
ioengine=libaio
max_latency=45s
end_fsync=1
create_serialize=0
size=3200m
directory=/path/to/storage
ramp_time=30
lat_percentiles=1
direct=1
filename_format=fiodata.$jobnum
verify_dump=1
numjobs=16
fallocate=native
stonewall=1
group_reporting=1
file_service_type=random
iodepth=16
runtime=5m
time_based=1
[random_0_100_4k]
bs=4k
rw=randwrite
[Where problems could occur]
To fix the bug the patch reads/writes queue->cmd with READ/WRITE_ONCE
statements and queue->rcv_state with smp_load_acquire and smp_store_release.
The patch modifies the nvme-tcp driver and therefore any potential regressions
regard setups using nvme over tpc.
[Other Info]
The user is able to reproduce the issue with kernles 5.19(no longer
supported), 6.8 and 6.11.
[1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme?id=a16f88964c647103dad7743a484b216d488a6352
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2106381/+subscriptions
Комментариев нет:
Отправить комментарий