понедельник

[Bug 2106381] [NEW] nvme/tcp hangs IO on arm

Public bug reported:

[Impact]

A user reported a bug in nvme over tcp driver affecting aarch64 architectures.
In weakly ordered architectures the compiler can reorder the instructions reading/setting
queue->cmd and queue->rcv_state which can lead to dropping IOs and IO hanging.

The bug has been fixed upstream in [1], introduced in 6.14.

[Test Plan]
The bug is reproducible on arm64 architectures.
Setup nvme over tcp.

Using an arm based machien as the target run a fio test with the
following config:

[global]
ioengine=libaio
max_latency=45s
end_fsync=1
create_serialize=0
size=3200m
directory=/path/to/storage
ramp_time=30
lat_percentiles=1
direct=1
filename_format=fiodata.$jobnum
verify_dump=1
numjobs=16
fallocate=native
stonewall=1
group_reporting=1
file_service_type=random
iodepth=16
runtime=5m
time_based=1
[random_0_100_4k]
bs=4k
rw=randwrite

[Where problems could occur]
To fix the bug the patch reads/writes queue->cmd with READ/WRITE_ONCE
statements and queue->rcv_state with smp_load_acquire and smp_store_release.
The patch modifies the nvme-tcp driver and therefore any potential regressions
regard setups using nvme over tpc.


[Other Info]

The user is able to reproduce the issue with kernles 5.19(no longer
supported), 6.8 and 6.11.


[1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme?id=a16f88964c647103dad7743a484b216d488a6352

** Affects: linux (Ubuntu)
Importance: Undecided
Status: New

** Affects: linux (Ubuntu Noble)
Importance: Undecided
Status: New

** Affects: linux (Ubuntu Oracular)
Importance: Undecided
Status: New

** Also affects: linux (Ubuntu Noble)
Importance: Undecided
Status: New

** Also affects: linux (Ubuntu Oracular)
Importance: Undecided
Status: New

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2106381

Title:
nvme/tcp hangs IO on arm

Status in linux package in Ubuntu:
New
Status in linux source package in Noble:
New
Status in linux source package in Oracular:
New

Bug description:
[Impact]

A user reported a bug in nvme over tcp driver affecting aarch64 architectures.
In weakly ordered architectures the compiler can reorder the instructions reading/setting
queue->cmd and queue->rcv_state which can lead to dropping IOs and IO hanging.

The bug has been fixed upstream in [1], introduced in 6.14.

[Test Plan]
The bug is reproducible on arm64 architectures.
Setup nvme over tcp.

Using an arm based machien as the target run a fio test with the
following config:

[global]
ioengine=libaio
max_latency=45s
end_fsync=1
create_serialize=0
size=3200m
directory=/path/to/storage
ramp_time=30
lat_percentiles=1
direct=1
filename_format=fiodata.$jobnum
verify_dump=1
numjobs=16
fallocate=native
stonewall=1
group_reporting=1
file_service_type=random
iodepth=16
runtime=5m
time_based=1
[random_0_100_4k]
bs=4k
rw=randwrite

[Where problems could occur]
To fix the bug the patch reads/writes queue->cmd with READ/WRITE_ONCE
statements and queue->rcv_state with smp_load_acquire and smp_store_release.
The patch modifies the nvme-tcp driver and therefore any potential regressions
regard setups using nvme over tpc.


[Other Info]

The user is able to reproduce the issue with kernles 5.19(no longer
supported), 6.8 and 6.11.


[1] https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/nvme?id=a16f88964c647103dad7743a484b216d488a6352

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2106381/+subscriptions

Комментариев нет:

Отправить комментарий