** Attachment added: "reproduce-ceph-punch-hole-corruption.py"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2144592/+attachment/5953206/+files/reproduce-ceph-punch-hole-corruption.py
** Description changed:
Running Ceph FS on Ubuntu 24.04 (6.8 kernel) - Ubuntu
6.8.0-100.100-generic 6.8.12
Enclosed script reproduce-ceph-punch-hole-corruption.py exposes issue
that we have found that on recent kernels CephFS silently corrupts 16KB
of data before the requested hole when trying to punch a hole through
file (test uses fallocate()). Corruption only occurs when hole touches
or crosses a 4MB RADOS object boundary (4MB is the default stripe size).
Execution shows the corruption:
root@EdgeOS-5HB6Q54:/home/eceuser# python3 ./reproduce-ceph-punch-hole-corruption.py /Shared_DataStore/
CephFS PUNCH_HOLE data corruption reproducer
============================================================
Mount point: /Shared_DataStore/
Object size: 4194304 (4 MiB)
Tests crossing 4MB object boundary (expect FAIL on buggy kernels):
------------------------------------------------------------
- FAIL 1 page before boundary, 2 pages
- hole=[4190208, 4198400) checked=[4173824, 4190208)
- 16384/16384 bytes read as 0x00 (expected 0xFF)
- FAIL 2 pages before boundary, 4 pages
- hole=[4186112, 4202496) checked=[4169728, 4186112)
- 16384/16384 bytes read as 0x00 (expected 0xFF)
- FAIL 4 pages before boundary, 8 pages
- hole=[4177920, 4210688) checked=[4161536, 4177920)
- 16384/16384 bytes read as 0x00 (expected 0xFF)
- FAIL ends at boundary, 2 pages
- hole=[4186112, 4194304) checked=[4169728, 4186112)
- 16384/16384 bytes read as 0x00 (expected 0xFF)
- FAIL ends at boundary, 1 page
- hole=[4190208, 4194304) checked=[4173824, 4190208)
- 16384/16384 bytes read as 0x00 (expected 0xFF)
+ FAIL 1 page before boundary, 2 pages
+ hole=[4190208, 4198400) checked=[4173824, 4190208)
+ 16384/16384 bytes read as 0x00 (expected 0xFF)
+ FAIL 2 pages before boundary, 4 pages
+ hole=[4186112, 4202496) checked=[4169728, 4186112)
+ 16384/16384 bytes read as 0x00 (expected 0xFF)
+ FAIL 4 pages before boundary, 8 pages
+ hole=[4177920, 4210688) checked=[4161536, 4177920)
+ 16384/16384 bytes read as 0x00 (expected 0xFF)
+ FAIL ends at boundary, 2 pages
+ hole=[4186112, 4194304) checked=[4169728, 4186112)
+ 16384/16384 bytes read as 0x00 (expected 0xFF)
+ FAIL ends at boundary, 1 page
+ hole=[4190208, 4194304) checked=[4173824, 4190208)
+ 16384/16384 bytes read as 0x00 (expected 0xFF)
Tests NOT crossing boundary (should always PASS):
------------------------------------------------------------
- PASS within object 0
- hole=[4161536, 4169728) checked=[4145152, 4161536)
- PASS mid object 0
- hole=[1048576, 1056768) checked=[1032192, 1048576)
- PASS start of object 1
- hole=[4194304, 4202496) checked=[4177920, 4194304)
- PASS within object 1
- hole=[5242880, 5251072) checked=[5226496, 5242880)
+ PASS within object 0
+ hole=[4161536, 4169728) checked=[4145152, 4161536)
+ PASS mid object 0
+ hole=[1048576, 1056768) checked=[1032192, 1048576)
+ PASS start of object 1
+ hole=[4194304, 4202496) checked=[4177920, 4194304)
+ PASS within object 1
+ hole=[5242880, 5251072) checked=[5226496, 5242880)
============================================================
Results: 4 passed, 5 failed out of 9
BUG CONFIRMED: This kernel has the CephFS PUNCH_HOLE corruption bug.
- Enclosed is a patch submission detailing issue, 0001-ceph-fix-data-
- corruption-from-short-read-on-punch-hole.patch
+ Enclosed is a patch submission detailing issue (AI created): 0001-ceph-
+ fix-data-corruption-from-short-read-on-punch-hole.patch
With patch test script now passes:
root@EdgeOS-3CD6Q54:~# python3 /home/eceuser/reproduce-ceph-punch-hole-corruption.py /Shared_DataStore/
CephFS PUNCH_HOLE data corruption reproducer
============================================================
Mount point: /Shared_DataStore/
Object size: 4194304 (4 MiB)
Tests crossing 4MB object boundary (expect FAIL on buggy kernels):
------------------------------------------------------------
- PASS 1 page before boundary, 2 pages
- hole=[4190208, 4198400) checked=[4173824, 4190208)
- PASS 2 pages before boundary, 4 pages
- hole=[4186112, 4202496) checked=[4169728, 4186112)
- PASS 4 pages before boundary, 8 pages
- hole=[4177920, 4210688) checked=[4161536, 4177920)
- PASS ends at boundary, 2 pages
- hole=[4186112, 4194304) checked=[4169728, 4186112)
- PASS ends at boundary, 1 page
- hole=[4190208, 4194304) checked=[4173824, 4190208)
+ PASS 1 page before boundary, 2 pages
+ hole=[4190208, 4198400) checked=[4173824, 4190208)
+ PASS 2 pages before boundary, 4 pages
+ hole=[4186112, 4202496) checked=[4169728, 4186112)
+ PASS 4 pages before boundary, 8 pages
+ hole=[4177920, 4210688) checked=[4161536, 4177920)
+ PASS ends at boundary, 2 pages
+ hole=[4186112, 4194304) checked=[4169728, 4186112)
+ PASS ends at boundary, 1 page
+ hole=[4190208, 4194304) checked=[4173824, 4190208)
Tests NOT crossing boundary (should always PASS):
------------------------------------------------------------
- PASS within object 0
- hole=[4161536, 4169728) checked=[4145152, 4161536)
- PASS mid object 0
- hole=[1048576, 1056768) checked=[1032192, 1048576)
- PASS start of object 1
- hole=[4194304, 4202496) checked=[4177920, 4194304)
- PASS within object 1
- hole=[5242880, 5251072) checked=[5226496, 5242880)
+ PASS within object 0
+ hole=[4161536, 4169728) checked=[4145152, 4161536)
+ PASS mid object 0
+ hole=[1048576, 1056768) checked=[1032192, 1048576)
+ PASS start of object 1
+ hole=[4194304, 4202496) checked=[4177920, 4194304)
+ PASS within object 1
+ hole=[5242880, 5251072) checked=[5226496, 5242880)
============================================================
Results: 9 passed, 0 failed out of 9
All tests passed. This kernel is not affected (or the fix is applied).
Appears as if following commit causes the issue:
92b6cc5d1e7c ("netfs: Add iov_iters to (sub)requests to describe various buffers") by David Howells, authored 2023-09-27, committed 2023-12-24. Merged in v6.8-rc1.
This is only present in 6.8 and 6.9 kernels, 6.10 rewrote this activity
under ee4cdf7ba857 ("netfs: Speed up buffered reading") by David
Howells, 2024-07-02. Merged in v6.10.) which no longer has this issue.
Asking for either analysis of enclosed patch to be included into Stable
or if there is another/better way to fix.
--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2144592
Title:
Punching hole through CephFS hosted file causes corruption when
crossing 4MB RADOS object boundary
Status in linux package in Ubuntu:
New
Bug description:
Running Ceph FS on Ubuntu 24.04 (6.8 kernel) - Ubuntu
6.8.0-100.100-generic 6.8.12
Enclosed script reproduce-ceph-punch-hole-corruption.py exposes issue
that we have found that on recent kernels CephFS silently corrupts
16KB of data before the requested hole when trying to punch a hole
through file (test uses fallocate()). Corruption only occurs when hole
touches or crosses a 4MB RADOS object boundary (4MB is the default
stripe size).
Execution shows the corruption:
root@EdgeOS-5HB6Q54:/home/eceuser# python3 ./reproduce-ceph-punch-hole-corruption.py /Shared_DataStore/
CephFS PUNCH_HOLE data corruption reproducer
============================================================
Mount point: /Shared_DataStore/
Object size: 4194304 (4 MiB)
Tests crossing 4MB object boundary (expect FAIL on buggy kernels):
------------------------------------------------------------
FAIL 1 page before boundary, 2 pages
hole=[4190208, 4198400) checked=[4173824, 4190208)
16384/16384 bytes read as 0x00 (expected 0xFF)
FAIL 2 pages before boundary, 4 pages
hole=[4186112, 4202496) checked=[4169728, 4186112)
16384/16384 bytes read as 0x00 (expected 0xFF)
FAIL 4 pages before boundary, 8 pages
hole=[4177920, 4210688) checked=[4161536, 4177920)
16384/16384 bytes read as 0x00 (expected 0xFF)
FAIL ends at boundary, 2 pages
hole=[4186112, 4194304) checked=[4169728, 4186112)
16384/16384 bytes read as 0x00 (expected 0xFF)
FAIL ends at boundary, 1 page
hole=[4190208, 4194304) checked=[4173824, 4190208)
16384/16384 bytes read as 0x00 (expected 0xFF)
Tests NOT crossing boundary (should always PASS):
------------------------------------------------------------
PASS within object 0
hole=[4161536, 4169728) checked=[4145152, 4161536)
PASS mid object 0
hole=[1048576, 1056768) checked=[1032192, 1048576)
PASS start of object 1
hole=[4194304, 4202496) checked=[4177920, 4194304)
PASS within object 1
hole=[5242880, 5251072) checked=[5226496, 5242880)
============================================================
Results: 4 passed, 5 failed out of 9
BUG CONFIRMED: This kernel has the CephFS PUNCH_HOLE corruption bug.
Enclosed is a patch submission detailing issue (AI created):
0001-ceph-fix-data-corruption-from-short-read-on-punch-hole.patch
With patch test script now passes:
root@EdgeOS-3CD6Q54:~# python3 /home/eceuser/reproduce-ceph-punch-hole-corruption.py /Shared_DataStore/
CephFS PUNCH_HOLE data corruption reproducer
============================================================
Mount point: /Shared_DataStore/
Object size: 4194304 (4 MiB)
Tests crossing 4MB object boundary (expect FAIL on buggy kernels):
------------------------------------------------------------
PASS 1 page before boundary, 2 pages
hole=[4190208, 4198400) checked=[4173824, 4190208)
PASS 2 pages before boundary, 4 pages
hole=[4186112, 4202496) checked=[4169728, 4186112)
PASS 4 pages before boundary, 8 pages
hole=[4177920, 4210688) checked=[4161536, 4177920)
PASS ends at boundary, 2 pages
hole=[4186112, 4194304) checked=[4169728, 4186112)
PASS ends at boundary, 1 page
hole=[4190208, 4194304) checked=[4173824, 4190208)
Tests NOT crossing boundary (should always PASS):
------------------------------------------------------------
PASS within object 0
hole=[4161536, 4169728) checked=[4145152, 4161536)
PASS mid object 0
hole=[1048576, 1056768) checked=[1032192, 1048576)
PASS start of object 1
hole=[4194304, 4202496) checked=[4177920, 4194304)
PASS within object 1
hole=[5242880, 5251072) checked=[5226496, 5242880)
============================================================
Results: 9 passed, 0 failed out of 9
All tests passed. This kernel is not affected (or the fix is applied).
Appears as if following commit causes the issue:
92b6cc5d1e7c ("netfs: Add iov_iters to (sub)requests to describe various buffers") by David Howells, authored 2023-09-27, committed 2023-12-24. Merged in v6.8-rc1.
This is only present in 6.8 and 6.9 kernels, 6.10 rewrote this
activity under ee4cdf7ba857 ("netfs: Speed up buffered reading") by
David Howells, 2024-07-02. Merged in v6.10.) which no longer has this
issue.
Asking for either analysis of enclosed patch to be included into
Stable or if there is another/better way to fix.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2144592/+subscriptions
Комментариев нет:
Отправить комментарий