вторник

[Bug 1910866] Re: nvme drive fails after some time

@kaihengfeng

I have found that running the command "fio --name=basic
--directory=/path/to/empty/directory --size=1G --rw=randrw --numjobs=4
--loops=5" runs fine on linux-image-5.4.0-59-generic but when trying
with linux-image-5.8.0-36-generic it would freeze the system in the
"Laying out IO file" stage. I checked with two subsequent boots that the
5.8 does fail like this on an empty directory and will now use this as
my "test" if a kernel works or not.

I have installed the 5.11 rc3 mainline kernel you linked, note I have
had to disable secure boot to be able to use it. But this kernel worked
successfully on two boots with the fio test above.

So in summary so far on my system with the fio test:
linux-image-5.4.0-59-generic: PASS
linux-image-5.8.0-36-generic: FAIL
linux-image-unsigned-5.11.0-051100rc3-generic: PASS

Please advise how to proceed here, should I start manually picking (by
bisecting) kernels between 5.8 and 5.11 or between 5.4 and 5.8 ?

Also I guess I should also try 5.8 mainline to ensure that any Ubuntu
patches aren't causing an issue?

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/1910866

Title:
nvme drive fails after some time

Status in linux package in Ubuntu:
Confirmed

Bug description:
Sorry for the vague title. I thought this was a hardware issue until
someone else online mentioned their nvme drive goes "read only" after
some time. I tend not to reboot my system much, so have a large
journal. Either way this happens once in a while. The / drive is fine,
but /home is on nvme which just disappears. I reboot and everything is
fine. But leave it long enough and it'll fail again.

Here's the most recent snippet about the nvme drive before I restarted
the system.

Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, aborting
Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 449 QID 5 timeout, aborting
Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 450 QID 5 timeout, aborting
Jan 08 19:19:11 robot kernel: nvme nvme1: I/O 451 QID 5 timeout, aborting
Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 448 QID 5 timeout, reset controller
Jan 08 19:19:42 robot kernel: nvme nvme1: I/O 22 QID 0 timeout, reset controller
Jan 08 19:21:04 robot kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1
Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
Jan 08 19:21:04 robot kernel: nvme nvme1: Abort status: 0x371
Jan 08 19:21:25 robot kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1
Jan 08 19:21:25 robot kernel: nvme nvme1: Removing after probe failure status: -19
Jan 08 19:21:41 robot kernel: INFO: task jbd2/nvme1n1p1-:731 blocked for more than 120 seconds.
Jan 08 19:21:41 robot kernel: jbd2/nvme1n1p1- D 0 731 2 0x00004000
Jan 08 19:21:45 robot kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1
Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1920993784 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 240123967, lost async page write
Jan 08 19:21:45 robot kernel: EXT4-fs error (device nvme1n1p1): __ext4_find_entry:1535: inode #57278595: comm gsd-print-notif: reading directory lblock 0
Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1920993384 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 240123917, lost async page write
Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1920993320 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1833166472 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 240123909, lost async page write
Jan 08 19:21:45 robot kernel: blk_update_request: I/O error, dev nvme1n1, sector 1909398624 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
Jan 08 19:21:45 robot kernel: Buffer I/O error on dev nvme1n1p1, logical block 0, lost sync page write
Jan 08 19:21:45 robot kernel: EXT4-fs (nvme1n1p1): I/O error while writing superblock

ProblemType: Bug
DistroRelease: Ubuntu 20.10
Package: linux-image-5.8.0-34-generic 5.8.0-34.37
ProcVersionSignature: Ubuntu 5.8.0-34.37-generic 5.8.18
Uname: Linux 5.8.0-34-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu50.3
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Sat Jan 9 11:56:28 2021
InstallationDate: Installed on 2020-08-15 (146 days ago)
InstallationMedia: Ubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
MachineType: Intel Corporation NUC8i7HVK
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.8.0-34-generic root=UUID=c212e9d4-a049-4da0-8e34-971cb7414e60 ro quiet splash vt.handoff=7
RebootRequiredPkgs:
linux-image-5.8.0-36-generic
linux-base
RelatedPackageVersions:
linux-restricted-modules-5.8.0-34-generic N/A
linux-backports-modules-5.8.0-34-generic N/A
linux-firmware 1.190.2
SourcePackage: linux
UpgradeStatus: Upgraded to groovy on 2020-09-20 (110 days ago)
dmi.bios.date: 12/17/2018
dmi.bios.release: 5.6
dmi.bios.vendor: Intel Corp.
dmi.bios.version: HNKBLi70.86A.0053.2018.1217.1739
dmi.board.name: NUC8i7HVB
dmi.board.vendor: Intel Corporation
dmi.board.version: J68196-502
dmi.chassis.type: 3
dmi.chassis.vendor: Intel Corporation
dmi.chassis.version: 2.0
dmi.modalias: dmi:bvnIntelCorp.:bvrHNKBLi70.86A.0053.2018.1217.1739:bd12/17/2018:br5.6:svnIntelCorporation:pnNUC8i7HVK:pvrJ71485-502:rvnIntelCorporation:rnNUC8i7HVB:rvrJ68196-502:cvnIntelCorporation:ct3:cvr2.0:
dmi.product.family: Intel NUC
dmi.product.name: NUC8i7HVK
dmi.product.version: J71485-502
dmi.sys.vendor: Intel Corporation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1910866/+subscriptions

Комментариев нет:

Отправить комментарий