пятница

[Bug 2033693] Re: Kernel bug lead to unresponsive system

Is this reproducible? Is it related to suspend/resume? Even if the
kernel hang is not reproducible, I'm interested in the PCIe Corrected
Errors. Some have reported that "pcie_aspm=off" avoids the errors. If
that's the case for you, see
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2043665/comments/6
and help me investigate it!

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2033693

Title:
Kernel bug lead to unresponsive system

Status in linux package in Ubuntu:
Confirmed

Bug description:
A server became unresponsive. After a hardware reset the server came
back up. /var/log/kern.log shows a stacktrace at Aug 31 19:42:36:

Aug 31 19:42:36 prod01 kernel: [1222010.240461] pcieport 0000:00:1c.1: AER: Multiple Corrected error received: 0000:00:1c.1
Aug 31 19:42:36 prod01 kernel: [1222010.364945] pcieport 0000:00:1c.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Aug 31 19:42:36 prod01 kernel: [1222010.364947] pcieport 0000:00:1c.1: device [8086:7ab9] error status/mask=00003140/00002000
Aug 31 19:42:36 prod01 kernel: [1222010.364949] pcieport 0000:00:1c.1: [ 6] BadTLP
Aug 31 19:42:36 prod01 kernel: [1222010.364950] pcieport 0000:00:1c.1: [ 8] Rollover
Aug 31 19:42:36 prod01 kernel: [1222010.364951] pcieport 0000:00:1c.1: [12] Timeout
Aug 31 19:42:36 prod01 kernel: [1222010.364968] pcieport 0000:00:1c.1: AER: Uncorrected (Fatal) error received: 0000:00:1c.1
Aug 31 19:42:36 prod01 kernel: [1222010.364977] pcieport 0000:00:1c.1: PCIe Bus Error: severity=Uncorrected (Fatal), type=Data Link Layer, (Receiver ID)
Aug 31 19:42:36 prod01 kernel: [1222010.364982] pcieport 0000:00:1c.1: device [8086:7ab9] error status/mask=00000010/00000000
Aug 31 19:42:36 prod01 kernel: [1222010.364985] pcieport 0000:00:1c.1: [ 4] DLP (First)
Aug 31 19:42:38 prod01 kernel: [1222011.619605] pcieport 0000:00:1c.1: AER: Root Port link has been reset (0)
Aug 31 19:42:38 prod01 kernel: [1222011.619634] igc 0000:06:00.0: can't change power state from D3cold to D0 (config space inaccessible)
Aug 31 19:42:38 prod01 kernel: [1222011.683503] igc 0000:06:00.0 enp6s0: PCIe link lost, device now detached
Aug 31 19:42:44 prod01 kernel: [1222017.857457] genirq: Flags mismatch irq 166. 00000000 (enp6s0) vs. 00000000 (enp6s0)
Aug 31 19:42:44 prod01 kernel: [1222017.858167] ------------[ cut here ]------------
Aug 31 19:42:44 prod01 kernel: [1222017.858169] kernel BUG at drivers/pci/msi.c:369!
Aug 31 19:42:44 prod01 kernel: [1222017.858179] invalid opcode: 0000 [#1] SMP NOPTI
Aug 31 19:42:44 prod01 kernel: [1222017.858183] CPU: 15 PID: 286 Comm: irq/127-aerdrv Not tainted 5.15.0-73-generic #80-Ubuntu
Aug 31 19:42:44 prod01 kernel: [1222017.858187] Hardware name: ASUSTeK COMPUTER INC. System Product Name/W680/MB DC, BIOS 2004 12/16/2022
Aug 31 19:42:44 prod01 kernel: [1222017.858189] RIP: 0010:free_msi_irqs+0x110/0x140
Aug 31 19:42:44 prod01 kernel: [1222017.858196] Code: 85 c0 0f 84 45 ff ff ff 45 31 f6 eb 11 41 83 c6 01 44 39 73 14 0f 86 32 ff ff ff 8b 7b 10 44 01 f7 e8 a4 44 a6 ff 84 c0 74 e3 <0f> 0b 49 8b 7d 60 e8 b5 93 9b ff 49 8b 45 00 eb 8d 49 8d b5 d0 00
Aug 31 19:42:44 prod01 kernel: [1222017.858200] RSP: 0000:ffffae8d00cf7c30 EFLAGS: 00010202
Aug 31 19:42:44 prod01 kernel: [1222017.858204] RAX: 0000000000000001 RBX: ffff8b42c34e5100 RCX: 0000000000000000
Aug 31 19:42:44 prod01 kernel: [1222017.858206] RDX: ffff8b42c4b97ef0 RSI: 00000000000000a6 RDI: ffffffffb357b540
Aug 31 19:42:44 prod01 kernel: [1222017.858209] RBP: ffffae8d00cf7c58 R08: 0000000000000000 R09: ffffffffb357b548
Aug 31 19:42:44 prod01 kernel: [1222017.858211] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b42c4095300
Aug 31 19:42:44 prod01 kernel: [1222017.858213] R13: ffff8b42c4095000 R14: 0000000000000000 R15: 00000000ffffffff
Aug 31 19:42:44 prod01 kernel: [1222017.858215] FS: 0000000000000000(0000) GS:ffff8b61bf3c0000(0000) knlGS:0000000000000000
Aug 31 19:42:44 prod01 kernel: [1222017.858217] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 31 19:42:44 prod01 kernel: [1222017.858219] CR2: 00007efd59269000 CR3: 0000001c1b610004 CR4: 0000000000770ee0
Aug 31 19:42:44 prod01 kernel: [1222017.858222] PKRU: 55555554
Aug 31 19:42:44 prod01 kernel: [1222017.858223] Call Trace:
Aug 31 19:42:44 prod01 kernel: [1222017.858226] <TASK>
Aug 31 19:42:44 prod01 kernel: [1222017.858228] pci_disable_msix+0x105/0x130
Aug 31 19:42:44 prod01 kernel: [1222017.858234] igc_reset_interrupt_capability+0x2d/0x120 [igc]
Aug 31 19:42:44 prod01 kernel: [1222017.858241] __igc_open+0x254/0x5e0 [igc]
Aug 31 19:42:44 prod01 kernel: [1222017.858248] ? igc_tsn_reset+0x76/0x130 [igc]
Aug 31 19:42:44 prod01 kernel: [1222017.858257] igc_io_resume+0x2a/0x70 [igc]
Aug 31 19:42:44 prod01 kernel: [1222017.858262] report_resume+0x80/0x90
Aug 31 19:42:44 prod01 kernel: [1222017.858265] ? resume_iter+0x50/0x50
Aug 31 19:42:44 prod01 kernel: [1222017.858268] pci_walk_bus+0x74/0xa0
Aug 31 19:42:44 prod01 kernel: [1222017.858271] pcie_do_recovery+0xf5/0x320
Aug 31 19:42:44 prod01 kernel: [1222017.858273] ? aer_print_port_info+0xc0/0xc0
Aug 31 19:42:44 prod01 kernel: [1222017.858277] aer_isr+0x4b8/0x5f0
Aug 31 19:42:44 prod01 kernel: [1222017.858280] ? kfree+0x21a/0x250
Aug 31 19:42:44 prod01 kernel: [1222017.858285] ? irq_forced_thread_fn+0x90/0x90
Aug 31 19:42:44 prod01 kernel: [1222017.858289] irq_thread_fn+0x25/0x70
Aug 31 19:42:44 prod01 kernel: [1222017.858293] irq_thread+0xdc/0x1b0
Aug 31 19:42:44 prod01 kernel: [1222017.858296] ? irq_thread_fn+0x70/0x70
Aug 31 19:42:44 prod01 kernel: [1222017.858299] ? irq_thread_check_affinity+0x100/0x100
Aug 31 19:42:44 prod01 kernel: [1222017.858303] kthread+0x127/0x150
Aug 31 19:42:44 prod01 kernel: [1222017.858307] ? set_kthread_struct+0x50/0x50
Aug 31 19:42:44 prod01 kernel: [1222017.858311] ret_from_fork+0x1f/0x30
Aug 31 19:42:44 prod01 kernel: [1222017.858316] </TASK>
Aug 31 19:42:44 prod01 kernel: [1222017.858317] Modules linked in: 8021q garp mrp stp llc ip6t_REJECT nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype xt_tcpudp nft_counter xt_recent xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables cpuid nfnetlink tls intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp nls_is
o8859_1 coretemp kvm_intel kvm input_leds joydev serio_raw eeepc_wmi wmi_bmof acpi_tad acpi_pad mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq
libcrc32c raid1 mfd_aaeon asus_wmi sparse_keymap platform_profile crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd intel_lpss_pci i2c_i801 nvme ahci
Aug 31 19:42:44 prod01 kernel: [1222017.858374] xhci_pci intel_lpss igc i2c_smbus libahci xhci_pci_renesas nvme_core idma64 wmi video pinctrl_alderlake
Aug 31 19:42:44 prod01 kernel: [1222017.858390] ---[ end trace 3e395bfa91f4ed5b ]---
Aug 31 19:42:44 prod01 kernel: [1222018.016432] RIP: 0010:free_msi_irqs+0x110/0x140
Aug 31 19:42:44 prod01 kernel: [1222018.016435] Code: 85 c0 0f 84 45 ff ff ff 45 31 f6 eb 11 41 83 c6 01 44 39 73 14 0f 86 32 ff ff ff 8b 7b 10 44 01 f7 e8 a4 44 a6 ff 84 c0 74 e3 <0f> 0b 49 8b 7d 60 e8 b5 93 9b ff 49 8b 45 00 eb 8d 49 8d b5 d0 00
Aug 31 19:42:44 prod01 kernel: [1222018.016436] RSP: 0000:ffffae8d00cf7c30 EFLAGS: 00010202
Aug 31 19:42:44 prod01 kernel: [1222018.016438] RAX: 0000000000000001 RBX: ffff8b42c34e5100 RCX: 0000000000000000
Aug 31 19:42:44 prod01 kernel: [1222018.016439] RDX: ffff8b42c4b97ef0 RSI: 00000000000000a6 RDI: ffffffffb357b540
Aug 31 19:42:44 prod01 kernel: [1222018.016440] RBP: ffffae8d00cf7c58 R08: 0000000000000000 R09: ffffffffb357b548
Aug 31 19:42:44 prod01 kernel: [1222018.016441] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b42c4095300
Aug 31 19:42:44 prod01 kernel: [1222018.016441] R13: ffff8b42c4095000 R14: 0000000000000000 R15: 00000000ffffffff
Aug 31 19:42:44 prod01 kernel: [1222018.016442] FS: 0000000000000000(0000) GS:ffff8b61bf3c0000(0000) knlGS:0000000000000000
Aug 31 19:42:44 prod01 kernel: [1222018.016443] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 31 19:42:44 prod01 kernel: [1222018.016444] CR2: 00007efd59269000 CR3: 0000000195ce0003 CR4: 0000000000770ee0
Aug 31 19:42:44 prod01 kernel: [1222018.016445] PKRU: 55555554
Aug 31 19:42:44 prod01 kernel: [1222018.016452] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Aug 31 19:42:44 prod01 kernel: [1222018.016453] BUG: unable to handle page fault for address: ffffae8d00cf7ef8
Aug 31 19:42:44 prod01 kernel: [1222018.016454] #PF: supervisor instruction fetch in kernel mod
Aug 31 19:42:44 prod01 kernel: [1222018.016454] #PF: supervisor instruction fetch in kernel mode
Aug 31 19:42:44 prod01 kernel: [1222018.016455] #PF: error_code(0x0011) - permissions violation
Aug 31 19:42:44 prod01 kernel: [1222018.016455] PGD 100000067 P4D 100000067 PUD 1001de067 PMD 106240067 PTE 80000001043e5163
Aug 31 19:42:44 prod01 kernel: [1222018.016457] Oops: 0011 [#2] SMP NOPTI
Aug 31 19:42:44 prod01 kernel: [1222018.016458] CPU: 15 PID: 286 Comm: irq/127-aerdrv Tainted: G D 5.15.0-73-generic #80-Ubuntu
Aug 31 19:42:44 prod01 kernel: [1222018.016460] Hardware name: ASUSTeK COMPUTER INC. System Product Name/W680/MB DC, BIOS 2004 12/16/2022
Aug 31 19:42:44 prod01 kernel: [1222018.016461] RIP: 0010:0xffffae8d00cf7ef8
Aug 31 19:42:44 prod01 kernel: [1222018.016462] Code: 00 00 e8 7e cf 00 8d ae ff ff 00 00 00 00 00 00 00 00 08 7f cf 00 8d ae ff ff aa 68 6e b1 ff ff ff ff c0 cb cb c3 42 8b ff ff <0b> 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 38 7f cf 00 8d ae
Aug 31 19:42:44 prod01 kernel: [1222018.016464] RSP: 0000:ffffae8d00cf7ee8 EFLAGS: 00010246
Aug 31 19:42:44 prod01 kernel: [1222018.016464] RAX: ffffae8d00cf7ef8 RBX: ffff8b42c3cbcbc0 RCX: 0000000000000170
Aug 31 19:42:44 prod01 kernel: [1222018.016465] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffae8d00cf7ec8
Aug 31 19:42:44 prod01 kernel: [1222018.016466] RBP: ffffae8d00cf7f08 R08: ffffffffb348a7c0 R09: 0000000000000019
Aug 31 19:42:44 prod01 kernel: [1222018.016467] R10: 000000000000000a R11: 000000000000000f R12: ffff8b42c3cbcbc0
Aug 31 19:42:44 prod01 kernel: [1222018.016468] R13: ffff8b42c3cbd7fc R14: 0000000000000001 R15: 0000000000000000
Aug 31 19:42:44 prod01 kernel: [1222018.016469] FS: 0000000000000000(0000) GS:ffff8b61bf3c0000(0000) knlGS:0000000000000000
Aug 31 19:42:44 prod01 kernel: [1222018.016469] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 31 19:42:44 prod01 kernel: [1222018.016470] CR2: ffffae8d00cf7ef8 CR3: 0000000195ce0003 CR4: 0000000000770ee0
Aug 31 19:42:44 prod01 kernel: [1222018.016471] PKRU: 55555554
Aug 31 19:42:44 prod01 kernel: [1222018.016474] Call Trace:
Aug 31 19:42:44 prod01 kernel: [1222018.016474] <TASK>
Aug 31 19:42:44 prod01 kernel: [1222018.016475] ? task_work_run+0x6a/0xb0
Aug 31 19:42:44 prod01 kernel: [1222018.016477] do_exit+0x217/0x3c0
Aug 31 19:42:44 prod01 kernel: [1222018.016479] make_task_dead+0x32/0x40
Aug 31 19:42:44 prod01 kernel: [1222018.016480] rewind_stack_and_make_dead+0x17/0x20
Aug 31 19:42:44 prod01 kernel: [1222018.016482] </TASK>
Aug 31 19:42:44 prod01 kernel: [1222018.016482] Modules linked in: 8021q garp mrp stp llc ip6t_REJECT nf_reject_ipv6 xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype xt_tcpudp nft_counter xt_recent xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables cpuid nfnetlink tls intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp nls_is
o8859_1 coretemp kvm_intel kvm input_leds joydev serio_raw eeepc_wmi wmi_bmof acpi_tad acpi_pad mac_hid sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr drm ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq
libcrc32c raid1 mfd_aaeon asus_wmi sparse_keymap platform_profile crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd intel_lpss_pci i2c_i801 nvme ahci
Aug 31 19:42:44 prod01 kernel: [1222018.016500] xhci_pci intel_lpss igc i2c_smbus libahci xhci_pci_renesas nvme_core idma64 wmi video pinctrl_alderlake
Aug 31 19:42:44 prod01 kernel: [1222018.016506] CR2: ffffae8d00cf7ef8
Aug 31 19:42:44 prod01 kernel: [1222018.016507] ---[ end trace 3e395bfa91f4ed5c ]---
Aug 31 19:42:44 prod01 kernel: [1222018.159906] RIP: 0010:free_msi_irqs+0x110/0x140
Aug 31 19:42:44 prod01 kernel: [1222018.159908] Code: 85 c0 0f 84 45 ff ff ff 45 31 f6 eb 11 41 83 c6 01 44 39 73 14 0f 86 32 ff ff ff 8b 7b 10 44 01 f7 e8 a4 44 a6 ff 84 c0 74 e3 <0f> 0b 49 8b 7d 60 e8 b5 93 9b ff 49 8b 45 00 eb 8d 49 8d b5 d0 00
Aug 31 19:42:44 prod01 kernel: [1222018.159910] RSP: 0000:ffffae8d00cf7c30 EFLAGS: 00010202
Aug 31 19:42:44 prod01 kernel: [1222018.159912] RAX: 0000000000000001 RBX: ffff8b42c34e5100 RCX: 0000000000000000
Aug 31 19:42:44 prod01 kernel: [1222018.159913] RDX: ffff8b42c4b97ef0 RSI: 00000000000000a6 RDI: ffffffffb357b540
Aug 31 19:42:44 prod01 kernel: [1222018.159914] RBP: ffffae8d00cf7c58 R08: 0000000000000000 R09: ffffffffb357b548
Aug 31 19:42:44 prod01 kernel: [1222018.159915] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b42c4095300
Aug 31 19:42:44 prod01 kernel: [1222018.159916] R13: ffff8b42c4095000 R14: 0000000000000000 R15: 00000000ffffffff
Aug 31 19:42:44 prod01 kernel: [1222018.159917] FS: 0000000000000000(0000) GS:ffff8b61bf3c0000(0000) knlGS:0000000000000000
Aug 31 19:42:44 prod01 kernel: [1222018.159918] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 31 19:42:44 prod01 kernel: [1222018.159919] CR2: ffffae8d00cf7ef8 CR3: 0000000195ce0003 CR4: 0000000000770ee0
Aug 31 19:42:44 prod01 kernel: [1222018.159920] PKRU: 55555554
Aug 31 19:42:44 prod01 kernel: [1222018.159921] Fixing recursive fault but reboot is needed!
Aug 31 20:25:51 prod01 kernel: [ 0.000000] microcode: microcode updated early to revision 0x119, date = 2023-06-06
A

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.15.0-82-generic 5.15.0-82.91
ProcVersionSignature: Ubuntu 5.15.0-82.91-generic 5.15.111
Uname: Linux 5.15.0-82-generic x86_64
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Aug 31 20:25 seq
crw-rw---- 1 root audio 116, 33 Aug 31 20:25 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
Date: Thu Aug 31 23:44:27 2023
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 0e8f:0020 GreenAsia Inc. USB to PS/2 Adapter
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Lsusb-t:
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/9p, 20000M/x2
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
|__ Port 6: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 1.5M
|__ Port 6: Dev 2, If 1, Class=Human Interface Device, Driver=usbhid, 1.5M
MachineType: ASUSTeK COMPUTER INC. System Product Name
PciMultimedia:

ProcEnviron:
LC_CTYPE=C.UTF-8
TERM=xterm-256color
PATH=(custom, no user)
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.15.0-82-generic root=UUID=e9d2e278-e29c-4073-8aa4-5f90edf16c3a ro consoleblank=0 systemd.show_status=true consoleblank=0
RelatedPackageVersions:
linux-restricted-modules-5.15.0-82-generic N/A
linux-backports-modules-5.15.0-82-generic N/A
linux-firmware 20220329.git681281e4-0ubuntu3.18
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/16/2022
dmi.bios.release: 20.4
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2004
dmi.board.asset.tag: Default string
dmi.board.name: W680/MB DC
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2004:bd12/16/2022:br20.4:svnASUSTeKCOMPUTERINC.:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnW680/MBDC:rvrRev1.xx:cvnDefaultstring:ct3:cvrDefaultstring:skuSKU:
dmi.product.family: Server
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.version: System Version
dmi.sys.vendor: ASUSTeK COMPUTER INC.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2033693/+subscriptions

Комментариев нет:

Отправить комментарий