четверг

[Bug 2131160] Re: Illegal memory access with Strix Halo

Yes internal team validated the updated binaries from ppa prevent crash
from ROCm use on Strix halo, updated tags.

** Tags removed: verification-needed-noble-linux-oem-6.14 verification-needed-noble-linux-oem-6.17
** Tags added: verification-done-noble-linux-oem-6.14 verification-done-noble-linux-oem-6.17

--
You received this bug notification because you are subscribed to linux
in Ubuntu.
Matching subscriptions: Bgg, Bmail, Nb
https://bugs.launchpad.net/bugs/2131160

Title:
Illegal memory access with Strix Halo

Status in linux package in Ubuntu:
New
Status in linux-oem-6.14 package in Ubuntu:
Invalid
Status in linux-oem-6.17 package in Ubuntu:
Invalid
Status in linux source package in Noble:
Won't Fix
Status in linux-oem-6.14 source package in Noble:
Fix Committed
Status in linux-oem-6.17 source package in Noble:
Fix Committed
Status in linux source package in Questing:
New
Status in linux-oem-6.14 source package in Questing:
Invalid
Status in linux-oem-6.17 source package in Questing:
Invalid
Status in linux source package in Resolute:
New
Status in linux-oem-6.14 source package in Resolute:
Invalid
Status in linux-oem-6.17 source package in Resolute:
Invalid

Bug description:
[Impact]
When running Comfy-UI with Strix Halo an illegal memory access occurs intermittently like below:

  0%| | 0/20 [00:00<?, ?it/s]
  5%|â–Œ | 1/20 [00:01<00:23, 1.21s/it]
 10%|â–ˆ | 2/20 [00:06<01:01, 3.43s/it]
 15%|█▌ | 3/20 [00:11<01:10, 4.14s/it]
 20%|██ | 4/20 [00:16<01:11, 4.49s/it]
 25%|██▌ | 5/20 [00:21<01:13, 4.90s/it]
 30%|███ | 6/20 [00:26<01:09, 4.93s/it]
 35%|███▌ | 7/20 [00:31<01:04, 4.96s/it]
 40%|████ | 8/20 [00:36<00:59, 4.98s/it]
 45%|████▌ | 9/20 [00:41<00:54, 4.98s/it]
 50%|█████ | 10/20 [00:46<00:49, 4.99s/it]
 55%|█████▌ | 11/20 [00:52<00:46, 5.20s/it]
 60%|██████ | 12/20 [00:57<00:41, 5.14s/it]
 65%|██████▌ | 13/20 [01:02<00:35, 5.09s/it]
 70%|███████ | 14/20 [01:07<00:30, 5.07s/it]
 75%|███████▌ | 15/20 [01:12<00:25, 5.06s/it]
 80%|████████ | 16/20 [01:18<00:21, 5.26s/it]
 85%|████████▌ | 17/20 [01:23<00:15, 5.24s/it]
 90%|█████████ | 18/20 [01:28<00:10, 5.24s/it]
 95%|█████████▌| 19/20 [01:33<00:05, 5.24s/it]
100%|██████████| 20/20 [01:39<00:00, 5.24s/it]
100%|██████████| 20/20 [01:39<00:00, 4.96s/it]
/home/taccuser/workspace/7.1-18/dart/scripts/rocm-on-radeon/semi-automated-scripts/pyt_rocm_flux_inf_comfy-ui/ComfyUI/comfy/sd.py:694: UserWarning: HIP warning: an illegal memory access was encountered (Triggered internally at /pytorch/aten/src/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h:83.)
  out = self.process_output(self.first_stage_model.decode(samples, **vae_options).to(self.output_device).float())
!!! Exception during processing !!! HIP error: an illegal memory access was encountered

[Test Plan]
Run Comfy-UI using ROCm stack with matching ROCr change on Strix halo and ensure it's stable.

[ Where problems could occur ]
ROCm applications (which use KFD interface)

[Other Info]
This issue is caused by a mismatch of expectations between the KFD and thunk CWSR save area size. It is fixed by this change from 6.18:

https://git.kernel.org/torvalds/c/d15deafab5d72

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2131160/+subscriptions

Комментариев нет:

Отправить комментарий