пятница

[Bug 2147137] Re: rocblas-test crashes with hipErrorIllegalAddress(700)

** Tags added: kernel-daily-bug -- You received this bug notification because you are subscribed to linux in Ubuntu. Matching subscriptions: Bgg, Bmail, Nb https://bugs.launchpad.net/bugs/2147137 Title: rocblas-test crashes with hipErrorIllegalAddress(700) Status in Linux: Unknown Status in linux package in Ubuntu: Triaged Status in rocblas package in Ubuntu: Triaged Bug description: The built-in test suite for rocblas is crashing repeatably with memory access errors, usually a hipError 700 which translates to hipErrorIllegalAddress. ## Reproducing The entire test suite does not cause problems, the easiest way to get this crash to manifest is to use './rocblas-test --gtest_filter=*trsm_batched*' ## Debug information When run through gdb, '/usr/libexec/rocm/librocblas5-tests/rocblas- test --gtest_filter=*trsm_batched*' generates the following output and backtrace: Query device success: there are 1 devices ------------------------------------------------------------------------------- Device ID 0 : AMD Radeon Pro W7900 gfx1100 with 48.3 GB memory, max. SCLK 1760 MHz, max. MCLK 1124 MHz, memoryBusWidth 48 Bytes, compute capability 11.0 maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 32 ------------------------------------------------------------------------------- info: parsing of test data may take a couple minutes before any test output appears... Note: Google Test filter = *trsm_batched* [==========] Running 13969 tests from 3 test suites. [----------] Global test environment set-up. [----------] 11916 tests from _/trsm_batched [New Thread 0x7ffe641af6c0 (LWP 1407327)] [New Thread 0x7ffcea7ff6c0 (LWP 1407328)] [Thread 0x7ffcea7ff6c0 (LWP 1407328) exited] Signal 0x7ffce2d05900 time stamps may be invalid. clients/common/../include/blas3/testing_trsm_batched.hpp:546: Failure Expected equality of these values:   hXorB_1.transfer_from(dXorB)     Which is: 700   hipSuccess     Which is: 0 Error: hipMemcpy post-guard copy failure. clients/gtest/../include/d_vector.hpp:165: Failure Expected equality of these values:   memcmp(host.data(), m_guard, m_guard_len)     Which is: -203   0 clients/gtest/../include/d_vector.hpp:190: Failure Expected equality of these values:   (hipFree)(d)     Which is: 700   hipSuccess     Which is: 0 clients/gtest/../include/device_batch_matrix.hpp:391: Failure Expected equality of these values:   (hipFree)(tmp_device_data)     Which is: 700   hipSuccess     Which is: 0 Error: hipMemcpy post-guard copy failure. clients/gtest/../include/d_vector.hpp:165: Failure Expected equality of these values:   memcmp(host.data(), m_guard, m_guard_len)     Which is: -203   0 Error: hipMemcpy pre-guard copy failure. clients/gtest/../include/d_vector.hpp:175: Failure Expected equality of these values:   memcmp(host.data(), m_guard, m_guard_len)     Which is: -203   0 clients/gtest/../include/d_vector.hpp:190: Failure Expected equality of these values:   (hipFree)(d)     Which is: 700   hipSuccess     Which is: 0 clients/gtest/../include/device_batch_matrix.hpp:391: Failure Expected equality of these values:   (hipFree)(tmp_device_data)     Which is: 700   hipSuccess     Which is: 0 rocBLAS error retreiving the device (deviceID: 32767) Thread 1 "rocblas-test" received signal SIGABRT, Aborted. __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=0) at ./nptl/pthread_kill.c:44 warning: 44 ./nptl/pthread_kill.c: No such file or directory (gdb) bt #0 __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=0) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (threadid=<optimized out>, signo=6) at ./nptl/pthread_kill.c:89 #2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:100 #3 0x00007fffebc2fb7e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007fffebc128ec in __GI_abort () at ./stdlib/abort.c:77 #5 0x00007fffeead5477 in rocblas_abort () at /usr/src/rocblas-7.1.0-1ubuntu4/library/src/rocblas_ostream.cpp:81 #6 0x00007fffeeace40e in _rocblas_handle::~_rocblas_handle (this=<optimized out>) at /usr/src/rocblas-7.1.0-1ubuntu4/library/src/include/rocblas_ostream.hpp:537 #7 0x00007fffeead2cd3 in rocblas_destroy_handle (handle=0x5555d5bbcfd0) at /usr/src/rocblas-7.1.0-1ubuntu4/library/src/rocblas_auxiliary.cpp:230 #8 0x000055555818e5ad in rocblas_local_handle::~rocblas_local_handle (this=0x7fffffffdad8) at /usr/src/rocblas-7.1.0-1ubuntu4/clients/common/client_utility.cpp:519 #9 0x0000555557a2ec84 in testing_trsm_batched<double> (arg=...) at /usr/src/rocblas-7.1.0-1ubuntu4/clients/common/../include/blas3/testing_trsm_batched.hpp:775 #10 0x000055555819e7b5 in std::function<void()>::operator() (this=0x7fffffffdc90) at /usr/lib/gcc/x86_64-linux-gnu/16/../../../../include/c++/16/bits/std_function.h:581 #11 catch_signals_and_exceptions_as_failures (test=..., set_alarm=true) at /usr/src/rocblas-7.1.0-1ubuntu4/clients/common/gtest_helpers.cpp:199 #12 0x0000555555c374c5 in (anonymous namespace)::trsm_batched_blas3_tensile_Test::TestBody (this=<optimized out>) at /usr/src/rocblas-7.1.0-1ubuntu4/clients/gtest/blas3/trsm_gtest.cpp:201 #13 0x00005555581fb6c7 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) () #14 0x00005555581e0dae in testing::Test::Run() () #15 0x00005555581e0f35 in testing::TestInfo::Run() () #16 0x00005555581ebd47 in testing::TestSuite::Run() () #17 0x00005555581f0ebc in testing::internal::UnitTestImpl::RunAllTests() () #18 0x00005555581fbd27 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) () #19 0x00005555581e0fca in testing::UnitTest::Run() () #20 0x000055555587efc9 in RUN_ALL_TESTS () at /usr/include/gtest/gtest.h:2334 #21 main (argc=1, argv=0x7fffffffe458) at /usr/src/rocblas-7.1.0-1ubuntu4/clients/gtest/rocblas_gtest_main.cpp:344 ## versions and affected HW Arch: amd64 Tested version: rocblas 7.1.0-1ubuntu4 AMD gpu ISAs tested: gfx1100, gfx1101, gfx1201 To manage notifications about this bug go to: https://bugs.launchpad.net/linux/+bug/2147137/+subscriptions

Комментариев нет:

Отправить комментарий