-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] Result of a high dimension matmul corrupts when program cache is enabled and assigning to an already allocated tensor and the 2nd matrix is transposed. At index 10240 of the result tensor #9849
Comments
Hey Marty, I will take a look at this. We haven't done much testing with our matmul APIs from C++ side, so thanks for pointing out this issue! |
Can you help provide the full test file with the includes and show how you built and run the test? |
@TT-BrianLiu No problem. I have uploaded the example code into a self-contained repository on my GitHub. Please let me know if the instructions in the README is not clear. I just tried again on c52e153 and I am experiencing the same issue. LMK if you cannot replicate it on your machine. |
Thank you! I will try running the test |
I was able to repro it and I pushed the test here: |
I figured our your issue. Our matmuls either support So, fix is simple. I will add the missing asserts for the matmul variants that are missing it, but let me explain what you're seeing in your tests. I will leave your test below for future reference since I will remove it when I merge the fix. I also removed everything that is not relevant (eg. the transpose, the extra allocation of
Reference code:
|
- This adds these checks to matmul_multicore and matmul_multicore_reuse as an intended side effect
- This adds these checks to matmul_multicore and matmul_multicore_reuse as an intended side effect
- This adds these checks to matmul_multicore and matmul_multicore_reuse as an intended side effect
- This adds these checks to matmul_multicore and matmul_multicore_reuse as an intended side effect
- This adds these checks to matmul_multicore and matmul_multicore_reuse as an intended side effect
- This adds these checks to matmul_multicore and matmul_multicore_reuse as an intended side effect
Added the appropriate checks here: #10013 |
Describe the bug
The result of a
1x10x16x256
by1x20x256x16
matmul is corrupted when program cache is enabled and a few very specific conditions. The situation is weirdly specificity. But happens to be exactly what I am doing when running GGML. I am able to create a minimal example:To Reproduce
Steps to reproduce the behavior:
NaN or corrupted value detected at index 10240
Expected behavior
There should not be a NaN in the result what so ever
Screenshots
If applicable, add screenshots to help explain your problem.
Please complete the following environment information:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: