We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revision 9b169a4
Linux
CPU
ARM Ampere
Qwen2.5-14B-Instruct-1M-Q5_K_M
Setting pooling_type = LLAMA_POOLING_TYPE_MEAN and calling llama_init_from_model() causes this crash:
/build/source/ggml/src/ggml.c:2738: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed
Setting to LLAMA_POOLING_TYPE_LAST and changing nothing else works correctly.
No response
The text was updated successfully, but these errors were encountered:
Possibly a duplicate of #12517
Sorry, something went wrong.
No branches or pull requests
Name and Version
Revision 9b169a4
Operating systems
Linux
GGML backends
CPU
Hardware
ARM Ampere
Models
Qwen2.5-14B-Instruct-1M-Q5_K_M
Problem description & steps to reproduce
Setting pooling_type = LLAMA_POOLING_TYPE_MEAN and calling llama_init_from_model() causes this crash:
Setting to LLAMA_POOLING_TYPE_LAST and changing nothing else works correctly.
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: