ggml : riscv: add 128-bit RVV support #12530
Open
+789
−405
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request adds
vec_dot
compatibility for RISC-V V extension (RVV) architectures using 128-bit VLEN configurations. While prior implementations in pull requests #2929 and #3453 established RVV support for systems with 256-bit VLEN and higher, those kernels proved incompatible with 128-bit VLEN architectures. To address this, the update implements dynamic kernel selection through runtime checks, adapts legacy kernels for 128-bit compatibility, and enhances performance of k-quant kernels for 128-bit VLEN. Additionally, the PR incorporates support for the RISC-V Zfhmin extension to accelerate float16 data type conversions.Some k-quant kernels now use RVV 128b-optimized inline assembly to bypass compiler limitations (riscv64-linux-gnu-gcc v14.2.0), resolving inadequate and excessive register group spills when intrinsics are used. Manual assembly ensures efficient register allocation.
Verification
By running the Q2_K_L quantized model of DeepSeek-R1-Distill-Llama-8B, I've confirmed the RVV accelerated kernels are not introducing substantial numeric errors compared to the scalar implementation (RVV support disabled during compilation).
Performance
Performance was measured using the same model as above, on a 64-core RISC-V rv64gcv machine with 128-bit VLEN configuration.