Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : riscv: add 128-bit RVV support #12530

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

xctan
Copy link
Contributor

@xctan xctan commented Mar 23, 2025

This pull request adds vec_dot compatibility for RISC-V V extension (RVV) architectures using 128-bit VLEN configurations. While prior implementations in pull requests #2929 and #3453 established RVV support for systems with 256-bit VLEN and higher, those kernels proved incompatible with 128-bit VLEN architectures. To address this, the update implements dynamic kernel selection through runtime checks, adapts legacy kernels for 128-bit compatibility, and enhances performance of k-quant kernels for 128-bit VLEN. Additionally, the PR incorporates support for the RISC-V Zfhmin extension to accelerate float16 data type conversions.

Some k-quant kernels now use RVV 128b-optimized inline assembly to bypass compiler limitations (riscv64-linux-gnu-gcc v14.2.0), resolving inadequate and excessive register group spills when intrinsics are used. Manual assembly ensures efficient register allocation.

ggml_vec_dot_q2_K_q8_K
ggml_vec_dot_q3_K_q8_K
ggml_vec_dot_q4_K_q8_K
ggml_vec_dot_q6_K_q8_K

Verification

By running the Q2_K_L quantized model of DeepSeek-R1-Distill-Llama-8B, I've confirmed the RVV accelerated kernels are not introducing substantial numeric errors compared to the scalar implementation (RVV support disabled during compilation).

scalar rvv128 (this PR)
20.0849 +/- 0.17272 20.0669 +/- 0.17253

Performance

Performance was measured using the same model as above, on a 64-core RISC-V rv64gcv machine with 128-bit VLEN configuration.

model size params backend threads test t/s note
llama 8B Q2_K - Medium 3.07 GiB 8.03 B CPU 64 pp512 3.18 ± 0.00 scalar
llama 8B Q2_K - Medium 3.07 GiB 8.03 B CPU 64 pp512 27.19 ± 0.11 rvv128
llama 8B Q2_K - Medium 3.07 GiB 8.03 B CPU 64 tg128 2.94 ± 0.00 scalar
llama 8B Q2_K - Medium 3.07 GiB 8.03 B CPU 64 tg128 11.10 ± 0.03 rvv128

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants