ggml : riscv: add 128-bit RVV support #12530

xctan · 2025-03-23T15:37:57Z

This pull request adds vec_dot compatibility for RISC-V V extension (RVV) architectures using 128-bit VLEN configurations. While prior implementations in pull requests #2929 and #3453 established RVV support for systems with 256-bit VLEN and higher, those kernels proved incompatible with 128-bit VLEN architectures. To address this, the update implements dynamic kernel selection through runtime checks, adapts legacy kernels for 128-bit compatibility, and enhances performance of k-quant kernels for 128-bit VLEN. Additionally, the PR incorporates support for the RISC-V Zfhmin extension to accelerate float16 data type conversions.

Some k-quant kernels now use RVV 128b-optimized inline assembly to bypass compiler limitations (riscv64-linux-gnu-gcc v14.2.0), resolving inadequate and excessive register group spills when intrinsics are used. Manual assembly ensures efficient register allocation.

ggml_vec_dot_q2_K_q8_K
ggml_vec_dot_q3_K_q8_K
ggml_vec_dot_q4_K_q8_K
ggml_vec_dot_q6_K_q8_K

Verification

By running the Q2_K_L quantized model of DeepSeek-R1-Distill-Llama-8B, I've confirmed the RVV accelerated kernels are not introducing substantial numeric errors compared to the scalar implementation (RVV support disabled during compilation).

scalar	rvv128 (this PR)
20.0849 +/- 0.17272	20.0669 +/- 0.17253

Performance

Performance was measured using the same model as above, on a 64-core RISC-V rv64gcv machine with 128-bit VLEN configuration.

model	size	params	backend	threads	test	t/s	note
llama 8B Q2_K - Medium	3.07 GiB	8.03 B	CPU	64	pp512	3.18 ± 0.00	scalar
llama 8B Q2_K - Medium	3.07 GiB	8.03 B	CPU	64	pp512	27.19 ± 0.11	rvv128
llama 8B Q2_K - Medium	3.07 GiB	8.03 B	CPU	64	tg128	2.94 ± 0.00	scalar
llama 8B Q2_K - Medium	3.07 GiB	8.03 B	CPU	64	tg128	11.10 ± 0.03	rvv128

ggml/src/ggml-cpu/ggml-cpu-quants.c

xctan added 2 commits March 21, 2025 22:10

ggml : add 128-bit RVV support

4fac0d0

ggml : revert to old RVV 256+ q2_K, q3_K, q4_K, q6_K impl

de8387e

github-actions bot added the ggml label Mar 23, 2025

remove trailing whitespaces

0b43956

ggerganov reviewed Mar 25, 2025

View reviewed changes

ggml/src/ggml-cpu/ggml-cpu-quants.c Outdated Show resolved Hide resolved

restructure vector length selection code

d1cac3d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : riscv: add 128-bit RVV support #12530

ggml : riscv: add 128-bit RVV support #12530

xctan commented Mar 23, 2025 •

edited

Loading

ggml : riscv: add 128-bit RVV support #12530

Are you sure you want to change the base?

ggml : riscv: add 128-bit RVV support #12530

Conversation

xctan commented Mar 23, 2025 • edited Loading

Verification

Performance

xctan commented Mar 23, 2025 •

edited

Loading