Popular repositories Loading
-
-
llama.cpp_turboquant
llama.cpp_turboquant PublicLLM inference with 7x KV cache compression. Combines llama.cpp (production inference engine) with TurboQuant (KV quantization). Run 131K token context on 16GB VRAM. OpenAI-compatible API server. Su…
Shell
-
quant.cpp
quant.cpp PublicForked from quantumaikr/quant.cpp
LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.
C
-
vllm
vllm PublicForked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
Repositories
- vllm Public Forked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
zetta-app/vllm’s past year of commit activity - quant.cpp Public Forked from quantumaikr/quant.cpp
LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.
zetta-app/quant.cpp’s past year of commit activity - llama.cpp_turboquant Public
LLM inference with 7x KV cache compression. Combines llama.cpp (production inference engine) with TurboQuant (KV quantization). Run 131K token context on 16GB VRAM. OpenAI-compatible API server. Supports 100+ model architectures.
zetta-app/llama.cpp_turboquant’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…