KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
-
Updated
Jul 10, 2024 - Python
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Use your open source local model from the terminal
Local AI Search assistant web or CLI for ollama and llama.cpp. Lightweight and easy to run, providing a Perplexity-like experience.
Summarize emails received by Thunderbird mail client extension via locally run LLM. Early development.
Add a description, image, and links to the localllama topic page so that developers can more easily learn about it.
To associate your repository with the localllama topic, visit your repo's landing page and select "manage topics."