Sharp little VRAM calculator for open LLMs. ⚡
Live app: https://kvanta.vcerny.cz
Source: github.com/vaclcer/kvanta
kvanta fetches public Hugging Face model configs and safetensors metadata, then calculates KV-cache memory and estimated model-weight footprint from context size, batch count, precision, and quantization.
- Exact adapters for standard decoder-only, GQA/MQA, sliding-window, GLM MoE DSA, DeepSeek V4, and Qwen3.5 hybrid cache layouts.
- Model weights are estimated from Hugging Face
safetensors.totalmetadata. - Runtime VRAM can still drift by inference engine because allocators, kernels, paged attention, and activation buffers all have opinions.
npm install
npm run devUseful checks:
npm run lint
npm run typecheck
npm test
npm run build