Skip to content
@zetta-app

zetta.app

Game development studio based in Prague

Popular repositories Loading

  1. llama.cpp llama.cpp Public

    Forked from ggml-org/llama.cpp

    LLM inference in C/C++

    C++

  2. llama.cpp_turboquant llama.cpp_turboquant Public

    LLM inference with 7x KV cache compression. Combines llama.cpp (production inference engine) with TurboQuant (KV quantization). Run 131K token context on 16GB VRAM. OpenAI-compatible API server. Su…

    Shell

  3. quant.cpp quant.cpp Public

    Forked from quantumaikr/quant.cpp

    LLM inference with 7x longer context. Pure C, zero dependencies. Lossless KV cache compression + single-header library.

    C

  4. vllm vllm Public

    Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python

Repositories

Showing 4 of 4 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…