Peer-to-peer distributed AI inference using 1-bit quantized models. CPU-only, 70-82% energy savings, 103+ tokens/sec. Validated on Zen 4 & Zen 5 (+35% cross-gen improvement).
-
Updated
Apr 27, 2026 - Python
Peer-to-peer distributed AI inference using 1-bit quantized models. CPU-only, 70-82% energy savings, 103+ tokens/sec. Validated on Zen 4 & Zen 5 (+35% cross-gen improvement).
Windows-native BitNet and ternary LLM inference with CPU GGUF, GPU runtime, terminal and browser chat, and release zips.
High-performance hybrid architecture for Agent Zero & BitNet b1.58. Natively optimized for Windows ARM64 (Snapdragon X Elite / Copilot+ PCs) using raw C++ inference and Docker-based agent orchestration.
Desktop chat app for Microsoft's 1-bit BitNet LLMs. Windows-native, CPU-only, zero dependencies
Add a description, image, and links to the 1-bit-llm topic page so that developers can more easily learn about it.
To associate your repository with the 1-bit-llm topic, visit your repo's landing page and select "manage topics."