ZMLabs — Benchmarks & Proofs

Honesty first. If we can't measure it, we don't sell it. Every headline number in this repository is backed by a raw measurement file under benchmarks/, with full conditions (model, quantization, hardware, batch, sample count, warm/cold). Capacity is never reported as speed. Estimated or single-run numbers are labelled as such — or kept out entirely. "Stand behind" means measured, not believed: a number ships only with its proof file attached, and anything we measured but could not stand behind — fragile edges, non-reproducible ratios — is listed with its reason in AUDIT_SUMMARY.md.

ZMLabs — Sète, France · contact: contact.zmlabs@proton.me

This repository gathers the measured benchmarks behind ZMLabs' four products. It contains only proofs with complete measurement conditions; modelled, synthetic, or not-yet-reproduced numbers are deliberately excluded (see benchmarks/BENCHMARKS_MANIFEST.md and AUDIT_SUMMARY.md).

The four products

🧠 memown — run a large local AI on a small machine

A local inference layer focused on capacity, not raw speed: it boots, serves and decodes a 120B-class MoE model on an 8 GB GPU + system RAM. This run shows the model runs at all on an 8 GB GPU even with only 16 GB RAM (experts NVMe-streamed, ~1.5 tok/s, n=1; see benchmarks/memown/local_120b_8gb_decode.json) — read it as the worst case that still works, not memown's nominal speed. With more system RAM the decode rises: the same 120B reaches 7.62 tok/s with 188 GB RAM (a server run). On a mid-range MoE (Qwen3-30B-A3B Q4_K_M, RTX 4070 8 GB) it serves at ~18–21 tok/s robustly (up to ~22 at a memory-fragile edge — disclosed). Quality changes from memory tiering are reported honestly (e.g. perplexity +20 % on a GPT-2 / WikiText-2 reference — not "lossless"). → proofs: benchmarks/memown/

🎮 diciz — an LLM that runs in your browser

A WebGPU runtime that runs real GGUF models 100 % on-device, no install, no Python. Measured throughput (single measurement, disclosed): 31.2 tok/s (Qwen2.5-7B Q4_K_M) and 94.2 tok/s (Qwen2.5-0.5B) on an RTX 4070 Laptop via Chrome WebGPU. Also ships browser-side frame interpolation (RIFE-Nano), measured rigorously (N=50, warm-up discarded). → proofs: benchmarks/diciz/

🛟 VRAMPilot — never crash on out-of-memory again

A UX/automation layer over llama.cpp that auto-fits a GGUF model to your GPU and recovers from a runtime out-of-memory instead of crashing. Measured end-to-end (no mocks): a forced OOM at 262 144 ctx backs off to 131 072 ctx and keeps serving (RTX 3070 8 GB, Qwen2.5-7B-Q4_K_M); decode energy 0.84 J/token (RAPL, CPU-package only — disclosed). Cross-platform support is reported precisely: CUDA & Vulkan demonstrated; Apple/Metal recovery not yet demonstrated (stated as such). → proofs: benchmarks/vrampilot/

🖼️ GlassBreakr — more frames per second through frame generation

A Windows frame-generation overlay (RIFE-Nano). The "up to 8×" figure is a throughput of generated frames (7 interpolated frames produced in 15.8 ms at 1080p, RTX 4070), disclosed as frame-gen — not an image-quality claim and not a render-speed claim. Real responsiveness is ~2×; impact on the main GPU is ≤ 9.6 % (measured, disclosed as low — never "zero"). → proofs: benchmarks/glassbreakr/

How to read these benchmarks

Capacity ≠ speed. Capacity means it runs at all — even if sustained decode is modest (~1.5 tok/s on 8 GB).
Throughput ≠ quality. A frame-gen "8×" is frames produced, not visual fidelity.
Conditions matter. Every number lives next to its model, quantization, hardware, sample count and warm/cold state. A bare number is not a benchmark.
Single runs are labelled. Where n=1, it is said so; rigorous figures carry N and discarded warm-up.

See AUDIT_SUMMARY.md for the per-file nature-of-proof and the explicit caveats (what each measurement does not prove).

Published by ZMLabs. Every figure here is a real measurement with its proof file attached.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmarks		benchmarks
.gitignore		.gitignore
AUDIT_SUMMARY.md		AUDIT_SUMMARY.md
LICENSE		LICENSE
README.md		README.md
REPRODUCIBILITY.md		REPRODUCIBILITY.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ZMLabs — Benchmarks & Proofs

The four products

🧠 memown — run a large local AI on a small machine

🎮 diciz — an LLM that runs in your browser

🛟 VRAMPilot — never crash on out-of-memory again

🖼️ GlassBreakr — more frames per second through frame generation

How to read these benchmarks

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

ZMLabs — Benchmarks & Proofs

The four products

🧠 memown — run a large local AI on a small machine

🎮 diciz — an LLM that runs in your browser

🛟 VRAMPilot — never crash on out-of-memory again

🖼️ GlassBreakr — more frames per second through frame generation

How to read these benchmarks

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages