Releases · timtoole02/Camelid

Camelid v0.1.0 is an evidence-first release: it claims exactly what the repository can defend with committed artifacts, and states its boundaries as plainly as its results.

What this is

A Rust-native local GGUF inference backend for Apple Silicon with an OpenAI-style API and a React/Vite chat frontend. Q8_0 weights load directly from GGUF (no conversion step), run on a Metal-resident GPU path with greedy sampling on the GPU, and fall back to validated CPU paths where the resident gates do not apply.

Supported rows (exact, evidence-cited)

Support is per exact model row with row-specific artifacts — see SUPPORT_MATRIX_v0.1.md:

TinyLlama 1.1B Chat Q8_0 — verified support gate
Llama 3.2 1B Instruct Q8_0 — verified bounded support
Llama 3.2 3B Instruct Q8_0 — supported exact-row smoke
Llama 3 8B Instruct Q8_0 — verified bounded support
Mistral-7B / Mixtral — evidence-only bring-up; fail-closed in v0.1

Performance (same host, same prompts, three alternating rounds; medians)

One Apple M4 (10-core GPU, 16 GB), comparators llama.cpp (Metal, brew) and MLX-LM (8-bit). Raw logs and methods in the committed evidence bundles under qa/evidence-bundles/; reading boundaries in BENCHMARKS.md.

Row / lane	Camelid	llama.cpp	MLX-LM
3B prefill, 601-token prompt (tok/s)	587.3	543.7	577.9
3B decode, short context (tok/s)	29.7	29.1	29.1
1B prefill (tok/s)	1664.3	1472.8	1670.0
1B decode (tok/s)	74.8	67.2	69.7
8B prefill (tok/s)	234.2	220.4	229.2
8B decode (tok/s)	12.1	12.1	12.0

Stated boundaries, in the same spirit:

These are short-prompt, same-session snapshots on exact rows; nothing transfers to other shapes or machines.
Past ~1.7k-token prompts Camelid prefill reads 2–4× below llama.cpp; decode at depth reads below both comparators (25.0 / 16.9 tok/s at 1.5k / 8k vs 26.4 / 19.1 and 26.9 / 22.2) — both recorded as known-behind lanes with their own bundles.
1B prefill is parity with MLX-LM (no win claimed); 8B decode is a three-way parity band.

Greedy continuations are token-parity-checked against the CPU reference path throughout, including the GPU sampling fast lane.

Gate

The release gate (RELEASE_GATE_v0.1.md) is green on the tag head: format, clippy (-D warnings, all features), 533 tests, release build, public evidence-claim and scrub checks, benchmark harness self-test, and the frontend build + model-state smoke. The Ollama comparator baseline is explicitly deferred with rationale recorded in the gate.

Full notes: RELEASE_NOTES_v0.1.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What this is

Supported rows (exact, evidence-cited)

Performance (same host, same prompts, three alternating rounds; medians)

Gate

Uh oh!

Releases: timtoole02/Camelid

Camelid v0.1.0

What this is

Supported rows (exact, evidence-cited)

Performance (same host, same prompts, three alternating rounds; medians)

Gate

Uh oh!