Releases: vitaliikapliuk/modelharness
Releases · vitaliikapliuk/modelharness
v0.3.0 — first public release
modelharness distils Fable 5's documented working practices into a zero-config.
Claude Code plugin — and measures exactly what that buys, across four Claude models.
Benchmark
17 agentic-coding tasks × 8 configurations × 3 reps = 408 runs. Hidden binary grading, no LLM judge, a reference solution per task.
What the data supports (paired per-task, 95% CI — bench/stats.py)
- Opus 4.8 — cost −12.0% [−17.3, −6.7] and time −16.5% [−25.3, −7.7]: statistically significant. The flagship subscription model gets a real win.
- Fable 5 — significant speed-up (−11.4%), even against the model these patterns came from.
- Sonnet 4.6 / Haiku 4.5 — cost and time within run-to-run noise: not a reliable saving, but never a reliable loss. Haiku additionally goes 98% → 100% pass.
- Quality is exact, not sampled: 407/408 runs passed.
Install
/plugin marketplace add vitaliikapliuk/modelharness
/plugin install modelharness@modelharness
Full details in the README; version history in CHANGELOG.md; grading corrections in bench/GRADING.md