Skip to content

Releases: vitaliikapliuk/modelharness

v0.3.0 — first public release

14 Jun 19:39

Choose a tag to compare

modelharness distils Fable 5's documented working practices into a zero-config.
Claude Code plugin — and measures exactly what that buys, across four Claude models.

Benchmark

17 agentic-coding tasks × 8 configurations × 3 reps = 408 runs. Hidden binary grading, no LLM judge, a reference solution per task.

What the data supports (paired per-task, 95% CI — bench/stats.py)

  • Opus 4.8 — cost −12.0% [−17.3, −6.7] and time −16.5% [−25.3, −7.7]: statistically significant. The flagship subscription model gets a real win.
  • Fable 5 — significant speed-up (−11.4%), even against the model these patterns came from.
  • Sonnet 4.6 / Haiku 4.5 — cost and time within run-to-run noise: not a reliable saving, but never a reliable loss. Haiku additionally goes 98% → 100% pass.
  • Quality is exact, not sampled: 407/408 runs passed.

Install

/plugin marketplace add vitaliikapliuk/modelharness
/plugin install modelharness@modelharness

Full details in the README; version history in CHANGELOG.md; grading corrections in bench/GRADING.md