Skip to content

v0.1.0

Latest

Choose a tag to compare

@samrith-s samrith-s released this 20 Jun 19:11
v0.1.0

First tagged release of bun-server-bench — a benchmark and trajectory dataset for evaluating AI coding agents on real-world Bun server engineering tasks.

Contents

  • 50 versioned tasks across 15+ backend categories (auth, databases, idempotency, rate limiting, websockets, …), each engineered so a plausible-but-wrong solution passes public tests and fails the hidden ones.
  • 120 verified SFT trajectories + 120 patch records (full-credit runs only; hidden tests and reference solutions excluded).

Published artifacts

Release assets below: full source tarball, SFT + patch JSONL, and the manifest (counts + source commit).