First tagged release of bun-server-bench — a benchmark and trajectory dataset for evaluating AI coding agents on real-world Bun server engineering tasks.
Contents
- 50 versioned tasks across 15+ backend categories (auth, databases, idempotency, rate limiting, websockets, …), each engineered so a plausible-but-wrong solution passes public tests and fails the hidden ones.
- 120 verified SFT trajectories + 120 patch records (full-credit runs only; hidden tests and reference solutions excluded).
Published artifacts
- Hugging Face dataset: https://huggingface.co/datasets/tinycomputerai/bun-server-bench-trajectories
- Harbor dataset: https://hub.harborframework.com/datasets/tinycomputerai/bun-server-bench
Release assets below: full source tarball, SFT + patch JSONL, and the manifest (counts + source commit).