OptimizeSpec: Build Eval-Driven Agent Optimization Systems

OptimizeSpec helps you make your agent better in a measured way, even if you have never built an eval suite or optimization loop before.

You start with a plain-language goal, such as "make support-triage answers more complete."

OptimizeSpec guides your coding agent through a spec-driven development workflow to turn your request into an eval spec, scoring criteria, and optimization code.

Even if you haven’t collected evals yet, this exercise will give you an understanding of what your evals should look like and what you need to collect.

What You Get

A structured workflow for turning an improvement idea into evals, scoring, and optimization code.
Production-equivalent evals against your real agent runtime, tools, skills, MCP servers, environment, and permissions.
Traceable optimization results with candidate IDs, per-case rollouts, scores, feedback, and a selected best candidate.

If OptimizeSpec helps you build better agent evals and optimization loops, give us a ⭐!

Quick Start

Install the CLI:

bun install -g optimizespec

Then install the skills:

npx skills add terminaluse/optimizespec --skill '*'

Initialize the project metadata once:

optimizespec init

Now create or update your optimization system with the skills:

/optimizespec-new
Create an optimization system to improve the agent in this folder

Continue until all the spec artifacts are generated:

/optimizespec-continue

Implement the spec:

/optimizespec-apply

How OptimizeSpec Works

OptimizeSpec skills include contracts for building optimization systems for agents. Your coding agent uses those contracts to implement the runner, scorer, optimizer, adapter, evidence ledger, candidate registry, and verification flow for your agent.

The core contracts are runtime-neutral. The skills include a reference system for Python Claude Managed Agents, and contributions for other hosted agent runtimes and languages are welcome.

How Optimization Works

The generated optimization system uses GEPA's Optimize Anything API as the optimization engine. OptimizeSpec defines the eval runner, scorer, candidate surface, ASI feedback, and evidence ledger; GEPA uses those pieces to evaluate candidates, reflect on live failures, propose mutations, and select better candidates.

GEPA is a reflective evolutionary optimizer: it improves text-representable candidates by combining scores, traces, feedback, and Pareto-efficient search. Read How GEPA Works for the underlying optimization loop.

What Spec Artifacts Get Created

OptimizeSpec keeps planning artifacts in one root folder:

optimizespec/changes/<change-name>/
  proposal.md
  design.md
  specs/
  tasks.md

The proposal records where the optimization-system code will live and where run evidence will be written:

## Optimization System Location

- Decision: create new folder|use existing folder
- Path: <repo-relative eval, tooling, or package-adjacent path>
- Import/runtime access plan: <how generated code imports or invokes the real agent modules>
- Run outputs path: runs/

$optimizespec-apply <change-name> writes runner, scorer, optimizer, adapter, and evidence-ledger code to the recorded executable path.

Note: Choose the path based on your repo's structure. The executable optimization system should usually live in an existing eval, test, tooling, or agent package-adjacent folder, where it can import or invoke the real agent, tools, skills, MCP servers, environment configuration, and permissions through a narrow adapter.

What a Run Produces

An optimizer run outputs:

optimizer-summary.json records the selected candidate, score summary, per-case live scores, budgets, and artifact paths.
candidates.json records every candidate with stable candidate IDs so scores can be traced back to prompts or other candidate surfaces.
rollout.json, score.json, and side_info.json capture per-case execution evidence, grader output, feedback, errors, and ASI inputs.

Learn More

Contract references for runner, grader, candidate, optimizer, and runtime contracts.
TECHNICAL.md for architecture, package boundaries, and release notes.
How GEPA Works for GEPA's reflective evolutionary optimization loop.
DEVELOPMENT.md for local development.

Acknowledgements

OptimizeSpec is only possible due to all the great work Lakshya has done on GEPA.

OptimizeSpec's spec-driven development approach is inspired by OpenSpec which we highly recommend for daily development.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
bin		bin
scripts		scripts
skills		skills
src/cli		src/cli
test		test
.gitignore		.gitignore
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
TECHNICAL.md		TECHNICAL.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OptimizeSpec: Build Eval-Driven Agent Optimization Systems

What You Get

Quick Start

How OptimizeSpec Works

Learn More

Acknowledgements

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OptimizeSpec: Build Eval-Driven Agent Optimization Systems

What You Get

Quick Start

How OptimizeSpec Works

Learn More

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages