Skip to content

terminaluse/OptimizeSpec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OptimizeSpec: Build Eval-Driven Agent Optimization Systems

OptimizeSpec helps you make your agent better in a measured way, even if you have never built an eval suite or optimization loop before.

You start with a plain-language goal, such as "make support-triage answers more complete."

OptimizeSpec guides your coding agent through a spec-driven development workflow to turn your request into an eval spec, scoring criteria, and optimization code.

Even if you haven’t collected evals yet, this exercise will give you an understanding of what your evals should look like and what you need to collect.

What You Get

  • A structured workflow for turning an improvement idea into evals, scoring, and optimization code.
  • Production-equivalent evals against your real agent runtime, tools, skills, MCP servers, environment, and permissions.
  • Traceable optimization results with candidate IDs, per-case rollouts, scores, feedback, and a selected best candidate.

If OptimizeSpec helps you build better agent evals and optimization loops, give us a ⭐!

Quick Start

  1. Install the CLI:
bun install -g optimizespec
  1. Then install the skills:
npx skills add terminaluse/optimizespec --skill '*'
  1. Initialize the project metadata once:
optimizespec init

Now create or update your optimization system with the skills:

/optimizespec-new
Create an optimization system to improve the agent in this folder

Continue until all the spec artifacts are generated:

/optimizespec-continue

Implement the spec:

/optimizespec-apply

How OptimizeSpec Works

OptimizeSpec skills include contracts for building optimization systems for agents. Your coding agent uses those contracts to implement the runner, scorer, optimizer, adapter, evidence ledger, candidate registry, and verification flow for your agent.

The core contracts are runtime-neutral. The skills include a reference system for Python Claude Managed Agents, and contributions for other hosted agent runtimes and languages are welcome.

How Optimization Works

The generated optimization system uses GEPA's Optimize Anything API as the optimization engine. OptimizeSpec defines the eval runner, scorer, candidate surface, ASI feedback, and evidence ledger; GEPA uses those pieces to evaluate candidates, reflect on live failures, propose mutations, and select better candidates.

GEPA is a reflective evolutionary optimizer: it improves text-representable candidates by combining scores, traces, feedback, and Pareto-efficient search. Read How GEPA Works for the underlying optimization loop.

What Spec Artifacts Get Created

OptimizeSpec keeps planning artifacts in one root folder:

optimizespec/changes/<change-name>/
  proposal.md
  design.md
  specs/
  tasks.md

The proposal records where the optimization-system code will live and where run evidence will be written:

## Optimization System Location

- Decision: create new folder|use existing folder
- Path: <repo-relative eval, tooling, or package-adjacent path>
- Import/runtime access plan: <how generated code imports or invokes the real agent modules>
- Run outputs path: runs/

$optimizespec-apply <change-name> writes runner, scorer, optimizer, adapter, and evidence-ledger code to the recorded executable path.

Note: Choose the path based on your repo's structure. The executable optimization system should usually live in an existing eval, test, tooling, or agent package-adjacent folder, where it can import or invoke the real agent, tools, skills, MCP servers, environment configuration, and permissions through a narrow adapter.

What a Run Produces

An optimizer run outputs:

  • optimizer-summary.json records the selected candidate, score summary, per-case live scores, budgets, and artifact paths.
  • candidates.json records every candidate with stable candidate IDs so scores can be traced back to prompts or other candidate surfaces.
  • rollout.json, score.json, and side_info.json capture per-case execution evidence, grader output, feedback, errors, and ASI inputs.

Learn More

Acknowledgements

OptimizeSpec is only possible due to all the great work Lakshya has done on GEPA.

OptimizeSpec's spec-driven development approach is inspired by OpenSpec which we highly recommend for daily development.

License

MIT

About

TypeScript CLI and skill pack for spec-driven development of optimization systems for agents.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors