refactor(bench): rename gepa-refine → improve-prompt (name by purpose, not method)#171
Conversation
…, not method) gepa-refine was named after the METHOD (GEPA) and used 'refine', which collides with the INNER within-run loop. This file is the OUTER improvement loop over a STRING surface (the refine directive), held-out gated. Convention: improve-/optimize- = OUTER (across runs, optimize a surface); refine- = INNER (within a run, k rounds over one persistent artifact). GEPA stays in the docstring as the string-surface optimizer it uses. - git mv bench/src/gepa-refine.ts → improve-prompt.ts - npm script gepa-refine → improve-prompt; help text + HARNESS.md + docs refs updated - header rewritten to name the surface type (MutableSurface = string | CodeSurface) and the OUTER/INNER distinction. CLI entrypoint only — nothing imports it, so no code refs change.
✅ No Blockers —
|
| deepseek | glm | aggregate | |
|---|---|---|---|
| Readiness | 89 | 89 | 89 |
| Confidence | 85 | 85 | 85 |
| Correctness | 89 | 89 | 89 |
| Security | 89 | 89 | 89 |
| Testing | 89 | 89 | 89 |
| Architecture | 89 | 89 | 89 |
Full multi-shot audit completed 5/5 planned shots over 7 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 7 changed files. Global verifier still owns final merge decision.
🟡 LOW Stale line-number reference in docs/roadmap-rsi.md:46 — docs/roadmap-rsi.md
Line 46 references
improve-prompt.ts:273-281for theheldoutSignificance(pairHoldout(...))ship-gate pattern. The actual call site is at lines 589-602 (the block comment starts at :589, thepairHoldoutcall at :601,heldoutSignificanceat :602). Lines 273-281 contain unrelated AppWorld driver JSON parsing code. The reference was already stale in the originalgepa-refine.tsfilename and was carried forward unchan
tangletools · 2026-06-05T23:22:08Z · trace
tangletools
left a comment
There was a problem hiding this comment.
✅ Approved — 1 non-blocking finding — eeddd6df
Full multi-shot audit completed 5/5 planned shots over 7 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 5/5 planned shots over 7 changed files. Global verifier still owns final merge decision.
Full immutable report for this review: trace
Summary comment for this run: full summary
tangletools · 2026-06-05T23:22:08Z · immutable trace
gepa-refinewas a bad name twice over: named after the method (GEPA, which is swappable) instead of the purpose, and "refine" collides with the inner within-run loop — but this file is the outer cross-run improvement loop.Naming convention this establishes (so the whole improvement system reads clearly):
improve-/optimize-= the OUTER loop — across runs, optimize a surface, held-out gated.refine-= the INNER loop — within a run, k rounds over one persistent artifact.improve-skill.ts,improve-tool.ts,improve-config.ts.This file is the OUTER loop over a string surface (
MutableSurface = string | CodeSurface) — here the refine directive. The same loop optimizes skills / inter-agent messages (also strings) and code (CodeSurface) by swapping the surface, not the loop. GEPA stays in the docstring as the string-surface optimizer it uses.Mechanical + safe: it's a CLI entrypoint, nothing imports it, so no code references change.
git mv+ the npm script (gepa-refine→improve-prompt) + help text +HARNESS.md+ 2 doc refs.bench/package.jsonre-validated as JSON; entrypoint intact.🤖 Generated with Claude Code