docs: sync optimization docs to selfImprove + bump agent-eval floor to 0.83#175
Merged
Conversation
…mp agent-eval floor PR #172 deleted optimizePrompt + report-eval-runs (selfImprove is the one entry point), but the docs/skills/pins still documented the removed APIs. Synced every surface so the docs match the code: - README + the SHIPPED adoption SKILL: the optimization story now points at agent-eval's selfImprove (@tangle-network/agent-eval/contract) — agent-runtime contributes only the code-surface improvementDriver; reportOptimizationRun → analyzeRuns; /improvement export table corrected to its real exports. - CLAUDE.md + bench/HARNESS.md: agent-eval pin ^0.76.0 → ^0.83.0; optimizePrompt → selfImprove. - package.json peerDependency floor >=0.76.0 → >=0.83.0 (selfImprove needs analyzeGeneration, added in 0.83) — a real correctness fix: a consumer on 0.76 would break. - drop a stale "0.76" comment label in improve-prompt.ts (heldoutSignificance is unchanged). Verified: 0 remaining optimizePrompt/reportOptimizationRun/^0.76 refs in tracked source/docs; examples typecheck clean; root typecheck/lint/build green. agent-eval is on the latest published (0.83.0).
drewstone
added a commit
that referenced
this pull request
Jun 6, 2026
Cuts the 58-commit backlog on main into a published release. Headline surface: - runToolLoop / streamToolLoop — bounded turn-level tool-dispatch loop (#137) - RSI agent tree: recursive Agent.act, Supervisor keystone, runProgram, the adaptive-driver channel (#139/#151/#165) - optimization API collapsed onto agent-eval selfImprove; the runtime keeps the CODE-surface ImprovementDriver you pass as driver (#172) - deployable benchmark adapters: AppWorld, commit0, aec-bench, EnterpriseOps-Gym; runBenchmarks over one ADAPTERS registry (#153/#156/#157) - agent-eval floor raised to >=0.83.0 (#175)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hygiene follow-up to #172 (which deleted
optimizePrompt+report-eval-runs). The code migrated toselfImprove, but the docs/skills/pins still documented the removed APIs. This syncs everything.Fixed
selfImprove(@tangle-network/agent-eval/contract) — agent-runtime contributes only the code-surfaceimprovementDriver.reportOptimizationRun→analyzeRuns; the/improvementexport table corrected to its actual exports.^0.76.0→^0.83.0;optimizePrompt→selfImprove.peerDependencyfloor>=0.76.0→>=0.83.0— a real correctness fix (selfImproveneedsanalyzeGeneration, added in 0.83; a consumer on 0.76 would break).0.76comment label inimprove-prompt.ts.Verified
optimizePrompt/reportOptimizationRun/^0.76refs in tracked source/docs.0.83.0).