docs(bench): unify rollout/shot terminology + honestly scope the HumanEval gate by drewstone · Pull Request #170 · tangle-network/agent-runtime

drewstone · 2026-06-05T22:26:43Z

"Shot" was overloaded. The canon (HARNESS.md, roadmap-rsi.md) already has the right unit: a rollout = one agent running an AgentProfile to completion — a full, possibly multi-turn/stateful trajectory; k counts rollouts, turns live inside one. The HumanEval gate, though, used "shot" for a raw stateless single completion that bypasses AgentProfile/the runtime — exactly the equal-k-on-stateless-samples the harness explicitly warns against.

HARNESS.md — adds a one-word Terminology block: rollout ≡ shot = one AgentProfile run; a stateless completion (maxTurns=0, harness:null) is the degenerate case; names humaneval-gate.mts as the no-self-correction selector LOWER BOUND, distinct from the rollout-based keystone gate.
humaneval-gate.mts — a SCOPE note in the header + a runtime regime banner: its numbers isolate the selector with the generator unable to self-correct (the selector's maximum leverage), so a win is the science (the selector works in a deployable-checker regime), not the product. Bridge = run the same arms as real rollouts (AgentProfile through runLoop, dialing maxTurns).

No code-path change; docs + comments + one log line. Merges clean.

🤖 Generated with Claude Code

… scope the HumanEval gate A 'shot' was overloaded: the canon (HARNESS.md, roadmap-rsi.md) means rollout = one agent running an AgentProfile to completion (multi-turn allowed; k counts ROLLOUTS, turns live inside one). The HumanEval gate used 'shot' for a raw stateless single completion that bypasses AgentProfile/the runtime — exactly the 'equal-k-on-stateless-samples' the harness warns against. - HARNESS.md: add a one-word Terminology block (rollout ≡ shot = one AgentProfile run; the stateless completion is the degenerate maxTurns=0 case; name the HumanEval probe as the no-self-correction selector LOWER BOUND, distinct from the rollout-based keystone gate). - humaneval-gate.mts: SCOPE note in the header + a runtime regime banner — its numbers are the selector lower bound (generator can't self-correct), not a rollout/product number; bridge = run the same arms as real rollouts (AgentProfile through runLoop, dial maxTurns).

tangletools · 2026-06-05T22:30:24Z

✅ No Blockers — `5f41aa3f`

Readiness 95/100 · Confidence 70/100 · 0 findings (none)

	deepseek	glm	aggregate
Readiness	95	95	95
Confidence	70	70	70
Correctness	95	95	95
Security	95	95	95
Testing	95	95	95
Architecture	95	95	95

Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.

No findings.

_{tangletools · 2026-06-05T22:30:21Z · trace}

tangletools

✅ Clean — `5f41aa3f`

Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Summary comment for this run: full summary

_{tangletools · 2026-06-05T22:30:21Z · immutable trace}

tangletools approved these changes Jun 5, 2026

View reviewed changes

drewstone merged commit 3e92f5b into main Jun 5, 2026
1 check passed

drewstone mentioned this pull request Jun 6, 2026

refactor(improvement): collapse optimization API onto agent-eval selfImprove #172

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(bench): unify rollout/shot terminology + honestly scope the HumanEval gate#170

docs(bench): unify rollout/shot terminology + honestly scope the HumanEval gate#170
drewstone merged 1 commit into
mainfrom
docs/rollout-terminology

drewstone commented Jun 5, 2026

Uh oh!

tangletools commented Jun 5, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented Jun 5, 2026

Uh oh!

tangletools commented Jun 5, 2026

✅ No Blockers — 5f41aa3f

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Clean — 5f41aa3f

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ No Blockers — `5f41aa3f`

✅ Clean — `5f41aa3f`