tangle-network · drewstone · Jun 5, 2026 · Jun 5, 2026
diff --git a/bench/HARNESS.md b/bench/HARNESS.md
@@ -20,7 +20,20 @@ within-run adaptive-driver layer): **does any non-blind topology beat blind comp
 (non-oracle) selector, at significant n?** Gate A is a **narrow diagnostic** — the cost-justification
 for parallel/adaptive topology, **NOT** the product verdict. A failed Gate A deletes within-run
 steering only; it never touches the corpus+policy product (Gate B). The invariant is equal-COMPUTE,
-not equal-k-on-stateless-samples. Two things to keep straight: today's judges grade a single
+not equal-k-on-stateless-samples.
+
+**Terminology (one word, used consistently).** A **rollout** (≡ a "shot") is ONE agent running an
+`AgentProfile` to completion — a full, possibly **multi-turn / stateful** trajectory. `k` counts
+*rollouts*; **turns live *inside* a rollout**, never as separate shots. A single **stateless
+completion** (`maxTurns=0`, `harness: null`, one model call, no persistent workspace) is the
+*degenerate* rollout — fine as a selector **lower bound**, never the canonical unit. The HumanEval
+probe (`bench/src/humaneval-gate.mts`) uses exactly that degenerate shape — it calls the router
+directly and does **not** route through `AgentProfile` / the sandbox / the keystone — so its numbers
+are the **no-self-correction lower bound** on the selector, distinct from the rollout-based keystone
+gate above. Bridge it to the product by running the same arms with real rollouts (an `AgentProfile`
+through `runLoop`), dialing `maxTurns`.
+
+Two things to keep straight: today's judges grade a single
 *correctness* scalar (the multi-objective vector is the open contract, architecture.md §6), and every
 number below is single-objective + within-run — read them as Gate-A diagnostics, not Gate-B results.
 - Within-run STEER (verify-and-revise family) **LOSES** (rung-0, n=40: blind 37.5% →

diff --git a/bench/src/humaneval-gate.mts b/bench/src/humaneval-gate.mts
@@ -10,9 +10,19 @@
  * keeps a passer. This file asks: at EQUAL k, does diverse@k + a deployable
  * verifier-grounded pick beat random@k + the same pick, and beat blind@1?
  *
- * Two paired arms over the SAME tasks:
- *   random@K  — K identical-base-prompt shots/task (the compute control)
- *   diverse@K — K shots, the i-th prefixed with composeStrategies(base, K)[i]
+ * SCOPE — read the numbers as a LOWER BOUND. Here a "shot" is a single STATELESS
+ * completion (one router call, `maxTurns=0`, NO `AgentProfile` / sandbox / keystone —
+ * it calls the router directly). That is the *degenerate* rollout (HARNESS.md's
+ * "Terminology"): it isolates the SELECTOR with the generator unable to self-correct,
+ * so it measures the selector's value at its MAXIMUM. A real rollout (an `AgentProfile`
+ * through `runLoop`, `maxTurns>0` over a persistent workspace) self-verifies by
+ * iterating, which shrinks the external selector's job — that is the next experiment,
+ * not this one. A positive result here is the science (the selector works in a
+ * deployable-checker regime), not the product.
+ *
+ * Two paired arms over the SAME tasks (each "shot" = one stateless completion):
+ *   random@K  — K identical-base-prompt completions/task (the compute control)
+ *   diverse@K — K completions, the i-th prefixed with composeStrategies(base, K)[i]
  *
  * The DEPLOYABLE CHECKER runs each candidate against the task's own `test` in an
  * isolated `--network=none` python:3.12-slim container (hard timeout) — exit 0 = pass.
@@ -266,6 +276,9 @@ async function main(): Promise<void> {
 
   console.log(`=== HumanEval deployable-verifier gate · N=${n} K=${k} offset=${offset} model=${model} ===`)
   console.log(`  router=${routerBaseUrl}  docker=${dockerImage} (--network=none, timeout ${dockerTimeoutMs}ms)`)
+  console.log(
+    '  regime: STATELESS single completions (maxTurns=0, no AgentProfile/sandbox) — the selector no-self-correction LOWER BOUND, not a rollout/product number',
+  )
 
   const tasks = await loadHumanEval(n, offset)
   console.log(`loaded ${tasks.length} HumanEval task(s): ${tasks.map((t) => t.taskId).join(', ')}`)