Release v2.4.6 — DEFCON Special · TSchonleber/brainctl

Top-heavy retrieval lift (plan-20260419) shipped end-to-end across a 6-way swarm (codex + claude-code).

Headline numbers

	Baseline → 2.4.6	Plan target	Status
LoCoMo hybrid Hit@1	0.023 → 0.279	+1.0pp	+25.5pp — crushed
LoCoMo hybrid MRR	0.032 → 0.394	+0.5pp	+36.2pp — crushed
LongMemEval Hit@1	0.882 → 0.869	+0.8pp	flat within noise on n=289; FULL beats ROLLBACK +62.3pp like-for-like

I2/I3/I4 — unified Brain.search + cmd_search pipeline, regex intent router, last-mile CE reranker with BRAINCTL_CE_P95_BUDGET_MS gate
I6/I7 — BRAINCTL_TOPHEAVY_ROLLBACK=1 emergency bypass, docs refresh
I1 — frozen benchmarks/snapshots/baseline-20260419/ + --traces flag
I5 — benchmarks/snapshots/calibration-20260419/ 3-cell ablation + BRAINCTL_DISABLE_INTENT_ROUTER=1 ablation bypass
I8 — strict retrieval-gate CI job, per-slice Hit@1/MRR/nDCG@5 gates, cross-platform-aware p95 latency gate, PR-comment matrix

init_schema.sql synced for migration 051 code_ingest_cache (fresh installs no longer need post-init migrate)
test_connection_lifecycle widened to filter by brain.py callsite (single-conn invariant holds post-unification)
Cross-platform latency-gate skip for subprocess-bound CLI ops (darwin baseline vs ubuntu-latest fresh no longer false-positives)

pip install --upgrade brainctl==2.4.6

No runtime config change needed — main defaults already enable the top-heavy controls.

Wire args.rerank through tests/bench/{locomo,longmemeval}_eval.py so CE dimension is measurable
Per-question_type slice analysis of the intent router (FULL == NO_INTENT on aggregate metrics)
Fix I5 driver _extract_metrics p95 parse, populate baseline_p95_ms: in tests/bench/budgets/*.yaml so the p95 leg flips from advisory to enforcing