v2.4.6 — DEFCON Special
Top-heavy retrieval lift (plan-20260419) shipped end-to-end across a 6-way swarm (codex + claude-code).
Headline numbers
| Baseline → 2.4.6 | Plan target | Status | |
|---|---|---|---|
| LoCoMo hybrid Hit@1 | 0.023 → 0.279 | +1.0pp | +25.5pp — crushed |
| LoCoMo hybrid MRR | 0.032 → 0.394 | +0.5pp | +36.2pp — crushed |
| LongMemEval Hit@1 | 0.882 → 0.869 | +0.8pp | flat within noise on n=289; FULL beats ROLLBACK +62.3pp like-for-like |
What landed
- I2/I3/I4 — unified
Brain.search+cmd_searchpipeline, regex intent router, last-mile CE reranker withBRAINCTL_CE_P95_BUDGET_MSgate - I6/I7 —
BRAINCTL_TOPHEAVY_ROLLBACK=1emergency bypass, docs refresh - I1 — frozen
benchmarks/snapshots/baseline-20260419/+--tracesflag - I5 —
benchmarks/snapshots/calibration-20260419/3-cell ablation +BRAINCTL_DISABLE_INTENT_ROUTER=1ablation bypass - I8 — strict
retrieval-gateCI job, per-slice Hit@1/MRR/nDCG@5 gates, cross-platform-aware p95 latency gate, PR-comment matrix
Fixes
init_schema.sqlsynced for migration 051code_ingest_cache(fresh installs no longer need post-init migrate)test_connection_lifecyclewidened to filter bybrain.pycallsite (single-conn invariant holds post-unification)- Cross-platform latency-gate skip for subprocess-bound CLI ops (darwin baseline vs ubuntu-latest fresh no longer false-positives)
Upgrade
pip install --upgrade brainctl==2.4.6
No runtime config change needed — main defaults already enable the top-heavy controls.
Follow-ups (not blocking)
- Wire
args.rerankthroughtests/bench/{locomo,longmemeval}_eval.pyso CE dimension is measurable - Per-
question_typeslice analysis of the intent router (FULL == NO_INTENT on aggregate metrics) - Fix I5 driver
_extract_metricsp95 parse, populatebaseline_p95_ms:intests/bench/budgets/*.yamlso the p95 leg flips from advisory to enforcing
Full changelog
See CHANGELOG.md.