v2.4.12
Fifth slice of the 2026-04-19 audit fix wave. Single-focus release
closing I23 — the CE cold-start issue that caused the recurring
latency-gate flake in CI.
Fixed
- Cross-encoder warmup no longer poisons the rolling p95 latency
window. The first_ce_rerank_timedcall includes model loading
(typically 15–40s forbge-reranker-v2-m3). Before 2.4.12 that
first sample went straight into_CE_LATENCY_SAMPLES_MSand stayed
there for the full 64-call deque rotation — the strict latency
fallback at the call site then silently skipped CE for the next
minute+ of queries without a clear operator signal beyond the
missingcross_encoder_appliedtrace. Now the first
_CE_WARMUP_SAMPLEScalls (default 1; envBRAINCTL_CE_WARMUP_SAMPLES
to override) are tracked separately, excluded from the p95 window,
and surfaced ascross_encoder_warmup_msin_debug_skipsso the
cost is still observable. This is what was causing the recurring
latency-gateflake in CI over the prior release chain.
Testing
1878 passed, 28 skipped, 2 xfailed locally. Two new regression
tests at tests/test_ce_warmup_burn_in.py lock the warmup-exclusion
behavior and the BRAINCTL_CE_WARMUP_SAMPLES=0 opt-out path.