Skip to content

v2.4.12

Choose a tag to compare

@TSchonleber TSchonleber released this 21 Apr 02:54
· 96 commits to main since this release
1f3bb5c

Fifth slice of the 2026-04-19 audit fix wave. Single-focus release
closing I23 — the CE cold-start issue that caused the recurring
latency-gate flake in CI.

Fixed

  • Cross-encoder warmup no longer poisons the rolling p95 latency
    window.
    The first _ce_rerank_timed call includes model loading
    (typically 15–40s for bge-reranker-v2-m3). Before 2.4.12 that
    first sample went straight into _CE_LATENCY_SAMPLES_MS and stayed
    there for the full 64-call deque rotation — the strict latency
    fallback at the call site then silently skipped CE for the next
    minute+ of queries without a clear operator signal beyond the
    missing cross_encoder_applied trace. Now the first
    _CE_WARMUP_SAMPLES calls (default 1; env BRAINCTL_CE_WARMUP_SAMPLES
    to override) are tracked separately, excluded from the p95 window,
    and surfaced as cross_encoder_warmup_ms in _debug_skips so the
    cost is still observable. This is what was causing the recurring
    latency-gate flake in CI over the prior release chain.

Testing

1878 passed, 28 skipped, 2 xfailed locally. Two new regression
tests at tests/test_ce_warmup_burn_in.py lock the warmup-exclusion
behavior and the BRAINCTL_CE_WARMUP_SAMPLES=0 opt-out path.