feat(gain): release-boundary slicing for --weak-filters#150
Merged
Conversation
`gain --weak-filters` reported lifetime stats by default — months of
behaviour from pre-upgrade binaries blended with current binary's
filter quality. Hard to read post-release whether a filter improved.
Add a self-correcting boundary mechanism:
1. New table `release_boundaries(id, version, installed_at)` populated
once per binary upgrade. On every `Tracker::new()`, compare
`env!("CARGO_PKG_VERSION")` to the latest row; insert a new row on
change. Cost: 1 SELECT per invocation, 1 INSERT once per upgrade.
Best-effort — boundary write failures never poison tracker
construction.
2. `get_weak_filters(project, since)` gains a `since` parameter that
filters command rows lexicographically by ISO-8601 timestamp.
`None` = no slice (lifetime).
3. `gain --weak-filters` defaults to slicing from
`latest_boundary_timestamp()`. New `--all-time` flag bypasses the
slice for cross-version analysis. Window banner makes the cutoff
explicit so the reader knows what they're looking at.
Live behaviour on v0.1.10 (post-install, 3 commands):
Window: since latest release boundary (2026-05-23T06:52:43Z).
1. ls 1 run 2.6M in 1.6M leaked 39.0%
2. git show 1 run 1.5K in 790 leaked 47.7%
3. git status 1 run 138 in 86 leaked 37.7%
vs --all-time (lifetime):
1. ls 1634 runs 29.8M 18.0M leaked 39.5%
2. proxy cargo 172 4.5M 4.5M leaked 0.0%
…
This makes the post-release audit honest: the user is identifying
gaps in the CURRENT binary, not gaps that have been historically
present and may already be fixed.
Tests (+3, all green):
- ensure_release_boundary_writes_first_row_on_fresh_db
- ensure_release_boundary_is_idempotent_on_same_version
- get_weak_filters_since_excludes_older_rows
2671/2671 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both reviewers returned YELLOW with the same top finding (TOCTOU race
on ensure_release_boundary). Closing all medium-priority items in one
follow-up.
1. ATOMIC INSERT (Codex + agy, the blocker):
Previous version did SELECT-latest then INSERT, with no transaction.
Two concurrent contextcrawler processes hitting a fresh-upgrade DB
could both observe the old latest-version and both write a duplicate
boundary row.
Fix: collapse to single atomic INSERT ... SELECT ... WHERE NOT
EXISTS. The WHERE clause evaluates against the LATEST row only
(id = MAX(id)), so a downgrade-then-upgrade scenario still appends
a new boundary row correctly (rejected only when version equals the
most-recently-written one).
2. TEST TIMESTAMP FORMAT (agy):
Production uses `Utc::now().to_rfc3339()` which yields `+00:00`
suffix. Existing tests used `Z` suffix — lexicographic comparison
safe within each fixture but would diverge against real DB data.
Aligned test timestamps to `+00:00`.
3. UX HINT FOR FIRST-RUN-ON-UPGRADE (agy):
On a pre-existing DB with command history, the first invocation
of a new binary writes a boundary row that slices out ALL history
by default. User sees an empty leaderboard with no indication
that --all-time would show data.
Fix: when the sliced query returns 0 rows AND the lifetime query
has rows, print a hint pointing at --all-time with the historical
run count.
4. THREE NEW TESTS (agy):
- latest_boundary_timestamp_is_none_on_empty_table
- ensure_release_boundary_appends_row_on_version_change
- ensure_release_boundary_handles_concurrent_safely (tight-loop
same-connection approximation of the original race)
Verification: 2674/2674 pass (+3 from this commit).
The silent-failure design at the `let _ = ensure_release_boundary()`
call site stays as-is per agy's reasoning: contextcrawler is a
transparent proxy and stderr warnings would pollute build/pipe
consumers. Codex's "log a warning" was a soft recommendation, not a
blocker.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 peer-review (Codex + agy) cleared the atomic INSERT race in production but both flagged that the regression test was a single- connection tight-loop — name suggested concurrency but the body did not exercise SQLite's write-locking behaviour across processes. Add a true multi-thread test: - Pin a private file-backed DB via the existing PinnedDb RAII helper. - Spawn 8 worker threads, each calling `Tracker::new()` independently. - Every Tracker::new() opens its own SQLite Connection AND runs `ensure_release_boundary()` — exactly the production race shape (two contextcrawler processes starting concurrently after a binary upgrade share the same DB path). - Assert exactly 1 row in release_boundaries after all workers join. This empirically pins the atomic INSERT...SELECT...WHERE NOT EXISTS guarantee against SQLite's actual lock arbitration, not just the Rust-side check that the previous test approximated. Also renamed the prior test from ensure_release_boundary_handles_concurrent_safely to ensure_release_boundary_is_idempotent_on_tight_loop because the old name was misleading — same-connection idempotency is a different invariant from cross-connection concurrency safety. Both are now pinned by distinct tests. Verification: 2675/2675 pass (+1 new test, total +5 in PR #150). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two pre-existing tracking-module issues surfaced by the new multi- thread boundary test in PR #150. Neither is a regression introduced by this PR, but both directly motivated by what the new test now exercises — so closing them in-PR rather than as a separate follow-up. 1. PRAGMA ordering (medium): Tracker::new() was running `PRAGMA journal_mode=WAL` BEFORE `PRAGMA busy_timeout=5000`. The WAL switch needs an exclusive lock and returns SQLITE_BUSY immediately if a peer holds the DB; without busy_timeout already armed, the connection then runs in non-WAL mode with no retry budget — write contention is non-deterministically fatal. Swapped to: PRAGMA busy_timeout=5000; PRAGMA journal_mode=WAL; Now any peer-lock encountered during the WAL upgrade is waited out for up to 5s. Eliminates a flakiness vector that the new multi-thread test could trigger under high CPU contention. 2. PinnedDb sidecar cleanup (low): PinnedDb::drop removed the .db file but not the WAL `-wal` and `-shm` sidecars. In WAL mode these are created on first write and persist if a worker thread panics or the connection closes abnormally — they pollute $HOME across runs. Added cleanup for both suffixes. Deferred to a follow-up issue (agy's third low): Worker threads inside `_no_duplicates_under_real_concurrency` read RTK_DB_PATH via std::env::var() concurrently. Other tests in the suite that don't hold ENV_LOCK can mutate RTK_DB_PATH at the same time — a process-wide concern that needs a broader audit of which tests touch the env, not a localised fix. Verification: 2675/2675 pass, full suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
contextcrawler gain --weak-filtersreported lifetime stats by default. Months of pre-upgrade binary behaviour got blended with current-binary filter quality. After a release it was unclear which gaps were still real and which had been closed.Exactly hit during the v0.1.10 post-ship audit today.
Fix
Self-correcting boundary tracking:
New table
release_boundaries(id, version, installed_at). On everyTracker::new(), compareenv!(\"CARGO_PKG_VERSION\")against the latest stored version. Insert a boundary row on mismatch.SELECTper invocation, 1INSERTonce per upgrade.get_weak_filters(project, since)gains an optional ISO-8601sinceparameter.None= no slice (lifetime).gain --weak-filtersdefaults to slicing fromlatest_boundary_timestamp().--all-timebypasses for cross-version analysis. Window banner makes the cutoff explicit.Verification
Same gaps surface immediately, without 17,000 historical rows muddying the picture.
Tests
3 new tests, 2671/2671 pass:
ensure_release_boundary_writes_first_row_on_fresh_dbensure_release_boundary_is_idempotent_on_same_versionget_weak_filters_since_excludes_older_rowsRisk
CREATE TABLE IF NOT EXISTS— safe on existing DBs.--all-time.gainview.🤖 Generated with Claude Code