feat(gain): release-boundary slicing for --weak-filters by thehoff · Pull Request #150 · thehoff/contextcrawler

thehoff · 2026-05-23T06:53:48Z

Problem

contextcrawler gain --weak-filters reported lifetime stats by default. Months of pre-upgrade binary behaviour got blended with current-binary filter quality. After a release it was unclear which gaps were still real and which had been closed.

Exactly hit during the v0.1.10 post-ship audit today.

Fix

Self-correcting boundary tracking:

New table release_boundaries(id, version, installed_at). On every Tracker::new(), compare env!(\"CARGO_PKG_VERSION\") against the latest stored version. Insert a boundary row on mismatch.
- Cost: 1 SELECT per invocation, 1 INSERT once per upgrade.
- Best-effort — boundary write failure never poisons tracker construction.
get_weak_filters(project, since) gains an optional ISO-8601 since parameter. None = no slice (lifetime).
gain --weak-filters defaults to slicing from latest_boundary_timestamp(). --all-time bypasses for cross-version analysis. Window banner makes the cutoff explicit.

Verification

$ contextcrawler gain --weak-filters
Window: since latest release boundary (2026-05-23T06:52:43Z).
  #   Tool        Runs   Input    Leaked   Savings
  1.  ls            1    2.6M    1.6M     39.0%
  2.  git show      1    1.5K     790     47.7%
  3.  git status    1    138       86     37.7%

$ contextcrawler gain --weak-filters --all-time
Window: lifetime (all-time).
  1.  ls         1634   29.8M   18.0M    39.5%
  2.  proxy cargo 172    4.5M    4.5M     0.0%
  …

Same gaps surface immediately, without 17,000 historical rows muddying the picture.

Tests

3 new tests, 2671/2671 pass:

ensure_release_boundary_writes_first_row_on_fresh_db
ensure_release_boundary_is_idempotent_on_same_version
get_weak_filters_since_excludes_older_rows

Risk

Schema migration: CREATE TABLE IF NOT EXISTS — safe on existing DBs.
Boundary insert failure: ignored; tracker still functions.
Cross-version analysis use case: preserved via --all-time.
No change to: command recording, retention, or any other gain view.

🤖 Generated with Claude Code

`gain --weak-filters` reported lifetime stats by default — months of behaviour from pre-upgrade binaries blended with current binary's filter quality. Hard to read post-release whether a filter improved. Add a self-correcting boundary mechanism: 1. New table `release_boundaries(id, version, installed_at)` populated once per binary upgrade. On every `Tracker::new()`, compare `env!("CARGO_PKG_VERSION")` to the latest row; insert a new row on change. Cost: 1 SELECT per invocation, 1 INSERT once per upgrade. Best-effort — boundary write failures never poison tracker construction. 2. `get_weak_filters(project, since)` gains a `since` parameter that filters command rows lexicographically by ISO-8601 timestamp. `None` = no slice (lifetime). 3. `gain --weak-filters` defaults to slicing from `latest_boundary_timestamp()`. New `--all-time` flag bypasses the slice for cross-version analysis. Window banner makes the cutoff explicit so the reader knows what they're looking at. Live behaviour on v0.1.10 (post-install, 3 commands): Window: since latest release boundary (2026-05-23T06:52:43Z). 1. ls 1 run 2.6M in 1.6M leaked 39.0% 2. git show 1 run 1.5K in 790 leaked 47.7% 3. git status 1 run 138 in 86 leaked 37.7% vs --all-time (lifetime): 1. ls 1634 runs 29.8M 18.0M leaked 39.5% 2. proxy cargo 172 4.5M 4.5M leaked 0.0% … This makes the post-release audit honest: the user is identifying gaps in the CURRENT binary, not gaps that have been historically present and may already be fixed. Tests (+3, all green): - ensure_release_boundary_writes_first_row_on_fresh_db - ensure_release_boundary_is_idempotent_on_same_version - get_weak_filters_since_excludes_older_rows 2671/2671 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both reviewers returned YELLOW with the same top finding (TOCTOU race on ensure_release_boundary). Closing all medium-priority items in one follow-up. 1. ATOMIC INSERT (Codex + agy, the blocker): Previous version did SELECT-latest then INSERT, with no transaction. Two concurrent contextcrawler processes hitting a fresh-upgrade DB could both observe the old latest-version and both write a duplicate boundary row. Fix: collapse to single atomic INSERT ... SELECT ... WHERE NOT EXISTS. The WHERE clause evaluates against the LATEST row only (id = MAX(id)), so a downgrade-then-upgrade scenario still appends a new boundary row correctly (rejected only when version equals the most-recently-written one). 2. TEST TIMESTAMP FORMAT (agy): Production uses `Utc::now().to_rfc3339()` which yields `+00:00` suffix. Existing tests used `Z` suffix — lexicographic comparison safe within each fixture but would diverge against real DB data. Aligned test timestamps to `+00:00`. 3. UX HINT FOR FIRST-RUN-ON-UPGRADE (agy): On a pre-existing DB with command history, the first invocation of a new binary writes a boundary row that slices out ALL history by default. User sees an empty leaderboard with no indication that --all-time would show data. Fix: when the sliced query returns 0 rows AND the lifetime query has rows, print a hint pointing at --all-time with the historical run count. 4. THREE NEW TESTS (agy): - latest_boundary_timestamp_is_none_on_empty_table - ensure_release_boundary_appends_row_on_version_change - ensure_release_boundary_handles_concurrent_safely (tight-loop same-connection approximation of the original race) Verification: 2674/2674 pass (+3 from this commit). The silent-failure design at the `let _ = ensure_release_boundary()` call site stays as-is per agy's reasoning: contextcrawler is a transparent proxy and stderr warnings would pollute build/pipe consumers. Codex's "log a warning" was a soft recommendation, not a blocker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Round-2 peer-review (Codex + agy) cleared the atomic INSERT race in production but both flagged that the regression test was a single- connection tight-loop — name suggested concurrency but the body did not exercise SQLite's write-locking behaviour across processes. Add a true multi-thread test: - Pin a private file-backed DB via the existing PinnedDb RAII helper. - Spawn 8 worker threads, each calling `Tracker::new()` independently. - Every Tracker::new() opens its own SQLite Connection AND runs `ensure_release_boundary()` — exactly the production race shape (two contextcrawler processes starting concurrently after a binary upgrade share the same DB path). - Assert exactly 1 row in release_boundaries after all workers join. This empirically pins the atomic INSERT...SELECT...WHERE NOT EXISTS guarantee against SQLite's actual lock arbitration, not just the Rust-side check that the previous test approximated. Also renamed the prior test from ensure_release_boundary_handles_concurrent_safely to ensure_release_boundary_is_idempotent_on_tight_loop because the old name was misleading — same-connection idempotency is a different invariant from cross-connection concurrency safety. Both are now pinned by distinct tests. Verification: 2675/2675 pass (+1 new test, total +5 in PR #150). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two pre-existing tracking-module issues surfaced by the new multi- thread boundary test in PR #150. Neither is a regression introduced by this PR, but both directly motivated by what the new test now exercises — so closing them in-PR rather than as a separate follow-up. 1. PRAGMA ordering (medium): Tracker::new() was running `PRAGMA journal_mode=WAL` BEFORE `PRAGMA busy_timeout=5000`. The WAL switch needs an exclusive lock and returns SQLITE_BUSY immediately if a peer holds the DB; without busy_timeout already armed, the connection then runs in non-WAL mode with no retry budget — write contention is non-deterministically fatal. Swapped to: PRAGMA busy_timeout=5000; PRAGMA journal_mode=WAL; Now any peer-lock encountered during the WAL upgrade is waited out for up to 5s. Eliminates a flakiness vector that the new multi-thread test could trigger under high CPU contention. 2. PinnedDb sidecar cleanup (low): PinnedDb::drop removed the .db file but not the WAL `-wal` and `-shm` sidecars. In WAL mode these are created on first write and persist if a worker thread panics or the connection closes abnormally — they pollute $HOME across runs. Added cleanup for both suffixes. Deferred to a follow-up issue (agy's third low): Worker threads inside `_no_duplicates_under_real_concurrency` read RTK_DB_PATH via std::env::var() concurrently. Other tests in the suite that don't hold ENV_LOCK can mutate RTK_DB_PATH at the same time — a process-wide concern that needs a broader audit of which tests touch the env, not a localised fix. Verification: 2675/2675 pass, full suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

thehoff and others added 4 commits May 23, 2026 16:53

thehoff merged commit 85d8a4c into develop May 23, 2026
4 checks passed

thehoff deleted the feat/release-boundary-tracking branch May 23, 2026 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gain): release-boundary slicing for --weak-filters#150

feat(gain): release-boundary slicing for --weak-filters#150
thehoff merged 4 commits into
developfrom
feat/release-boundary-tracking

thehoff commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thehoff commented May 23, 2026

Problem

Fix

Verification

Tests

Risk

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant