Skip to content

feat(gain): release-boundary slicing for --weak-filters#150

Merged
thehoff merged 4 commits into
developfrom
feat/release-boundary-tracking
May 23, 2026
Merged

feat(gain): release-boundary slicing for --weak-filters#150
thehoff merged 4 commits into
developfrom
feat/release-boundary-tracking

Conversation

@thehoff
Copy link
Copy Markdown
Owner

@thehoff thehoff commented May 23, 2026

Problem

contextcrawler gain --weak-filters reported lifetime stats by default. Months of pre-upgrade binary behaviour got blended with current-binary filter quality. After a release it was unclear which gaps were still real and which had been closed.

Exactly hit during the v0.1.10 post-ship audit today.

Fix

Self-correcting boundary tracking:

  1. New table release_boundaries(id, version, installed_at). On every Tracker::new(), compare env!(\"CARGO_PKG_VERSION\") against the latest stored version. Insert a boundary row on mismatch.

    • Cost: 1 SELECT per invocation, 1 INSERT once per upgrade.
    • Best-effort — boundary write failure never poisons tracker construction.
  2. get_weak_filters(project, since) gains an optional ISO-8601 since parameter. None = no slice (lifetime).

  3. gain --weak-filters defaults to slicing from latest_boundary_timestamp(). --all-time bypasses for cross-version analysis. Window banner makes the cutoff explicit.

Verification

$ contextcrawler gain --weak-filters
Window: since latest release boundary (2026-05-23T06:52:43Z).
  #   Tool        Runs   Input    Leaked   Savings
  1.  ls            1    2.6M    1.6M     39.0%
  2.  git show      1    1.5K     790     47.7%
  3.  git status    1    138       86     37.7%

$ contextcrawler gain --weak-filters --all-time
Window: lifetime (all-time).
  1.  ls         1634   29.8M   18.0M    39.5%
  2.  proxy cargo 172    4.5M    4.5M     0.0%
  …

Same gaps surface immediately, without 17,000 historical rows muddying the picture.

Tests

3 new tests, 2671/2671 pass:

  • ensure_release_boundary_writes_first_row_on_fresh_db
  • ensure_release_boundary_is_idempotent_on_same_version
  • get_weak_filters_since_excludes_older_rows

Risk

  • Schema migration: CREATE TABLE IF NOT EXISTS — safe on existing DBs.
  • Boundary insert failure: ignored; tracker still functions.
  • Cross-version analysis use case: preserved via --all-time.
  • No change to: command recording, retention, or any other gain view.

🤖 Generated with Claude Code

thehoff and others added 4 commits May 23, 2026 16:53
`gain --weak-filters` reported lifetime stats by default — months of
behaviour from pre-upgrade binaries blended with current binary's
filter quality. Hard to read post-release whether a filter improved.

Add a self-correcting boundary mechanism:

1. New table `release_boundaries(id, version, installed_at)` populated
   once per binary upgrade. On every `Tracker::new()`, compare
   `env!("CARGO_PKG_VERSION")` to the latest row; insert a new row on
   change. Cost: 1 SELECT per invocation, 1 INSERT once per upgrade.
   Best-effort — boundary write failures never poison tracker
   construction.

2. `get_weak_filters(project, since)` gains a `since` parameter that
   filters command rows lexicographically by ISO-8601 timestamp.
   `None` = no slice (lifetime).

3. `gain --weak-filters` defaults to slicing from
   `latest_boundary_timestamp()`. New `--all-time` flag bypasses the
   slice for cross-version analysis. Window banner makes the cutoff
   explicit so the reader knows what they're looking at.

Live behaviour on v0.1.10 (post-install, 3 commands):
  Window: since latest release boundary (2026-05-23T06:52:43Z).
  1. ls          1 run   2.6M in   1.6M leaked   39.0%
  2. git show    1 run   1.5K in    790 leaked   47.7%
  3. git status  1 run   138  in     86 leaked   37.7%
vs --all-time (lifetime):
  1. ls       1634 runs  29.8M  18.0M leaked  39.5%
  2. proxy cargo  172   4.5M    4.5M leaked   0.0%
  …

This makes the post-release audit honest: the user is identifying
gaps in the CURRENT binary, not gaps that have been historically
present and may already be fixed.

Tests (+3, all green):
- ensure_release_boundary_writes_first_row_on_fresh_db
- ensure_release_boundary_is_idempotent_on_same_version
- get_weak_filters_since_excludes_older_rows
2671/2671 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both reviewers returned YELLOW with the same top finding (TOCTOU race
on ensure_release_boundary). Closing all medium-priority items in one
follow-up.

1. ATOMIC INSERT (Codex + agy, the blocker):
   Previous version did SELECT-latest then INSERT, with no transaction.
   Two concurrent contextcrawler processes hitting a fresh-upgrade DB
   could both observe the old latest-version and both write a duplicate
   boundary row.

   Fix: collapse to single atomic INSERT ... SELECT ... WHERE NOT
   EXISTS. The WHERE clause evaluates against the LATEST row only
   (id = MAX(id)), so a downgrade-then-upgrade scenario still appends
   a new boundary row correctly (rejected only when version equals the
   most-recently-written one).

2. TEST TIMESTAMP FORMAT (agy):
   Production uses `Utc::now().to_rfc3339()` which yields `+00:00`
   suffix. Existing tests used `Z` suffix — lexicographic comparison
   safe within each fixture but would diverge against real DB data.
   Aligned test timestamps to `+00:00`.

3. UX HINT FOR FIRST-RUN-ON-UPGRADE (agy):
   On a pre-existing DB with command history, the first invocation
   of a new binary writes a boundary row that slices out ALL history
   by default. User sees an empty leaderboard with no indication
   that --all-time would show data.

   Fix: when the sliced query returns 0 rows AND the lifetime query
   has rows, print a hint pointing at --all-time with the historical
   run count.

4. THREE NEW TESTS (agy):
   - latest_boundary_timestamp_is_none_on_empty_table
   - ensure_release_boundary_appends_row_on_version_change
   - ensure_release_boundary_handles_concurrent_safely (tight-loop
     same-connection approximation of the original race)

Verification: 2674/2674 pass (+3 from this commit).

The silent-failure design at the `let _ = ensure_release_boundary()`
call site stays as-is per agy's reasoning: contextcrawler is a
transparent proxy and stderr warnings would pollute build/pipe
consumers. Codex's "log a warning" was a soft recommendation, not a
blocker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-2 peer-review (Codex + agy) cleared the atomic INSERT race in
production but both flagged that the regression test was a single-
connection tight-loop — name suggested concurrency but the body did
not exercise SQLite's write-locking behaviour across processes.

Add a true multi-thread test:
- Pin a private file-backed DB via the existing PinnedDb RAII helper.
- Spawn 8 worker threads, each calling `Tracker::new()` independently.
- Every Tracker::new() opens its own SQLite Connection AND runs
  `ensure_release_boundary()` — exactly the production race shape
  (two contextcrawler processes starting concurrently after a binary
  upgrade share the same DB path).
- Assert exactly 1 row in release_boundaries after all workers join.

This empirically pins the atomic INSERT...SELECT...WHERE NOT EXISTS
guarantee against SQLite's actual lock arbitration, not just the
Rust-side check that the previous test approximated.

Also renamed the prior test from
  ensure_release_boundary_handles_concurrent_safely
to
  ensure_release_boundary_is_idempotent_on_tight_loop
because the old name was misleading — same-connection idempotency is
a different invariant from cross-connection concurrency safety. Both
are now pinned by distinct tests.

Verification: 2675/2675 pass (+1 new test, total +5 in PR #150).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two pre-existing tracking-module issues surfaced by the new multi-
thread boundary test in PR #150. Neither is a regression introduced
by this PR, but both directly motivated by what the new test now
exercises — so closing them in-PR rather than as a separate follow-up.

1. PRAGMA ordering (medium):
   Tracker::new() was running `PRAGMA journal_mode=WAL` BEFORE
   `PRAGMA busy_timeout=5000`. The WAL switch needs an exclusive
   lock and returns SQLITE_BUSY immediately if a peer holds the DB;
   without busy_timeout already armed, the connection then runs in
   non-WAL mode with no retry budget — write contention is
   non-deterministically fatal. Swapped to:
     PRAGMA busy_timeout=5000;
     PRAGMA journal_mode=WAL;
   Now any peer-lock encountered during the WAL upgrade is waited
   out for up to 5s. Eliminates a flakiness vector that the new
   multi-thread test could trigger under high CPU contention.

2. PinnedDb sidecar cleanup (low):
   PinnedDb::drop removed the .db file but not the WAL `-wal` and
   `-shm` sidecars. In WAL mode these are created on first write
   and persist if a worker thread panics or the connection closes
   abnormally — they pollute $HOME across runs. Added cleanup for
   both suffixes.

Deferred to a follow-up issue (agy's third low):
   Worker threads inside `_no_duplicates_under_real_concurrency`
   read RTK_DB_PATH via std::env::var() concurrently. Other tests
   in the suite that don't hold ENV_LOCK can mutate RTK_DB_PATH at
   the same time — a process-wide concern that needs a broader
   audit of which tests touch the env, not a localised fix.

Verification: 2675/2675 pass, full suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant