You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Registry populated + tree clean + branch up-to-date + HEAD == last_verified_sha
Trust — skip entirely
Registry populated + any drift (dirty tree / behind upstream / HEAD moved)
Verification pass — md5 compare per row
Invariants
Inside a session, registry is trusted — only TMB's SWE modifies files, atomic-close updates registry.
Across sessions, registry is suspect until the cheap proof (last_verified_sha check) passes.
Verification is never full re-summarization. md5 compare only. Drift → mark stale. Summaries regenerate lazily when architect/SWE actually reads the file.
Key insight
The bootstrap scan is not an upfront cost. On a cold session in an existing repo, Claude has to read files for context regardless. That read is the scan — we just persist it into file_registry.
The cooperation insight (new session / git pull / dirty tree): across sessions, other developers on other machines may have changed files. Verification is mandatory on first code-touching ask of each session, unless last_verified_sha proves nothing has changed.
Schema additions
ALTERTABLE file_registry ADD COLUMN content_md5 TEXT;
ALTERTABLE file_registry ADD COLUMN summary TEXT;
ALTERTABLE file_registry ADD COLUMN summary_updated_at TEXT;
-- plus a plugin_config key:-- last_verified_sha: TEXT — git HEAD at last successful verify/bootstrap/update
MCP surface additions
file_registry_update_summaries(paths) — called by SWE atomic-close for each committed path. md5 + regenerate summary + touch timestamp + advance last_verified_sha.
file_registry_verify() — called by project-prescan when drift detected. Returns per-path verdict (match / mismatch / missing / new) so the caller decides whether to mark stale.
Skills / agents affected
skills/project-prescan/ — branch on entry-state matrix; call file_registry_verify when drift detected.
agents/swe.md — atomic close calls file_registry_update_summaries with commit's touched paths.
agents/architect.md — reads from file_registry instead of scanning; may call update tool when editing non-source files.
skills/lazy-regen-check/ — unchanged (still governs the markdown view in docs/trustmybot/architecture/auto/).
Integration test — clean tree + unchanged HEAD + prior verify → next session trusts registry, no md5 pass.
Integration test — simulated git pull (HEAD moved) → next session runs verification pass, stale rows marked.
Integration test — simulated dirty working tree → verification detects mismatched md5, marks stale.
Integration test — another developer deletes a file upstream → after pull, verification deletes the registry row.
Integration test — another developer adds a file upstream → after pull, verification inserts a new row with null summary.
Budget: md5 pass on 500-file repo completes in ≤100ms.
Open questions
Who writes summary text — architect (fresh LLM reasoning) or a dedicated tool? Prefer architect — it's reasoning, not mechanical.
Skip binaries / lockfiles? Yes; filter by type column.
If last_verified_sha points to a commit that has been rebased away (GC'd), verification falls through to the full md5 pass. Safe but expensive — acceptable because rebase-away is rare.
Notes
This supersedes earlier drafts of #45. Cooperation case (new session / git pull / dirty tree) added per discussion — without it, the registry would drift when teammates push changes.
Workflow
flowchart TD Start([Session start]) --> FirstCodeAsk[First code-touching ask arrives] FirstCodeAsk --> HasFiles{git ls-files<br/>non-empty?} HasFiles -- No --> NewProj[New project<br/>no scan, no registry write] HasFiles -- Yes --> HasMem{file_registry<br/>populated?} HasMem -- No --> Bootstrap[Bootstrap scan:<br/>git ls-files + md5 each<br/>insert rows; summary = null] HasMem -- Yes --> CleanCheck{Working tree clean<br/>AND branch up-to-date<br/>AND HEAD == last_verified_sha?} CleanCheck -- Yes --> Trust[Trust registry<br/>no scan] CleanCheck -- No --> Verify[Verification pass per row:<br/>re-md5 file.<br/>Match: keep summary.<br/>Mismatch: mark summary stale.<br/>Missing file: delete row.<br/>New file: insert row with md5.] NewProj --> Route[Route to architect<br/>→ SWE → pr-reviewer] Bootstrap --> Route Trust --> Route Verify --> RecordHEAD[config_set<br/>last_verified_sha = HEAD] RecordHEAD --> Route Route --> Close[SWE atomic close<br/>commits changes] Close --> Update[file_registry_update_summaries<br/>paths touched by commit<br/>+ advance last_verified_sha] Update --> Next{Next ask<br/>in same session?} Next -- Code-touching --> Route Next -- Read-only --> Next Next -- None --> End([Session end])Entry-state matrix
last_verified_shaInvariants
last_verified_shacheck) passes.Key insight
The bootstrap scan is not an upfront cost. On a cold session in an existing repo, Claude has to read files for context regardless. That read is the scan — we just persist it into
file_registry.The cooperation insight (new session / git pull / dirty tree): across sessions, other developers on other machines may have changed files. Verification is mandatory on first code-touching ask of each session, unless
last_verified_shaproves nothing has changed.Schema additions
MCP surface additions
file_registry_update_summaries(paths)— called by SWE atomic-close for each committed path. md5 + regenerate summary + touch timestamp + advancelast_verified_sha.file_registry_verify()— called byproject-prescanwhen drift detected. Returns per-path verdict (match / mismatch / missing / new) so the caller decides whether to mark stale.Skills / agents affected
skills/project-prescan/— branch on entry-state matrix; callfile_registry_verifywhen drift detected.agents/swe.md— atomic close callsfile_registry_update_summarieswith commit's touched paths.agents/architect.md— reads fromfile_registryinstead of scanning; may call update tool when editing non-source files.skills/lazy-regen-check/— unchanged (still governs the markdown view indocs/trustmybot/architecture/auto/).skills/refresh-architecture/— unchanged.Acceptance criteria
last_verified_shaconfig key read/written correctly.git pull(HEAD moved) → next session runs verification pass, stale rows marked.Open questions
typecolumn.last_verified_shapoints to a commit that has been rebased away (GC'd), verification falls through to the full md5 pass. Safe but expensive — acceptable because rebase-away is rare.Notes
This supersedes earlier drafts of #45. Cooperation case (new session / git pull / dirty tree) added per discussion — without it, the registry would drift when teammates push changes.
Related
docs/architecture/FLOWS.mdflow 7docs/architecture/ERD.md— file_registry