Codebase memory: lazy bootstrap + per-session verify + task-closure updates

## Workflow

```mermaid
flowchart TD
 Start([Session start]) --> FirstCodeAsk[First code-touching ask arrives]
 FirstCodeAsk --> HasFiles{git ls-files non-empty?}

 HasFiles -- No --> NewProj[New project no scan, no registry write]
 HasFiles -- Yes --> HasMem{file_registry populated?}

 HasMem -- No --> Bootstrap[Bootstrap scan: git ls-files + md5 each insert rows; summary = null]
 HasMem -- Yes --> CleanCheck{Working tree clean AND branch up-to-date AND HEAD == last_verified_sha?}

 CleanCheck -- Yes --> Trust[Trust registry no scan]
 CleanCheck -- No --> Verify[Verification pass per row: re-md5 file. Match: keep summary. Mismatch: mark summary stale. Missing file: delete row. New file: insert row with md5.]

 NewProj --> Route[Route to architect → SWE → pr-reviewer]
 Bootstrap --> Route
 Trust --> Route
 Verify --> RecordHEAD[config_set last_verified_sha = HEAD]
 RecordHEAD --> Route

 Route --> Close[SWE atomic close commits changes]
 Close --> Update[file_registry_update_summaries paths touched by commit + advance last_verified_sha]
 Update --> Next{Next ask in same session?}

 Next -- Code-touching --> Route
 Next -- Read-only --> Next
 Next -- None --> End([Session end])
```

## Entry-state matrix

| State at first code-touching ask | Action |
|---|---|
| Empty repo (no files) | No scan, no registry write |
| Files exist + registry empty | Bootstrap scan — git ls-files + md5 per file; summaries null |
| Registry populated + tree clean + branch up-to-date + HEAD == `last_verified_sha` | Trust — skip entirely |
| Registry populated + any drift (dirty tree / behind upstream / HEAD moved) | Verification pass — md5 compare per row |

## Invariants

1. **Inside a session**, registry is trusted — only TMB's SWE modifies files, atomic-close updates registry.
2. **Across sessions**, registry is suspect until the cheap proof (`last_verified_sha` check) passes.
3. **Verification is never full re-summarization.** md5 compare only. Drift → mark stale. Summaries regenerate lazily when architect/SWE actually reads the file.

## Key insight

The bootstrap scan is not an upfront cost. On a cold session in an existing repo, Claude has to read files for context regardless. That read *is* the scan — we just persist it into `file_registry`.

The cooperation insight (new session / git pull / dirty tree): across sessions, other developers on other machines may have changed files. Verification is mandatory on first code-touching ask of each session, unless `last_verified_sha` proves nothing has changed.

## Schema additions

```sql
ALTER TABLE file_registry ADD COLUMN content_md5 TEXT;
ALTER TABLE file_registry ADD COLUMN summary TEXT;
ALTER TABLE file_registry ADD COLUMN summary_updated_at TEXT;

-- plus a plugin_config key:
-- last_verified_sha: TEXT — git HEAD at last successful verify/bootstrap/update
```

## MCP surface additions

- `file_registry_update_summaries(paths)` — called by SWE atomic-close for each committed path. md5 + regenerate summary + touch timestamp + advance `last_verified_sha`.
- `file_registry_verify()` — called by `project-prescan` when drift detected. Returns per-path verdict (match / mismatch / missing / new) so the caller decides whether to mark stale.

## Skills / agents affected

- `skills/project-prescan/` — branch on entry-state matrix; call `file_registry_verify` when drift detected.
- `agents/swe.md` — atomic close calls `file_registry_update_summaries` with commit's touched paths.
- `agents/architect.md` — reads from `file_registry` instead of scanning; may call update tool when editing non-source files.
- `skills/lazy-regen-check/` — unchanged (still governs the markdown view in `docs/trustmybot/architecture/auto/`).
- `skills/refresh-architecture/` — unchanged.

## Acceptance criteria

- [ ] Schema additions landed.
- [ ] `last_verified_sha` config key read/written correctly.
- [ ] Integration test — empty repo → first task closes → registry populated only for touched files.
- [ ] Integration test — existing repo / registry empty → bootstrap scan runs once, subsequent asks don't re-scan.
- [ ] Integration test — clean tree + unchanged HEAD + prior verify → next session trusts registry, no md5 pass.
- [ ] Integration test — simulated `git pull` (HEAD moved) → next session runs verification pass, stale rows marked.
- [ ] Integration test — simulated dirty working tree → verification detects mismatched md5, marks stale.
- [ ] Integration test — another developer deletes a file upstream → after pull, verification deletes the registry row.
- [ ] Integration test — another developer adds a file upstream → after pull, verification inserts a new row with null summary.
- [ ] Budget: md5 pass on 500-file repo completes in ≤100ms.

## Open questions

- Who writes summary text — architect (fresh LLM reasoning) or a dedicated tool? Prefer architect — it's reasoning, not mechanical.
- Skip binaries / lockfiles? Yes; filter by `type` column.
- If `last_verified_sha` points to a commit that has been rebased away (GC'd), verification falls through to the full md5 pass. Safe but expensive — acceptable because rebase-away is rare.

## Notes

This supersedes earlier drafts of #45. Cooperation case (new session / git pull / dirty tree) added per discussion — without it, the registry would drift when teammates push changes.

## Related

- `docs/architecture/FLOWS.md` flow 7
- `docs/architecture/ERD.md` — file_registry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codebase memory: lazy bootstrap + per-session verify + task-closure updates #45

Workflow

Entry-state matrix

Invariants

Key insight

Schema additions

MCP surface additions

Skills / agents affected

Acceptance criteria

Open questions

Notes

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

State at first code-touching ask	Action
Empty repo (no files)	No scan, no registry write
Files exist + registry empty	Bootstrap scan — git ls-files + md5 per file; summaries null
Registry populated + tree clean + branch up-to-date + HEAD == `last_verified_sha`	Trust — skip entirely
Registry populated + any drift (dirty tree / behind upstream / HEAD moved)	Verification pass — md5 compare per row

Codebase memory: lazy bootstrap + per-session verify + task-closure updates #45

Description

Workflow

Entry-state matrix

Invariants

Key insight

Schema additions

MCP surface additions

Skills / agents affected

Acceptance criteria

Open questions

Notes

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions