Skip to content

🐛 SWE atomic-close: file_registry_update_summaries + last_verified_sha unreliable; promote to Layer 2 hook #181

@ZaxShen

Description

@ZaxShen

SWE doctrine (`agents/swe.md` line 17) says the atomic close MUST batch:
```
commit + task_update_status(status='completed', commit_sha)

  • file_registry_update_summaries(updates=[...], advance_verified_sha=)
    ```

The third call (`file_registry_update_summaries`) is prompt-only doctrine, not enforced. v0.5.0-rc.3 L5 dogfood (CI run 24982267041) showed SWE skipping it on multiple flows:

  • `02-simple-task` outcome 3/5 — `file_registry-has-md5-and-summary-after-swe-close (got 0)` + `last_verified_sha-was-set-after-close (got 0)`
  • `10-codebase-memory-cold-start` outcome 3/4 — `headless_fallback-event-recorded-for-cold-start (got 0)`
  • `11-codebase-memory-verify-on-drift` outcome 2/3 — `foo.py-md5-was-refreshed-after-verify (got NULL)`

Pattern: same h3/h4 prompt-discipline ceiling. SWE complies sometimes, skips sometimes. Pre-exists v0.5.0 work; the failing assertions were added by #45 (codebase memory) and have been quietly failing in L5 since (L5 only runs on RC tags + labeled PRs).

Proposed fix (Layer 2)

A PostToolUse hook on `task_update_status` (similar in shape to `cleanup-worktree-on-task-close.sh` from #172) that fires when SWE flips status to `completed` and:

  1. Reads `tasks.commit_sha` for the just-closed task.
  2. Walks the worktree's commit diff against its parent (`git diff-tree --no-commit-id --name-only -r <commit_sha>`).
  3. For each touched file: computes md5 + writes/updates `file_registry` row with `content_md5` (deterministic).
  4. Advances `last_verified_sha` to `commit_sha`.

The `summary` field still requires LLM judgment (no good way to auto-generate without a model call). The hook would leave `summary` empty for SWE to fill in a later step OR bro to refresh on next code-touching ask. The `content_md5` + `last_verified_sha` updates are the deterministic 80% — close the doctrine-compliance gap on the parts that don't need an LLM.

Why deferred for v0.5.0

This bug is pre-existing (lands with #45), not a v0.5.0 regression. Affected outcome assertions in flows 02 / 10 / 11 are being temporarily relaxed for v0.5.0 ship; should be re-tightened once this hook lands.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't workingPriority: HighHigh priority — blocks meaningful workflowsWorkflowBro / SWE / pr-reviewer doctrine + planning skills

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions