Skip to content

bug: kb.context returns archived/superseded claims; lifecycle mutations never re-index FTS5 status #78

Description

@galuis116

What happened

kb.context returns claims that have been archived, superseded,
contested, or redacted as if they were live — agents receive
retracted knowledge in their context pack. Two compounding bugs
combine to produce this:

Bug A — build_context_pack has no status filter.
src/vouch/context.py:75-101 walks the hits from _retrieve and
appends every match to the context pack. There is no check on
claim.status. Once a claim is indexed, it stays in retrieval
forever, regardless of subsequent lifecycle calls.

Bug B — store.update_claim never refreshes the FTS5 row.
src/vouch/storage.py:308-313:

def update_claim(self, claim: Claim) -> Claim:
    if not self._claim_path(claim.id).exists():
        raise ArtifactNotFoundError(f"claim {claim.id}")
    self._claim_path(claim.id).write_text(_yaml_dump(claim.model_dump(mode="json")))
    self._embed_and_store(kind="claim", id=claim.id, text=claim.text)
    return claim

The embedding cache is refreshed; the FTS5 row is not. So
lifecycle.archive, lifecycle.supersede, and
lifecycle.contradict — all of which finish with
store.update_claim(...) and never themselves call
index_db.index_claim — leave claims_fts.status stuck at whatever
value it had at first-index time.

index_db.index_claim is called by proposals.approve (at
src/vouch/proposals.py:258) on first approval and by
health.rebuild_index during vouch index / vouch doctor. No
update path keeps it in sync with lifecycle mutations.

The compound effect: even a future fix that adds a status filter
to context.py would still leak archived claims, because the FTS5
status column it would filter on is stale. Both bugs must be
fixed for retrieval to be correct.

This breaks the read-side promise of the KB. The whole point of
ClaimStatus.{ARCHIVED, SUPERSEDED, REDACTED} is to remove a
claim from active circulation while keeping its history. Today
those states are decorative.

What you expected

kb.context (and the underlying build_context_pack) should
exclude claims whose status is ARCHIVED, SUPERSEDED, or
REDACTED from the returned items. CONTESTED claims may be
surfaced (per kb.lint and vouch lint, "contested" is a
caution, not a removal), but the host should be able to see
their status — adding status to ContextItem lets agents and
hosts decide.

lifecycle.archive, lifecycle.supersede, and
lifecycle.contradict should keep claims_fts.status in sync
with the on-disk claim — either by calling index_claim directly
or by making store.update_claim re-index the FTS5 row in
addition to refreshing the embedding cache.

Reproduction

$ python archived_context_repro.py
work dir: /tmp/vouch-archived-XXXXXXXX

--- after approval (stable claim) ---
items: [('mongodb-is-faster-than-postgres', '«MongoDB» is faster than Postgres')]

--- claim mongodb-is-faster-than-postgres archived ---
on-disk claim.status: archived
FTS5 row after archive: id=mongodb-is-faster-than-postgres status='working'  (expected: 'archived')

--- after archive, kb.context for 'mongodb' ---
items returned: 1
  id='mongodb-is-faster-than-postgres'  summary='«MongoDB» is faster than Postgres'

BUG CONFIRMED: archived claim still surfaces in kb.context. Agents
will quote retracted knowledge as if it were live.

archived_context_repro.py does:

  1. KBStore.init(...); register a source.
  2. propose_claim(text="MongoDB is faster than Postgres", evidence=[src.id])
    then approve(...).
  3. lifecycle.archive(claim_id=...).
  4. Read the on-disk YAML → confirms status: archived.
  5. Direct SQL query against state.db
    claims_fts.status='working'. The FTS5 row was never
    refreshed; the column still reflects the status at first-index.
  6. context.build_context_pack(query='mongodb').
  7. Observe the archived claim in items — the agent would receive
    it as fresh context.

The same shape applies to supersede (status becomes
SUPERSEDED) and contradict (status becomes CONTESTED) —
both call update_claim and never re-index.

Environment

  • vouch version: vouch 0.0.1 at main
  • Python version: Python 3.12.13
  • OS: any (logic is platform-independent)
  • Host: any (CLI vouch context …, MCP kb.context, JSONL
    kb.context)

.vouch/ state

$ vouch doctor

Not informative — doctor rebuilds the index, which masks Bug B
after the rebuild completes. The bug reappears the moment any
new lifecycle op runs on a claim that survives the rebuild.

Anything else

Suggested fix

Two-file fix, both required:

  1. src/vouch/storage.py update_claim — after writing the
    YAML, also refresh the FTS5 row by calling
    index_db.index_claim(conn, id=claim.id, text=claim.text, type=claim.type, status=claim.status, tags=claim.tags)
    (mirroring what proposals.approve does on first-index at
    src/vouch/proposals.py:258). This keeps claims_fts.status
    in sync with every lifecycle mutation without touching
    lifecycle.py.

  2. src/vouch/context.py build_context_pack — after
    _retrieve and before assembling items, skip hits whose
    kind == "claim" and whose underlying claim.status is in
    {ARCHIVED, SUPERSEDED, REDACTED}. Resolve via
    store.get_claim(hid); if the claim is gone, drop the hit.
    Optionally surface claim.status on ContextItem so
    CONTESTED is visible without filtering.

Tests in tests/test_context.py and tests/test_lifecycle.py:

  • After archive, kb.context returns zero items matching
    the archived claim's text.
  • After supersede(old, new), kb.context returns new but
    not old.
  • After archive, the claims_fts row's status column is
    'archived' (direct SQL assertion).

Why it's worth fixing

  • The read-side promise of the KB is "what does vouch know
    now". The archive/supersede/redact statuses exist precisely
    to retract knowledge from that answer. Today retracted
    knowledge is still served, so the statuses are pure
    bookkeeping with no operational effect.
  • Affects every agent that uses kb.context (i.e., every
    intended consumer of vouch) on every platform and every
    transport (CLI, MCP, JSONL).
  • Compound — fixing only one half (filter at context, or
    re-index in update_claim) doesn't close the bug. The PR has
    to touch both files, with tests for both halves.
  • Adjacent to active team investment in retrieval quality —
    recently-merged PR feat(embeddings): CLI sweep, MCP/JSONL parity, integration test, docs #44 reshaped the semantic-search code path;
    PR fix(sessions): index crystallize summary page into FTS5 (#60) #61 (open) is fixing FTS5 indexing of the crystallize
    summary page (bug: crystallize writes session-summary page without FTS5 indexing — pages invisible to kb.search #60). Both signal that retrieval correctness is
    on the team's radar.
  • Sibling guarantee to the audit-truthfulness work the team has
    been investing in: "audit log truthfully says what happened"
    is meaningless if "kb.context truthfully says what's live" is
    broken.

Checked for duplicates

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions