You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
salience already ranks entities that co-occur across the queries of a single session (compute_salience in src/vouch/salience.py, fed by the in-process registry in src/vouch/hot_memory.py), but that signal evaporates when the session ends — session_end calls salience.reset_session and the buffer is dropped. nothing looks across many finished sessions to notice that the same cluster of entities keeps showing up together in approved claims.
this asks for a durable, offline pass that mines recurring themes across sessions and approved claims: which entities co-occur on many approved claims, and which of those recur across distinct sessions over time. when a cluster clears a support threshold, propose a single "theme" synthesis page tying the entities and their strongest supporting claims together — routed through the gate, never written directly. it is the cross-session, durable counterpart to the per-session in-memory reflex in #223, and it produces a proposal rather than the ad-hoc, ungated prose that kb.synthesize (#222) returns at read time.
proposed surface
a new read-only detector plus a propose-only method:
kb.detect_themes — scan approved claims + their entity references and score candidate clusters. read-only; returns ranked cluster records {entity_ids, claim_ids, session_count, claim_count, score} and writes nothing. reuses the co-occurrence scoring shape already in salience.compute_salience but ranges over the durable store instead of a session ring buffer.
kb.propose_theme <cluster> (or kb.detect_themes --propose) — for a cluster over threshold, build a synthesis body from the cited claims (deterministic, no llm — same posture as synthesize.synthesize, whose llm=True path raises) and emit a propose_page proposal with claim_ids pointing at the real supporting claims. the page page_type is a config-declared "theme" kind (the extra-kind mechanism from feat: typed page kinds — schema sync for person, decision, system #234) or falls back to the built-in concept kind — not an invented enum value. the proposal enters list_pending and waits for a human kb.approve.
config, read defensively like retrieval.reflex in salience.reflex_cfg (no new pydantic model — see #243 for that):
themes:
min_sessions: 3# a cluster must recur across ≥N distinct sessionsmin_claims: 5# ≥N approved claims must support ittop_k: 10# cap proposals per runenabled: true
scoring stays deterministic and zero-llm: entity co-occurrence counts over claim.entities, weighted by how many distinct sessions contributed the supporting claims (session attribution via the existing Session.proposal_ids backfill in session_end).
review gate & scope
the detector is a proposer, never a writer. every theme page it produces goes out as a propose_page proposal and materializes only when a human runs kb.approve — the write path stays proposals.approve() with no bypass. this is deliberately different from the session summary page that crystallize writes directly: that page is confined to server-generated fields the agent cannot influence (see _build_summary_body and #76), whereas a theme page carries synthesized prose and must go through the gate.
status/eligibility logic (what counts as a supporting claim, the support thresholds, dedup against already-approved theme pages) lives in a new module alongside proposals.py / lifecycle.py; storage.py stays pure i/o. the detector reads only approved, non-archived, non-superseded claims. everything runs locally against .vouch/ and state.db — no network, no hosted service, no change to the yaml storage format. a scheduled/background invocation is allowed only in propose-only mode: it files proposals and stops, never auto-approving.
distinction from adjacent issues: #223 is the per-session, in-memory, read-time prefetch reflex — it never persists; this is the cross-session, durable, propose-time counterpart. #222 (kb.synthesize) answers a query in gated-read prose at request time and writes nothing; this mines standing themes offline and produces a reviewable page artifact.
acceptance criteria
kb.detect_themes returns ranked cluster records over approved claims and writes nothing (read-only)
cluster scoring is deterministic and zero-llm (entity co-occurrence + distinct-session weighting); no llm flag that silently degrades
--propose / kb.propose_theme emits theme pages via propose_page with a valid page_type (config-declared theme kind or built-in concept); they appear in kb.list_pending and require a human kb.approve to materialize
proposed theme pages carry only real supporting claim_ids; no approved page is ever written outside proposals.approve()
archived / superseded / pending claims are excluded from cluster support
themes.* config is read defensively with safe defaults; malformed values fall back rather than crash
a background/scheduled run proposes and never approves; re-running dedups against existing approved theme pages
new method registered at all four sites (server.py, jsonl_server.py, capabilities.METHODS, cli.py) with tests/test_themes.py covering the propose-not-write invariant
salience already ranks entities that co-occur across the queries of a single session (
compute_salienceinsrc/vouch/salience.py, fed by the in-process registry insrc/vouch/hot_memory.py), but that signal evaporates when the session ends —session_endcallssalience.reset_sessionand the buffer is dropped. nothing looks across many finished sessions to notice that the same cluster of entities keeps showing up together in approved claims.this asks for a durable, offline pass that mines recurring themes across sessions and approved claims: which entities co-occur on many approved claims, and which of those recur across distinct sessions over time. when a cluster clears a support threshold, propose a single "theme" synthesis page tying the entities and their strongest supporting claims together — routed through the gate, never written directly. it is the cross-session, durable counterpart to the per-session in-memory reflex in #223, and it produces a proposal rather than the ad-hoc, ungated prose that
kb.synthesize(#222) returns at read time.proposed surface
a new read-only detector plus a propose-only method:
kb.detect_themes— scan approved claims + their entity references and score candidate clusters. read-only; returns ranked cluster records{entity_ids, claim_ids, session_count, claim_count, score}and writes nothing. reuses the co-occurrence scoring shape already insalience.compute_saliencebut ranges over the durable store instead of a session ring buffer.kb.propose_theme <cluster>(orkb.detect_themes --propose) — for a cluster over threshold, build a synthesis body from the cited claims (deterministic, no llm — same posture assynthesize.synthesize, whosellm=Truepath raises) and emit apropose_pageproposal withclaim_idspointing at the real supporting claims. the pagepage_typeis a config-declared "theme" kind (the extra-kind mechanism from feat: typed page kinds — schema sync forperson,decision,system#234) or falls back to the built-inconceptkind — not an invented enum value. the proposal enterslist_pendingand waits for a humankb.approve.cli mirrors:
config, read defensively like
retrieval.reflexinsalience.reflex_cfg(no new pydantic model — see #243 for that):scoring stays deterministic and zero-llm: entity co-occurrence counts over
claim.entities, weighted by how many distinct sessions contributed the supporting claims (session attribution via the existingSession.proposal_idsbackfill insession_end).review gate & scope
the detector is a proposer, never a writer. every theme page it produces goes out as a
propose_pageproposal and materializes only when a human runskb.approve— the write path staysproposals.approve()with no bypass. this is deliberately different from the session summary page thatcrystallizewrites directly: that page is confined to server-generated fields the agent cannot influence (see_build_summary_bodyand #76), whereas a theme page carries synthesized prose and must go through the gate.status/eligibility logic (what counts as a supporting claim, the support thresholds, dedup against already-approved theme pages) lives in a new module alongside
proposals.py/lifecycle.py;storage.pystays pure i/o. the detector reads only approved, non-archived, non-superseded claims. everything runs locally against.vouch/andstate.db— no network, no hosted service, no change to the yaml storage format. a scheduled/background invocation is allowed only in propose-only mode: it files proposals and stops, never auto-approving.distinction from adjacent issues: #223 is the per-session, in-memory, read-time prefetch reflex — it never persists; this is the cross-session, durable, propose-time counterpart. #222 (
kb.synthesize) answers a query in gated-read prose at request time and writes nothing; this mines standing themes offline and produces a reviewable page artifact.acceptance criteria
kb.detect_themesreturns ranked cluster records over approved claims and writes nothing (read-only)--propose/kb.propose_themeemits theme pages viapropose_pagewith a validpage_type(config-declared theme kind or built-inconcept); they appear inkb.list_pendingand require a humankb.approveto materializeclaim_ids; no approved page is ever written outsideproposals.approve()themes.*config is read defensively with safe defaults; malformed values fall back rather than crashserver.py,jsonl_server.py,capabilities.METHODS,cli.py) withtests/test_themes.pycovering the propose-not-write invariantmake checkgreen (pytest, mypy src, ruff)